Skip to content

ltgoslo/slide

Repository files navigation

SLIDE

arXiv

Data and code (still being added) for the paper Multi-label Scandinavian Language Identification (SLIDE) (presented at RESOURCEFUL-2025).

reproduce metrics (table 4):
cd src/
./run_all.sh
reproduce evaluation on nordic_langid (table 5)

obtain data from nordic_langid on Huggingface and put *test.csv into src/evaluation

cd src/
python3 nordic_langid2jsonl.py 
python3 evaluate.py --method bert --model ltg/SLIDE-base --dataset nordic_dsl_test50k.jsonl
python3 evaluate.py --method bert --model ltg/SLIDE-base --dataset nordic_dsl_test10k.jsonl

The values that will be shown will be different from those in table 5 in the paper.

The values in table 5 were obtained with the understanding of loose accuracy as it is described in the paper.

The actual evaluate.py accepts a prediction if it is a subset of gold languages, not an intersection. (Values in table 4 were obtained with this understanding). However, while it influences exact values (less than 2%), the models' ranking remains the same.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •