Skip to content

lexibank/northperulex

Repository files navigation

CLDF dataset derived from Ugarte et al.'s "NorthPeruLex - A Lexical Dataset of Small Language Families and Isolates from Northern Peru (forthcoming).

CLDF validation

How to cite

If you use these data please cite

  • the original source

    Ugarte, Carlos and Blum, Frederic and Ingunza, Adriano and Gonzales, Rosa and Peña, Jaime. Forthcoming. NorthPeruLex - A Lexical Dataset of Small Language Families and Isolates from Northern Peru.

  • the derived dataset using the DOI of the particular released version you were using

Description

This dataset brings together lexical data from isolates and small language families from northern Peru to investigate their historic relations.

This dataset is licensed under a CC-BY-4.0 license

Conceptlists in Concepticon:

Notes

work in progress

Statistics

CLDF validation Glottolog: 97% Concepticon: 100% Source: 100% BIPA: 100% CLTS SoundClass: 100%

  • Varieties: 35 (linked to 34 different Glottocodes)
  • Concepts: 200 (linked to 200 different Concepticon concept sets)
  • Lexemes: 4,804
  • Sources: 16
  • Synonymy: 1.11
  • Invalid lexemes: 0
  • Tokens: 27,168
  • Segments: 342 (0 BIPA errors, 0 CLTS sound class errors, 337 CLTS modified)
  • Inventory size (avg): 37.43

Contributors

Name GitHub user Description Role
Carlos Ugarte @MuffinLinwist Data collector, CLDF conversion and annotation Author, Editor
Frederic Blum @FredericBlum CLDF conversion and annotation Author, Editor
Adriano Ingunza @BadBatched Data collector and annotation Author
Rosa Gonzales @rosalgm Data collector and annotation Author
Jaime Peña @JaimePenat Data collector and annotation Author

CLDF Datasets

The following CLDF datasets are available in cldf: