Glottocodes: Identifiers Linking Families, Languages and Dialects

Harald Hammarstrom
Robert Forkel1

Glottocodes constitute the backbone identification system for the language, dialect and family inventory Glottolog ( In this paper, we summarize the motivation and history behind the system of glottocodes and describe the principles and practices of data curation, technical infrastructure and update/version-tracking systematics. Since our understanding of the target domain --- the dialects, languages and language families of the entire world --- is continually evolving, changes and updates are relatively common. The resulting data is assessed in terms of the FAIR (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship. As such the glottocode-system responds to an important challenge in the realm of Linguistic Linked Data with numerous NLP applications.
