Abstract:
Glottocodes constitute the backbone identification system for the
language, dialect and family inventory Glottolog
(https://glottolog.org). In this paper, we summarize the motivation and
history behind the system of glottocodes and describe the principles
and practices of data curation, technical infrastructure and
update/version-tracking systematics. Since our understanding of the
target domain --- the dialects, languages and language families of
the entire world --- is continually evolving, changes and updates
are relatively common. The resulting data is assessed in terms of
the FAIR (Findable, Accessible, Interoperable, Reusable) Guiding
Principles for scientific data management and stewardship. As such
the glottocode-system responds to an important challenge in the
realm of Linguistic Linked Data with numerous NLP applications.