Glottocodes: Identifiers Linking Families, Languages and Dialects to Comprehensive Reference Information

Tracking #: 2843-4057

Harald Hammarstrom
Robert Forkel

Responsible editor: 
Guest Editors Advancements in Linguistics Linked Data 2021

Submission type: 
Tool/System Report
Glottocodes constitute the backbone identification system for the language, dialect and family inventory Glottolog ( In this paper, we summarize the motivation and history behind the system of glottocodes and describe the principles and practices of data curation, technical infrastructure and update/versiontracking systematics. Since our understanding of the target domain — the dialects, languages and language families of the entire world — is continually evolving, changes and updates are relatively common. The resulting data is assessed in terms of the FAIR (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship. As such the glottocode-system responds to an important challenge in the realm of Linguistic Linked Data with numerous NLP applications.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Steven Moran submitted on 04/Sep/2021
Review Comment:

This manuscript was submitted as 'Tools and Systems Report' and should be reviewed along the following dimensions: (1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided). (2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

Review #2
By Jeff Good submitted on 23/Sep/2021
Review Comment:

This manuscript was submitted as 'Tools and Systems Report' and should be reviewed along the following dimensions: (1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided). (2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

I reviewed an earlier version of this manuscript and recommended minor revisions to clarify some aspects of the original manuscript which I believed could be made more accurate or clear. My assessment of the earlier version was that this paper should be published, and this assessment has not changed. I also believe that my main points of concern have been adequately addressed in the revised version. So, at this point, I believe that this paper can be accepted. I noticed some minor issues with the English presentation, which suggest that the authors may want to proofread the manuscript carefully one more time, but I don't think these should prevent the paper from being accepted.