When Linguistics Meets Web Technologies. Recent advances in Modelling Linguistic Linked Open Data

Tracking #: 2859-4073

Fahad Khan
Christian Chiarcos
Thierry Declerck
Daniela Gifu
Elena González-Blanco García
Jorge Gracia
Max Ionov
Penny Labropoulou
Francesco Mambrini
John McCrae
Émilie Pagé-Perron
Marco Passarotti
Salvador Ros
Ciprian-Octavian Truica

Responsible editor: 
Philipp Cimiano

Submission type: 
Survey Article
This article provides an up-to-date and comprehensive survey of models (including vocabularies, taxonomies and ontologies) used for representing linguistic linked data (LLD). It focuses on the latest developments in the area and both builds upon and complements previous works covering similar territory. The article begins with an overview of recent trends which have had an impact on linked data models and vocabularies, such as the growing influence of the FAIR guidelines, the funding of several major projects in which LLD is a key component, and the increasing importance of the relationship of the digital humanities with LLD. Next, we give an overview of some of the most well known vocabularies and models in LLD. After this we look at some of the latest developments in community standards and initiatives such as OntoLex-Lemon as well as recent work which has been in carried out in corpora and annotation and LLD including a discussion of the LLD metadata vocabularies META-SHARE and \textit{lime} and language identifiers. In the following part of the paper we look at work which has been realised in a number of recent projects and which has a significant impact on LLD vocabularies and models.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 28/Sep/2021
Review Comment:

The paper presents a very exhaustive state-of-the-art on LLD vocabularies and models, projects, resources.
The provided survey represents a useful reference for a broad audience, including non-experts in the field.

In the reviewed version of the paper, authors have properly addressed previous comments making the content less repetitive and more concise and adding summarizing tables and introductory paragraphs to improve paper organization.

One minor remark: Figure 3 seems a low quality image.

Review #2
By Armando Stellato submitted on 30/Sep/2021
Minor Revision
Review Comment:

I have read the improved version of the submitted work and acknowledge the changes that have been brought to satisfy the requests of the reviewers and, in general, the quality of the article.

I am generally satisfied with this revision of the work; I have only quite a few remarks about the changes in section 2.1. What I addressed as an unclear and unexplained description of the properties of OWL has been replaced with another description seeming closer to an evangelist/seller approach than a paragraph showing grounded facts or at least true common knowledge about OWL. A few examples:

* “It also facilitates the machine-readable description of the relationship between different definitions of concepts across languages or traditions.” It’s hard for me to get the contribution of OWL here. This is about content and domain semantics…maybe this is, again, a contribution of OntoLex specifically as an OWL vocabulary?

* “Using a machine readable description in OWL, once again, in conjunction with an ontology modelling methodology such as OntoClean [5], and together a more human readable description given as documentation, can help to clarify (according to the expressive limitations of OWL) what we mean when we use a concept like ‘Sense’ or ‘Morpheme’ in a dataset. “
- small note: add “with” in between “together” and “a”
- OntoClean is not a modeling methodology (unless we consider, on a very broad spectrum of the meaning, “modeling methodology” anything that supports, at any extent, the development of an ontology, but a methodology about “A” would provide you with necessary and complete information for performing “A”, which OntoClean does not, wrt modeling). OntoClean specifically helps in validating the logical consistency of taxonomic relationships by means of meta-properties that are assigned on the basis of the interpretation of the ontology. E.g., if the concepts of student and human are clear in the mind of the developer, student will be considered anti-rigid while human is rigid and if, by mistake, human is put as a subclass of student, then (a processor implementing) OntoClean will tell that this is not ok because a rigid concept cannot be more specific than a non-rigid one. It is surely fascinating in that it provides computational rules for evaluating adequacy and consistency based on the interpretation of the concepts provided by the same developers. However, besides being a useful tool to be adopted for validating our work, it does not contribute much in clarifying the precise meaning of logical terms, as suggested by the authors (that is more clarified by documentation, which has been mentioned by the authors as well). Also, the expression “according to the expressive limitations of OWL” is not clear, why according to?

Overall, I’m in favor of accepting the paper with minor modifications, but that section needs to be cleaned and cleared, as it is vague and unprecise.

Review #3
Anonymous submitted on 30/Nov/2021
Review Comment:

In general, the authors have appropriately approached the issues mentioned in my previous review. Mainly, in section 3, thanks to the two tables included and additional explanations, the authors provide a more comprehensive picture of LLOD models. Section 4 seems now more balanced as for the information provided in the several sections and the attention given to the several models, standards and initiatives described in there. The summary sections included at the introduction of each section and subsection are also of much help. Section 5 has also been improved and provides now a much more comprehensive picture of projects and their relations to previously introduced models and initiatives.
Some additional remarks/suggestions for authors:
I’d suggest the authors to review the introduction (section 1). They claim “The purpose of this article is to provide a comprehensive and up-to-date survey of models used for representing linguistic linked data”, but this is not true, since the paper is much more ambitious and covers other topics related to LLOD than only models.
I am not sure the introduction should give details about specific sub-sections (2.4, 4.1, 4.2) in which certain topics are provided, but rather remain at the level of “section”. Maybe they should consider converting certain subsections (2.4) into proper sections.
The introductory paragraph in section 2 is not really an “introductory paragraph”, but the concluding paragraph of a longer introduction that brings the authors to conclude that there are three topics or trends that are to be analysed with respect to LLD models. I would suggest they re-write this paragraph so that it becomes a real introduction to the section.
In section 2.2 they refer to the projects that will be described in more detail in section 5, but the projects (and sections) mentioned there differs from the ones finally included in section 5.
Minor comments:
• Consider removing this sentence “We will look at one series of FAIR related recommendations for models in Section 3 and see how they might be applied to the case of LLD.”, since it breaks the line of though at that stage and appears in the last paragraph of section 2.1. “Part of the intention of this article, together with the foundational work carried out in [7], is to provide an overview of what exists out there in terms of LLD-focused models, to look at the areas which are receiving most attention in order to highlight those which are so far underrepresented. In addition in Section 3 we look at the most well known LLD models in the light of a recent series of recommendations on the publication of models as FAIR resources.”
• The English of the paper should be reviewed, specially as regards informal language: “we will look at ‘in the wild’ so to speak”…
• Check the use of capitalised L in Lemon (OntoLex-Lemon model)
• Not sure machine readable licensing has been dealt with in this project, but NIF used in for the annotations produced by NLP services. Please check: “Other projects with a significant recent impact on the application of LLD vocabularies include the Horizon 2020 project Lynx: Building the Legal Knowledge Graph for Smart Compliance Services in Multilingual Europe (2017-2021) [127] which has contributed to data modelling in the area of machine readable licensing, a topic that is much broader than the area covered by our survey”.
• Other minor errors:
o Section 2.2: along with an extended descriptions of a number of projects
o Section 2.3 emphases
o Section 3: . Every one of the models listed in the table at is an OWL ontology..
o Section 4.1: (one of which has been published and two which are still currently under development)
o Section 5: bracket missing in LiODi Section…; NexusLinguaram
o Section 5.1: They also include FREME180 which explored the application of the NIF and lemon;