LL(O)D and NLP Perspectives on Semantic Change for Humanities Research

Tracking #: 2688-3902

Authors: 
Florentina Armaselu
Elena-Simona Apostol
Fahad Khan
Chaya Liebeskind
Barbara McGillivray
Ciprian-Octavian Truica
Andrius Utka
Giedrė Valūnaitė Oleškevičienė
Marieke van Erp

Responsible editor: 
Philipp Cimiano

Submission type: 
Survey Article
Abstract: 
The paper presents a survey of the LLOD and NLP methods, tools and data for detecting and representing semantic change, with main application in humanities research. Its aim is to provide the starting points for the construction of a workflow and set of multilingual diachronic ontologies within the humanities use case of the COST Action 'Nexus Linguarum, European network for Web-centred linguistic data science'. The various sections focus on the essential aspects needed to understand the current trends and to build applications in this area of study.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Julia Bosque submitted on 07/Mar/2021
Suggestion:
Major Revision
Review Comment:

This survey provides an overview of the main approaches in the Semantic Web and NLP to detect and represent semantic change in the context of digital humanities research, specifically in the study of semantic shift in historical texts.

*Summary*
1. The topic of the paper, at the intersection of humanities and the Semantic Web, is interesting and relevant for the advancement in a line of research which poses numerous challenges still.
2. The quality of writing is good and the survey is well balanced, with a broad coverage encompassing theoretical standpoints and approaches, tools, repositories and datasets.
3. The granularity and length are also appropriate for the text to serve as an introductory text.

The need for *major changes* is due to several aspects:
- The motivation to combine digital humanities approaches with semantic technologies in the first place, and/or the need of this survey specifically, are points not clearly addressed.
- The survey requires a clearer definition of its scope and limitations
- The current structure of the paper and the guided thread through the different sections make it difficult for the main message to come through: the sections and subsections alone are clear, but not the connection between some of them or the placement of specific content into specific subsections.

*Detailed review*

MOTIVATION: In order to serve as an introductory text (see survey criteria), the paper should provide some points on the added value of semantic technologies in particular to digital humanities and the study of semantic change. This would illustrate the motivation of combining the two lines for comers from any of the two areas. Similarly, the contribution of this survey to other surveys on the topic (or the lack of them, if applicable) is not mentioned. For readers outside the community of the NexusLinguarum COST Action, the motivation and gaps in the literature behind the use case that moves the authors to embark on this survey and how the contribution addresses them are points not highlighted from the very beginning.

SCOPE: The paper focuses on the study of semantic change for a use case requiring the creation of diachronic ontologies. In an advanced section of the text, the authors refer to the challenges of historical text in particular. The scope seems to be specifically historical text in digital humanities, which would leave the studies of lexical semantic shift across the last decades out (e.g. [1]). The focus on historical text is not a drawback of the study, but the manuscript needs a clearer delimitation of its scope.

[1] Wegmann, A., Lemmerich, F., & Strohmaier, M. (2020). Detecting different forms of semantic shift in word embeddings via paradigmatic and syntagmatic association changes. In International Semantic Web Conference (pp. 619-635). Springer, Cham.

STRUCTURE:

1. The first core section of the paper starts with an overview of the theoretical frameworks to study semantic change. The authors organise this section in two subsections, for efforts that "depart from knowledge or from language". I suggest that the main differences between these two blocks are briefly outlined in more detail at the beginning of that section.

2. Relation between sections 2 and 3: In the "knowledge-oriented" block (Section 2) there are approaches that turn to Semantic Web technologies. Later, on Section 3, LLOD formalisms, we also have approaches that are using semantic technologies, but these latter approaches are not placed in relation to the theoretical ones from the previous section. This results in no bridge between Sections 2 and 3. For example, the representation of etymologies in RDF via an Etymology class or the properties cognate or derivedFrom seem to be related to the conversion of etymological linguistic resources (e.g. WordNet, dictionaries, etc.) and seem to be closer to the language-oriented approaches than other works in that same section dealing with the representation of concept evolution as RDF, or with the representation of change following a perdurantist approach. The last two examples would be closer to the knowledge-oriented ones from the previous section.

3. Section 3, LLOD formalisms, deals with the representation of semantic shift in the Semantic Web context in general, not all of the works and resources described there specifically address linguistic linked data.

4. Subsection 4.2, NLP tools and normalisation, is focused on normalisation approaches to deal with spelling variation in historical text. Are there any other features of historical text that are challenging which could also be discussed? (a section devoted to challenges of historical text, for example, instead of a subsection for a specific feature). The following subsection (4.3) addresses NER and NEL in historical text, but the content here is not put in relation to semantic shift detection in particular and why NER or NEL play an important role there. When reading, this causes the focus in section 4 to change from "semantic shift detection" to "tools for NLP for historical texts in general" and then back to specific approaches to detect semantic change (next subsections).

5. Subsection 5.3 addresses both tools such as LODifier, LLODifier and CSV to RDF converters, as well as ontology learning tools. The content should reflect the difference between tools that extract information and link entities to available ontologies from those that perform a shallow conversion to RDF and those that are aimed at ontology learning. This also raises the question whether all tools mentioned there fit into the larger section "NLP for ontology generation" (Section 5), or whether that subsection gathers tools to generate RDF which could be useful for the generation/conversion of LLOD diachronic resources in general, not necessarily ontologies.

6. Section 6, on available diachronic resources, starts with an overview of available LLD resources, their publication, and some resources which are currently being modelled with OntoLex-Lemon. This is a key section given the focus of the paper, so restructuring the next part (p.15 - l.33) in a way that better conveys the authors’ points would help to better understand the state of the art (lack of guidelines, limitations for the representation, lack of ways to ease publication and maintenance of data for non-Semantic Web experts, potentially relevant and available digital resources).

7. Tables (or a single one) to summarize the list of approaches, tools, etc. and their use would help to illustrate the complexity and heterogeneous landscape that is discussed in the conclusions.

MINOR COMMENTS

The whole section 4 deals with NLP techniques and tools to automatically detect semantic change, so subsection 4.1 "Automatic detection of semantic change" could be renamed to “Overview”, otherwise this introductory part seems separate from the one on word embeddings.

TYPOS:

- as well as in the field of Linguistic Linked Open Data (LLOD) -> comma after this (p. 2)
- the terminological point which reveals the development of a terminological notion (p.5, l. 44) -> maybe rephrase
- 5Lexicon Model for Ontologies: Community Report, 10 May
2016 (w3.org)https://www.w3.org/2016/05/ontolex/#semantics -> Missing space
- "LLOD" used throughout the whole paper but both the title and p.6 (l.44) include the mention with parentheses (LL(O)D)
- p. 16, l.24, “learning to deal with the knowledge of the past and its evolution over time, also implies learning to deal with
the knowledge of the future” -> remove comma
- Ref. [35] in the bibliography shows all in capital letters

Review #2
By Enrico Daga submitted on 12/May/2021
Suggestion:
Major Revision
Review Comment:

The article is a survey on theories, approaches, and technologies for supporting the identification and representation of semantic change in the context of humanities (particularly historical) corpora.

In what follows I refer to the journal guidelines for reviewing surveys: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.

The article is an interesting introductory text to the topic of semantic change, covering a wide body of literature that is relevant to the topic. I particularly liked the framing of the topic within theoretical frameworks in the humanities and how those notions relate to research in knowledge representation and semantic web (Section 2). Generally, I agree on the value of the distinction between knowledge based approaches and language oriented ones. However, maybe the distinction is not that sharp and mainly related to the research communities being isolated?
It could be useful to summarise the findings in a more structured way, for example, providing a (simple) taxonomy or visualisation of the different theoretical approaches and their similarities and differences. In general, I found the article very interesting in the content collated but without adding much to that in terms of insight.

One negative point is related to the (lack of) methodology, i.e. why the papers mentioned, where they come from, what type of body of sources were considered and why. Without a clear methodology it is hard to evaluate how comprehensive is the survey (2) and how much balanced it is (for example, I expected some work in discourse analysis to tackle issues related to concept change - I am not an expert on that so I cannot claim there is any - but without a methodology section it is hard to assess why it is not there, if it does not exist or it is not relevant, or what else…).

I generally agree on the structure of the survey. However, the sections seems rather disconnected. How the various NLP methods for detecting semantic change relate to the theoretical frameworks? Are they based on different or the same underlying assumptions on the nature of the problem? The same consideration can be made for sections 4 and 5. Section 5.3 is very similar to 5.1 in content, I recommend to join the two sections as there is a lot of overlap. I am not sure that tools for converting structured linguistic resources to linked data are relevant to the issue of semantic change.

Finally, the article does not elaborate on the proposed material, e.g. it does not organise it in a taxonomy, there is no discussion on the open problems and challenges and on the research agenda. Overall, it is a (very well written) list of references relevant to the topic which are presented individually but without providing sufficient insight on the state of affairs.

The article is very well written and clear in the presentation (3).
The material is important to the SW community (4).

The article can be improved in the following directions: (a) a methodology that gives the boundaries to the survey, (b) a summary/structure/map of the material, with links across sections, and (c) discuss the findings, possibly in the light of a research agenda

Review #3
By Thierry Declerck submitted on 19/May/2021
Suggestion:
Minor Revision
Review Comment:

I think the survey article is very good. Clearly written and following a clear cross-disciplinary ambition. I also found that the presented and discussed references are quite complete. I didn't find a lot of comments to add. But some questions/suggestions.
1) Is there an inconsistency in introducing references? Sometimes by the name of the author(s), like „Richter accounts for the distinction“, and sometimes by the reference numbers, like „[11] proposes an adaption of Kuukkanen’s and Wang et al.’s interpretations“.
2) Are there also synchronic „concept shifts“ (between (sub-) domains)?
3) The authors mention semasiological and onomasiological approaches. It could be good to add a section on Terminology, as those two aspects are central to this field. And Terminology seems to be a field in which this tension between the wished stability of the concepts and the continuous changes in language use is crucial use (see work by Christophe Roche or Sowa).
4)In the description of OntoLex-Lemon (section 3.1), I am missing a description of the LexicalConcept class. It seems to me that this class is important for encoding „conceptual shifts“, also as it includes a dedicated property called „definition“. And it has also a link to the ontological world. My feeling is that the authors are focusing exclusively on the "Sense" and "Reference" aspects of the models. If we are talking about concept shifts, why not consider the "LexicalConcept" side of OntoLex-Lemon? Also it seems that Terminology would be best represented around this class and the associated properties (the most important one being "definition").
5) Maybe the authors could have a look at the RDF* W3C Community, which is aiming at, among others, adding temporal aspects to edges/proeprties. This could offer a solution for future work?