Review Comment:
This survey provides an overview of the main approaches in the Semantic Web and NLP to detect and represent semantic change in the context of digital humanities research, specifically in the study of semantic shift in historical texts.
*Summary*
1. The topic of the paper, at the intersection of humanities and the Semantic Web, is interesting and relevant for the advancement in a line of research which poses numerous challenges still.
2. The quality of writing is good and the survey is well balanced, with a broad coverage encompassing theoretical standpoints and approaches, tools, repositories and datasets.
3. The granularity and length are also appropriate for the text to serve as an introductory text.
The need for *major changes* is due to several aspects:
- The motivation to combine digital humanities approaches with semantic technologies in the first place, and/or the need of this survey specifically, are points not clearly addressed.
- The survey requires a clearer definition of its scope and limitations
- The current structure of the paper and the guided thread through the different sections make it difficult for the main message to come through: the sections and subsections alone are clear, but not the connection between some of them or the placement of specific content into specific subsections.
*Detailed review*
MOTIVATION: In order to serve as an introductory text (see survey criteria), the paper should provide some points on the added value of semantic technologies in particular to digital humanities and the study of semantic change. This would illustrate the motivation of combining the two lines for comers from any of the two areas. Similarly, the contribution of this survey to other surveys on the topic (or the lack of them, if applicable) is not mentioned. For readers outside the community of the NexusLinguarum COST Action, the motivation and gaps in the literature behind the use case that moves the authors to embark on this survey and how the contribution addresses them are points not highlighted from the very beginning.
SCOPE: The paper focuses on the study of semantic change for a use case requiring the creation of diachronic ontologies. In an advanced section of the text, the authors refer to the challenges of historical text in particular. The scope seems to be specifically historical text in digital humanities, which would leave the studies of lexical semantic shift across the last decades out (e.g. [1]). The focus on historical text is not a drawback of the study, but the manuscript needs a clearer delimitation of its scope.
[1] Wegmann, A., Lemmerich, F., & Strohmaier, M. (2020). Detecting different forms of semantic shift in word embeddings via paradigmatic and syntagmatic association changes. In International Semantic Web Conference (pp. 619-635). Springer, Cham.
STRUCTURE:
1. The first core section of the paper starts with an overview of the theoretical frameworks to study semantic change. The authors organise this section in two subsections, for efforts that "depart from knowledge or from language". I suggest that the main differences between these two blocks are briefly outlined in more detail at the beginning of that section.
2. Relation between sections 2 and 3: In the "knowledge-oriented" block (Section 2) there are approaches that turn to Semantic Web technologies. Later, on Section 3, LLOD formalisms, we also have approaches that are using semantic technologies, but these latter approaches are not placed in relation to the theoretical ones from the previous section. This results in no bridge between Sections 2 and 3. For example, the representation of etymologies in RDF via an Etymology class or the properties cognate or derivedFrom seem to be related to the conversion of etymological linguistic resources (e.g. WordNet, dictionaries, etc.) and seem to be closer to the language-oriented approaches than other works in that same section dealing with the representation of concept evolution as RDF, or with the representation of change following a perdurantist approach. The last two examples would be closer to the knowledge-oriented ones from the previous section.
3. Section 3, LLOD formalisms, deals with the representation of semantic shift in the Semantic Web context in general, not all of the works and resources described there specifically address linguistic linked data.
4. Subsection 4.2, NLP tools and normalisation, is focused on normalisation approaches to deal with spelling variation in historical text. Are there any other features of historical text that are challenging which could also be discussed? (a section devoted to challenges of historical text, for example, instead of a subsection for a specific feature). The following subsection (4.3) addresses NER and NEL in historical text, but the content here is not put in relation to semantic shift detection in particular and why NER or NEL play an important role there. When reading, this causes the focus in section 4 to change from "semantic shift detection" to "tools for NLP for historical texts in general" and then back to specific approaches to detect semantic change (next subsections).
5. Subsection 5.3 addresses both tools such as LODifier, LLODifier and CSV to RDF converters, as well as ontology learning tools. The content should reflect the difference between tools that extract information and link entities to available ontologies from those that perform a shallow conversion to RDF and those that are aimed at ontology learning. This also raises the question whether all tools mentioned there fit into the larger section "NLP for ontology generation" (Section 5), or whether that subsection gathers tools to generate RDF which could be useful for the generation/conversion of LLOD diachronic resources in general, not necessarily ontologies.
6. Section 6, on available diachronic resources, starts with an overview of available LLD resources, their publication, and some resources which are currently being modelled with OntoLex-Lemon. This is a key section given the focus of the paper, so restructuring the next part (p.15 - l.33) in a way that better conveys the authors’ points would help to better understand the state of the art (lack of guidelines, limitations for the representation, lack of ways to ease publication and maintenance of data for non-Semantic Web experts, potentially relevant and available digital resources).
7. Tables (or a single one) to summarize the list of approaches, tools, etc. and their use would help to illustrate the complexity and heterogeneous landscape that is discussed in the conclusions.
MINOR COMMENTS
The whole section 4 deals with NLP techniques and tools to automatically detect semantic change, so subsection 4.1 "Automatic detection of semantic change" could be renamed to “Overview”, otherwise this introductory part seems separate from the one on word embeddings.
TYPOS:
- as well as in the field of Linguistic Linked Open Data (LLOD) -> comma after this (p. 2)
- the terminological point which reveals the development of a terminological notion (p.5, l. 44) -> maybe rephrase
- 5Lexicon Model for Ontologies: Community Report, 10 May
2016 (w3.org)https://www.w3.org/2016/05/ontolex/#semantics -> Missing space
- "LLOD" used throughout the whole paper but both the title and p.6 (l.44) include the mention with parentheses (LL(O)D)
- p. 16, l.24, “learning to deal with the knowledge of the past and its evolution over time, also implies learning to deal with
the knowledge of the future” -> remove comma
- Ref. [35] in the bibliography shows all in capital letters
|