RelTopic: A Graph-Based Semantic Relatedness Measure in Topic Ontologies and Its Applicability for Topic Labeling of Old Press Articles

Tracking #: 2536-3750

Authors: 
Mirna El Ghosh
Nicolas Delestre
Jean-Philippe Kotowicz
Cecilia Zanni-Merk
Habib Abdulrab

Responsible editor: 
Special Issue Cultural Heritage 2021

Submission type: 
Full Paper
Abstract: 
Graph-based semantic measures have been used to solve problems in several domains. They tend to compare ontological entities in order to estimate their semantic similarity or relatedness. While semantic similarity is applicable to hierarchies, semantic relatedness is adapted to ontologies. However, designing semantic relatedness measures is a difficult and challenging issue. In this paper, we propose a novel semantic measure within topic ontologies, named RelTopic, for assessing the relatedness of instances and topics. To design RelTopic, we considered topic ontologies as weighted graphs where topics and instances are represented as weighted nodes and semantic relations as weighted edges. The use of RelTopic is evaluated for labeling old press articles. For this purpose, a topic ontology, named Topic-OPA, is derived from open knowledge graphs by the application of a SPARQL-based fully automatic approach. The ontology building process is based mainly on a set of disambiguated named entities representing the articles. To demonstrate the performance of our approach, a use-case is presented in the context of the old french newspaper Le Matin. Our experiments show that RelTopic produces more than 80% relevant labeling topics as compared to the topics assigned by human annotators.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Angelo Salatino submitted on 25/Sep/2020
Suggestion:
Minor Revision
Review Comment:

Review

In this paper the authors present RelTopic, an automatic approach to annotate old press articles, specifically focussing on 48 “Le Matin” articles from 1910 to 1937. In particular, this approach is based on two main steps. First, extraction of Topic-OPA, an ontology inferred from Wikidata, using the entities found within the articles. Then, they map the concepts within the ontology to the content of the articles.

The paper is written very well, and it is particularly easy to read. The only drawback I have to raise towards its readability is that the paper is excessively long. However, I appreciate the thoroughness and the rigour in describing the approach that brings justice to the number of pages. Indeed, the amount of details gives to the reader a very exhaustive view of the approach.

I found this approach very interesting because it is slightly different from the classic approaches available in the literature, which are based on mapping text/documents to a taxonomy/ontology. This approach looks at the semantic properties of the entities in order to improve the relevance of the mapping.

I will raise some of my concerns as the paper is read.

Page 3, right column, row 38: “c1 and c2 change direction”. What do you mean? Is it because it goes up and down within the levels of the taxonomy/ontology?

Page 4, left column, row 24. To me, it seems that limitation 2 and 3 are the same. Typically, a taxonomy is a monohierarchic structure and does not contemplate multiple ancestors (a feature of a polyhierarchical structure). Therefore saying that something is applicable only to taxonomies it is implied that it won’t work on polyhierarchical structures.

Page 4, right column, row 32. You start the statement of “Natural language processing techniques …”, but you don’t have any reference supporting your claim. You can perhaps add: Salatino, Angelo A., et al. "The CSO classifier: Ontology-driven detection of research topics in scholarly articles." International Conference on Theory and Practice of Digital Libraries. Springer, Cham, 2019.

Another approach worth mentioning in the literature review concerning the automatic generation of ontologies of topics is Osborne, Francesco, and Enrico Motta. "Klink-2: integrating multiple web sources to generate semantic topic networks." International Semantic Web Conference. Springer, Cham, 2015.

Page 6, left column, row 26. The weight of the edge when is not subclassof or instanceof is 0.25. How come you decided this value? Have you tried to tune this value and see how the performance of your approach would change?

Topic-OPA is built using Wikidata (English) taking advantage of the named entities available within the articles. However, the case study articles (Le Matin) are written in French. It is not clear to me, how this language dependency has been modelled within this workflow. Is there a translation process?

Although I like the settings of the paper, I would have liked to see a slightly bigger evaluation. The authors use only 48 paper from “Le Matin”. I am curious to see how this approach performs on another set of articles, or articles in another time period, or articles in another language different from French, or simply on a slightly bigger corpus to see how it generalises. Such a result would let the community appreciate even more this endeavour.

Review #2
By Maria Belen Diaz Agudo submitted on 18/Nov/2020
Suggestion:
Major Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The paper describes RelTopic a semantic measure for assessing the relatedness of instances and topics in Topic ontologies (weighted graphs of topics, instances and semantic relations). Authors describe also a topic ontology, named Topic-OPA, that is derived from Wikidata and demonstrate the performance within a particular application and use-case: selection of the most relevant topics for labelling old articles from the French Le Matin newspaper.
The paper is well written and the contributions, motivation, results are clear and understandable.
Scope and Generality of the approach
Even if the RelTopic measure seems general, it is not clear how to use it within other domains, ontologies or applications. All the approach seems to be very specific.
Although the tittle suggests that the contributions are on the RelTopic measure an important part of this paper is the process of building the Topic ontology Topic-OPA derived from wikidata from the set of named entities of the corpus they want to label.
The problem is “ given an article A and a set of entities N collected from A and a topical structure T , the problem is to find the most relevant topics from T that label A” I would appreciate that the authors describe if and how this approach cover other well-known applications, for example in semantic search, how semantic relatedness help to map a user’s specific search query to multiple equivalent formulations.
Besides, many other measures exist in the realm of domain-independent NLP and other authors have identified a lack on the domain-specific coverage of the resources used by these measures makes them ineffective for use in domain-specific tasks. This could be one advantage of your specific approach. Do you think that RelTopic cover this lack? It is interesting how RelTopic combines two types of measures: Path based + concept based and claim to overcome the limitations of the separate use of each type. However this has been done in description logics community (using LCS measures + role based measures).
In the evaluation the Reltopic is compare only with 3 other measures. A more thoughtful evaluation would be appreciated. Besides from the 3 measures, 2 of them only are applicable to taxonomies and only the distance component is compared.
Each entity in N is compared with each topic to compute its relatedness and then Rank them. Is this process (described in section 2) scalable for big problems or other applications?
Regarding the ontology: it is built ad-hoc. It is a topic ontology named TOPIC OPA and it is only applicable in the domain of old press papers. Not sure if the review of different topics ontologies in the related work section (like IPCC of climate research) is very relevant for this research. Is Topic OPA reusable for other applications?
It is interesting to see the structural measures of TOPIC OPA like the maximum Depth (28), average Depth, ..but there are many details and not readable images that are not so useful.
Authors indicates that: “Topic ontology is built adhoc from an for a given corpus of articles and it cannot be compared with other ontologies” .. that makes me difficult to evaluate and see the generality of the approach. They propose an application based evaluation approach by measuring how well the measure can label a given article compared with human labelling.
The similarity measure is interesting and well described. They combine semantic distances, centrality, and allocation of nodes. The use of configurable weights make also usable for other ontologies (although the behaviour is not demonstrated).
Related work:
Author claims that in the literature, few relatedness measures have been designed as most efforts are directed for designing similarity measures. This sentence needs further justification. Besides there is a lot of research in other communities like description logics, social networks, case based reasoning,.. that also needs to be referenced.
I would recommend this very related work (not cited) (Ted Pedersen et al. ) that compares existing path-based measures, context vector measures, and variations that augment path-based measures with information content statistics from text corpora in the biomedical domain.
As this paper seems very coupled with the ASTURIAS Project (Analyse STructURelle et Indexation sémantique d’ArticleS de presse) I would also appreciate having more information on the type of projects, scope and duration and other contributions.

Review #3
By Silvio Peroni submitted on 28/Dec/2020
Suggestion:
Major Revision
Review Comment:

The authors describe a relatedness measure (a.k.a. RelTopic) based on graphs (which represent an ontology) that can label texts with concepts defining their topics. They accompany the presentation of RelTopic with a section about its evaluation, highlighting the pros and cons of the implementation and tests they provided.

It is undoubtedly an interesting article that addresses a particular problem, i.e. classifying texts through an ontology. The whole process proposed starts from a requirement (and hypothesis) which is to have a good representation of a text via a set of entities available in a knowledge graph (in this context, Wikidata). Thus, supposing that this set is provided for each text to label, the approach proposed is fully automatic.

However, I have some issues with two aspects of the article that must be addressed appropriately. The first one is about the narrative, and the second one is about the evaluation. Both of them are introduced as follows, followed by a final remark.

Thus, while I value the authors' article, I believe additional work is needed to be appropriate for publication in Semantic Web.

# Narrative

Section 2 of the article ("Problem Definition") highlights the particular research problem the authors are trying to address. The whole work is done in the context of a larger project (ASTURIAS) in which the approach proposed by the authors is just one part of the story (i.e. that of work package 3). It would be better to start from the beginning of the introduction on presenting the ASTURIAS, which research problems it poses, and how the authors want to address them (or part of them), thus focussing on the RelTopic algorithm they propose for labelling texts. In this way, the authors can provide a more concrete and pragmatic setting of the general framework they propose. Then, section 2 should be dedicated only to introducing the research problem into consideration and its initial hypothesis.

Also, since the hypothesis that "disambiguated" entities can be a good proxy for representing a text is a crucial aspect of RelTopic proposal, it is essential to convince the reader that this hypothesis is valid, in particular in the context of the project. I know that this information should come from the people working in WP2 (and they are not necessarily the same authors of this article), but having a clear view about the validity of that hypothesis is crucial to claim the robustness of the approach proposed in this article.

Finally, I think it necessary to specify why RelTopic and the entire framework for labelling text have been proposed in the project context. I believe – but this is not explicitly stated in the article – that the framework's goal is to replace humans with software for labelling, in principle, a vast number of texts, something that would require too much human effort to do it manually. If this is the real goal to achieve, it should be explicitly stated - and, honestly, it does have consequences on the evaluation (see below).

# Evaluation

The quality of Topic-OPA and the uses of the scores obtained by using RelTopic indeed is difficult to assess. The best approach to adopt in these cases to check the quality of a whole set of technologies is to see if (a) Topic-OPA + (b) RelTopic + (c) the automatic labelling process enable one to obtain results comparable to what humans do - in particular, if it is true that the goal is to substitute humans with software in the labelling of texts. Of course, this will not provide a specific evaluation for each framework component (i.e. a-c). Still, it would be appropriate to assess the framework overall, when all its components are used in combination, which is adequate for the project from which this work is part of.

However, to do that, one should organise an appropriate testing session with several humans. Something is described in section 8.2.1, but it is very general, and it seems it is not enough to convince a reader that the approach works fine - i.e. in a way which is similar to what humans do. Since even humans can be in contradiction for specific texts, it would be essential to involve, a least, three distinct annotators for each text used in the evaluation, to measure also if there are possible agreement/disagreement between them. My perception is that the authors' framework should show an agreement similar to that shown by humans when they are asked to do the same job. In particular, I think it is not enough to measure how much of the topics identified by the framework matches those returned by (one?) human.

It is also crucial to measure how much the initial hypothesis (i.e. the fact that disambiguated entities may be a proxy of an article) has consequences on the results obtained. In particular, have the entities used as proxies for the texts in the evaluation been obtained using the software developed in WP2? Or, have they been selected by the authors of this article by hand? In both cases, how are the authors sure that such entities represent good proxies for the texts used in the evaluation?

Thus, a clear explanation of how the testing session has been organised, which information has been provided to humans, which newspapers have been considered (better if more than one, unless the project only focuses on Le Matin), which data have been collected, which results have been obtained, must be specified. Reading the text seems that such a testing session was done naively without considering all these variables. If that is the case, the authors should run a new testing session to gather meaningful data. Otherwise, it is necessary to clarify all the passages in the paper if they already did it.

The proposition of such an evaluation should be: to convince a reader that the authors' framework behaves in a way which is comparable with humans.

Finally, it would be good if all the data gathered in the evaluation, including the software implementing the framework and the documents describing the protocol used for the testing session, could be available online for replicability purposes (e.g. in Zenodo, Figshare, Protocols, GitHub).

# A final remark

The approach adopted by the authors seems to be appropriate and generalisable for implementing labelling activities for any text, not only newspaper articles. Is that the case? To what extent? May the authors elaborate more on this aspect?