Automatic Ontology Population from French Classified Advertisements

Tracking #: 2985-4199

Celine Alec

Responsible editor: 
Aldo Gangemi

Submission type: 
Full Paper
Artificial intelligence has become one of the most studied fields in computer science. It aims at building smart machines capable of performing tasks that typically require human intelligence. One of the challenging tasks consists in understanding texts written in natural language. Semantic Web technologies, in particular ontologies, can be used to help agents representing knowledge from a specific domain and reasoning like a human would do. Ontology population from texts aims to translate textual contents into ontological assertions. This paper deals with an approach of automatic ontology population from French textual descriptions. This approach has been designed to be domain-independent, as long as a domain ontology is provided. It relies on a text-based and a knowledge-based analysis, which are fully explained. Experiments performed on French classified advertisements are discussed and provide encouraging results.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Ludovica Marinucci submitted on 20/Feb/2022
Major Revision
Review Comment:

The submitted article propose an automatic approach for ontology population from French textual descriptions.

In its current state, the article needs major revisions due to several shortcomings, including:
1. Narrative and methodological challenges
2. Structure and language issues

In the below, I address these issues individually, adding some suggestions.

# Narrative and Methodology
The Introduction claims that the proposed methodology is suitable for ontology population from textual descriptions concerning any kind of domain, even if it is based only on a use case concerning real estate house sales announcements, which should be in French.
Unfortunately, the "fake” example presented in Section 3 is not a convincing solution to describe the proposed approach not only because it is not able to address the issues related to the generalization of the approach from a specific domain but especially because of the many linguistic differences between French and English, highlighted incidentally only in the Conclusion. The real data in French (ads descriptions, KG, etc.) of the use case considered should be described in details in Section 3, instead of giving few notes and a link to Zenodo in the (too) short Evaluation, in order to extract and define commonalities, patterns and principles applicable to every domain in a structured manner. In this article, the author describe only the process and populating algorithm from a generic perspective.
The originality of the contribution is not clearly addressed neither in relation of each related work cited, and it is also unclear why many of them are included at all, as they are not used by the proposed approach. Furthermore, some bibliographic references are not up to date and they show little familiarity with the field of ontology design as well as the distinction between ontology and knowledge graph, used interchangeably throughout the article.

# Structure and language
Although the quality of the language is acceptable, there are flaws in almost every paragraph. So I suggest a proofreading by an English native speaker.
Structure-wise, especially the subsections of Section 3 would benefit from being reorganized. For better readability, each layer in Figure 1 should again be inserted into each subsection where the specific layer is described.
In Sect. 3.1, a graphical representation that helps to visualize classes, properties and instances of the ontology for the house sales domain would be really useful. Also necessary to better understand the starting point are: (i) some hints on the methodology used for its design; (ii) Github / Zenodo links (with a detailed Readme) where to find each owl file that should be described in the section. These changes could perhaps shed light on the ambiguity of sentences, such as "The only condition is to have an ontology whose domain is the object's type" (cf. Introduction).
In Sect. 3.2 at least one example (or more) of French text(s) should be described, and possibly translated into English only in the notes, to better understand how matching is handled.
In Section 3.3 real examples in French would help to better understand (i) the problem of the absence of verbs or of their "meaning", (ii) how is it possible to treat indifferently the KG and the MT. Furthermore, adding examples on at least one other domain could lend more credibility to the claim that the population algorithm is domain independent.
In Section 4 it is not clear what are the design criteria of the GS, which perhaps should have also been mentioned at the beginning of Section 3, and represented in Figure 1. Furthermore, it is confusing to find the list of all the tools used during the process and not just those specific to the evaluation.
The Conclusions mention aspects to be addressed in the future that, however, do not find footholds in the paper written as it is now. For example, testing the approach by having as inputs different ontologies representing the same domain is certainly interesting, but it is quite substantial now to choose and detail properly at least one state-of-the-art ontology for the domain of sale houses, describing it in the appropriate section (i.e. Section 3). Furthermore, the problem of adapting the approach to other languages is only incidentally mentioned, when already in the present article it would be really useful to see examples of the treatment of property matching in French to understand how the approach works and when it is possibile, or not, to generalize the process to other languages.

Review #2
Anonymous submitted on 23/Oct/2022
Review Comment:

The submitted paper describes a pipeline to extract an OWL knowledge graph from text related to real estate advertisement.
The authors present the pipeline as an algorithmic method to maximise the useful knowledge to be extracted, according to a quite complex pipeline that uses a sentence splitter, a tree tagger, and several algorithms that implement heuristics to achieve the required performance (which bear a high accuracy against assertion validity, not compared to alternative methods).
While I appreciate pragmatic approaches that include rule-based techniques for knowledge extraction, this work seems too ad hoc to be considered a scientific result valuable for a top journal like SWJ. Originality is low and mainly associated with specific solutions that accommodate the issues encountered in a few examples of a restricted textual domain. Significance, as anticipated, is hardly defendable for this journal. Quality of writing should be improved, including not only English typos and syntax, but also the overall narrative, which looks more like a project report than a scientific paper.