Neighbourhood-based Cross-Language Ontology Matching

Tracking #: 3102-4316

Authors: 
Juliana Medeiros Destro
Gabriel Oliveira dos Santos
Julio Cesar dos Reis
Ariadne Maria Brito Rizzoni Carvalho
Ricardo da Silva Torres1
Ivan Ricarte

Responsible editor: 
Jérôme Euzenat

Submission type: 
Full Paper
Abstract: 
Cross-language ontology alignments play a key role for the semantic integration of data described in different languages. The task of automatically identifying ontology mappings in this context requires exploring similarity measures as well as ontology structural information. Such measures compute the degree of relatedness between two given terms from ontology's entities. The structural information in the ontologies may provide valuable insights about the concept alignments. Although the literature has extensively studied these measures for monolingual ontology alignments, the use of similarity measures and structural information for the creation of cross-language ontology mappings still requires further research. In this article, we define a novel technique for automatic cross-language ontology matching based on the combination of a composed similarity approach with the analysis of neighbour concepts to improve the effectiveness of the alignment results. Our composed similarity considers lexical, semantic and structural aspects based on background knowledge to calculate the degree of similarity between contents of ontology entities in different languages. Experimental results with MultiFarm indicate a good effectiveness of our approach including neighbour concepts for mapping identification.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject (Two Strikes)

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Bo Fu submitted on 09/May/2022
Suggestion:
Major Revision
Review Comment:

Thank you to the authors for their changes and clarifications provided in the revision and the cover letter.

It is said that the novelty of this paper lies in the weighted approach while combining various elements found in the existing literature (such as neighboring entities etc.). One distinction the authors have made is how the neighboring entities are being utilized, or rather, the timing of its utilization, i.e., to validate the soundness of a potential mapping (in this paper) vs. an input variable in the generation of a mapping (used in prior related work). This is somewhat puzzling however, as long as these neighboring entities have influenced the final mapping outcome, aren’t they input?

The title of the paper, as it currently stands (“neighborhood based…”), can be easily interpreted as a method that generates mappings based on discoveries/knowledge learned from the neighboring entities. The notion of weights - the key contribution of this paper is not emphasized adequately. It may be necessary to reconsider a more accurate description.

Considering weighted overlap was originally proposed in [4], it is essential that comparisons are made between this paper and the work presented in [4]. In particular, it would be necessary to include discussions on how this paper differs or expands from [4]. The only narrative in the manuscript only briefly mentions that “Our investigation explores a Weighted Overlap measure [4] relying on the neutral-domain semantic network BabelNet [5]…” Adding comparative discussions will likely strengthen the novelty and the contribution of this paper.

The related work section mentions SOCOM++, but Table 1 presents SOCOM (an earlier prototype with less features) instead. It would be necessary to update this row with the correct information for SOCOM++ that reflect the existing body of knowledge in relation. I recall that SOCOM++ uses external resources including the Google Translate API 0.5, the Microsoft Translator API (later renamed as Bing), the Big Huge Thesaurus API (based on WordNet, Carnegie Mellon Pronouncing Dictionary, and crowd-sourced suggestions), and synonyms-fr.com [20]. Also, syntactic comparisons (utilizing the Alignment API, the LingPipe API) are used to generate mappings (post translation). Likewise, it is important to ensure that the details for the other systems shown in Table 1 are accurately presented.

One of the headings in Table 1 is “lexicon”, which is yet to be defined in the paper. It may be better fitted to use terminologies outlined in section 3 for the purpose of consistency in this table.

Review #2
By Jorge Gracia submitted on 25/May/2022
Suggestion:
Major Revision
Review Comment:

This article describes a novel method for automatic cross-lingual ontology alignment, based on lexical, semantic, and structural aspects of the ontology. It combines a number of similarity metrics, relying on Google translate and BabelNet for the cross-lingual aspects. An analysis of the similarity of neighbouring concepts (at depth 1) is added in case that the initial similarity between terms is not conclusive enough. A preliminary evaluation with the Multifarm track shows promising results.

The paper is well written and organised in general. The topic, cross-lingual ontology matching, is a timely and interesting one. The system uses BabelNet as a source of background knowledge for cross-lingual ontology matching, through the use of NASARI vectors. The main novelty resides, according to the authors, in the weighted combination of semantic and syntactic similarities, leveraging the concept of neighbourhood to improve the correctness of the generated mappings.

This revised version of the manuscript fixes a good number of the minor issues identified by the reviewers, and some of the major ones; for instance the comparison with the SoA has been substantially improved. The access to the repository has been granted. However, there is a lack of a README file with a description of the project.

Despite the interest of this approach and the effort made in improving the paper, some important issues regarding the experimental set up remain open and need to be addressed before a later resubmission:

* The authors run the test on the same data used to tune the system. This is a methodological flaw even if, as stated by the authors, their approach is not based on machine learning models. They have derived their optimal thresholds and weights based on the same data that is used later for testing (instead of testing with unseen data), which might result in the system behaving well with precisely such data; its ability to generalize results remains undemonstrated. This has not changed in the revised version of this work.

* This work only addresses type (ii) (same ontology) alignments in the evaluation disregarding type (i) (different ontologies) which is the most interesting case. The lack of a type (i) evaluation is a considerable limitation, not sufficiently justified in the paper. This has not changed in the revised version of this work.

A couple of minor remarks:

- The definition of ontology does not consider individuals, thus being incomplete. They should define ontology properly an then indicate that the system does not cover individuals, instead of re-defining ontology to accommodate it to the system limitations.

- The labels in Russian appear empty