Neighbourhood-based Cross-Language Ontology Matching

Tracking #: 2934-4148

Authors: 
Juliana Medeiros Destro
Gabriel Oliveira dos Santos
Julio Cesar dos Reis
Ricardo da Silva Torres1
Ariadne Maria Brito Rizzoni Carvalho
Ivan Ricarte

Responsible editor: 
Jérôme Euzenat

Submission type: 
Full Paper
Abstract: 
Cross-language ontology alignments play a key role for the semantic integration of data described in different languages. The task of automatically identifying ontology mappings in this context requires exploring similarity measures as well as ontology structural information. Such measures compute the degree of relatedness between two given terms from ontology's entities. The structural information in the ontologies may provide valuable insights about the concept alignments. Although the literature has extensively studied these measures for monolingual ontology alignments, the use of similarity measures and structural information for the creation of cross-language ontology mappings still requires further research. In this article, we define a novel technique for automatic cross-language ontology matching based on the combination of a composed similarity approach with the analysis of neighbour concepts to improve the effectiveness of the alignment results. Our composed similarity considers lexical, semantic and structural aspects based on background knowledge to calculate the degree of similarity between contents of ontology entities in different languages. Experimental results with MultiFarm indicate a good effectiveness of our approach including neighbour concepts for mapping identification.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Bo Fu submitted on 28/Dec/2021
Suggestion:
Major Revision
Review Comment:

This paper presents a solution to overcome mappings of pair-wise ontologies that are in different natural languages. The paper also reports results from a series of experiments using the multilingual datasets taken from the OAEI MultiFarm track. The topic is highly relevant to the journal and this work is of interest to the semantic web community, though there are a number of concerns that would require authors’ attention, as discussed in detail below.

Major concerns:

- Originality -

The biggest concern is that the novelty of this work is not immediately clear, considering that utilizing pivot languages (e.g. as previously demonstrated in [10]), similarity measures (e.g. several categories of algorithms to compute similarity measures such as node-based, edge-based etc. in [3]), background knowledge (e.g. [21] uses WordNet synsets), and neighboring entities (e.g. [21] makes use of structural information in the given ontologies that are referred to as source and target semantic surroundings) have all been investigated to date and that these strategies are not new. Moreover, the weighted overlap that was used to generate the composed similarity measure in this paper was originally proposed in [4]. Given these prior research efforts, it becomes critical for this paper to identify its contribution, and in particular, to outline the innovation of this work that advances the state of the art.

- State of the art -

The MultiFarm dataset was first introduced by the OAEI in 2012, and multiple participants have contributed to this track since then, but many of them were not mentioned or discussed in the related work section. For the purpose of completeness and the need to clarify the novelty of this paper, it would be necessary to include reviews of those algorithms that have participated in the MultiFarm track, discuss how they may have motivated this paper, and how this paper differs from these prior efforts within and beyond the OAEI community.

The discussions of the related work are relatively brief. As it currently stands, some may call it a short laundry list, as it lacks one-on-one comparisons to the proposed solution presented in this paper. It would be helpful to include critiques of prior work, e.g. how each prior work relates to this paper, what are their limitations, and how this paper differs/improves upon these limitations etc.

Considering some solutions outlined share certain methods in their solutions (e.g. background knowledge has been used by a number of algorithms), it would be helpful to present a table that gives a visual overview of the features used by existing solutions. The readers can then easily compare across existing algorithms, and find a set of checkmarks under corresponding column headings (e.g. thesauri, machine translation, structural information, etc.) in this table for each algorithm reviewed. This will also help with clarifying the technical innovation of this paper compared to prior work.

- Technical innovation -

The matching algorithm presented in this paper appears to be an implementation of mix-and-match elements previously proposed [3, 4, 10, 21] without much deliberation on the rationale or novelty. For instance, machine translation comes with a set of challenges that are yet unsolved in their own right, while it can be a struggle to translate one natural language into the other, let alone translating each source and target language into a third one. While some scenarios may call for the use of a pivot language, it is uncertain however that routinely utilizing pivot languages in every scenario is necessary or beneficial. It is debatable whether doing so is of any benefits, as arguably it may end up doubling the noise (i.e. poorly translated entities) now that machine translation is to be applied twice instead of once. It would be necessary to present convincing evidence, e.g. from experiments, that there are significant benefits that overwhelmingly outweigh potential drawbacks in the use of pivot languages in every crosslingual ontology matching scenario.

Furthermore, in the choosing of a pivot language in the context of this paper, e.g. choosing English over Spanish or some other language, how does one go about making this choice? What are the potential factors that may influence this decision? How does one know the chosen one is the most appropriate pivot language to use amongst all other options? The drivers behind such decisions would hopefully not be a matter of convenience.

Improving the correctness of the mappings was highlighted as a major aim in the paper (p.2 and p.4), and it seems not much efforts were invested in improving the completeness of the mappings. Though in some cases, it may be desirable to place greater emphasis on mapping completeness rather than correctness. What is the rationale for focusing on precision over recall (as opposed to focusing on both), considering F-measure was used to evaluate the overall mapping quality?

- Evaluation & Significance -

Given that the F-measures in Table 5 are largely in the range of 0.30+ ~ 0.60+, it is somewhat unconvincing to claim significant improvement over prior methods, especially since only selective results from OAEI 2018 were included as a benchmark in the evaluation, and not supported by any test of statistical significance.

Moreover, what is the rationale for designing the experiments using the MultiFarm datasets released in 2015 and the OAEI results reported in 2018 when comparing the proposed solution to other existing matching techniques? Such an experimental setup may give the impression that benchmarks were cherrypicked so that the proposed solution would seem superior. It would be necessary to provide justifications of these decisions in the evaluation.

The experimental results showed that the highest F-measures were generated from English & German ontologies, is this sufficient evidence of the effectiveness of the proposed solution, or a byproduct of the circumstance, i.e. English is a Germanic language and this language pair in particular is notably closer to each other lexically speaking compared to other language pairs?

- Clarity -

It is not clear whether Fig. 1 presents a class hierarchy or not. If so, it is inconsistent with is-a relationships as it currently stands. It would be necessary to clarify the entity types. Also, it would be helpful to include examples of what this definition entails for the neighbor concepts of object/data properties & instances (beyond examples for classes only).

It would be helpful to make clear in earlier sections of the paper (before the definitions and the experiments) how minimum threshold is determined in a given mapping scenario, and how it relates to the default threshold, particularly when both thresholds are in the same [0, 1] range. Likewise, how they relate to “a doubtful range” (in the range of [0, 1] as well) etc. Is the default threshold always 0.95, or is it set at free will?

The example in Fig. 5 seems to present a very convenient scenario for the proposed solution, what if the target ontology had used “individual”/“human” instead of “person”? Could we then argue that Fig. 5 is simply a case of having used labels that are lexically closer?

Minor issues:

It is inaccurate to categorize SOCOM++ as a participant of the OAEI MultiFarm track, as this work concluded in 2011 prior to the introduction of MultiFarm.

“The obtained results indicate that syntactic and semantic similarities may have different weights in order to obtain a good accuracy.”
What is a “good” accuracy, e.g. is 0.8 considered good, or will 0.3 suffice? Do we mean precision, recall, f-measure above a certain threshold?

“There is a growing number of ontologies described in different natural languages.”
It would be helpful to provide some references such as the number of multilingual ontologies by domain/topic/language etc. to demonstrate “growth” as stated.

“This dataset has been extensively used to assess cross-language ontology matching methods.”
It would be helpful to support this statement with some numbers, e.g. since its introduction and over the years, the “extent” to which MultiFarm has been utilized.

“Several approaches have explored the translation effects and the user of a third language in cross-language ontology alignment.”
[9] gives an example of machine translation effects and [10] gives an example of pivot languages, but one example in each case cannot support a claim of “several”.

“…whenever the initial value of composed similarity is in a doubtful range…”
“Doubtful” is typically perceived as a personal feeling, it would be nice to use another phrasing that does not emphasize subjectivity so much. It would be helpful to have the definition of “doubtful range” (first appeared on p.7) in earlier sections of the paper.

Not sure what Fig. 3 contributes since the same information is also presented in Fig. 4. It seems somewhat redundant.

It is not clear why Table 3 is needed, as this information may be condensed in a sentence or two.

Review #2
By Jorge Gracia submitted on 17/Jan/2022
Suggestion:
Major Revision
Review Comment:

This article describes a novel method for automatic cross-lingual ontology alignment, based on lexical, semantic, and structural aspects of the ontology. It combines a number of similarity metrics, relying on Google translate and BabelNet for the crosslingual aspects. An analysis of the similarity of neighbouring concepts (at depth 1) is added in case that the initial similarity between terms is not conclusive enough. A preliminary evaluation with the Multifarm track shows promising results.

The paper is well written and organised in general. The topic, cross-lingual ontology matching, is a timely and interesting one. In fact, this is a challenging research area in which there are still a lot of room for improvement. The main novelty of this work, not sufficiently emphasized by the authors, is the use of BabelNet as a source of background knowledge for cross-lingual ontology matching, through the use of NASARI vectors.

The main drawback of this approach is the evaluation setup. The authors tuned their system with the same dataset used for testing. Ideally, they should have split the evaluation data in a development and testing part, to carry out the evaluation with alignments not previously seen by the system. Therefore, a comparison with other participant systems in MultiFarm is not completely valid, since MultiFarm participants were evaluated against a blind test data. In particular, the authors chose the "Conference" ontology, and 45 language pairs out of the 55 available language pairs in the evaluation dataset. The authors should explain why this choice of languages.

The results shown in table 6 does not correspond to the ones published by the OAEI'18 organisers. Maybe the authors have filtered the participants' results to the ones of the "conference" ontology only and only for the 45 language pairs examined and re-computed the metrics. However, this details are not present in the paper and the source of such numbers is therefore unclear.

Another issue is the fact that MultiFarm considers two types of alignments: type (i) between different ontologies and type (ii) between the same ontology (in different languages). This work only addresses type (ii), which reduces considerably the interest of the evaluation. As stated by the OAEI organisers "for the tasks of type (ii), good results are not only related to the use of specific techniques for dealing with cross-lingual ontologies, but also on the ability to exploit the identical structure of the ontologies". For this reason, purely monolingual methods that are good at compute structural similarities could give good results in type (ii) but perform poorly in type (i). It would be important to check whether the proposed system behaves well also for type (i) alignments.

The related work analysis is also problematic. The reported SoA of systems participating in MultiFarm is outdated, since there have been two more recent campaigns after 2019. The table of OAEI MultiFarm results is the one of 2018, disregarding the last three OAEI editions. Maybe there is a good reason to compare with 2018 only, which however is missing from the document. The table includes a system not described in the SoA (XMAP).

The source code is not publicly available (it is behind login/password in their institutional repository https://gitlab.ic.unicamp.br/jreis/evocros ) therefore the reproducibility aspects couldn't be tested properly.

In summary, this work is not yet ready for its publication as a journal article, in my view. To address the aforementioned issues, the authors should extend the evaluation setup by considering type (i) alignments and clearly separating development and test data. Related work needs improvement also, as well as other issues described later in this review. It would be a good idea to participate in the next OAEI campaign.

Other issues:

* The classification of the CL matching systems given in the background section is not satisfactory, since the "information retrieval" approaches also include translation-based systems (e.g., KEPLER)

* In the introduction, "As differences between the used alphabets hamper the use of simple string comparison techniques, similarity measures play a key role..." I would add "semantic" here: "...semantic similarity measures play..."

* The authors state, in the background section, that "Our approach differs from the above-mentioned proposals because we combine both semantic and syntactic similarities by computing the composed similarity assigning weights to each similarity measure". This is not completely true since a weighted combination of syntactic and semantic similarities is something extendedly used in OM systems. I see the novelty in other aspects such as the use of BabelNet and the fact that they do not only rely on a translation system for the cross-lingual aspects, as most other systems do.

* In the literature review, the authors should make clear that the reported methods on background-based ontology matching are just an illustrative sample, because the SoA on that matter is more extensive and the authors' analysis is far from exhaustive. The same applies for combined methods (lexical + structured + semantic based) which are all over the place in the OM literature but the authors only mention one work (Nguyen and Conrad).

* In definition 3.1: "Each relation r(c1, c2) in R" should be "Each relation r in R ". The definition of ontology is incomplete since it does not consider individuals.

* The neighbourhood of relations is not defined in Definition 3.2

* Definition 3.3 is not really formulated as a definition (I'd rather expect "We define CL Ontology Alignment as..."). Similarity (s_ij) is not adequate in the definition, I'd say "confidence degree". c1 = "Cabeça" and c2= "Head" is also wrong in my view, there is no identity relation between the label and the concept (the latter is much more than a label). They could say, instead: concept c1 with associated label "Cabeça" etc.

* In definition 3.4 (mapping), the last sentence unnecessarily restricts the notion of mapping to string similarity. Actually the author's method does not fulfil such a definition (they do not only use string similarities).

* Definition 3.5 uses similarity and relatedness in an interchangeable way, while they are not the same concepts. Apparently 3.5 is a definition of similarity in general but ends up defining syntactic similarity (syn).

* When describing NASARI vectors, it is unclear what they mean by "contextual information". This description is a bit obscure and it is difficult to know which information is used to build the concept's vector.

* In Section 4: "These ontologies are converted to an object, preserving the relations and neighbourhood relationship between concepts." It is unclear which object they refer to (an object in the OO implementation?).

* Comparisons are made among entities of the same class, then those mappings above a threshold are kept. What happens if there are more than one candidate mapping for different entities? The paper does not mention any selection strategy (other OM approaches uses the Hungarian algorithm, for instance).

* It's unclear which information that characterise the ontology entities (e1, e2) is passed to BabelNet apart of the natural language tags and labels, and how the system deals with polysemy when getting the translations (w1, w2).

* Table 4 is unnecessary. They can simply list the chosen thresholds and weights and state that all the possible configurations were run.

* In the experimental section, it would have been a nice addition a quantitative analysis (not only based on examples) of the impact of using the neighbourhood, to better justify why this is not always used.

* In section 6 the authors state that "it might be useful considering semantic algorithms such as stop-words elimination and stemming, etc. [...]". Notice that these are NOT semantic algorithms but syntactic ones.

English is good in general. Some suggested improvements:

* Section 1 "Our experiments suggest that the threshold, language in which the ontologies are described and translation tool play an important role" -> "...the language in which the ontologies are described, and the translation tool play an important role"

* There is a strange encoding problem when citing Dowling and Gallier in the background section

* The sentence starting "The threshold 0.95..." in the last paragraph of section 5.1 is not understandable.