Using Background Knowledge to Enhance Ontology Matching: a Survey

Tracking #: 1525-2737

Zohra Bellahsene
Clement Jonquet

Responsible editor: 
Jérôme Euzenat

Submission type: 
Survey Article
The ontology matching research community has been very active since a decade. Recently proposed state-of-the-art approaches promote the use of external resources or previously discovered mappings as a Background Knowledge (BK) for enhancing the ontology matching quality. Several important questions related to the use of a BK arise: (i) in which cases the use of the background knowledge is justified and necessary ? and (ii) what is the tradeoff between the complexity of the alignment methods and the background knowledge in terms of the quality of matching and time execution? Another interesting issue is the selection of the most useful BK for a given ontology alignment task. In this paper, we review the different approaches in respect to the kind of background knowledge used and implemented ontology matching techniques and provide a synthetic classification of these approaches. Furthermore, we address the problem of BK selection by providing a review of existing methods. Finally, we provide a comparative experimental review of BK-based alignment systems by analyzing their performance results obtained during Ontology Alignment Evaluation Initiative (OAEI) 2012-2016 campaigns. We thus evaluate the benefit of using a BK and the improvement achieved by these approaches regarding to the systems that do not use a BK.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 20/Jan/2017
Review Comment:

This paper is a survey of the existing literature, techniques and state of the art tools on ontology matching using background knowledge. It covers the topic with exhaustivity and a very clear and understandable style so as to to be a good introductory text to researchers, PhD students and practitioners who are interested in a readable and comprehensive introduction on the matter.
The survey covers with a very good balance all the different aspects of the topic treated (introduction, definitions of approaches, main issues, classification of tools, empirical part). The discussion part aims to answer some research aspects posed at the beginning of the survey.
The material covered in this survey is relevant for the Semantic Web community, as the problem of whether and how to use and make sense of all the already available information to find new related information is one of the still crucial problem for the Semantic Web field. The experimental part of the survey shows how promising are approaches using background knwoledge with respect to classical approaches.

Review #2
By Daniel Faria submitted on 27/Jan/2017
Minor Revision
Review Comment:

This survey provides an overview on the use of background knowledge in ontology matching, by classifying and organizing existing approaches, and analyzing the results they have obtained throughout the years in the Ontology Alignment Evaluation Initiative (OAEI).

With respect to the relevance of the topic, the use of background knowledge is extremely important to the Ontology Matching community, and to a lesser degree, to the broader Semantic Web community. It has been the target of sufficient original research work that I believe a survey of the topic is merited and would be useful to the community.

Regarding the manuscript, in terms of form, I found it well organized and clearly written, despite a number of grammar and word-usage errors which need correcting. In terms of content, I found it fairly comprehensive in its coverage, and accessible as an introductory text on the topic. However, there are several issues that need addressing before the manuscript is acceptable for publication.

Content Issues (in the order in which they appear in the text):

- Section 2.1; Background Knowledge: "we can define it as any external resource of knowledge which the use can improve the matching result"
This definition is erroneous: it corresponds exactly to the definition of a "useful BK" given later in the paper, and clearly the definition of "BK" and "useful BK" cannot be the same. While the intent of using BK is indeed to improve matching results, intent and outcome are not always coincidental. A knowledge source doesn't cease to be BK if it isn't proven useful, and defining it as such wouldn't be very helpful, because BK usefulness can only be proven a posteriori in the cases where a reference alignment exists.
What can be said about BK in OM is that it is any external knowledge resource that provides information, be it lexical or semantic, about the domain(s) of the ontology matching problem or some of the entities therein.

- Section 2.2; first paragraph: "the simple use of lexical resources such as WordNet to enrich concepts with synonyms before matching ontologies is not considered as BK-based matching in this paper"
I'm puzzled by this statement as it seemingly contradicts the authors' broad definition of BK given previously. Why wouldn't the use of BK for lexical enrichment qualify as BK-based matching? This division between the use of BK for lexical enrichment and its use as an intermediate seems arbitrary and artificial, as the two approaches are to some degree interchangeable, and will often lead to similar results. For instance, say that ontology S has the concept "hair" and ontology T has the concept "fur", and that "fur" and "hair" are WordNet synonyms. I can use WordNet as a lexical resource to enrich S and T by adding "fur" as a synonym of "hair" and "hair" as a synonym of "fur" respectively, which will enable me to map the two concepts. However, I can also use WordNet as an intermediate "ontology" (treating its synsets as classes) which will enable me to find: S(hair) = WordNet(hair) <=> WordNet(fur) = S(fur); thereby arriving at the same mapping. And please note that this duality is not exclusive to the WordNet: BK ontologies may generally be used as intermediates, but it is also possible to use them for lexical enrichment. In fact, this is something done by AML (using UBERON) which accounts in part for its success in the OAEI's Anatomy track.
While I understand that the authors wish the main focus of the paper to be on the "standard" use of BK sources as intermediates, they must also provide some coverage of the use of BK for lexical enrichment if they wish their survey to be comprehensive. Even if they still wish to differentiate between the two approaches, they should at least acknowledge and describe the latter, rather than outright excluding it from the manuscript.

- Section 2.2: definitions of "useful BK" and "BK selection".
The definitions given are correct only in the context of using a single BK source. In a multi-BK setting, "BK selection" aims at finding the best combination of BK sources, and the concept of usefulness must consider not only the direct alignment, but also the alignment using other BK sources. Since the authors do contemplate the multi-BK setting in section 4.2, they should provide definitions contemplating the multi-BK setting in addition to those for the single BK setting at this stage.

- Section 3.2.1: " The size of the BK is generally large with respect to the ontologies to be aligned."
This is doubtful and unsupported. In the common case of using an ontology as BK, there is no reason to assume that the BK ontology will be any larger than the ontologies being matched, and I wouldn't expect it to be so on average. It may indeed be larger in some cases (e.g., if you use a broad multi-domain biomedical resource such as the UMLS metathesaurus as BK) but it may also be smaller - e.g., when the BK source is an upper level ontology or when using BK sources that cover only a small part of the large ontologies being matched (e.g., in the SNOMED-NCI matching problem from the OAEI) - or be of approximately the same dimension.

- Section 5 - Table1
Not all the BK selection methods listed and discussed throughout the text are summarized in Table 1. Namely, references [16], [18], [21], and [25] are left out.

- Section 6.1: "AML-BK does not dynamically select its BK but uses a preselected BK that offer a good mapping gain score."
This is untrue. While there is a "preselection" of sorts in the sense that the BK sources available to AML are only a few, AML does test each of them in all biomedical tracks of the OAEI, using the mapping gain methodology to select which source(s) to use in each case. Within the set of available BK sources, there is no preselection of which BK sources are used in each track - for instance, DOID is only useful in SNOMED-NCI but it is tested by AML in all other biomedical tasks. However, the process is deterministic, and thus we know what BK sources will be selected in each track after testing the algorithm, which is why we listed them in the OAEI paper.
Please correct this statement and revise all subsequent statements on this subject, namely in Section 6.4 and in the last paragraph of the conclusions.

- Section 6.5; first paragraph
While increasing recall is indeed the goal for which we use BK, and some loss in precision can be expected in general by using BK, the authors are overlooking a key piece of the puzzle in this discussion of the OAEI results: the fact that the LargeBio reference alignments are derived from the UMLS Metathesaurus and thus not actual gold standards. Concretely, in the task where the greatest losses in precision are observed (Task2), the reason for these losses is that the NCI Thesaurus includes a small branch on mouse anatomy in addition to its branch on human anatomy, and that by using UBERON as BK, tools obtain mappings between both these branches and the FMA. Now, because the UMLS is focused on human health, it does not include mappings between the NCI's mouse anatomy branch and FMA, but that doesn't mean that such mappings are incorrect. Being a cross-species anatomy ontology, UBERON does include them, and explicitly at that, in the form of cross-references (which are essentially mappings, and are manually curated). Thus, the substantial loss in precision that some tools experience in Task2 is mostly artificial, due to the incompleteness of the UMLS-derived reference alignment.
This does not mean that the authors' final statement of the paragraph is not valid. But they should be careful not to give too much highlight to the issue of precision. Also, the final statement needs to be elaborated as it is too shallow for a non-expert to follow.

- Section 6.5; second paragraph
Again, this discussion is too shallow. Faced with the evidence that the biomedical domain is the main target of BK usage in the OAEI, the authors should first explain why that is before drawing any conclusions. Concretely, in the biomedical domain there is a marriage of necessity and opportunity: on the one hand, the vocabulary is complex and specialized, which limits the effectiveness of generic lexical resources such as the WordNet; on the other hand there is an abundance of ontologies with overlapping domains, which can be exploited as BK. This combination of factors is particular to the biomedical domain, and thus it does not make sense to expect a comparable use of BK in other domains.
Furthermore, it is not true that BK is not used in other OAEI tracks. For instance, the WordNet has been used by a number of tools in the Conference track. However, the authors have (erroneously, as I discussed above) discarded this use of the WordNet as not being "BK-based matching". I don't really understand what sources of BK other than the WordNet the authors would expect to be useful in this domain.
I also don't understand why "BK-based matching" approaches being adopted in other OAEI tracks would allow for a better evaluation. I agree that it would allow for a broader evaluation, but "better" suggests that there is something lacking in the evaluation in the biomedical domain, which is untrue - it is unquestionable that these approaches are effective in that domain.

- Section 7; question 2:
Here, again, the authors are discarding the use of the WordNet for lexical enrichment as a valid usage of BK when they state that BK is only used in the biomedical tracks. The reasons behind the focus of BK approaches on the biomedical domain, which I detailed in the previous point, should be used to elaborate and clarify this section.

- Section 7; question 3:
The conclusion of the authors is based upon a false premise. AML does not implement less similarity metrics than AgreementMaker employed in the Anatomy track; it implements equivalent metrics but does so more efficiently. Thus, AML's workflow for Anatomy is not simpler than AgreementMaker's, it merely is computationally more efficient. Moreover, the effort towards a more efficient workflow was not motivated by the usage of BK, but because AgreementMaker could not cope with the LargeBio ontologies.
It should also be noted that AML obtained very solid results in Anatomy even without the use of BK (0.886 F-measure, higher than AgreementMaker's best result without BK, from 2010), and the same is also true for LargeBio as well as for other tracks. Thus, while it is true that a large part of AML's success in biomedical tracks is due to its use of BK, and I would also agree that the use of adequate BK can replace more complex matching algorithms, the authors cannot infer the latter from the former.

- Section 7; question 4:
The part about AML having selection done a priori is erroneous, as detailed above. AML employs the algorithm detailed in reference [16] which has linear time complexity with regard to the size of the ontologies to test as BK. AML does benefit from having a small universe of ontologies to choose from in its OAEI configuration, whereas LogMap has to cope with all of BioPortal, but nevertheless, algorithmically, the two approaches should be comparable. Additionally, LogMapBio is slow not only because of the number of ontologies it considers, but also because it relies on BioPortal's API to access them and this creates a bottleneck in performance.

Form Issues:
- The BK acronym:
If the authors define BK as being "Background Knowledge" (the concept), then they should refer to specific objects as "BK (re)sources" rather than simply BK. Alternatively, they should redefine BK to mean "Background Knowledge (re)source" and abstain from using it to refer to the general concept. Using the acronym both to refer to the concept of BK and to the objects that are instances of that concept makes the text confusing.

- Grammar & word-usage errors:
There are a number of these throughout the document, and it needs a careful revision by a fluent English speaker.
In the abstract alone:
"active since a decade" -> "active FOR a decade"
"in which cases the use of the background knowledge is justified and necessary?" -> "in which cases IS the use of the background knowledge justified and necessary?"
The former is the most common grammar error in the manuscript, occurring multiple times - failure to observe subject-auxiliary inversion in interrogative statements.
"regarding to the systems" > either "regarding the systems" or "with regard to the systems"

Review #3
By Marta Sabou submitted on 03/Apr/2017
Major Revision
Review Comment:

This is a survey paper focusing on a sub-topic of the ontology matching field, namely background knowledge (BK) based ontology matching. Several approaches using background knowledge based techniques have been reported in the literature within the last decade. Therefore, this is a timely survey of this particular branch of the ontology matching field. Having said that, the covered material is likely to be interesting to a rather narrow subset of the broader Semantic Web community.

The paper identifies a general process for background knowledge based ontology matching and classifies the various approaches in terms of key criteria in the process. A special focus is put on reviewing the methods used for BK selection. Finally, the authors provide insights into the capabilities of background knowledge based ontology matching techniques by relying on experimental data derived from the OAEI campaigns, which is a strength of the paper.

Nevertheless, the paper in its current form falls short of the expectations of a suitable introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. In particular, the following aspects need improvement:

1)The formalization of the background knowledge based ontology matching problem and relevant subproblems, e.g., the formal description of the evaluation measures, should be improved. Also, this formalization should, whenever possible be carried over into the main part of the survey, when characterizing individual approaches.

2)The classification of the approaches should be reconsidered. The currently proposed classification scheme captures the existing works into one particular perspective (i.e., dividing between approaches using one or multiple BK sources). Readers interested in other dimensions, e.g., use of ontological vs. non-ontological BK sources are at a loss as this information is rather difficult to collect. Therefore, another approach for categorizing the approaches should be investigated – for example, a table, in which all classification criteria have the same weight.

3)Individual matching approaches are often rather briefly described and do not always reflect all classification criteria. More and more complete details should be provided about the surveyed approaches.

4)The paper dedicates a section to overview BK selection mechanisms. To ensure that this survey will act as a suitable material for researchers interested in diverse aspects of this problem, similar sections should be dedicated to the other key stages of the process, e.g., anchoring, relation derivation, aggregation and selection, to provide an overview of typical techniques used for solving those stages.

5)The discussion of emerging topics in the area of background knowledge based ontology matching that should be investigated in the future by the community is very brief. Authors should put more effort into deriving a “research agenda” for this field as a conclusion of their paper.

Readability and clarity of the presentation: While the paper is generally easy to read, the clarity of the presentation should be improved. Besides improving and extending the formalizations used, several smaller comments and typos need correction (see following detailed comments). Overall, a language check by a native speaker would be advisable.

Minor Comments:
1) revise motivation: the vision of the Semantic Web and the need for ontology matching were known before and have significantly broader applicability than the context of Web2.0.

2) the statement about ontologies being heterogeneous because “they describe the same domain” is counter-intuitive, as one would think the opposite, that is, ontologies that describe the same domain are homogeneous at least from the perspective of the domain they cover.

3) “Classical ontology matching approaches are based only …” – is a strong statement made without any support to suitable references and without clarifying what “classical approaches actually mean”. Consider softening statement to: “Classical ontology matching approaches are primarily based on…”

4) Sentence “There is agreement that the use of BK improves …” needs reference.

5) Consider using “suitable” instead of “appropriate”

Section 2:
1)What is an “indirect mapping”? Please define. Also, consider rewriting the caption of Fig1 to “Background-based ontology matching.”
2)“F-score” - use F-measure instead to be consistent with equation.
3)Section 2.4 – the meaning of terms used in the formulas that define P/R/F-measure need to be explained in text
4)Section 2 starts with some level of formalization in section 2.1, but this formalization becomes weaker in the rest of the section. The authors should use the formalization of the mapping process to also describe other relevant aspects, such as the meaning of the evaluation measures.

Section 4:
1)“An important characteristic of upper ontologies are their important size” – please clarify the meaning of “important size”.

Table 1 – consider replacing “Aggregation and Filtering” with “Aggregation and Selection” to comply with Fig. 2.

“wise decision” => reconsider what “wise” means.

Consider including also the following extended version of the work presented in [43]: Marta Sabou, Mathieu d'Aquin, Enrico Motta: Exploring the Semantic Web as Background Knowledge for Ontology Matching. J. Data Semantics 11: 156-190 (2008)

Abstract: regarding to => in comparison to
Intro: “Does the use of BK will lead” => “Does the use of BK lead”

Some sentences lack verbs: “Section 6 the evaluation …”; “There several kinds… ” (Section 2); “Also called BK-matching…” (Section 2.2); “Hence the need…” (Section 3.2.3); “Especially in the case of … ” (Section 3.2.3); “In particular WordNet…” (Section 4.1.1); “A semantic technique …” (Section 4.1.1);

Rest of the paper:
-the formalization of the mapping printed rather strangely, please revise character encoding.
-“knowledge which the use” => “knowledge whose use”
-“more rich” => richer
-then” => “more than”
-“biomdeical” => “biomedical”
-“to match biomedical ontologies matching by” => revise
-“work had reused” => “work has reused”
-“Indeed, BK of textual nature …” => please revise and finalize text in this paragraph
-“varietes” => “varies”
-“Does the use of properly selected BK will lead” => “Does the use of properly selected BK lead”