Background Knowledge in Ontology Matching: A Survey

Tracking #: 2952-4166

Authors: 
Jan Portisch
Michael Hladik
Heiko Paulheim

Responsible editor: 
Jérôme Euzenat

Submission type: 
Survey Article
Abstract: 
Schema matching is an integral part within the data integration process. One of the main challenges within the schema matching operation is semantic heterogeneity, i.e. modeling differences between the two schemas that are to be integrated. The semantics within most schemas are, however, typically incomplete because schemas are designed within a certain context which is not explicitly modeled. Therefore, external background knowledge plays a major role in the task of (semi-) automated schema matching. In this survey, we introduce the reader to the general schema matching problem as well as to the ontology matching problem which can be seen as a special case of the schema matching task. We review the background knowledge sources as well as the approaches applied to make use of external knowledge. Our survey covers all ontology matching systems that have been presented within the years 2004 - 2021 at a well-known ontology matching competition together with systematically selected publications in the research field. We present a classification system for external background knowledge, concept linking strategies, as well as for background knowledge exploitation approaches. We provide extensive examples and classify all ontology matching systems under review in a resource/strategy matrix obtained by coalescing the two classification systems. Lastly, we outline interesting and yet underexplored research directions of applying external knowledge within the ontology matching process.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 01/Feb/2022
Suggestion:
Minor Revision
Review Comment:

In a direct answer to previous reviews, the authors reduce the scope of the survey. The text have been re-organised accordingly, including the addition of a new section dedicated to introducing background knowledge resources in ontology matching.

However, when starting reading the abstract, i still have the impression that the survey is on schema matching. So, the first lines of the abstract should be revised. It is the same case for the introduction. The survey is now on ontology matching.

In the introduction, contextual matching has been introduced ("context-based matching, i.e. matching with intermediate resources"), it should be the same for "complex matching".

What about a third search parameter "ontology mapping"?

It is interesting to have the inclusion and exclusion criteria for the papers considered in the survey (and all the big tables being adjusted).

"In the area of semantic modeling, ontologies are typically used" => references?

"In this article, we also cover papers and systems which address the ontology integration problem where background knowledge plays a significant role in the matching phase" => this could be clarified in the introduction.

I still think that introducing precision, recall, etc. does not bring so much as the evaluation of the systems is not addressed.

Table 3 refers to OAEI 2020 and Figures 3 and 4 refers to OAEI 2021 (the best performances are kept in 2021 with respect to those in 2020?).

Figure 9 could arrive early in the text.

Besides these minor comments, the authors have carefully taken into account the comments of all reviewers.

Review #2
Anonymous submitted on 13/Feb/2022
Suggestion:
Accept
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

Review #3
Anonymous submitted on 14/Feb/2022
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

I thank the authors for their thorough revision of the survey paper.
I find it much improved with some minor aspects that could be touched up:

#1 The impact of BK on OM performance is addressed very briefly with a new reference to Dragisic et al 2017. I believe that this is a crucial topic for this survey, since it is a strong motivator for work in this area and would like to see it discussed in more detail. Is using BK worth it? This is a question this survey should aim to answer. A previous review indicated several works where results on this were presented but which were not included in the revision.

#2 There is a very interesting discussion on the multiple biases found. However, the conclusions only report on such biases and not on possible mitigation strategies. A clear one, which is alluded to in a previous section, is that the biases are directly influenced by the existence of public BK sources and also the existence of benchmarks at the OAEI. The authors should take this opportunity to emphasize the importance of establishing benchmarks and open data to support research impact.

#3 “While multiple automatic background knowledge
selection approaches have been proposed (see Section
3.3)” This topic is no longer addressed in 3.3.

#4 Figures 3 and 4 are not entirely correct. I am pretty sure more systems competed in OAEI 2021 than are shown.

#5 It is relevant to clarify that BioPortal contains upwards of 800 ontologies available. This is relevant to understand that LogMapBio is not working over a handful of picked ontologies (as AML, for instance), but rather over a very large repository of ontologies, which actually tackles the challenge indicated in section 8.