# Survey on complex ontology matching

### Tracking #: 1879-3092

Authors:
Elodie Thieblin
Ollivier Haemmerlé
Nathalie Hernandez
Cassia Trojahn dos Santos

Responsible editor:
Marta Sabou

Submission type:
Survey Article
Abstract:
Simple ontology alignments, largely studied in the literature, link a single entity of a source ontology to a single entity of a target ontology. One of the limitations of these alignments is, however, their lack of expressiveness which can be overcome by complex alignments. While diverse state-of-the-art surveys mainly review the matching approaches in general, to the best of our knowledge, there is no study taking the specificities of the complex matching problem. In this paper, an overview of the different complex matching approaches is provided. This survey proposes a classification of the complex matching approaches based on their specificities (i.e. type of correspondences, guiding structure). The evaluation aspects and the limitations of these approaches are also discussed. Insights for future work in the field are provided.
Revised Version:
Tags:
Decision/Status:
Major Revision

Solicited Reviews:
Review #1
Anonymous submitted on 24/Jun/2018
 Suggestion: Major Revision Review Comment: My apologies to the authors for the long delay for reviewing. The paper is a survey on automated techniques for discovering complex correspondences between ontologies. A complex correspondence is one where at least one of the entities that are related is a compound entity, such as a concept union or intersection. The paper tries to classify and compare the systems based on characteristics that are specific of complex alignments, rather than reusing the typical features of all ontology matching systems. These characteristics are: type of correspondence, guiding structures, members expressions pre-definition. The main contributions could be summarised as follows: - it provides a single entry point to existing work in complex ontology matching - the systems of complex matching are classified along the dimensions of comparison listed above - it provides a number of challenges that could serve as the basis for the research of someone who would like to investigate this area (such as a doctoral student). The survey is pretty good in terms of coverage, investigating the notion of ontology matching in a broad sense (it includes references to data base schema matching, as well as other forms of "ontologies"). There are references to algorithms, implementations, matchers evaluation metrics and datasets. One missing part could be the representation aspect: how are complex alignments represented, stored or saved, exchanged, etc.? On this aspect, there exists at least EDOAL, an Expressive and Declarative Ontology Alignment Language (http://alignapi.gforge.inria.fr/edoal.html), which can be used with Inria's alignment API. In case a more formal bibliographic reference is needed, the language is directly inspired by Knowledge web deliverable 2.2.10: Jérôme Euzenat, François Scharffe and Antoine Zimmermann. D2.2.10: Expressive alignment language and implementation. FP6 Knowledge Web deliverable, 2007. It could be interesting to investigate how complex alignment are represented, especially in systems that do not deal with Web ontologies. With that said, I find two things to argue against the paper, not in a rejecting manner but in a "needs improvement" way: 1) There are a number of technical issues that should be fixed before the paper is accepted. Most importantly, there are important problems in the formalisation. There are also several inaccuracies throughout the paper. I give thorough details about this below. 2) The choices of dimensions for classifying the complex matchers should be better justified. The paper apparently just assert that those will be the one used for the survey, without a clear motivation for rejecting other possibilities. The discussion even says that there could be other dimensions to study the approach, yet it does not explain why they have not been chosen after all. For these reasons, I request that a stronly revised version of the paper be resubmitted. 1. Introduction: - in the motivating example, after "complex correspondences are needed". The formulas are strange. They look like meta statements of first order logic. The symbol \equiv is used in FOL literature to mean "the FOL formula on the left of the symbol is logically equivalent to the FOL formula on the right". If we take a look at Item 1., the left-hand side of the \equiv symbol is \forall x,y o1:priceInDollars(x,y). Every arragement of the universe that makes this true must have every pair of things be related predicate priceInDollars. On the right-hand side, we have o2:priceInEuro(x,coversionFunction(y)). The truth of this statement depends on what assignment we make for x and y, which are free variables in this formula. Clearly, if the left-hand side of \equiv is true, there is no reason that the right is true as well. So the equivalence is clearly wrong. It is quite probable that what the authors mean in fact is \leftrightarrow instead of \equiv. In this case, Item 1 becomes a single FOL formula which indeed expresses the fact that the price in dollars of something can be converted into the price in euros of the same thing. Assuming this is the case, then Item 3 is problematic: whether the formula is true or false depends on the assignment we make for y (which is a free variable). In this case, the formula should start with \forall x \exists y. - The meaning of Fig.2 is unclear. Why "Data Models" is here? Are all data models equally expressive? Are data models even knowledge representation models? What is "General Logic"? Is XML a knowledge representation model? etc. 2. Background: - "these appraoches are out of the scope of this study" -> why can't the work here be applied to them? - Sec.2.2 "those are out of the scope of this survey" -> why can't the results be applied to them as well? - Sec.2.3: * the definition of correspondence is never used anywhere. There are only pseudo FOL formulas as the ones discussed above * if a correspondence include a value $n$, then it's not a triple (e1,e2,r) but a quadruple (e1,e2,r,n). If it is a triple, then don't mention this n. You do not use, or need, this n, anyway. * in the item list just following the definition of correspondence, there are so-called correspondences that are not following the definition. The first one could be expressed like this "(o1:Person,o2:Person,\equiv)", but the next one is less clear. With a language like EDOAL, the 5 examples can be expressed as triples following Def.2 3. Classification: - In Sec.3.2, in addition to the formulas using \equiv, there is one that has \sqsubseteq. It seems that this is used to express \rightarrow instead (implication) 4. Complex alignment approaches: - The "type of knowledge representation model" is sometimes strange. First, since this is a survey on complex ontology matching, all approaches are matching ontologies (in a broad sense). So to say that an approach [for complex ontology matching] is for "ontology to ontology" is a bit strange. It seems that, by "ontology" here, you mean something more specific, like OWL ontology or DL-based ontologies? - Sometimes, there is "relational database schema", sometimes "database schema", sometimes just "schema". What each of these means? - What is "conceptual model"? - Svab-Zamazal and Svatek is not easy to follow. There should be an example, like in most other descriptions - on p10, there are strange notations: * Table 2 starts with pattern forms that reuse the pseudo FOL notation used so far. Then it uses a different notation with "contact", "union", "substr", then it uses curly brackets. There are strange equalities (maybe they are supposed to be equivalences in correspondences?) * Is "union" different from disjunction? * what are "v", "v1", "v2", etc.? constants? free variables? existentially quantified variables? * the pattern forms do not have the provenance of the terms as in other examples (they use "p(x)" instead of "o1:p(x)") * in Ex.3, there is a mixture of FOL-like notation, DL symbol \sqsubseteq, and RDF term "rdf:type"! Please use a single representation for all correspondences - in general, the examples used in the whole section are not very illustrative. They look more like the general case (with generic names like A, B, p1, p2) rather than actual examples - in "Wu et al.", the notation "{passengers}={adults,children,seniors}" is disturbing. We the common interpretation of curly brackets, equal sign, and commas, we have that a singleton is equal to a 3-element set! - In Sec.4.3 "An et al." there is one more new notation "u \approxequiv s" which is not explained - In Sec.4.6: "Table 3 ... the needed input" and later "with respect to the kind of input they exploit". It seems that the input mentioned in Table 3 is of a different nature as the one mentioned later. In fact, it seems that Table 3 does not really mention the input of the matching process (which should at least take 2 ontologies) but some other extra input. This is not really explained 5. Evaluation: - on p22, end of Sec.5: yet another notation (DL-like this time) is used for expressing correspondences 6. Discussion: - there is a clear distinction between the approaches based ... -> the distinction may be clear at this point for the authors, but it would be good to make explicit what distinguishes them clearly - in p23, first column, other characteristics not used in the survey are mentioned as possible ways of classifying the approaches. But we would like to know why they have not been retained. As a result, the classification dimensions chosen in the paper seem a bit arbitrary (or at least, not too well justified). Here are smaller issues (typos, grammar, etc.) 1. Introduction: - "Largely speaking" -> "Broadly speaking" - "e.g. ... etc." -> use either "e.g." or "etc.", not both - "Two 'paradigms' organise the field" -> what's described is hardly a paradigm. Moreover, why use single quotes? - "to fully overcome ontology conceptual heterogeneity" -> "to fully overcome conceptual heterogeneity"? - "a survey on ontology matching resaerchers" -> "research"? - "for different tasks [4], data translation [5]" -> ref [5] is clearly not about data translation. It seems ref 4 and 5 should inverted - the outline of the paper should be the last thing to present in the introduction. If a motivating example occur after the outline, it should be in a separate section. If the motivating example is part of the introduction, then present it before explain how the paper is organised - "Consider three toy ontologies" -> why qualify them as "toys"? The figures could equally depict portions of large, complex ontologies. - "can help automatising the task" -> "can help automatise the task" - "will lead to a loss in information" -> "loss of information" - The motivating example ends abruptly, with no transition to the following. 2. Background: - The quotation at the beginning of the section is not useful at all. - Sec.2.2: "for the o2:accepted property ." -> deleted extra space before dot - Sec.2.3: it would be good to define an ontology alignment after correspondence. 3. Classification: - Sec.3.1 "the first one includes the matching process is guided" - Sec.3.2 * "traducing" -> translating * "different matching strategy" -> strategies 4. Complex alignment approaches: - In Sec.4.1 "the labels of the ontologies entities" -> of the ontology entities - In 4.2: * "CGLUE, also presented in [30] is" -> missing comma before "is" * "Some of the searchers, use" -> delete comma * "from the target, schema." -> delete comma * "are ciloared with help of" -> with the help of * "It alignes" -> aligns - In Sec.4.4 * "a XML schema" -> an XML schema * "between the schema's attributes" -> the schema attributes * "the ontology's data-properties" -> "ontology data properties" or "the data properties of the ontology" * in "Nunes et al.": """Each "individual" of""" -> why quotes? Moreover, they should be opening quotes and closing quote, not straight quotes * in "De Carvalho et al." """its "individual". Each "individual"""" -> idem - In Sec.4.5: * "the highest FOIL gain" -> what's this? * In BMO """into a "document"""" -> why quotes? use opening and closing quotes * "an Apriori algorithm" -> a priori - In Sec.4.6: * "only to Semantic Web" -> to the * "Very few approaches are available online" -> Very few implementations (the approaches themselves are all accessible online) * "on a guiding structures" -> structure * "with respect the kind" -> with respect to * in Table 3, the 3rd approach has lower case "onto" 5. Evaluation - First sentence: it is not very interesting to know that some surveys did not address the complex matching perspective. It would be good to know if there is a survey that addressed it. If it's not the case, then the sentence should be that no survey address the problem - In Table 4, there are footnote marks, but the corresponding footnotes are not there. In LaTeX, you can't directly put footnotes in tables, you need a little trick with \footnotemark and \footnotetext - Fig.5 "Clio" is i the ontology-based systems rather than instance-based systems, but in Table 3, it has "matched instances" as its input - Table 6 has missing footnotes 6. Discussion - "into two 'classes'" -> why quotes? why are they single and not double? - "e.g., ..., etc." -> choose between e.g. and etc. - Ex. 5 could easily be inlined in the text rather than in an Example environment - "an input resources" -> resource - "Another aspect refers to the kind of relations of a correspondence generated" -> of a generated correspondence(?) - "hybrid" in quotes, why? - "function type For example" -> missing full stop - Ref.70 is the same as ref.13 with a missing author. - regarding the discussion on tickets, children, etc. Yes, tickets and children are not comparable, but numbers are numbers. I can say that I have as many tickets as I have children. - There is a part of a sentence repeated "comple domains where several etc..." - "the decidability of the merged ontology" -> it is not the ontology that is decidable or not. It is the ontology language or formalism. References: - ref 13: this is the second edition - ref 15: "1: n" -> "1{:}n" to avoid extra space - ref 31: "owl" should be in capital letters - remove ref 70 and use 13 instead