Automatic evaluation of complex alignments: an instance-based approach

Tracking #: 2379-3593

Elodie Thieblin
Ollivier Haemmerlé
Cassia Trojahn dos Santos

Responsible editor: 
Jens Lehmann

Submission type: 
Full Paper
Ontology matching is the task of generating a set of correspondences (i.e., an alignment) between the entities of different ontologies. While most efforts on alignment evaluation have been dedicated to the evaluation of simple alignments (i.e., those linking one single entity of a source ontology to one single entity of a target ontology), the emergence of complex approaches requires new strategies for addressing the problem of automatically evaluating complex alignments (i.e., those composed of correspondences involving logical constructors or transformation functions). This paper proposes a benchmark for complex alignment evaluation composed of an automatic evaluation system that relies on queries and instances, and a dataset about conference organisation. This dataset is composed of populated ontologies and a set of competency questions for alignment as SPARQL queries. State-of-the-art alignments are evaluated and a discussion on the difficulties of the evaluation task is provided
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 31/Dec/2019
Review Comment:

This paper proposes a benchmark for complex alignment evaluation composed of an automatic evaluation system that relies on SPARQL queries and instances, and a data set about conference organization. The complex alignment evaluation is still under-explored area and any improvement of this evaluation is very valuable. Considering the contribution of this manuscript (richly described previous related work, proposing an automatic approach for evaluating complex alignments and a new data set, evaluation) I suggest to accept the manuscript in the current format.

Review #2
Anonymous submitted on 06/Jan/2020
Minor Revision
Review Comment:

I appreciated the significant effort the authors invested into improving the paper according to the first round of reviews. However, I still see a few important questions and issues raised by the previous reviewers that were not properly answered.

The most important one is that the evaluation methods, that are the core contribution of the paper, require ontologies to be populated with instances in a "regular" and "controlled" manner. The authors point this out several times, yet the terms "regular" and "controlled" are never properly defined, which is a major omission given the importance of these constraints. Does it mean that every class (that is concerned by the alignment) should have at least one instance in both ontologies? Furthermore, "intrinsic precision" also requires that instances across the two ontologies should be comparable, i.e. have the same identifiers, otherwise the instances themselves would need to be aligned, which is a problem comparable in difficulty to the one being solved! Reviewer 2 explicitly asked for a clarification of these constraints, yet none was provided in the revised manuscript.

My opinion is that the fact that the authors did not manage to find a single real-world ontology that would fulfil the above limitations means that the practical usability of the method is questionable. Section 6 proposes a method for generating synthetic instances for evaluation; however, this requires the alignments to be a priori known, which only makes sense in artificial settings such as OAEI. To me this limits the impact of the results. The authors should reflect on the real-world usability of their method within the paper, preferably in the introduction and/or the conclusion.

These clarifications would be welcome especially given the otherwise deep understanding and insight the authors demonstrate of the problem area and the field of study in general.

Furthermore, I have one problem to signal with respect to paper structure, also addressed previously by Reviewer 1. I understand that the authors have restructured section 4 (the workflow) in particular. However, the new structure is not very well balanced:
- 4.1 generic workflow (2 pages);
- 4.2 simple alignment workflow (1 page);
- 4.3 complex alignment workflow (1 page);
- 4.3.1 example (over 3 pages).
One concern is that there is a lot of redundancy across the subsections, which makes the section very long (7 pages). Section 4.1 should introduce the various notions (anchor selection, syntactic/semantic/instance-based comparison, etc.) and they should not be re-explained in every consecutive subsection. Also, the length of 4.3.1 alone (>3 pages) is due to the fact that it actually contains two examples, one based on reference alignments and the other on reference queries. For balance and readability, I would consider either transforming 4.3.1 into two subsections 4.4 and 4.5, or into a single subsection 4.4 where the two examples are presented simultaneously (which should be possible as the main difference between the two examples seems to be in the anchoring step only). I do not insist on these particular solutions, but would expect the redundancy and imbalance issues to be addressed in some way.

Finally, a few minor mistakes (the list is not exhaustive):
- p. 11: " the relations between the correspondences between the correspondences";
- p. 11: "is a wrong" => is wrong;
- pp. 2, 14, and 22: "Intrinsic precision balances the CQA coverage *by like* precision balances recall in information retrieval." => replace "by like" by "just like" (BTW, why do you need to repeat this same sentence three times in the paper?).

Review #3
By Pavel Shvaiko submitted on 08/Feb/2020
Major Revision
Review Comment:

A revised version of the manuscript has improved in many specific aspects, though, it still bears three flaws, involving fundamental questions that were insufficiently addressed from the previous round of reviews.

The authors state “The comparison of the objects can be performed in a syntactic, semantic or instance-based way”. It is insufficiently justified why exactly these dimensions are considered and as such these appear rather ad hoc. This undermines completeness of the approach. The authors stated in the response letter that “This has been clarified in the re-organisation of the evaluation workflow section.” The new Section 4 only bluntly introduces these categories, hence, the previously raised problems still persist.

The authors state that “the equivalence relation is considered more specific than the subsumption relations”, this appears to be counter-intuitive, contradicts set theory, and thus, undermines soundness of the approach. The authors stated in the response letter that the equivalence relation is preferred over subsumption relations and that this is a design decision choice. This design choice, however, remains unjustified in the article, hence, the previously raised problems still persist.

The work on the proposed dataset assumes consistent population of ontologies, this aspect was addressed in the manuscript through artificial populations. Are there any practical applications where such an approach could be deemed realistic? The authors stated in the response letter that “applicability is on knowledge bases artificially populated with common instances”. This reaffirms that the problem of practical significance of the work done still persists.

Review #4
Anonymous submitted on 18/Mar/2020
Review Comment:

I thank the authors for the extensive revision they did. They addressed all of my major concerns adequately.
I have found a few typos and have one suggestion.


Page 2
"This benchmark is composed
of a dataset involving ontologies, populated with
controlled instances, reference competency question
queries, and an automatic evaluation system."
I think it is a good idea to refer here that the set of instances is shared between the two ontologies.

Page 1
"Intrinsic precision balances the CQA coverage by like
precision balances recall in information retrieval." remove "by"

"The ontologies cmt and ekaw here
come from the Conference dataset [18]." remove "here"
Page 5
"In 2019, the benchmark presented in this
paper has been used to automatically evaluating
complex alignments." evaluating -> evaluate

"close metric to relaxed precision and recall [32]
gas been applied to entity" gas -> has