Review Comment:
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.
The paper tackles the challenge of identifying complex mappings across ontologies and proposes CANARD, an approach based on the description of requirements in unary or binary SPARQL queries as input, so-called Competency Questions for Alignment (CQAs). CANARD distinguishes itself as the first using CQAs to reduce the search space, thus capable of handling large ontologies and not relying on any complex mapping patterns. The work has been presented as an ISWC 2020 paper previously, and the current journal version explicitly lists the extensions. This is the second round review, and I didn’t participate in the first round. After reading the first round comments, the authors’ responses, the current updated version, and the ISWC 2020 version, I eventually believe that the paper presents a unique way, with a fair group of extensions from its conference publication, the evaluation is comprehensive, and the analysis is of depth showing both the advantages and limitations, and thus would like to recommend Minor Revision.
There are still some issues, small or big, that should be addressed before the paper becomes ready for a journal publication. Let me list them as follows.
TITLE. The title of the paper needs to be crafted. Firstly, “alignment need” is seldom used in the paper whereas “user’s need” or “user knowledge need” occurs a lot. Secondly, “ABox-based relation discovery” is seldom used in the paper either, and what’s the relationship between “alignment need” and “ABox-based relation discovery”? As a matter of fact, it seems to me the involvement of user is not necessary in the approach, as it is not interactive mapping anyway. The “user” in the paper as well as “user knowledge need” is all ambiguous. I suggest to rename the title to sth like “CANARD: An Approach for Generating Expressive Correspondences based on Competency Questions for Alignment”, as this is exactly what the approach is. And when introducing the problem or discussing the application, you can mention that WHO can give these CQAs, those intend to query the source ontology (for instance, ontology-based application user), to link the source ontology to target one (for instance, ontology engineer), or others. I notice that the mentioning of “user” is frequent all through the paper, so “de-user” may take some work.
ABSTRACT. The abstract is incomplete as nothing has been said about the results of the evaluation, nor any conclusions about the approach.
INTRODUCTION. Before listing the extensions based on the ISWC 2020 conference version, an explicit statement of the contributions of the paper is needed.
MAIN STEPS. In Section 4, each main step of the approach is described. Some of them are approximate with uncertainty, like 4.4 Label Similarity, and the others might be sound and complete, like 4.1 Translating CQAs into DL Formulae. These should be explicitly stated, as the former needs empirical evaluation and the latter theoretical proving.
QUALITATIVE ANALYSIS. I agree to a comment from previous round review that a qualitative analysis of the resultant mappings is needed as complementary to quantitative measure evaluation. One way to do this is in Section 5.6 Comparison on the OAEI Systems, to add a table of (part of) the complex mappings uniquely identified by CANARD whereas missed by others. This can greatly convince the power of the proposed approach.
SIMPLE VS COMPLEX. As both simple and complex mappings are simultaneously identified in CANARD, situations like one-to-one correspondences misjudged as complex ones and complex correspondences misjudged as one-to-one ones should all be discussed. In real-world ontology applications, being good at identifying simple mappings is as important as identifying complex ones.
MORE THAN TWO ONTOLOGIES. When introducing CQAs, “two or more ontologies” occurs a couple of times in the paper. If CANARD is capable of matching more than two ontologies in a nontrivial way, please elaborate on this matter; otherwise remove “or more”.
LIMITATION. I don’t think that the user has to be familiar with SPARQL and the source ontology is a serious limitation of the approach. This is a problem concerning the application of the approach. Fundamentally, being constrained by the expressivity of SPARQL limits the complex matches that can be found. This should be discussed in depth in the paper, together with possible expansions as future work.
RELATED WORK. Matching methods capable of identifying both simple and complex correspondences that do not rely on any questions, patterns, or instances like the following should be mentioned:
Mengyi Zhao, Songmao Zhang, Weizhuo Li and Guowei Chen. Matching biomedical ontologies based on formal concept analysis. Journal of Biomedical Semantics, 2018 Mar 19;9(1):11. doi: 10.1186/s13326-018-0178-9
Lastly, several writing errors:
- Page 2, a system that discovers expressive correspondences2 The content of footnote 2 should be replaced to the text in the paper, as it's important to know what kind of expressive correspondences can be found by CANARD.
- Page 2, Section 2.1, using “:” to link a definition with an example is not appropriate.
- Page 28, with a (the detailed … in Section 5.3. the phrase is incomplete.
|