Review Comment:
This manuscript is a fresh resubmission - not a revision - of a manuscript that I had already reviewed for this same journal. Therefore, I will sometimes comment on the way this manuscript differs from the one (SWJ 3703) I had previously reviewed.
In the context of ontology learning, enrichment, and validation, axiom scoring is the task of evaluating the acceptability of a (candidate) axiom against the known facts. The authors of this manuscript study axiom scoring in a scenario of ontology evolution, whereby new facts (represented by RDF triples) are added to a knowledge base at different times and the underlying ontology needs to be revised based on the newly acquired knowledge.
To this aim, the authors adapt a possibilistic axiom-scoring heuristic proposed in the literature to the case of ontology evolution based on RDF data streams. After defining the problem of axiom testing against a sliding window of a stream of RDF data, focusing on the property axioms of functionality, inverse functionality, transitivity, irreflexiveness, symmetry and asymmetry, they introduce their adaptation of the possibilistic scoring approach and validate it through two experiments.
In the first experiments, they use the CMT ontology to test the extent to which the possibilistic approach is capable of correctly scoring some axioms that are known to hold, using three different sliding window sizes, and compare it to traditional information-retrieval measures like precision and F1-score. The results allow them to conclude that the possibilistic approach is robust and applicable when scoring axioms in streams and with limited data, and more so than a strictly probability-based one.
The second experiment considers an actual scenario of ontology evolution, using the game-related fictional domain of Pokémons, where successive generations (I through IX) provide sets of instances with different properties. To deal with this scenario, they compare three approaches: (i) using the plain possibilistic score, (ii) using same score together with a user-defined threshold for axiom acceptance, and (iii) defining an evolving possibilistic score as a weighted average of the past and present score for each axiom - a sort of moving average. The results suggest that approach (ii) is the most effective of the three to capture with the ontology the changes occurring in the stream of instances.
The idea of applying the possibilistic axiom scoring heuristics to RDF data streams for ontology evolution is novel and the proposed adaptation of the approach is original.
The article is well-written and easy to read. I found a few typos, which are detailed below.
The empirical validation is convincing, although choosing a real-world ontology of practical relevance would have made Experiment II even more compelling; nevertheless, I am inclined to believe that the domain of Pokémon can serve as a simplified model of the phenomena one could observe in real-world scenarios, like the ones of streams of sensor data, which motivates this contribution, as explained in Section 3.1.
Overall, the paper is technically sound, althogh I found issues with some of the definitions and with the terminology. However, these issues are easy to fix and they do not impact the substantial correctness of the proposed approach.
Detailed Comments
To begin with, I am pleased to notice that the remarks I had made on manuscript SWJ 3703 have been taken into account by the authors to prepare this new version.
In particular, I find that the newly written Swction 3.3 answers my remarks, solves my doubts, and is much more intuitive.
While the presentation of the manuscript has improved, some new issues have been introduced.
The first issue has to do with terminology. For some reason, the authors have decided to rename "confirmations" as "examples", probably because the concept is dual to "counterexample"; this has left the term "confirmation" free to be used to denote what formerly was called "selective confirmations". However, this terminological choice is unfortunate and runs into problems. The first one is that the "selective confirmation principle" is known by that name in the literature, including [11]. Now, the authors have changed its name into "selective example principle", which, apart from calling the same principle with another name, is much less semantically motivated: it is not clear how an example can be "selective". Now, the authors have also - inadvertedly, I hope - changed the title of [11] to match their terminology, thus introducing an error in the bibliography. Then, using "confirmation" (a term which is used in the literature) to mean what is called "selective confirmation" in the literature can only create confusion. I think names are important and changing names that have become established in the literature is a decision that should not be taken lightly. My advice is to adhere to the established conventions.
The second issue has to do to Definition 1. In it, the authors define an individual as a set of RDF triples having the same entity as their subject. In the same definition, they introduce the notation FI(i), where i is an individual, to mean the facts about i, defined as all triples with the same subject... Which to me is exactly the same as what the authors call an individual! But if FI(i) = i, why do we need the notation FI(.)?
However, later, in Definitions 4 and 5, the authors talk about a "named individual" i as a subject of properties P(i, x), and they refer to its set of facts as FI(i), which contradicts Definition 1.
So, in the end, I think that after all what the authors call an individual is not a set of RDF triples, but rather an "individual" in the same sense as in description logics, which is in perfect agreement with their usage of a "named individual".
Section 3.4 has been moved where it is now from the Section on experiments. I think this is a good idea, because it introduces one of the contributions made by the authors. However, in so doing, their reference to "this experiment" does not make sense anymore and is out of context. I suggest to replace it with more general terms like "scenario", "application", or the like, and replace the "this" with a qualification of the scenario/application the evolving ARI is designed to address.
Typos and minor issues:
- sliding windows sizes -> sliding window sizes
- through means of -> by means of
- The alignment of some formulas should be fixed
- ... is to not include ... -> ... is not to include ...
- decision reached \ using / ARI
Assessment of the data file
A link is provided by the authors to the GitHub repository of the TICO_Lite tool.
Upon inspection, this repository appears to be well organized, but a README file is missing. I warmly recommend that the authors add one to the repository.
The provided resources appear to be complete for replication of experiments.
All the data artifacts used in the article are there and they appear to be complete.
|