Review Comment:
This paper proposes a new unsupervised approach to learning disjointness axioms based on the novel concept of terminological cluster trees that extend a previous approach proposed by the authors.
I generally like the work proposed, but I also think that the paper needs to be substantially improved along a number of dimensions.
First of all, this work reminded me of the COBWEB algorithm proposed by Fisher in 1987. There are clearly novel aspects, but I would have liked to see a mention and discussion of this seminal work:
Fisher, D.H. Machine Learning (1987) 2: 139. https://doi.org/10.1023/A:1022852608280
Besides this minor thing, I find the positioning of the work not very well done. The authors should explain better what the benefit of inferring disjointness axioms actually is. They talk very vaguely about improving “effectiveness” of reasoning, but I would have liked to see less vague statements and more accurate claims or empirical evidence that disjointness axioms can improve reasoning. What type of (DL) tasks can profit from that? Do we get more inconsistencies? More inferences? What is the gain?
In details:
In the introduction one finds the following statements:
„the effectiveness of most services is strictly dependent on the quality of the ontologies, namely on how precisely they model the intended domains“
=> It is unclear what effectiveness means (see my comments above and below on this). It is also unclear what type of services are referred to here? Reasoning services? What kind of reasoning services? Subsumption reasoning? Do the authors mean by effectiveness the ability to spot inconsistencies? This needs to be stated in a clearer fashion.
„debugging strategies can be better supported when accurate ontologies are considered“
=> What is the notion of „accuracy“ here? More axioms? Then please say so. Actually, debugging without any axioms will indeed be quite useless.
„strong and explicit forms of negation“
=> What does „strong“ mean here? What are „implicit“ forms of negation? Can you give an example?
„many currently ontologies are often loosely designed in this respect“
=> Please bee more precise here? Loosely designed in which respect?
„they provide only a rather approximate representation of the domains, failing to capture all the underlying constraints or even distorting the intended semantics“
What does it mean to provide an approximate representation of the domains? Does it mean, according to Guarino, that the ontology does not manage to rule out conceptualizations that are not possible according human intuitions? In which sense is the semantics distorted?
„Conversely, available intensional knowledge is only marginally taken into account“.
=> What do you mean with „intensional knowledge“ here?
„a data-driven approach could be devised with the goal of finding partitions of similar individuals of the knowledge base ... by minimizing overlap.“
=> In order to grasp this, I think you would need to introduce an example of a cluster tree and give a rough explanation how it would be created in an incremental, top-down fashion. Most people might know decision trees, so you might resort to the way decision trees are constructed for analogy, but clearly saying that your approach is unsupervised and not trained in a risk minimization framework.
Other than that, before introducing the specific algorithmic challenges solved by the givent paper on pages 3-4, I think I would have liked to see some concrete examples of clusterings already early in the paper before introducing the actual method to help the reader in understanding what is meant by a conceptual clustering and how it works. I think the basic idea can be illustrated in a Figure early in the paper without any need of formality. On the basis of a few pictorial examples, the authors could introduce the (technical) problems related to i) inconsistencies, ii) noisy individuals and iii) completeness of refinment operator as mentioned in the intro. These three technical problems can IMHO only be perceived after having seen some small examples of such cluster trees. I think having early on an informally depicted example would also help the authors to introduce their ideas and motivate their approach better. Having examples of desirable disjointness axioms to be induced, possibly aligned with the example of a cluster tree, in the intro would be nice in general.
A second major point: I feel there are two many contributions in this paper! On the one hand, you introduce your TCT construction approach, provide algorithmic details on how the TCTs are constructed, provide an evaluation and comparison to other algorithms. For me, this would be sufficient as a contribution. Speeding up the computation via a distributed processing approach using Spark would be a second contribution that I would advise to leave out from this paper, which is very dense in terms of technical description and evaluation anyway. There is enough content without the distributed processing aspects.
In general, I would recommend the authors to remove the part on distributed reasoning and focus more on providing concrete examples that illustrate the behavior of the algorithm for inducing the terminological trees.
Minor comments:
Page 2
According to [26] -> bad style to use references as words, use According to X et al. [26]
As discussed in [24] -> as above
Page 3
Depends on a notion of purity no comma that determines ...
There are some issues no comma that were not investigated in previous works ...
Let us suppose that an automatic method may find** (no „s“)
Propose the „related axiom“: which related axiom?
In order to enable the exploration of a larger search space, it should be possible to tune...
Page 4
Such that those within each group/cluster are more closely related/similar to one another no comma than the objects...
A further interesting class of methods is represented by (delete „the“) conceptual clustering approaches.
Unclear sentence: „rather than extensionally, defined in logical terms, concepts, that can be reasoned with“
Page 5
Odd sentence: „Also note that no comma the problem of discovering disjointness axioms could be formalized in different ways, depending on the type of resorting? to the (supervised or unsupervised) machine learning approach to be employed.
Section 3
The proposed approach is grounded on a two-step (no „s“) process.
Definition 2
„accounts for the individuals belonging to C“ => accounts in which sense?
cohesion of? exceeds a given threshold
In the inductive step no comma that occurs when ...
Unclear: for each of them the subsets of I made up of (known) the positive and negative instances are both not empty...
Cannot parse this sentence: Then no comma function SELECTBEST CONCEPT evaluates the candidate specializations in terms of a separation measure computed as the distance between pairs of sub-clusters determined the positive and negative instances w.r.t. the candidate concepts.
The algorithm is able to determine it „naturally“ according to the data distribution: what does „naturally“ mean here?
Page 6
Section 3.1.1.
Given the a concept -> „the „ or „a“
Page 8
May miss some important features (concepts) that would (delete „to“) describe...
Page 9
The following sentence is broken:
„While... the separation measure to be maximized in the procedure for the TCTs resorts to a measure to be maximized ultimately relies on a distance defined over the individuals occurring in the knowledge base.“
Page 10
Section 4
In the experiments, we considered a variety of? freely...
Page 11
MONETARY is an ontolog*y*
Page 12
The first 2 sentences right at the start of Section 4.2 make no sense
Page 14
Below table
Indeed comma
The captions of tables 2 and 3 as well as 4 and 5 seem to be pairwise alike. What is the difference between these tables?
Page 19
In particular, with high dimensional beams, ... whereas no comma the use of low dimensional
Page 21
Has been pointed out also in [25] -> using reference as a word again
Page 22
In [17], a tool for repairing => [17] is used as a word here!
Page 24
Conclusion
We have cast the task of discovering axioms as the/a (one of both!) clustering problem that was solved
We compared the Spark-based (what?) of the refinement operator with that used in [22] => reference used as word!
There are more minor issues I did not write down. The authors are advised to check carefully the manuscript
|