An Unsupervised Approach to Disjointness Learning based on Terminological Cluster Trees

Tracking #: 1965-3178

Authors: 
Giuseppe_Rizzo
Claudia d'Amato
Nicola Fanizzi

Responsible editor: 
Philipp Cimiano

Submission type: 
Full Paper
Abstract: 
In the context of the research on the Semantic Web regarded as a Web of Data, efforts have been devoted to improving the quality of the ontologies that are used as vocabularies to enable complex services based on automated reasoning. From various surveys it emerges that many domains would require better ontologies that include nonnegligible constraints. In this respect, disjointness axioms are representative of this general problem: these axioms are essential for making the negative knowledge about the domain of interest explicit yet they are often overlooked during the modeling process (thus affecting the efficacy of the reasoning services). To tackle this problem, automated methods for discovering these axioms can be used as a tool for supporting knowledge engineers in the task of modeling new ontologies or evolving existing ones. The current solutions, either those based on statistical correlations or those relying on external corpora, often do not fully exploit the terminology of the knowledge base. Stemming from this consideration, we have been investigating on alternative methods to elicit disjointness axioms from existing ontologies based on the induction of terminological cluster trees, that are logic trees in which each node stands for a cluster of individuals which emerges as a sub-concept. The growth of such trees relies on a divide-and-conquer procedure that assigns, for the cluster representing the root node, one of the concept descriptions generated via a refinement operator and selected according to a heuristic based on the minimization of the risk of overlap between the candidate sub-clusters (quantified in terms of the distance between two prototypical individuals). Preliminary works have showed some shortcomings that are tackled in this paper. To tackle the task of disjointness axioms discovery we have extended the terminological cluster tree induction framework with various contributions which can be summarized as follows: 1) the adoption of different distance measures for clustering the individuals of a knowledge base; 2) the adoption of different heuristics for selecting the most promising concept descriptions; 3) a modified version of the refinement operator to prevent the introduction of inconsistency during the elicitation of the new axioms; 4) the integration of frameworks for the distributed and efficient in-memory processing, namely Spark, for scaling up the set of candidate concepts generated through the refinement operator. A wide empirical evaluation showed the feasibility of the proposed extensions and the improvement with respect to alternative approaches.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 23/Oct/2018
Suggestion:
Major Revision
Review Comment:

This paper proposes a new unsupervised approach to learning disjointness axioms based on the novel concept of terminological cluster trees that extend a previous approach proposed by the authors.

I generally like the work proposed, but I also think that the paper needs to be substantially improved along a number of dimensions.

First of all, this work reminded me of the COBWEB algorithm proposed by Fisher in 1987. There are clearly novel aspects, but I would have liked to see a mention and discussion of this seminal work:

Fisher, D.H. Machine Learning (1987) 2: 139. https://doi.org/10.1023/A:1022852608280

Besides this minor thing, I find the positioning of the work not very well done. The authors should explain better what the benefit of inferring disjointness axioms actually is. They talk very vaguely about improving “effectiveness” of reasoning, but I would have liked to see less vague statements and more accurate claims or empirical evidence that disjointness axioms can improve reasoning. What type of (DL) tasks can profit from that? Do we get more inconsistencies? More inferences? What is the gain?

In details:

In the introduction one finds the following statements:

„the effectiveness of most services is strictly dependent on the quality of the ontologies, namely on how precisely they model the intended domains“

=> It is unclear what effectiveness means (see my comments above and below on this). It is also unclear what type of services are referred to here? Reasoning services? What kind of reasoning services? Subsumption reasoning? Do the authors mean by effectiveness the ability to spot inconsistencies? This needs to be stated in a clearer fashion.

„debugging strategies can be better supported when accurate ontologies are considered“

=> What is the notion of „accuracy“ here? More axioms? Then please say so. Actually, debugging without any axioms will indeed be quite useless.

„strong and explicit forms of negation“

=> What does „strong“ mean here? What are „implicit“ forms of negation? Can you give an example?

„many currently ontologies are often loosely designed in this respect“

=> Please bee more precise here? Loosely designed in which respect?

„they provide only a rather approximate representation of the domains, failing to capture all the underlying constraints or even distorting the intended semantics“

What does it mean to provide an approximate representation of the domains? Does it mean, according to Guarino, that the ontology does not manage to rule out conceptualizations that are not possible according human intuitions? In which sense is the semantics distorted?

„Conversely, available intensional knowledge is only marginally taken into account“.

=> What do you mean with „intensional knowledge“ here?

„a data-driven approach could be devised with the goal of finding partitions of similar individuals of the knowledge base ... by minimizing overlap.“

=> In order to grasp this, I think you would need to introduce an example of a cluster tree and give a rough explanation how it would be created in an incremental, top-down fashion. Most people might know decision trees, so you might resort to the way decision trees are constructed for analogy, but clearly saying that your approach is unsupervised and not trained in a risk minimization framework.

Other than that, before introducing the specific algorithmic challenges solved by the givent paper on pages 3-4, I think I would have liked to see some concrete examples of clusterings already early in the paper before introducing the actual method to help the reader in understanding what is meant by a conceptual clustering and how it works. I think the basic idea can be illustrated in a Figure early in the paper without any need of formality. On the basis of a few pictorial examples, the authors could introduce the (technical) problems related to i) inconsistencies, ii) noisy individuals and iii) completeness of refinment operator as mentioned in the intro. These three technical problems can IMHO only be perceived after having seen some small examples of such cluster trees. I think having early on an informally depicted example would also help the authors to introduce their ideas and motivate their approach better. Having examples of desirable disjointness axioms to be induced, possibly aligned with the example of a cluster tree, in the intro would be nice in general.

A second major point: I feel there are two many contributions in this paper! On the one hand, you introduce your TCT construction approach, provide algorithmic details on how the TCTs are constructed, provide an evaluation and comparison to other algorithms. For me, this would be sufficient as a contribution. Speeding up the computation via a distributed processing approach using Spark would be a second contribution that I would advise to leave out from this paper, which is very dense in terms of technical description and evaluation anyway. There is enough content without the distributed processing aspects.

In general, I would recommend the authors to remove the part on distributed reasoning and focus more on providing concrete examples that illustrate the behavior of the algorithm for inducing the terminological trees.

Minor comments:

Page 2

According to [26] -> bad style to use references as words, use According to X et al. [26]

As discussed in [24] -> as above

Page 3

Depends on a notion of purity no comma that determines ...

There are some issues no comma that were not investigated in previous works ...

Let us suppose that an automatic method may find** (no „s“)

Propose the „related axiom“: which related axiom?

In order to enable the exploration of a larger search space, it should be possible to tune...

Page 4

Such that those within each group/cluster are more closely related/similar to one another no comma than the objects...

A further interesting class of methods is represented by (delete „the“) conceptual clustering approaches.

Unclear sentence: „rather than extensionally, defined in logical terms, concepts, that can be reasoned with“

Page 5

Odd sentence: „Also note that no comma the problem of discovering disjointness axioms could be formalized in different ways, depending on the type of resorting? to the (supervised or unsupervised) machine learning approach to be employed.

Section 3

The proposed approach is grounded on a two-step (no „s“) process.

Definition 2

„accounts for the individuals belonging to C“ => accounts in which sense?

cohesion of? exceeds a given threshold

In the inductive step no comma that occurs when ...

Unclear: for each of them the subsets of I made up of (known) the positive and negative instances are both not empty...

Cannot parse this sentence: Then no comma function SELECTBEST CONCEPT evaluates the candidate specializations in terms of a separation measure computed as the distance between pairs of sub-clusters determined the positive and negative instances w.r.t. the candidate concepts.

The algorithm is able to determine it „naturally“ according to the data distribution: what does „naturally“ mean here?

Page 6

Section 3.1.1.

Given the a concept -> „the „ or „a“

Page 8

May miss some important features (concepts) that would (delete „to“) describe...

Page 9

The following sentence is broken:

„While... the separation measure to be maximized in the procedure for the TCTs resorts to a measure to be maximized ultimately relies on a distance defined over the individuals occurring in the knowledge base.“

Page 10

Section 4

In the experiments, we considered a variety of? freely...

Page 11

MONETARY is an ontolog*y*

Page 12

The first 2 sentences right at the start of Section 4.2 make no sense

Page 14

Below table

Indeed comma

The captions of tables 2 and 3 as well as 4 and 5 seem to be pairwise alike. What is the difference between these tables?

Page 19

In particular, with high dimensional beams, ... whereas no comma the use of low dimensional

Page 21

Has been pointed out also in [25] -> using reference as a word again

Page 22

In [17], a tool for repairing => [17] is used as a word here!

Page 24

Conclusion

We have cast the task of discovering axioms as the/a (one of both!) clustering problem that was solved

We compared the Spark-based (what?) of the refinement operator with that used in [22] => reference used as word!

There are more minor issues I did not write down. The authors are advised to check carefully the manuscript

Review #2
Anonymous submitted on 19/Jan/2019
Suggestion:
Minor Revision
Review Comment:

This paper introduces a significant extension of an existing method for learn disjointness axioms through the induction of so-called Terminological Cluster Trees. It extends previous work by the same authors by improving the induction methods (refinement operator, distance measures and heuristics) and a parallalised implementation to enhance performance. The new methods are extensively assessed via a number of empirical experiments, both on the quality of the results and the performance.

I like the paper, and believe that the extension is significant enough to warrant a journal publication. The paper is generally well written and structured, and the approach as well as evaluation are, to the best of my knowledge, sound. As far as I can see all relevant related work has been discussed.
There are some minor points, though, that I believe need to be addressed by the authors before final publication, all of them are related to the write-up, and do not require extra research or evaluations. These edits are concerned with the definitions of TCTs and the notion of pure clusters, the representativeness of the testing ontologies and the presentation of the results.

1) The most important problem I have with the paper is that I find the notion of the desired TCTs, so the actual goal of the paper, not sufficiently well defined or discussed. On top of Definition 2, you define the criterial that the partitions induced should be pure in terms of cohesion. Both those terms are not introduced and (formally) defined. Neither in section 2 or 3 could I find a satisfactory definition of these notions. Of course, I know what the intention is, and I have an intuition what the authors want to achieve, but this should be made explicit. It is the claim of the paper, that the algorithms solve those problems (or at least approximate, I believe), but it is not clear which the interesting TCTs are. Later the confusion is extended by using the notion of ideality, defined by fitness and completeness. What I am missing is a definition of “good” TCTs, and some lemma explaining “how close” your algorithms come to calculate these “good” TCTs (I understand that your algorithm is not ideal, so I expect it not to be “perfect” either). For example, would ideality imply pureness, or any other property of good TCTs?
So, this issue has to be fixed, with both concrete definitions and an extended discussion. Do not get me wrong, I believe that it is acceptable to develop a heuristic algorithm that calculates “good” rather than necessarily “perfect” algorithms. But in a scientific article these properties should have been made explained to the reader, and systematically explored.

2) I find the evaluation very thorough and sound, and gives interesting insights in the behaviour of the new method. The question is, however, in how much the chosen ontologies are representative for the ontologies for which the methods are designed in the first instance. The ontologies have been chosen as to have disjointness axioms, but my guess would be that this already gives you a strong bias towards expressive ontologies. This can also be seen by the expressiveness, which is mostly almost full OWL, while I would claim that the algorithms are most useful for less expressive ontologies, e.g. in RDFs. The question is thus how valid your evaluation will be for those ontologies.
The second question is about the expressiveness of the disjointness axioms. My guess is that in the chosen ontologies the disjointness axioms will be very simple, while your algorithm should be capable of producing far more complex ones (as you show in 4.2.3). Does that somehow puts the results of the evaluation into question?

3) A far less important question concerns the presentation of the results. While the results in Tables 2-9 should definitively be part of the paper, I was wondering whether they should not be presented in the Appendix in its complete form. In the respective sections, I would prefer some smaller tables that focus on the most interesting elements of the experiments. The way the results are presented is rather tedious and repetitive, and the reader wonders whether there is anything interesting to be found. I would hope that the authors could find a better way to have more focus in the date representations. This will also allow the text to be actual closer to the tables, which will make the paper more readable.

Finally, there are some minor points, mostly concerning editing. I do not claim that this is complete.
Page 3
- C1Par2: here the notion of purity and cohesion are introduces, but it does not say much what it actually is.
- C1Par 3: The term TCT has not been introduced yet.
- C2 it should possible -> it should BE possible
Page 4:
Clusters are introduced, but it is not said what a good cluster is.
Page 5
- In section 3: Here you should discuss what a good TCT is.
- You have not yet introduced TDTs. They should also be discussed in the related work.
- Or its (measure of) cohesion of exceeds?
Page 6:
- Given the a concept -> Given a concept
- 3.1.1. here you should actually define what your algorithm does, and related it to the requirement of TCTs being good.
Page 7: be intended for an use -> for a use
Page 10: a variety freely -> a variety of freely
Page 12: can you add more information about the axioms to Table 1. Length? Complexity?
Page 16. Table 6. Should here not be a comparison with your approach as well? Where are these numbers for TCT, as compared to PCC and NAR?
Page 24: as the a clustering problem -> as a clustering problem.