An Unsupervised Approach to Disjointness Learning based on Terminological Cluster Trees

Tracking #: 1965-3178

This paper is currently under review
Claudia d'Amato
Nicola Fanizzi

Responsible editor: 
Philipp Cimiano

Submission type: 
Full Paper
In the context of the research on the Semantic Web regarded as a Web of Data, efforts have been devoted to improving the quality of the ontologies that are used as vocabularies to enable complex services based on automated reasoning. From various surveys it emerges that many domains would require better ontologies that include nonnegligible constraints. In this respect, disjointness axioms are representative of this general problem: these axioms are essential for making the negative knowledge about the domain of interest explicit yet they are often overlooked during the modeling process (thus affecting the efficacy of the reasoning services). To tackle this problem, automated methods for discovering these axioms can be used as a tool for supporting knowledge engineers in the task of modeling new ontologies or evolving existing ones. The current solutions, either those based on statistical correlations or those relying on external corpora, often do not fully exploit the terminology of the knowledge base. Stemming from this consideration, we have been investigating on alternative methods to elicit disjointness axioms from existing ontologies based on the induction of terminological cluster trees, that are logic trees in which each node stands for a cluster of individuals which emerges as a sub-concept. The growth of such trees relies on a divide-and-conquer procedure that assigns, for the cluster representing the root node, one of the concept descriptions generated via a refinement operator and selected according to a heuristic based on the minimization of the risk of overlap between the candidate sub-clusters (quantified in terms of the distance between two prototypical individuals). Preliminary works have showed some shortcomings that are tackled in this paper. To tackle the task of disjointness axioms discovery we have extended the terminological cluster tree induction framework with various contributions which can be summarized as follows: 1) the adoption of different distance measures for clustering the individuals of a knowledge base; 2) the adoption of different heuristics for selecting the most promising concept descriptions; 3) a modified version of the refinement operator to prevent the introduction of inconsistency during the elicitation of the new axioms; 4) the integration of frameworks for the distributed and efficient in-memory processing, namely Spark, for scaling up the set of candidate concepts generated through the refinement operator. A wide empirical evaluation showed the feasibility of the proposed extensions and the improvement with respect to alternative approaches.
Full PDF Version: 
Under Review