Ontology Verbalization using Semantic-Refinement

Tracking #: 1500-2712

Authors: 
Vinu Ellampallil Venugopal
P Sreenivasa Kumar

Responsible editor: 
Rinke Hoekstra

Submission type: 
Full Paper
Abstract: 
In this paper, we propose a rule-based technique to generate redundancy-free natural language (NL) descriptions of Web Ontology Language (OWL) entities. The existing approaches which address the problem of verbalizing OWL ontologies generate NL text segments which are close to their counterpart statements in the ontology. Some of these approaches also perform grouping and aggregating of these NL text segments to generate a more fluent and comprehensive form of the content. Restricting our attention to description of individuals and atomic concepts, we find that the approach currently followed in the available tools is that of determining the set of all logical conditions that are satisfied by the given individual/concept name and translate these conditions verbatim into corresponding NL descriptions. Human-understandability of such descriptions is affected by the presence of repetitions and redundancies, as they have high fidelity to the OWL representation of the entities. In the literature, no efforts had been taken to remove redundancies and repetitions at the logical level before generating the NL descriptions of entities and we find this to be the main reason for lack of readability of the generated text. In this paper, we propose a technique called semantic-refinement to generate meaningful and easily-understandable (what we call redundancy-free) text descriptions of individuals and concepts of a given OWL ontology. We identify the combinations of OWL/DL constructs that lead to repetitive/redundant descriptions and propose a series of refinement rules to rewrite the conditions that are satisfied by an individual/concept in a meaning-preserving manner. The reduced set of conditions are then employed for generating textual descriptions. Our experiments show that, semantic-refinement technique leads to significantly improved descriptions of ontology entities. We also test the effectiveness and usefulness of the the generated descriptions for the purpose of validating the ontologies and find that the proposed technique is indeed helpful in the context. The details of an empirical study to support the claim are provided in the paper.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 12/Feb/2017
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

Summary
=======

This paper describes a technique for verbalizing in natural language (English) OWL ontologies.
Verbalization is an important technique to debug ontologies with the assistance of domain experts, usally unfamiliar with semantic web formalisms (description logics, rule-based queries, etc.) and standards (OWL, Sparql, RDF, etc.) The main problem in the state-of-the-art of ontology verbalization, is the generation of long and redundant verbalizations, which hamper understandability.

The authors propose a refinement-based approach to verbalization, in which OWL statements are (1) simplified/normalized (in the sense that a minimal and more specific --w.r.t. concept subsumption-- rewriting of concepts and roles is operated on the statements/assertions/concepts) and then (2) verbalized in (controlled) English. The authors focus their work on ABox assertions and on OWL DL ontologies (OWL 1.1 standard, i.e., SHIQ).
They propose a number of (sound and complete) rewriting rules, and conduct a usability experiment (or survey) with domain experts that allegedly shows that their technique does indeed improve understandability.

I have some comments on the survey that I will describe after the global assessment, which I would like that the authors address before this paper is published.

Assessment:
==========

(1) Originality: The paper addresses a known problem from a somewhat new perspective. It is in a way similar to the work by Franconi et al. (QUELO system) for ontology verbalization, but focuses on ABox statements and on statement/concept rewriting rather than NLG and sentence planning.

(2) Significance: The results are interesting for the controlled language and ontology verbalization semantic web sub-communities.

(3) The paper is well written, bar some minor typos here and there (e.g., formulas exceeding margins).

To review:
=========

While the paper is technically sound, I find some small issues in the survey, in particular (1) the size of the sample and (2) the absence of statistical significance tests. While it does seem that the overall trend observed holds (i.e., the more refined a statement, the more understandable it is), is this result statistically significant? If I run a t-test or chi2-test, will the p value be < 0.05 (for a no-improvement null hypothesis)?

Review #2
By Allan Third submitted on 25/Sep/2017
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

Thank you for this interesting paper on reducing the repetitiveness of information in natural language verbalisations of ontologies by application of rules to OWL axioms pre-verbalisation. The aim of doing so is to improve the readability of verbalisations and make them more useful for domain experts to read.

The work presented is certainly original and seems highly likely to be a significant and useful approach to ontology verbalisation, although I believe there could be some improvements in the presentation, and there are some points that I'd like to have clarified.

Firstly, the rules are presented as all in some sense performing a similar task, that of refining which semantic concepts should be verbalised and which should not. However, there appear to be at least two different and quite distinct sub-tasks within this, and it would be interesting and helpful to0 see this distinction drawn out, and perhaps separated in the text. These tasks are content selection, and inference. For quite a few of the examples given, it seems that a great deal of the redundancy-reduction which is achieved comes from only mentioning the most specific class in a subclass-chain, rather than listing the same property repeatedly for each class in the chain (content-selection), whereas others involve deriving from, e.g., conjunctions. I wonder if it might not be easier to follow, particularly the inference-based rules, if they could be stated in terms of the inferences that were being carried out instead of in syntactic terms.

In several of the examples, the selection of content seems a little arbitrary, with the presented "simplified" text containing less information than the original, where some of the "missing" information does *not* appear to be redundant. For example, on p3, section 1, the "simplified" example "Sam is a cat-owner having at least one cat as pet" certainly communicates the description of the individual Sam, but is missing the generic/definitional semantics in the original text - "A cat-owner is a person having at least one cat as pet". The former does rule out the possibility of non-Sam individuals being cat-owners who have no cat as pet. Similarly on p13, the example relating to Florida in Table 10 omits that Florida is a major administrative subdivision, which, at least on a surface reading, does not appear to be redundant when compared to the rest of the information which is selected to show about Florida. All of this raises questions about the approach and how we can be sure in general that potentially important information isn't being omitted, and it would be very helpful if the reasons behind the selection of content were made clear.

Finally, and potentially related to the above, is the question of context-sensitivity of the word "redundant". It is possible that, in the Florida example, "major administrative subdivision" is omitted because there is a more specific subclass mentioned. However, I wouldn't say that this constitutes redundancy for someone who does not know the ontology in question and who wouldn't always recognise that a more generic class was being omitted. That is to say, the idea of redundancy depends on what the reader already knows. It would therefore be useful to have some discussion on what is intended by "redundancy" in this paper.

There is other work on redundancy in ontology verbalisation which might be relevant. Apologies for the self-citation, but http://dl.acm.org/citation.cfm?id=2392726 covers another form of redundancy which might complement the approach given here.