Question Answering on RDF KBs using Controlled Natural Language and Semantic Autocompletion

Tracking #: 1492-2704

Giuseppe Mazzeo
Carlo Zaniolo

Responsible editor: 
Guest Editors ENLI4SW 2016

Submission type: 
Full Paper
The fast growth in number, size and availability of RDF knowledge bases (KBs) is creating a pressing need for research advances that will let people consult them without having to learn structured query languages, such as SPARQL, and the internal organization of the KBs. In this paper, we present our Question Answering (QA) system, that accepts questions posed in a Controlled Natural Language. The questions entered by the user are annotated on the fly, and a KB-driven autocompletion system displays suggestions computed in real time from the partially completed sentence the person is typing. By following these patterns, users can enter only semantically correct questions which are unambiguously interpreted by the system. This approach assures high levels of usability and generality. Experiments conducted on well-known QA benchmarks, including questions on the encyclopedic DBpedia and specialized domains, such as music and medicine, show a better accuracy and precision than previous systems.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Giorgos Stoilos submitted on 21/Dec/2016
Major Revision
Review Comment:

The paper presents a question answering system that works o top of RDF KBs. It is using a controlled natural language and is based on RDF triples to perform some type of auto-completion of a user query. This is accomplished by matching user keywords to RDF entities and then according to the RDF graph suggesting subsequent terms. This process is controlled by an automaton. Moreover, some consistency rules encoded in the automaton can help in term disambiguation.

The studied problem is interesting and the proposed system seems to be doing what is originally promised at the introduction. The evaluation also shows that in general it performs better than existing state-of-the-art.

The related work seems fairly complete (the authors do mention quite a few other systems and techniques). Nevertheless, although not an expert in the area, I happen to be aware of the following papers which as far as I can remember also use the idea of traversing the RDF graph in order to perform incremental user query construction. Could you comment on similarities and differences with your approach and perhaps cite them from the paper and include a comparison in the related work (if indeed they are related)

Thanh Tran, Philipp Cimiano, Sebastian Rudolph, Rudi Studer: Ontology-Based Interpretation of Keywords for Semantic Search. ISWC/ASWC 2007: 523-536

Gideon Zenz, Xuan Zhou, Enrico Minack, Wolf Siberski, Wolfgang Nejdl: From keywords to semantic queries - Incremental query construction on the semantic web. J. Web Sem. 7(3): 166-176 (2009).

The definitions given on pages 6 and 7 need some fixing. Take for example the domain of a property: "the domain of a property p is the set of elements {t} such that if t is an entity there exists a triple ...". The empty set does (vacuously) satisfy these properties, hence the empty set is a valid domain of a property, although obviously this is not intended. The correct definition would be

"the domain of a property p is the minimal set of elements S such that:
- if there exists a triple , then S contains t
- ...

the same applies to the other two definitions.

There are a few typos here and there. e.g., "the the", "able able". please do a proof reading.

The phrase "semantically correct" used in the abstract as well as in the intro is a bit ill defined. What does semantically correct mean?

It was not possible to understand the sentence "the results obtained on each question the benchmarks are reported in [10]"

The word "remarkable" in the conclusions is a bit too much.

Review #2
By Mariano Rico submitted on 27/Jan/2017
Review Comment:

The topic es relevant and the writing is good.
I have read the paper with expectation. However, after checking the work of the first author I have got an unpleasant surprise: most paper content is included in these two NOT cited works:

Maurizio Atzori, Shi Gao, Giuseppe M. Mazzeo, Carlo Zaniolo
Answering End-User Questions, Queries and Searches on Wikipedia and its History. IEEE Data Eng. Bull. 39(3): 85-96 (2016)

Giuseppe M. Mazzeo, Carlo Zaniolo
Answering Controlled Natural Language Questions on RDF Knowledge Bases. EDBT 2016: 608-611

Both papers can be easily found in Google Scholar.

In my humble opinion this reason is enough for a rejection but, bypassing this fact, I also miss a link to the online system. May be other journals can accept papers describing online systems without a minimal demo version, but SWJ emphasises the usage of online demos or any other mechanism to check the implementation of the described system.

Review #3
Anonymous submitted on 08/Mar/2017
Review Comment:

The paper presents a system for question-answering over RDF KBs. It uses controlled natural language and a form of KB-driven auto-completion to avoid ambiguities in query interpretation and assist users in formulating, KB-compliant queries. To this purpose, user inputs are matched to RDF entities of the underlying graph and query formulation is seen as navigation over the latter, using an automaton for controlling the transition over states that realise patterns of queries.

The problem addressed is very interesting and is relevant to the SI scope. The approach is well-motivated and explained; moreover, the evaluation experiments are promising, as the presented system performs better than relevant state-of-the-art systems against which comparisons have been carried out. The related work is quite extensive; nonetheless, in the referenced surveys on NL approaches to QA in the introduction section, more recent publications could be added (e.g. “C. Unger, A. Freitas, and P. Cimiano. An introduction to Question Answering over Linked Data. Reasoning Web Summer School, pages 100-140, 2014.”); moreover, as the use of CNL and the KB-driven auto-completion are very closely to the idea of feedback and clarification dialogues presented in “D. Damljanovic, M. Agatonovic, H. Cunningham, K. Bontcheva. Improving habitability of natural language interfaces for querying ontologies with feedback and clarification dialogues. J. Web Sem. 19: 1-21 (2013)”.

However, there are some concerns with respect to considering the manuscript for publication in its current form. The first is that significant part of its contents can be found in two recent publications by the authors that haven’t been cited in the current submission: “Answering Controlled Natural Language Questions on RDF Knowledge Bases, EDBT 2016: 608-611” and “Answering End-User Questions, Queries and Searches on Wikipedia and its History, IEEE Data Eng. Bull. 39(3): 85-96 (2016)”. Moreover, as there is no advancement in the afforded question-answering expressivity capabilities (the same extensions are reported as future work in all), the contribution of the current manuscript is considerably weakened. Second, although in the evaluation experiments, datasets of the well-established QALD benchmarks are used, it is not clear why for the comparison with SWIPE only the QALD3 MusicBrainz dataset was considered and not a DBpedia one, either from QALD6 or even QALD4 for which a cited work reports that SWIPE is the most effective visual query system. Last, although the performance over the QALD datasets backs up the claim that the presented approach doesn’t appear to reduce the expressive power (since with few exceptions of unsupported features, the initial question can be rewritten in a CANaLI equivalent form), there is no end-user evaluation on how intuitive and practical it is to use the system, nor a comprehensive listing of the types of questions/constructions that are supported (e.g. queries that can be expressed using what-questions (“what is the age ...”) but not how-questions (“how old is...”). Also, a link to the online demo should be provided explicitly; now it can only be found by following the reference on the results of experimental evaluation for all considered benchmarks.

There are a few typos and sentences that would benefit from a proof reading; examples include: “The results obtained on each question the benchmarks are reported in [10]., “The results of that comparison Figure 10”, etc.