Constructing a natural language interface for semantic web based sports news aggregator.

Tracking #: 1483-2695

Authors: 
Dzung Tuan Cao
Quang-Minh Nguyen

Responsible editor: 
Guest Editors ENLI4SW 2016

Submission type: 
Full Paper
Abstract: 
News search using conventional methods based on keyword search often does not return results with the containing contents what does not meet users’ desires. We aim to develop a news aggregation system, called BKSport, focusing on sport domain supporting the semantic search function. The news search subsystem takes a natural language question as input and returns relevant news corresponding to the semantic of question, including the answer and related information. In this research, we propose a novel method to transform natural language questions to semantic queries based on SPARQL to query RDF data in the semantic data store. Experiments are performed on a set of questions of various different categories, and the results show that the proposed method provides high precision.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Mariana Damova submitted on 29/Nov/2016
Suggestion:
Major Revision
Review Comment:

A rather straighforward and standard account addressing the problem of natural language to ontology interoperability, "Constructing a natural language interface for semantic web based sports news aggregator" does not quite meet the readers expectations.The authors motivate their work with processing sports related articles on the Web, but address very weakly and only remotely the use case. The paper reads as a "diary", describing subsequent steps in the elaboration of a natural language analysis engine. The list of linguistic phenomena covered, e.g. S-V-O, temporal expressions, conjunction, wh- and yes/no questions are not justified sufficiently clearly in relation with the use cases. However, the classification of questions in groups of expressed intentions is a good start. Unfortunately, their analysis and interpretation is not further explained in the paper, which would have largely contributed to the coherence of the presented text and to the clarity of the contribution of the presented research. The paper could also benefit of a more structured problem statement about the interpretation and semantic representation of questions and of a solution built in relation to this problem statement. It does not become clear enough how were the language structures talked about in the paper selected and why? It does not become clear how is the Stanford parser integrated in the presented solution. The several ongoing linguistic examples point to commonly addressed linguistic phenomena without proposing an out of the ordinary interpretation or handling.

One ongoing example describing one team defeating another team looks out of place. Speaking of the semantic web, I would expect a solution inferring the information about the defeat, and explaining how it is done rather than the presenting a conversion of S-V-O structure into a Semantic Web triple.

A lot of text is dedicated to the description of the ontology and the entity extractor KIM. This section could have been much shorter, and putting emphasis on its role to semantically annotate and consequently return query results.

The authors do not show a convincing experiment and evaluation. The table listing the exemplary queries does not explain the experimentation method and does not prove the strength of the proposed approach.

Apart from the worked out step by step example of the SPARQL query construction, it would have been good to have a worked out example from a NL query to returned results.

On the positive side, the paper outlines a detailed literature and related work review, and it would have been beneficial to position the authored approach within this context in terms of differences, contribution to and advancing of the state of the art. The presented figures shed light on details that have not been addressed in the text of the paper, and are a good supplement to the presentation of the proposed approach.

Overall, the presented approach is interesting because of the classification of the types of questions, but in order to be considered as contributing to the field the authors should show how they handle the groups of questions, make the interpretation of temporal expressions more convincing and full both with respect to covered temporal expressions and with respect to the motivation of including them in their account (The example of SPARQL query showing an interval needs to be reconsidered. The presented SPARQL query and its explanation too), the same with conjunction. The font of the SPARQL examples should be changed so they become readable. Currently, it is difficult to distinguish triples, prefixes, etc.

To sum up, the paper does not standout with its originality, as it describes a straighforward, standard method of handling natural language expressions. It does not describe in sufficient detail the transformation from language to triples, and more importantly, does not address beyond state of the art cases. The presented results are not convincing, because the paper does not show evaluation methodology and comparison with other approaches to demonstrate the superiority of their proposed approach. The relatedness of the examples and the account with the sports domain remains quite remote throughout the paper. As stated above that paper is written as a "chronological diary". It needs to present the problem that is being solved in a more convincing manner, and make the text more coherent by reducing the description of common for the averse reader tools, instruments, methods, and put emphasis and more attention to the interesting points of converting NL into SPARQL on the one hand, and relating this to the sports domain on the other. Further, the paper should better describe the theoretical framework the authors commit to. We see discourse representation structures from the Discourse Representation Theory, we see one and two place predicates, a constituent structure of a syntactic tree, and a mention of the Stanford parser. Finally, some quantification related to the covered linguistic structures, the number and type of SPARQL constructs, the size of ontology, the size of the text corpus the experiments are carried out, and statistics about the results would be good to have.

The paper can be improved substantially, and it may consequently offer an acceptable contribution to the SWJ. The topic of natural language to ontology interoperability is very relevant for the SW research and application.

Review #2
By Elena Simperl submitted on 29/Dec/2016
Suggestion:
Major Revision
Review Comment:

Verdict: Accepted — Subject to Major Revisions

The paper presents a system that transforms natural language questions into SPARQL queries.

The system can be used in a question answering context in order to allow users who are not familiar with SPARQL to ask their questions. In the same time, it could potentially reduce the effort required to write SPARQL queries by supporting natural language instead of structured queries as user interface.

The work is novel in the sense that it is applied to a new system used in a new domain (sports news). The use of ontologies to help transfer the system to other domains is sensible, but not particularly novel. In general, my biggest concern is that the work is not compared thoroughly with other works in this space.

The authors provide many examples to describe the most complex components of the proposed architecture. This is appreciated.

The related work is comprehensive, but could be improved. It focuses on SPARQL-based question answering for obvious reasons. However, at the technical level, the paper addresses the problem of semantic parsing in order to transform natural language questions into knowledge-base queries. There has been much work on semantic parsers, and recently on learnable semantic parsers (Kwiatkowski et al. 2013), that leverage a structured knowledge base (i.e., Freebase) to answer user questions. Research in this space seems relevant to the system proposed in the paper and should be discussed in the next iteration of the paper.

Another area of improvement is the evaluation. The criteria according to which the authors build their question corpus needs to be discussed in greater detail. It is not clear how the survey that led to the proposed distribution of the questions across the different types was conducted. The precision of the system would probably be different for a different distribution of questions and, as a result, the choices that resulted in the selection of those 41 sentences should be discussed in more detail.

The idea of evaluating the performance of the system against a weighted version of precision is interesting. However, the inclusion of the precision and recall scores would emphasise the proposed metric’s reliability. Furthermore, even if the majority of the competition focuses on the general domain, it might be worthwhile to investigate the performance of the proposed system against other competitive approaches that transform natural language questions into SPARQL queries.

The paper is reasonably well written, but it would have helped if the authors would have proof read it more thoroughly before the submission.

There are some grammatical and typographical mistakes. Some representative examples are enumerated below:

1. The second affiliation is misspelled as "Hanoi University or Science and Technology".
2. In the second to last paragraph in the "Related Work" section, the word "domain" is written as "do main".
3. In the third section, the second sentence of the second paragraph should be re-phrased.
4. The indentation across the different types of questions in "4. Question classification" is inconsistent.
5. The first sentence of "6.1 Question preprocessing" should be eliminated.
6. The third paragraph of "6.3.2.2 Identification of ordinary variables and ..." is repeated three times.
7. In both Figure 8 and the description in 6.3.2.3 about the identification of the quantify constraint, it is not clear how the system is able to treat comparison scenarios that manifest themselves with "at least" or "at most" as a subset of those with "than".
8. The indentation in 6.4.2.1 is not consistent across the different types of clauses.
9. The first sentence of "6.5.2 Recognition of classes" should be re-phrased.
10. In the "Experiments" section, it is mentioned that 45 sentences were used for the evaluation of the system, however, only 41 are presented in the corresponding table (Table 2).

There are also some inconsistencies with some of the cross-references in the paper. For instance, in the first example of "6.4.2.3 Representation of temporal constraint in SPARQL query", the reference to Section 3 should have been a reference to Section 6.3.2.4. Sections 6.5.3 and 6.6 have similar issues. The summary of the structure of the paper in the last paragraph of the introduction is also inconsistent with the actual content of the subsequent sections. The cross-reference to "Section 5" in the last paragraph of 7.1 should also probably be to "Section 4".

Furthermore, there are some citations that are missing. For instance, in "Related Works", there are three mentions of the Mooney dataset - to be self-contained, the paper needs to tell the readers how they could get familiar with this dataset. Furthermore, in "6.5.1 Named entity recognition" the Semantic Annotation Platform - KIM is not properly cited either.

In summary, for the paper to be accepted, I would recommend, besides fixing the issues documented in this review, two major areas of improvement:

- the related work (semantic parsing)
- the evaluation: including a more detailed description of the corpus and its design and thorough comparison of the performance of the system compared with direct competitors.

Review #3
Anonymous submitted on 02/Jan/2017
Suggestion:
Reject
Review Comment:

The main weakness of this work is the evaluation of the process that is proposed. For evaluating the conversion of natural language queries to SPARQL only 42 questions were used. They are very few for drawing any safe conclusion. Moreover the different entities that occur in these queries are very few. It would be more reliable if the questions were much more and they corresponded to questions that have been submitted by real users.

But even with this very small set of questions, the paper does not report detailed evaluation results. Only percentages are reported that concern the entire process. In this way the reader cannot evaluate how good the five steps of the process are, and how alternative steps would affect the results (or even how WordNet or KIM affects the results).

Another weakness of the paper, is that its scope is very limited. It deals only with the translation of NLP queries to SPARQL queries and does not evaluate the responses of these queries. Even if a translation to SPARQL is correct, this does not necessarily mean that the end user will be pleased by the returned results. The contents of the KB, the entity resolution approach that is followed and many other factors will affect the effectiveness of the entire approach. Furthermore, ranking (which is of crucial importance in this kind of application) is not elaborated.

Overall, according to my opinion this work is not mature enough for publication in a journal.

SOME SUGGESTIONS PER SECTION

Abstract

The first sentence needs improvement.

Section 2

The set of related works could be clustered and section 2 could be partitioned to subsections. As regards, graphical query builders I would add a recent survey of Faceted exploration of RDF/S datasets.

It would be interesting if the authors could identify those approaches/systems which are more related to their work and then make clear the novelty and then design a comparative evaluation (that could be presented in Section 7).

Section 3
Too short.

Section 4

Probably the ontology should be presented here.

Section 5

The ways news are actually annotated is not clear.

Section 6

The ontology (Fig 10) should be presented earlier. That would allow the reader to understand earlier how the target SPARQL query should look like.

Fig 10: use grayscale (what box is red is not clear in BW printouts).

How BKSport ontology was constructed? Is it adequate for this domain?
I am not sure that it allows modeling clearly the periods that one player was member of a team (and information about the contract of the player with the team).

Section 7

The text says 45 questions, the table shows 41.

Section 8

The authors say that the ontology will be improved. This however will affect the translation process.


Comments