Survey on Challenges of Question Answering in the Semantic Web

Tracking #: 1375-2587

Authors: 
Konrad Höffner
Sebastian Walter
Edgard Marx
Ricardo Usbeck
Jens Lehmann
Axel-Cyrille Ngonga Ngomo

Responsible editor: 
Marta Sabou

Submission type: 
Survey Article
Abstract: 
Semantic Question Answering (SQA) removes two major access requirements to the Semantic Web: the mastery of a formal query language like SPARQL and knowledge of a specific vocabulary. Because of the complexity of natural language, SQA presents difficult challenges and many research opportunities. Instead of a shared effort, however, many essential components are redeveloped, which is an inefficient use of researcher’s time and resources. This survey analyzes 62 different SQA systems, which are systematically and manually selected using predefined inclusion and exclusion criteria, leading to 72 selected publications out of 1960 candidates. We identify common challenges, structure solutions, and provide recommendations for future systems. This work is based on publications from the end of 2010 to July 2015 and is also compared to older but similar surveys.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Chris Biemann submitted on 26/Apr/2016
Suggestion:
Accept
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.

(1) Very
(2) Very
(3) Very
(4) Very

This paper presents an extensive survey over Question Answering Systems on Linked Data (Semantic Question Answering, SQA) for the period of 2010-2015.
The task of SQA is to retrieve answers from structured SW resources for natural language queries. This is the first revision of the paper.

I am happy to see that my comments on the previous version were addressed very nicely. Thus, I vote for accepting the paper since it is an important addition to the literature and nicely serves as a more current survey on the field of SQA.

Below, some minor errors that should be addressed in the final version. I am sure I didn’t find all of them.

Minor remarks
- "don't" -> do not, in various places
- make sure to drop spaces before \footnote{}, e.g. fn 13
- 5.1: an Apache Lucene ndex which implements a Levenshtein Automaton -> Lucene index, which
- 6. than 100 languages which -> than 100 languages, which
AVERBIS

References formatting need to be revisited. See e.g. for a contrast:

[18] R. Blanco, P. Mika, and S. Vigna. Effective and efficient entity search in RDF data. In L. Aroyo, C. Welty, H. Alani, J. Taylor, and A. Bernstein, editors, Proceedings of the 10th International Conference on The Semantic Web—Volume Part I, volume 7031 of Lecture Notes in Computer Science, pages 83–97, Berlin Heidelberg, Germany, 2011. Springer-Verlag. ISBN 978-3-642-25072-9. . URL http://dl.acm.org/citation.cfm?id=2063016.2063023.

versus

[21] A. Both, D. Diefenbach, K. Singh, S. Shekarpour, D. Cherix, and C. Lange34. Qanary–An extensible vocabulary for open Question Answering systems. ESWC, 2016.
[104] A.-C. Ngonga Ngomo. Link discovery with guaranteed reduction ratio in affine spaces with minkowski measures. In Proceedings of ISWC, 2012.

in length/verbosity. There is no right or wrong, its a matter of consistency.

Also, there are some errors. Please carefully check all references, I just list a few:

[14] A. Ben Abacha and P. Zweigenbaum. Medical Question Answering: Translating medical questions into SPARQL queries. In IHI ’12: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, pages 41–50, New York, NY, USA, 2012. Association for Computational Linguistics.

No, this is ACM not ACL

[68] B. Hamp and H. Feldweg. Germanet—a lexical-semantic net for German. In I. Mani and M. Maybury, editors, Proceedings of the ACL/EACL ’97 Workshop on Intelligent Scalable Text Summarization, pages 9–15. Citeseer, 1997.

Citeser is not a publisher. Don’t just blindly copy bibtex files from the web please. Also, check title capitalizations, e.g. this one should be {GermaNet}.

[93] Natural Language Processing and Information Systems: 19th International Conference on Applications of Natural Language to Information Systems, NLDB 2014, Montpellier, France, June 18–20, 2014. Proceedings, volume 8455 of Lec- ture Notes in Computer Science, Berlin Heidelberg, Germany, 2014. L’Unité Mixte de Recherche Territoires, Environnement, Télédétection et Information Spatiale (TETIS), Springer.

TETIS???

[102] N. Nakashole, G. Weikum, and F. Suchanek. PATTY: A taxonomy of relational patterns with semantic types. In EMNLP-CoNLL 2012, 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference, pages 1135–1145, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.

Stroudsburg, PA, USA is not the place of the venue, as in most other references

Review #2
By Chris Welty submitted on 26/Apr/2016
Suggestion:
Accept
Review Comment:

I disagreed with the other reviewers and thought the paper provided a valuable resource ONLY IF it can be published quickly. Quit nit-picking and get this out there, its a reference not a result.

Review #3
Anonymous submitted on 08/May/2016
Suggestion:
Minor Revision
Review Comment:

Overall Rate

The reviewer appreciate the restructuring of the introduction about the overall organisation of Sections 4 and 5. It makes clearer also the terminology adopted in some subsections, that is actually consistent.

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.

Very high suitability, as outlined in my previous review.

(2) How comprehensive and how balanced is the presentation and coverage.

With proper justifications the authors restrict the focus very well on the wide existing literature, and the discussion is well balanced.

(3) Readability and clarity of the presentation.

The current version has been revised and it is much better at the readability level. It makes the overall picture clearer and more consistent.

(4) Importance of the covered material to the broader Semantic Web community.

The paper tries to discuss the guidelines for future work, just as a summary of current trends. I appreciated the discussion, although it has been concentrated in a too limited space at the end.

Pointwise comments.

Please avoid capital letters after column e.g. pag. 1375-2587
"Our contributions are threefold: First, we complement existing
work with 72 publications about 62 systems ..."
or pag. 20
"This survey follows a strict discovery methodology: Objective inclusion and exclusion"
or pag. 4
"This accellerates the evolution of QA in four different ways: Firstly, new systems"

At pag. 7, please change
"More elaborate normalizations use natural language programming (NLP)"
into
"More elaborate normalizations use natural language processing (NLP)"

Please remove "talks about their" in pag. 9

"... the time consuming querying step, by combining restrictions for each meaning.
talks about their
Each term is mapped to a Dependency-based ..."


Comments

We thank the reviewers for the detailed descriptions of the bibliography problems and further minor errors. We corrected them as requested.

However, we require clarification on 3 minor issues:

1. As far as we know, capital letters are used after a colon if more than one sentence follows, thus we did not change the capitalization in two of the cases requested by reviewer #3 (First,... . Second, … . Finally, ...). The following sentence was rewritten using a semicolon: “This survey follows a strict discovery methodology; Objective inclusion and exclusion...”

2. It was commented that we did not provide the place of the conference venue for proceedings. As far as we know, the BibTeX “address” field is meant for the address of the publisher, not the address of the conference. This is also mentioned in the Semantic Web Journal FAQ (http://www.semantic-web-journal.net/faq#q10): “[...]Author(s), title of paper, editor(s) of the proceedings, title of the book (as on the book cover, i.e. the conference acronym does not suffice), publisher, location (city) of the publisher, year of publication, page numbers.[...]”

3. We did not understand the comment “TETIS???” on the NLDB 2014 proceedings. Is TETIS not the correct organization? On the conference web page , http://www.lirmm.fr/nldb2014/submission.html, three organizers are listed: “TETIS”, “cirad” and “irstea” with TETIS on top. We agree that it looks a bit strange that the organization is put between publisher and publisher address but this is decided by the bibtex style “abbrvnat” and mentioned in the SWJ FAQ: “note that the order of the items is defined by the publisher's style files, and ordering is done automatically if using BibTeX: ”.

This is the bibtex entry in paper.bib:

@proceedings{nldb2014,
title={Natural Language Processing and Information Systems: 19th International Conference on Applications of Natural Language to Information Systems, NLDB 2014, Montpellier, France, June 18--20, 2014. Proceedings},
author={M{\'e}tais, Elisabeth and Roche, Mathieu and Teisseire, Maguelonne},
volume={8455},
year={2014},
publisher={Springer},
address={Berlin Heidelberg, Germany},
series = {Lecture Notes in Computer Science},
organization = {L'Unit{\'e} Mixte de Recherche Territoires, Environnement, T{\'e}l{\'e}d{\'e}tection et Information Spatiale (TETIS)}
}

Just a few comments:

Re. 2: you are correct that the BibTeX address field is meant for the address of the publisher. Note that sometimes the location of the conference is part of the book title, though.

Re. 3: If in doubt, you can use the bibliographic information given by the publishing house, in this case you find it at http://www.springer.com/us/book/9783319079820

Re. 1: I honestly don't know, use your best judgement.