Abstract:
Recent and intensive research in the biomedical area enabled to
accumulate and disseminate biomedical knowledge through various
knowledge bases increasingly available on the Web. The exploitation of
this knowledge requires to create links between these bases and to use
them jointly. Linked Data, the SPARQL language and interfaces in
natural language question answering provide interesting solutions for
querying such knowledge bases. However, while using biomedical Linked
Data is crucial, life-science researchers may have difficulties using
the SPARQL language. Interfaces based on natural language question
answering are recognized to be suitable for querying knowledge
bases. In this paper, we propose a method for translating natural
language questions into SPARQL queries. We use Natural Language
Processing tools, semantic resources and RDF triple descriptions. We
designed a four-step method which allows to linguistically and
semantically annotate questions, to perform an abstraction of these
questions, then to build a representation of the SPARQL queries, and
finally to generate the queries. The method is designed on 50
questions over three biomedical knowledge bases used in the task 2 of
the QALD-4 challenge framework and evaluated on 27 new questions. It
achieves good performance with 0.78 F-measure on the test set. The
method for translating questions into SPARQL queries is implemented as
a Perl module and is available at
http://search.cpan.org/~thhamon/RDF-NLP-SPARQLQuery/.