Bio-SODA - A Question Answering System for Domain Knowledge Graphs

Tracking #: 2601-3815

This paper is currently under review
Ana Claudia Sima
Tarcisio Mendes de Farias
Maria Anisimova
Christophe Dessimoz
Marc Robinson-Rechavi
Erich Zbinden
Kurt Stockinger

Responsible editor: 
GQ Zhang

Submission type: 
Full Paper
The problem of question answering over structured data has become a growing research field, both within the relational database and the Semantic Web community, with significant efforts involved in question answering over knowledge graphs (KGQA). However, many of these approaches are specifically targeted at open-domain question answering, and often cannot be applied directly in complex closed-domain settings of scientific datasets. In this paper, we focus on the specific challenges of question answering over closed-domain knowledge graphs and derive design goals for KGQA systems in this context. Moreover, we introduce our prototype implementation, Bio-SODA, a question answering system that does not require training data in the form of question-answer pairs for generating SPARQL queries over closed-domain KGs. Bio-SODA uses a generic graph-based approach for translating questions to a ranked list of candidate queries. Furthermore, we use a novel ranking algorithm that includes node centrality as a measure of relevance for candidate matches in relation to a user question. Our experiments with real-world datasets across several domains, including the last official closed-domain Question Answering over Linked Data (QALD) challenge – the QALD4 biomedical task – show that Bio-SODA outperforms generic KGQA systems available for testing in a closed-domain setting by increasing the F1-score by at least 20% across all datasets tested. We also provide a new bioinformatics benchmark with complex queries drafted in collaboration with domain experts. The experimental results show that for these types of real-world queries, the advantage of Bio-SODA is even more significant by outperforming state-of-the-art systems up to 46% improvement in the F1-score.
Full PDF Version: 
Under Review