Estimating query rewriting quality over LOD

Tracking #: 1827-3040

Authors: 
Ana Torre
Jesús Bermúdez
Arantza Illarramendi

Responsible editor: 
Guest Editors IE of Semantic Data 2017

Submission type: 
Full Paper
Abstract: 
Nowadays it is becoming increasingly necessary to query data stored in different datasets of public access, such as those included in the Linked Data environment, in order to get as much information as possible on distinct topics. However, users have difficulty to query those datasets with different vocabularies and data structures. For this reason, it is interesting to develop systems that can produce on demand rewritings of queries. Moreover, a semantics preserving rewriting cannot often be guaranteed by those systems due to heterogeneity of the vocabularies. It is at this point where the quality estimation of the produced rewriting becomes crucial. In this paper we present a novel framework that, given a query written in the vocabulary the user is more familiar with, the system rewrites the query in terms of the vocabulary of a target dataset. Moreover, it also informs about the quality of the rewritten query with two scores: firstly, a similarity factor which is based on the rewriting process itself, and secondly, a quality score offered by a predictive model. This model is constructed by a machine learning algorithm that learns from a set of queries and their intended (gold standard) rewritings. The feasibility of the framework has been validated in a real scenario.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Roberto García submitted on 24/Feb/2018
Suggestion:
Accept
Review Comment:

The authors have addressed my only concern. They added a paragraph to clarify how they generated the gold standard.

Review #2
By Luiz André Portes Pais Leme submitted on 02/Mar/2018
Suggestion:
Accept
Review Comment:

This paper tackles an important problem for spreading the use of Linked Data on the Web which is the automatic data integration and has improved greatly since the first version. All pointed out issues were adequately handled and it is ready for publishing.