Materialisation approaches for Façade-based data access with SPARQL

Tracking #: 3331-4545

This paper is currently under review
Luigi Asprino
Enrico Daga
Justin Dowdy
Aldo Gangemi
Paul Mulholland

Responsible editor: 
Ruben Verborgh

Submission type: 
Full Paper
The Knowledge Graph concept is gaining momentum as an ideal approach to data integration. Therefore, it is of paramount importance to equip knowledge engineers with tools for accessing data from multiple, heterogeneous resources. The successful W3C standard SPARQL is the reference language for interacting with RDF knowledge graphs. For that reason, approaches extend SPARQL for accessing data in non-RDF formats. Recent research proposes relying on an intermediate RDF model, named Façade-X, whose components can be transparently mapped to various file formats. However, although Façade-X specifies how its components map to many different formats (CSV, JSON, HTML, Markdown, and others), it is still unclear how to implement a SPARQL execution engine that relies on it. In other words, what are the possible strategies for executing Façade-X queries? This article explores materialisation approaches for executing Façade-X queries. Specifically, we study two in-memory strategies for performing Façade-X data access with SPARQL. A complete materialised view strategy fully transforms the data source into RDF. Instead, a sliced materialised view strategy segments the data source and generates an RDF view on each part. Both strategies can be optimised by only materialising the part of the RDF graph that has potential matches with triple patterns in the query (triple-filtering). In addition, we compare these approaches with an on-disk alternative, which relies on a temporary database instance. We analyse the characteristics of these methods and perform extensive experiments, reporting on benefits and limitations of both approaches. Finally, we contribute guidelines and best practices derived from the findings.
Full PDF Version: 
Under Review