Differential Privacy and SPARQL

Tracking #: 2610-3824

This paper is currently under review
Carlos Buil Aranda
Jorge Lobo
Federico Olmedo

Responsible editor: 
Guest Editors ST 4 Data and Algorithmic Governance 2020

Submission type: 
Full Paper
Differential privacy is a framework that provides formal tools to develop algorithms to access databases and answer numerical and statistical queries with quantifiable accuracy and privacy guarantees. The notions of differential privacy are defined independent of the data model and the query language. Most results have been on aggregation queries such as counting or finding maximum or average values, and on grouping queries over aggregations such as the creation of histograms. The data model has been typically the relational model and the query language SQL. However, good realizations of deferential privacy for queries that required joins had been limited. This has imposed severe restrictions on applying differential privacy in RDF knowledge graphs and SPARQL. By the simple nature of RDF data, most interesting queries accessing RDF graphs will require intensive use of joins. Recently though, new techniques have been developed that can be applied to many types of joins in SQL with reasonable results. This opened the question of whether these new definitions can be transferred to RDF and SPARQL. In this paper we provide a positive answer to this question by presenting an algorithm that can answer count queries over a large class of SPARQL queries that guarantees differential privacy, if the RDF graph is accompanied with some natural semantic information about its structure. We have implemented our algorithm and conducted several experiments, showing the feasibility of our approach for large databases. Our aim has been to present an approach that can be used as a stepping stone towards extensions and other realizations of differential privacy for SPARQL and RDF.
Full PDF Version: 
Under Review