Abstract:
This paper demonstrates that the presence of blank nodes in RDF data
represents a problem for distributed processing of SPARQL
queries. It is shown that the usual decomposition strategies from
the literature will leak information---even when information derives
from a single source. It is argued that this leakage, and the proper
reparational measures, need to be accounted for in a formal
semantics. To this end a set semantics for SPARQL is generalized
with a parameter representing execution contexts. This makes it
possible to keep tabs on the naming of blank nodes across execution
contexts, which in turn makes it possible to articulate a
decomposition strategy that is provably sound and complete wrt. any
selection of RDF sources even when blank nodes are allowed. Alas,
this strategy is not computationally tractable. However, there are
ways of utilizing knowledge about the sources, if one has it, that
will help considerably.