Recursion in SPARQL

Tracking #: 2276-3489

Authors: 
Adrián Soto
Juan Reutter1
Domagoj Vrgoč1

Responsible editor: 
Oscar Corcho

Submission type: 
Full Paper
Abstract: 
The need for recursive queries in the Semantic Web setting is becoming more and more apparent with the emergence of datasets where different pieces of information are connected by complicated patterns. This was acknowledged by the W3C committee by the inclusion of property paths in the SPARQL standard. However, as more data becomes available, it is becoming clear that property paths alone are not enough to capture all recursive queries that the users are interested in. In this paper we propose a general purpose recursion operator to be added to SPARQL, formalize its syntax and develop algorithms for evaluating it in practical scenarios. We also show how to implement it as a plug-in on top of existing systems and test its performance on several real world datasets.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 14/Oct/2019
Suggestion:
Major Revision
Review Comment:

This work proposes and formalizes a recursion operator to be added to SPARQL. The manuscript is well written, easy to read. It handles a real problem and an interesting problem. The work is a concrete and in­depth research emphasizing the importance of the study to get the attention of the audience. The work is mostly clearly presented but it would be good to say more about its contribution. The abstract is completely the same with respect to the work published 4 years ago at ISWC. Personally, I think the abstracts should be different or at least this abstract must focus on what is the contribution of this work when it is compared against the previous work. Also, the content of the introduction is broadly similar. For reader, it would be useful if a more detailed explanation is included into the introduction to highlight the new findings.

Review #2
Anonymous submitted on 17/Nov/2019
Suggestion:
Minor Revision
Review Comment:

The study of recursion in SPARQL is very well-motivated. The problem is of practical importance, is challenging, and unresolved. Hence, the current paper is timely and has excellent potential for real impact.

The work presented is novel and non-trivial. The proposed extension, reminiscent of recursion in SQL, is natural and effective. The foundational results are very welcome and, to my reading, correct. The practical algorithmic results show the potential for uptake in the community. The experimental results are promising, demonstrating the potential for practical impact.

The writing is clear and effective. Altogether, this is a very fine manuscript presenting timely, interesting, and relevant results.

I have only two remarks for improving the paper. First, given the work ongoing in the property graph world on query language design and engineering, extending the related work discussion to make bridges to this world would be very helpful. For example, what is the relationship of recursive SPARQL and linear-recursive SPARQL to the Regular Queries of Reutter, Romero, and Vardi? A recent survey can be found in Bonifati et al. Querying Graphs, Morgan & Claypool, 2018.

Second, the experimental study should be extended, to show the scalability on larger graphs, on the order of 10's of millions to billions of triples. The current scalability study is limited to very small graphs (GMark Graph1, ..., Graph3), and hence not very insightful/convincing. I would encourage the authors to dig deeper here.

Review #3
Anonymous submitted on 29/Jan/2020
Suggestion:
Major Revision
Review Comment:

The paper presents a proposal for the introduction of recursion in SPARQL,
beyond the property paths already adopted by SPARQL itself. The idea is
similar to that of Datalog, where "outputs" of rules (i.e., inferred ground
facts) can be used as inputs for other rules until a fixpoint is reached.

The introduction and motivation are clearly exposed in section 1; the PROV-O
example appear indeed convincing. However, the related work fails to mention
relevant approaches to recursive queries in SPARQL, such those in the paper

"Expressive languages for querying the Semantic Web" by M. Arenas, G. Gottlob
and A. Pieris, TODS, 2009

which was published a while ago. Since Datalog recursion is mentioned in the
paper (see also my comments below), a thorough comparison with the Datalog-like
languages for the Semantic Web is surely needed here.

Section 2 is concise and generally clear. I recommend a short explanation of
what IRIs are, so as to clarify their role in the syntax and semantics later
explained. Moreover, the syntax and semantics of SPARQL should be defined
formally and in full, so as to make the technical statements in the paper
comprehensible to the reader.

Section 3 deals with the formal aspects of recursion in SPARQL. The technical
part here needs more detail. The discussion on negation and stable model
semantics in section 3.2 is sound but too high-level; stratified negation is
not mentioned; the stable model semantics is merely skimmed, while it should
have been formally defined. Also, the statement "one could simply choose to
allow [negation and operators that simulate it], at the risk [of] allowing
users to write queries that enter an infinite loop". Here the approach to the
language is completely procedural, which appears not to be suitable; if the
semantics is not defined, allowing the use of the operators in question gives
no guarantee on the query results and makes little sense. In fact, the whole
section 3.2 relies on a procedural approach. In the example in the paragraph
"Creating values", it seems that new triples are inserted but it is not clear
at all what the problem is; why is the insertion non-monotone? Why cannot
answering be defined here?

The "Queries we support" paragraph should include, at the risk of being
pedantic, a formal and clear definition of the adopted language. The
complexity analysis in proposition 3.1 seems rather simple and relying on [35].

Section 4 deals with "realistic recursion". The definition at the beginning of
the section is imprecise. It *defines* the answer ans(Q,D) by means of a
recursion that only considers D \union D_i \ D_(i-1), while it should more
clearly state (as explained in the text) that such recursion is equivalent to
the "normal" one in the case of linear queries. Also, linear Datalog (and its
definition) should be explained here in its correspondence to the definition of
linear queries in this paper.

The algorithm in section 4.1 (algorithm 2) is seemingly based on known
techniques, and a survey is cited [20], which considers Datalog. The novelty
of algorithm 2 as it is presented is not at all clear.

Section 4.2 considers limiting the depth of the recursion. Again, while the
content is sound, the approach leaves to be desired as it considers a merely
procedural approach to the language without providing a semantics for the
restriction. This makes the syntactic addition in question is of very limited
interest to the community.

The experimental evaluation in section 5 is generally sound and I won't discuss
details about it. The datasets are well chosen and the use cases appear
reasonable. The comparison with the evaluation of property paths is sound,
though I would not call the comparison "unfair" as it is done at page 13;
rather, the authors should leave as future work the possibility of treating
property paths with an ad-hoc technique.

As a general remark, this reviewer notes that in the experimentation again a
question arises, which underlies the whole research presented here: what is the
relationship with recursion in Datalog-like languages? It is known that SPARQL
queries can be treated by means of Datalog-like languages, of which several
implementations, variously optimized, exist. What if the proposed approach was
treated by using one of these languages? This could at the very least shed
more light on the theoretical part, without hindering the meaningfulness of the
proposal of adding recursion to the SPAQRL standard. Moreover, if recursion is
to be implemented as a plug-in on top of existing systems, as done in the
paper, the techniques adopted could well rely on other algorithms and
languages.

This paper, in the opinion of this reviewer, should be revised by including a
thorough comparison with other approaches to recursion in SPARQL, showing why
and how the proposed algorithms are better that the state of the art. The
above technical remarks should be addressed so as to make the paper more
readable and self-contained. Finally, a thorough proofreading is required; a
few typos (not a complete list) are below:

- page 3, 2nd col: "de usual syntax"

- page 6, 2nd col: "the risk allowing" -> "the risk of allowing"

- page 7, 2nd col: "much more SPARQL queries" -> "many more SPARQL queries"

- page 8, 1st col: "First two classes" -> "The first two classes".