Review Comment:
The paper presents a proposal for the introduction of recursion in SPARQL,
beyond the property paths already adopted by SPARQL itself. The idea is
similar to that of Datalog, where "outputs" of rules (i.e., inferred ground
facts) can be used as inputs for other rules until a fixpoint is reached.
The introduction and motivation are clearly exposed in section 1; the PROV-O
example appear indeed convincing. However, the related work fails to mention
relevant approaches to recursive queries in SPARQL, such those in the paper
"Expressive languages for querying the Semantic Web" by M. Arenas, G. Gottlob
and A. Pieris, TODS, 2009
which was published a while ago. Since Datalog recursion is mentioned in the
paper (see also my comments below), a thorough comparison with the Datalog-like
languages for the Semantic Web is surely needed here.
Section 2 is concise and generally clear. I recommend a short explanation of
what IRIs are, so as to clarify their role in the syntax and semantics later
explained. Moreover, the syntax and semantics of SPARQL should be defined
formally and in full, so as to make the technical statements in the paper
comprehensible to the reader.
Section 3 deals with the formal aspects of recursion in SPARQL. The technical
part here needs more detail. The discussion on negation and stable model
semantics in section 3.2 is sound but too high-level; stratified negation is
not mentioned; the stable model semantics is merely skimmed, while it should
have been formally defined. Also, the statement "one could simply choose to
allow [negation and operators that simulate it], at the risk [of] allowing
users to write queries that enter an infinite loop". Here the approach to the
language is completely procedural, which appears not to be suitable; if the
semantics is not defined, allowing the use of the operators in question gives
no guarantee on the query results and makes little sense. In fact, the whole
section 3.2 relies on a procedural approach. In the example in the paragraph
"Creating values", it seems that new triples are inserted but it is not clear
at all what the problem is; why is the insertion non-monotone? Why cannot
answering be defined here?
The "Queries we support" paragraph should include, at the risk of being
pedantic, a formal and clear definition of the adopted language. The
complexity analysis in proposition 3.1 seems rather simple and relying on [35].
Section 4 deals with "realistic recursion". The definition at the beginning of
the section is imprecise. It *defines* the answer ans(Q,D) by means of a
recursion that only considers D \union D_i \ D_(i-1), while it should more
clearly state (as explained in the text) that such recursion is equivalent to
the "normal" one in the case of linear queries. Also, linear Datalog (and its
definition) should be explained here in its correspondence to the definition of
linear queries in this paper.
The algorithm in section 4.1 (algorithm 2) is seemingly based on known
techniques, and a survey is cited [20], which considers Datalog. The novelty
of algorithm 2 as it is presented is not at all clear.
Section 4.2 considers limiting the depth of the recursion. Again, while the
content is sound, the approach leaves to be desired as it considers a merely
procedural approach to the language without providing a semantics for the
restriction. This makes the syntactic addition in question is of very limited
interest to the community.
The experimental evaluation in section 5 is generally sound and I won't discuss
details about it. The datasets are well chosen and the use cases appear
reasonable. The comparison with the evaluation of property paths is sound,
though I would not call the comparison "unfair" as it is done at page 13;
rather, the authors should leave as future work the possibility of treating
property paths with an ad-hoc technique.
As a general remark, this reviewer notes that in the experimentation again a
question arises, which underlies the whole research presented here: what is the
relationship with recursion in Datalog-like languages? It is known that SPARQL
queries can be treated by means of Datalog-like languages, of which several
implementations, variously optimized, exist. What if the proposed approach was
treated by using one of these languages? This could at the very least shed
more light on the theoretical part, without hindering the meaningfulness of the
proposal of adding recursion to the SPAQRL standard. Moreover, if recursion is
to be implemented as a plug-in on top of existing systems, as done in the
paper, the techniques adopted could well rely on other algorithms and
languages.
This paper, in the opinion of this reviewer, should be revised by including a
thorough comparison with other approaches to recursion in SPARQL, showing why
and how the proposed algorithms are better that the state of the art. The
above technical remarks should be addressed so as to make the paper more
readable and self-contained. Finally, a thorough proofreading is required; a
few typos (not a complete list) are below:
- page 3, 2nd col: "de usual syntax"
- page 6, 2nd col: "the risk allowing" -> "the risk of allowing"
- page 7, 2nd col: "much more SPARQL queries" -> "many more SPARQL queries"
- page 8, 1st col: "First two classes" -> "The first two classes".
|