Review Comment:
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.
The paper defines and studies Extended Property Paths (EPPs), which extend the Property Paths (PPs) of the SPARQL query language with features such as path and node tests, path conjunction and path difference. Although the non-recursive fragment of EPPs can be translated into SPARQL (as shown in the paper), EPPs can be more concise. In addition, the paper shows that full EPPs could be handled by existing SPARQL processors by a reasonably small modification to one of the evaluation functions.
The paper also studies the problem of answering queries where the reasoning capabilities of the RDFS entailment regime are taken into account. The authors show that this reasoning can be captured by query rewriting using EPPs. Empirical studies reported in the paper show that the overhead of the rewritings is not very large.
The paper is comprehensive and well written, and presents an interesting extension to SPARQL that provides increased expressive power. However, the novelty of EPPs is rather limited. As mentioned in fact by the authors on several occasions, many of the features of EPPs as well as results on expressiveness and computational complexity carry over from languages based on Nested Regular Expression (NREs), which have been studied by various authors over a number of years. Other features of EPPs are present in languages such as GXPath. Nevertheless, the focus on extending SPARQL with such features is useful. As a result, I believe the paper would be of interest and use to a significant number of researchers.
More specific comments follow below.
Since NRE-based languages specifically introduced tests to check existence of paths satisfying conditions, I think that it should be clarified in Example 3 what aspects of TP used in the example are not supported by them. This is done for Example 4, e.g.
Form a practical point of view, the motivation for using nEPPs (rather than EPPs) in SPARQL seems weak. They cannot be combined with PPs because of overlap in syntax, and one wouldn't want to replace PPs with nEPPs because the former are more expressive.
In the discussion on the conciseness of EPPs compared to the translation on p 16, an example using p1/p2 is used and it is pointed out that the translation produces two triple patterns with a connecting variable. However, p1/p2 could be translated into SPARQL as is, so the example hardly motivates the conciseness of EPPs. Also I fail to see the point of mentioning that there are infinitely many possible translations and the algorithm described implements one of them.
Again on p 26, the paper claims that navigational queries can be written more succinctly using nEPPs rather than SPARQL. As evidence of this, the lengths of nEPPs and equivalent SPARQL queries are compared. However, the SPARQL queries used are those resulting from the translation presented in the paper. It is quite conceivable that an automated translation will produce much more verbose queries than natively hand-coded queries, so the comparison is unfair. It would be more convincing to present results on conciseness, independent of the translation.
Theorem 26 (on p 21) shows that there is an EPP query that cannot be expressed as a PP query. In the proof, the EPP query using triple pattern ?b (p & q)* ?e expressed on the graph G in Figure 13 is used. The claim is that no PP query returns exactly the same result on G. In the inductive step for P = P_1 AND P_2, it is claimed that "all the additional answers" from evaluating P_1 and P_2 on G "will not be discarded" because they both return all the "self" bindings. However, all that can be deduced is that all the "self" bindings will be returned; other bindings may not be. So one cannot conclude that P_1 AND P_2 will return more answers than the EPP query. In fact, it seems to me that the PP pattern ?b p ?e . ?b q ?e will return exactly the answers returned by the EPP query (although obviously not being equivalent to it in general). So I think a more complex graph is needed to prove this, and the proof needs to be fixed.
It is also claimed on p 22 that it follows from Theorem 26 that a particular kind of recursive EPP query cannot be evaluated using the standard PP machinery. But this doesn't follow (even if the proof of the theorem is corrected) because the query used here is syntactically different from the one used in the proof of the theorem. The query used here includes "/" and TP, whereas the one in the theorem uses "&".
Earlier on p 22, it states that for the base case of p, the query must be rewritten using EPP constructs. However, the corresponding entry in Table 9 shows only PP is necessary. This needs to be clarified or fixed.
In the related work section, the focus is (correctly) on SPARQL and extensions to it. So on p 29, e.g., it is stated that "only EPPs allow to test node values in a nested expression" and "none of these extensions can express the Italian exclusive friends query mentioned in the Introduction". However, I think it should be pointed out that languages such as GXPath (or certainly conjunctive GXPath) can express both of these kinds of queries. Section 9.2 does mention GXPath [43] but dismisses it because it is not SPARQL-based. I believe readers who are not familiar with the expressive power of GXPath may be left with a incorrect impression of the significance of the work reported in the present paper. By the way, the full version of the GXPath paper appeared as "Querying Graphs with Data" in J ACM in March 2016.
The following is a list of more minor comments and typos.
abstract, l 2: "evaluation and," -> "evaluation, and"
p 2, l 1: "neither are" -> "are neither"
p 2, ex 1: "request" -> "requests"
(many places): "NREs-based languages" -> "NRE-based languages"
p 3, fig 1 caption: insert "a" before "Knowledge"
p 3, ex 6: "arbitrary length with labels" is clumsy; perhaps "arbitrary length comprising edges labeled"
p 4, l 6: "EPP expressions" -> "expression"
p 4, l 8: in fact the standard uses a single polymorphic function named ALP, not ALP1
p 4, ex 7: the expression from Example 6 cannot be rewritten as shown since the two expressions are not equivalent.
p 4, l -4,-5: "a mean of" -> "a means of"
p 4, c 2, l -17: "multiset" -> "multisets"
p 4, c 2, l -9: "into" -> "using"
p 6, l -10: "the only" -> "only the"
According to the definitions on pages 5 and 6, an RDF triple cannot have a literal in the subject position but a SPARQL triple pattern can.
p 7: double "then" (after definition of 'optional')
p 8: "spec." -> "specification" (and elsewhere)
p 8: second "ALP1" -> "ALP2" (see comment above about the standard); also the ALP1 and ALP2 functions seem to be from [48] in fact.
p 8: "NREs-like" -> "NRE-like"
p 8, def 12: delete "is" after "epp"
p 8: "a IRI" -> "an IRI"
p 9, Table 3: In R2, the result should be a set of mappings, not a projection over two variables; also, shouldn't "?v" be "?v_n"?
p 9: delete "it" before "is translated"
p 9: space missing between ":leaderParty" and "_s"
p 10, fig 8, caption: insert "of" after "semantics"
p 10: "denotes" -> "denote" (l 11)
p 10, ex 13: "in predicate" -> "in the predicate"; "mapping" -> "mappings"; insert "is bound" after "?v_R)"
p 10: delete "are" (c 2, l 4)
p 11: delete "text" before ":Carrara"; "that i" -> "that is"
p 12: add comma before "a position symbol"
p 12: "wrt" -> "with respect to"
p 12: function "coor" -> "corr"
p 13: "same type of" -> "same type as"
p 13: "translates in" -> "translate to"
p 13: "test are" -> "tests are"
p 14: "EPPs test" -> "EPP tests" (Table 5 caption)
p 14: "in the fact" -> "to the fact"
p 14: "in input" => "as input"
p 14: "To given" -> "To give"
p 14: "a nEPP" -> "an nEPP"
p 14: "of the for" -> "of the form"
p 14: "in translated" -> "is translated"
p 14 "a EBC" -> "an EBC"
p 16: "a EPPs" -> "an EPP"
p 16: "Exended" -> "Extended"
p 16: "changes into the" -> "changes to"
p 16: "table is the" -> "table is that the"
p 17: "full EPPs" -> "ful EPP"
p 17: "Fig. 5" -> "Fig. 6"
p 17: the second aim of the section (5) is actually in Section 6.2, so this needs to be rewritten.
p 17: "of RDFS vocabulary consisting in" -> "of the RDFS vocabulary consisting of"
p 17: "Authors" -> "The authors"
p 17: include the translation Phi as part of what is given in Lemma 22.
p 17: "three expression" -> "three expressions"
p 18: "UL" in the caption of Table 7 should be bold "IL"
p 18 "Enconding" -> "Encoding"
p 18 "capture rule" -> "captures rule"
p 19: "such type of expressions" -> "such types of expressions"
p 21: "in virtue of" -> "by virtue of"
p 21: "than all" -> "then all"
p 22: "close language" -> "closed language"
p 22: "our rewriting" -> "our rewritings"
p 22: "regime" -> "regimes" (c 2, l 9)
p 22: "Authors" -> "The authors"
p 22: "symmertricProperty" -> "symmetricProperty"
p 22: "give a receipt"?
p 22: "currently support" -> "currently supports"
p 22: "as a future" -> "as future"
p 22: "KG" -> "KGs"
p 22: "Semantics" -> "semantics"
p 23: Table 10 uses both "|" and ":" to mean the same thing; also the definitions given for R4 and R9 use "|" and "wedge" incorrectly - rather use "where" and "and"
p 24: "wrt" -> "with respect to"
p 24: "EPPs language" -> "EPP language"
p 25: "ran" -> "run"
p 25: "nEPPs constructs" -> "nEPP constructs"
p 25: "characters long nEPPs" -> "character long nEPP"
p 26: "EPPs" -> "EPP" (l 2); "iEPPs" -> "iEPP" (l 3)
p 26: "including" -> "comprising"
p 26: "of a total" -> "with a total"
p 26: insert "are evaluated" before "via Jena"
p 27: "under entailment" -> "under the entailment"
p 27: remove comma after "in some cases"
p 27: add "operators" after "UNION"
p 28: "available into" -> "available for"?
p 28: "from some time" -> "for some time"
p 29: "it is the language having less" -> "it has fewer"
p 29: "languages but" -> "languages except"
p 29: "wrt" -> "with respect to"
p 30: "GxPAth" -> "GXPath"
p 30: "wrt" -> "with respect to"
p 30: "2-EXPTIME" what, hard or complete?
p 30: "Reutters" -> "Reutter"
p 31: "such extension" -> "such an extension"
p 31: reference [14] is not 543 pages long!
p 32: presumably reference [51] has now appeared
|