Review Comment:
General comments:
This is a very relevant paper for scientists and practitioners who are interested in understanding and using SPARQL as the query language for Linked Data sources. Although there have been previous papers (some of them published in the Semantic Web journal, including at least one of the authors of this paper) on this general topic, this paper focuses on adding a feature, property paths, that appeared in the SPARQL1.1 specification, reflects clearly the need to traverse the graph nature of Linked Data on the Web, and unfortunately, most probably because of a syntactic restriction, did not allow in the case of query federation to consider easily the traversal over different graphs. In any case, this last drawback is indeed not a drawback in the context of Linked Data, where the notion of RDF graphs is not so relevant.
Now I provide a single statement about each of the three main features to be discussed in reviews for full papers at SWJ:
(1) Originality: As discussed above, this is not the first paper that discusses how to use SPARQL for querying Linked Data, and goes into the understanding of the formal foundations of SPARQL querying over Linked Data. However, this paper adds the analysis of the simple but very powerful feature of property paths, and for this it introduces a family of reachability-based semantics, the notion of Web-safeness and the demonstration of the existence of a syntactic property that allows determining the web-safeness of a query. This is considered enough for a new publication on this area, especially on top of the corresponding previous conference paper.
(2) Significance of the results: With an increasing set of approaches and tools exploring the space of Linked Data querying in contrast with the federated query processing approaches over SPARQL endpoints that were explored some years ago, these foundational results are very relevant to understand the characteristics and limits of SPARQL querying over Linked Data. Some of the proposed formalisations are even directly usable for those approaches (e.g., the Web-safeness proof).
(3) Quality of writing: The paper is easy to read (obviously, dense as any formal work of this type) and the formalisations are well done. The paper presents clearly the foundations behind the formalisations that are presented, so that it allows understanding the main advances presented here. The paper would only benefit from some additional examples, which may be obviously omitted to avoid having an excessive length.
Additional comments:
As it can be seen from my comments above, I have really enjoyed reading this very well-written paper, which I acknowledge that provides a very clear view on the current state of the art in Linked Data querying, and on the formalisations required for dealing with property paths.
Now I will go for some very detailed comments on aspects that I think that may be errors, or that at least would require some further explanation.
In section 3.1, on the second paragraph, you say that alpha belongs to the set I union L union V. I may be wrong, but I cannot see how L can be included here, as literals are no allowed to be the subject of triples in RDF.
In page 6, just above Definition 5, the authors include a formula {s,p,o} intersection {s',p',o'} intersection B not equal empty set. However, I cannot see how these sets can be intersected. I think that you are abusing notation here. I suggest reviewing and possibly rewriting this formula.
This is a more general statement. You aim at allowing that in any RDF document there may be triples whose subject is different from the URI that one is reaching. I know that this is not unnormal in Linked Data querying, since sometimes we need to add triples about subjects that are not the ones that we are exposing, for completeness sake, or just for convenience, but I wonder whether this too general approach is actually impacting too much your formalisation. I would feel even more confident with a formalisation that only focuses on accessing triples that have as subject the URI that you have just de-referenced. Just an opinion (I am happy to accept the another option). This came to my mind when checking the last paragraph in definition 5, where you comment about some u belonging to {s,p,o}, where I would forget about the s.
I can say that I really liked section 4 on Web-aware semantics of property paths. However, at the end there is only one place where I would need a further explanation, which is when you say "For instance, the PP pattern .... context-based semantics)". I must admit that I did not see it very clearly.
In terms of section 5, I have only one concern for which I would like an explanation from the authors (I was even about to put a minor revision comment in my suggestion so as to force that answer, which I am sure that will come). You comment on the footnote of page 10 that adding some features of SPARQL would just be an exercise without major implications on your results. However, I have the impression that it is not so easy, and I may ask for a tech-report or alike (a 1-page demostration), to indicate how you would deal with triple patterns with a variable as predicate. I really like to understand whether that one is so easy to handle.
In definition 29 you have not defined the symbol "tilde".
One simple thing for the evaluation (I am very sensitive to this since I have been recently working with early-stage PhD students on defining their evaluation setups). I do not think that the characteristics of the computer setup that you are using in section 7 is very relevant taking into account the experiments that you do. I would remove the comment on the MacBook Pro. Other than that, I was not thinking that any experimentation was actually needed for this paper, but I found the experiments, which are correctly called experiments rather than evaluation, something that I like a lot, very useful to let people understand the expected behaviour of the different approaches discussed.
Typos (some of them on section 6 - please recheck that section for grammar and typos):
- choses --> chooses
- algorithm consist of --> consists
|