Review Comment:
# Summary
The reviewed paper is a survey on existing approaches for SPARQL query
relaxation, with a focus on the impact of RDF reification. SPARQL
query relaxation addresses the important problem of user queries that
return none or insufficient answers. This is a common problem when
SPARQL queries are composed manually because it is difficult for a
user to have a detailed knowledge of an RDF dataset and its schemas.
Reification is another important problem related to knowledge
representation in RDF, and hence to SPARQL querying. Because no
standard for reification has emerged yet, reification is generally
ignored in solutions to other problems, like query relaxation. The
most original part of the survey is precisely to analyse the impact of
reification when applying query relaxation approaches.
The survey is structured as follows:
- Section 2: methodology to select the papers covered by the survey, resulting in a selection of 13 approaches in 12 research papers
- Section 3: background that explains the main notions found through the different approaches
- Section 4: decription of the existing approaches in term of the notions explained in Section 3
- Section 5: multi-faceted comparison of existing approaches
- Section 6: focus on RDF reification, presenting the different reification models, comparing them, and finally analyzing their impact on query relaxation approaches
- Section 7: a short section on challenges and open issues related to reification
# (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.
The paper does a good job at explaining the key notions involved in
query relaxation. It should be accessible to anyone with a basic
knowledge of RDF, RDFS, and SPARQL.
It provides a unique resource to get a global and up-to-date overview
on SPARQL query relaxations approaches, along with their relative
strengths and weaknesses.
# (2) How comprehensive and how balanced is the presentation and coverage.
To the best of my knowledge, the set of presented approaches is
comprehensive, and those approaches are presented and compared in a
fair and accurate way.
In particular, two main categories of approaches are included, those
based on ontologies and entailment rules, and those based on
similarity of instances. It could have been tempting to consider only
the former, and considering the two makes the survey much more
interesting and valuable.
The survey is also comprehensive on the reification models, including
the most recent one, RDF-star. Although it is not yet a W3C
recommendation, it is the most promising method and its adoption is
rising fast.
# (3) Readability and clarity of the presentation.
The document is well written, and well illustrated with many concrete
examples. I particularly appreciated Section 3 (background), and
Section 5 (comparison). Section 2 (survey methodology) is also fine.
Section 4 has a lot of contents, an effort has been made to give an
accurate presentation of the different approaches without diving too
much in the details. However, it is organized as a long list of
approaches, and each approach is defined in a textual form that is
often difficult to follow. Many terms are put in italics to indicate
that they are technical terms coming from the surveyed papers but they
are often not defined. Here are examples of unclear contents:
- rules applied in reverse order
- matchings that are not matchings of other relaxed triple patterns of t'
- k-relevant approximate answers
- BR studies the problem of relaxing queries as a batch-based process... (paragraph)
- Section 4.2.2: not clear what is compared: the entities in the user query with other entities? the candidate approximate answers with the query constraints?
I would suggest to focus on the key intuitions that make each approach
interesting, to use formal notations when they are clearer than
convoluted sentences (e.g.: matchings that are not matchings of other
relaxed triple patterns of t'), and to use examples to pinpoint key
aspects of each approach (in contrast with other approaches).
Section 5 and its synthetic tables are really interesing. The
organization of comparison criteria make sense and are informative.
Section 6 is interesting but unsatisfactory in terms of organisation,
and contents at some points. First, the paper title says "under the
lens of RDF reification" but reification only comes at the end, when
most of the survey has been done. The section is mostly a micro-survey
on RDF reification, and then one page about its impact on query
relaxation. It would make more sense to have a subsection on
reification in Section 3 (background), and to add reification as a
comparison facet in Section 5 (comparison). The presentation of
reification could also be made shorter by merging comments on RDF
descriptions and SPARQL graph patterns, as they are the same.
Second, although it is true that query relaxation approaches have not
considered reification, 3 reification methods (out of 5) rely on
standard RDF so the approaches can be applied to such reified data. The
argumentation in Section 6.4 is a bit quick in saying that they don't
work or work badly.
- Query size: the increase is not that big, and some approaches have been shown to work on large queries.
- Syntax support: the extension of relaxation rules seems rather immediate for named graphs (from triples to quadruples), and for RDF-star (in <> q v, relax s, p, o, q, and v), even if such relaxations may not be optimal
- Relaxation over annotations: the argument focus on values, which are in my view an orthogonal concern. No existing approach seem to consider values as objects in triples. Here, two concerns are mixed: literal values and reification (metadata annotations).
For me, the problem is rather that reification methods rely on some shapes that can be broken by the relaxation rules.
- Metadata type. Same as above, a difference concern.
- Dataset size: I don't agree that ontology-based approaches are not impacted. They are not impacted in the relaxation process, but they are in the evaluation of relaxed queries, which is actually the bottleneck of those approaches in term of efficiency.
The impact of reification on query relaxation should be studied more
thoroughly. One should ask: what if I apply approach X on reification
model Y? Ideally, this should be studied experimentally. If this is
hardly possible, for lack of source code for instance, a more rigorous
reasoning should be conducted.
# (4) Importance of the covered material to the broader Semantic Web community.
As already discussed in the summary above, the survey cover two
important topics of general interest to the Semantic Web community:
SPARQL query relaxation and RDF reification.
# Minor comments
- Section 3.1: the rules in Table 2 are actually clearer than their
paraphrasing in the text. Maybe the text could be made more
informal, to convey the intuitive meaning of rules, and explain the
more technical aspects of Table 2.
- p6, line 27: avoid the line return for the 3rd case
- Section 3.2.2: the definition for query excludes UNION and
OPTIONAL. Is this intended? Are there not any approach dealing with
them ?
- Section 3.2.2: triple deletion does not appear as a possible
relaxation. Is this equivalent to replacing all terms in a triple by
a variable ? I don't think so. Some of the presented approaches do
perform triple deletion so maybe this should be presented in this
section as a possible relaxation.
- p7, line 48: Sim(C,C') is only when C' is a super-class of C, right?
This should be made explicit.
- p8, lines 4, 14: it seems to me that the conditions "all the
isntances belong to the superclass C'/superproperty P'" are not
required.
- Section 3.3.2: provide a justification for the definition of
Sim(tp,tp'), why an average and not a simple sum as the components
are information content ?
- Section 3.3.3: same, justify the definition of Sim(Q,Q'), why a
product here and not a sum
- p9, line 14: are builT from
- p14, line 29: the short DEPTH
- p17, 26: Section 5 --> Section 5.1
- Section 5.1: comparision --> comparison (several times)
hierarchie --> hierarchy
- p20, Lattice pruning: what about approaches that explore the lattice
from the more general to the more specific. Don't they perform some
significant pruning by stopping traversal as soon as a query with no
(new) answers is encountered?
- Section 6.1: to avoid introducing literals, which are not covered in
previous sections, I would recommend choosing another running
example where annotation values are actually URIs. An example could
be an actor playing in some movie under some role (and possibly
using some language). This would simplify the discourse, and avoid
the filters in the queries.
- Table 8: columns #Triple and # triple patterns can be merged, the
only unsignificant difference being the omission of the type in the
N-ary model (the same could be done for the standard reification
model). Then, c+n expressions should be given, rather than a fixed
value for 1 annotation. Then for named graphs, it should be "1+n
quads" to emphasize that there is no additional "triple" but an
additional term in each "triple". For RDF-star, I don't think it is
correct to say "1 triple". The genuine representation is
s p o {| q1 v1; q2 v2 |}.
which stands for 3 assertions
s p o.
<> q1 v1.
<> q2 v2.
So for me it's still "1+n assertions/triples"
About the overhead variables, I count 1 for N-ary, and 2 for named graphs.
- p27, line 36: rectagnular --> rectangular
- p28, line 44: singificant --> significant
- p29, line 5: meta-metadata --> metadata ?
|