Review Comment:
The authors have addressed almost all of my comments.
The major open issue is the definition of the Archival queries. This is not yet well defined. See Comment 2 and 3.
======================================================================
COMMENT 1
Round 1 comment:
b) If the work is focused on relational databases, why aren't CSV dumps
sufficient?
Authors' response:
Answer:
b) The fourth paragraph in the introduction section is sharpened and states
“…it is desirable for the contents of a database to be unloaded in a
neutral format … “ and “…preserved representations must include
sufficient meta-data to retrieve, explain, reproduce, and disseminate …”
Round 2 comment:
Is there any related work that can be provided on using CSV for long-term preservation of data? CSV could also be considered be "neutral format". However, there is not standard way of representing meta-data in CSV, unless the data dictionary of the database is dumped to CSV also.
Bottom-line: I would like to see a clear case of why RDF is the chosen format versus others.
======================================================================
COMMENT 2
Round 1 comment:
c) After the keyword TRIPLES, comes "archived triple pattern". However, Query
A8, has a set of triple patterns. I assume that instead of an "archived
triple pattern", it is a basic graph pattern, which can have 1 or more triple
patterns.
Authors' response:
Answer:
In the new semantic description in Sec 3.1 there is a clear explanation that
in a TRIPLES clause the user specifies “archived triple patterns” and an
optional “archive restriction” in a WHERE clause. An archive restriction
restricts the triples to archive. It consists of a graph pattern and may
include SPARQL functions, which is the case in A8.
Round 2 comment:
This is not yet clear. First of all, 'archived_triple_patterns' is not defined anywhere. I'm having to look at the example to understand what it is. I believe that a 'archived_triple_patterns' is a single triple pattern (s, p, o) where s, p, or o can be constants URIs or variables. In the examples I do not see a case where there is more than one triple pattern. If so the term 'archived_triple_patternS' is misleading (should not be plural) because it is only one triple pattern.
======================================================================
COMMENT 3
Round 1 comment:
c) I would recommend to formally present the semantics either by 1) using
rules/datalog syntax to represent the translation or 2) defining its own
semantics following the approach of the semantics of SPARQL by Perez et al
(Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2009. Semantics and
complexity of SPARQL. ACM Trans. Database Syst.) and comparing the
expressivity of A-SPARQL with SPARQL CONSTRUCT. This way, there would be no
room for ambiguity.
Authors' response:
Answer:
The translation rules are now simplified and much better explained.
Round 2 comment:
The translation rules from A-SPARQL to the generated SPARQL are not formalized and consists of two sentences, which are hard to follow.
For example, the first states: " 1) The CONSTRUCT clause of the translated SPARQL query consists of all the archived triple patterns in the TRIPLES clauses of the archive specifications."
If we look at query A2, there are two TRIPLES clauses, and each one has the following triple pattern: ?subject ?property ?value. If I follow the translation, the construct clause would have this triple pattern twice. However, in the construct query Q2, this is not the case (and obviously not what is expected). This is an example of the ambiguity when there is lack of formality.
In 1), you make reference to "TRIPLES", but in 2) you make reference to archive specification, archive triple patterns and optional archive restrictions. Please be consistent.
======================================================================
COMMENT 4
Round 1 comment:
2) Following my question (1), the sub-views, even though not defined, seem
very similar to datalog rules of the W3C Direct Mapping (Appendix B) and
Sequeda et al's Augmented Direct Mapping. What is the relationship?
Authors' response:
Answer:
It is now clearly stated now in section 5.1 that “The RDB to RDF mapping in
SAQ conforms to the direct mapping recommended by W3C [23], and more
particularly to the augmented direct mapping proposed in [19], which is
proven to guarantee information preservation. “
Round 2 comment:
If this is the case, then I don't see the novelty of section 5.1. I understand that it is needed for the system and it is on top of the RD-view that the unbounded predicate queries are executed and optimized (which is novel and a contribution of the work). I understand this as restating [23] and [19] in a different syntax.
Maybe a clarification that this is not a contribution but needed in order to understand how the system is built?
======================================================================
COMMENT 5
Round 1 comment:
6) The benchmark queries sometimes include the triple: ?class rdf:type
rdfs:Class. I believe that this triple is needed for SAQ because it access
their mapping table which maps schema elements to RDFS elements. Did the
benchmark queries to D2RQ and Virtuoso include that triple? If so, this may
be a cause for the slow performance. If this is the case, what happens when
that triple is not included? What happens to SAQ?
Authors' response:
Answer:
In Sec, 62 the following paragraph is added: “Since both D2RQ and Virtuoso
don’t generate for their default mapping a triple with the form (subject
rdf:type rdfs:Class), this triple was excluded from the definitions of
queries Q2, Q5 and Q6 for these systems.”
Round 2 comment:
This answer addressed only one part of my question. The other part is not answered: What happens in SAQ if <> rdf:type rdfs:Class queries are not included. Why do some queries have that triple pattern (A2, A5, …) and some don't (A8, … )
======================================================================
COMMENT 6
Round 1 comment:
- Why is [22] cited when making reference to Datalog? I would suggest to cite
instead the Foundations of Databases book by Abiteboul, Hull and Vianu.
Authors' response:
Answer:
We actually use a Datalog dialect different than the above, which is the
reason for the reference.
Round 2 comment:
Which dialect of Datalog? And why that one? Please be specific.
|