Review Comment:
The manuscript introduces an approach for timestamp-based versioning of RDF datasets. The approach uses RDF-star to annotate every triple of the dataset with a 'valid_from' timestamp and a 'valid_to' timestamp, where the latter may be in the far future to cover cases of triples that are "valid until further notice."
The authors describe how to insert and update triples with timestamp annotations by using SPARQL-star Update statements and they describe how to retrieve/materialize a version of the dataset at any given timestamp by using SPARQL-star queries. Additionally, as an evaluation of their approach, the authors have done a simple experiment by using a dataset of the BEAR RDF Archiving Benchmark with some triple pattern queries. The main observations of the experiment are i) that the tested systems (Jena TDB and GraphDB) achieve better query performance for some (unspecified) Named Graph representation of the timestamped data than for two variations of the authors' RDF-star approach and ii) that GraphDB achieves better performance than Jena TDB.
I do not consider the contributions of this manuscript sufficient for a journal article; in fact, I wouldn't even consider them sufficient for a conference paper in the main Semantic Web conferences. Instead, I would consider the contributions more as something for a workshop paper. The proposed approach is just a straightforward application of RDF-star and SPARQL-star, which is not even made transparent for users (instead, users are assumed to include the timestamp annotations and the timestamp-related query patterns manually), and the evaluation is very simplistic and only scratches the surface in terms of insights that it provides about the proposed approach. Moreover, the manuscript contains several small inaccuracies, and several details are missing or are not captured thoroughly.
Having said that, I am happy to see that the authors are attempting this work and I strongly encourage them to expand what they currently have. The remainder of this review elaborates more on the aforementioned issues and provides suggestions to improve and expand this work.
CONCEPTUAL CONTRIBUTIONS
To provide an actual conceptual contribution related to the proposed application of RDF-star and SPARQL-star, I would like to see a well-defined approach to make the RDF-star-based timestamping of RDF datasets transparent to the users. That is, I would like to see a foundation for *automatically* translating any given SPARQL Update statement into a SPARQL-star Update statement that adds or updates the relevant timestamp annotations. Similarly, for materialization-related queries, I would like to see a foundation for automatically translating a SPARQL query, together with a timestamp, into a SPARQL-star query that produces the result of the given SPARQL query over the version of the dataset at the given timestamp.
In addition to materialization-related queries, I would like to see a thorough discussion of how the proposed application of RDF-star can be leveraged for other types of data archive queries (timestamp retrieval, delta materialization, cross-version queries, etc). Related to that, I see that the authors wanted to focus on materialization queries, but I don't see any clearly-stated rationale for this focus; there are some vague references to some recommendations of a data citation working group but no concrete elaboration on these recommendations or no discussion of relevant requirements.
Smaller issues about the description of the proposal:
* While the idea to use the VALUES feature for inserting triples makes sense, I am certain that there is a practical limit to this idea. I mean, it is not possible to bulk load an unlimited number of triples in a single insert statement. This limit should be discussed and may also be something to be studied experimentally.
* The examples for the case of updates focuses only on updating a single triple. It may not be obvious to the reader how the updates are done when updating a combination of multiple triples or when bulk-updating multiple individual triples. Generally, as mentioned above, I would like to see a more generic treatment of how update statements need to be extended with the relevant timestamp-related patterns.
* The proposal for outdating a triple (Sec.3.6) requires that "an artificial valid_until timestamp must exist on that triple." While this requirement makes sense, there should also be a statement that specifies what happens, or what should happen, in cases in which the requirement is not satisfied.
* Section 3.7 claims that Tables 5 and 6 represent query results for the queries in Listings 9 and 10. That is incorrect because the values in the "Predicate" and the "Object" columns of the tables are not returned by the queries.
* The discussion related to DISTINCT at the end of Sec.3.7 does not make sense. Since the queries also contain the second FILTER condition (about ?valid_until), there would be no duplicates and no need for using DISTINCT (at least, if we assume that every data triple has only one vers:valid_from annotation and only one vers:valid_until annotation).
* In this discussion related to DISTINCT, the text mentions a condition with some "system_timestamp". It is not clear what that means.
EVALUATION
From a journal article I am expecting a much more comprehensive evaluation than what is provided in this manuscript.
* There is no study of the performance impact of the insert and update part of the proposal, although this part makes up 2/3 of the description of the proposal.
* The file-import part of the evaluation is somewhat unclear (and perhaps misleading?) because the baseline (some Named Graphs based approach) is not clearly defined.
* The evaluation is based on a single dataset and, additionally, there is no justification why only that dataset was used (considering that the BEAR benchmark consists of multiple datasets).
* The triple pattern lookup queries considered in the "Query Performance" experiment are a very simple form of queries. The authors do not provide any consideration of the practical relevance of such queries; how much can we actually learn about the approaches from such simple queries?
* There is no discussion whatsoever of the observations that can made from the measurements. Why is there such a huge reduction of the file size and the memory footprint when converting the Named Graphs representation into the RDF-star-based representations? Why do the systems achieve a better query performance for the Named Graphs representation of the timestamped data when compared to the RDF-star-based representations? Why does GraphDB achieve a better performance than Jena TDB? Is the hypothesis that "we expect a better performance with former ones" (i.e., predicate-lookup-queries)" actually verified by the experiments? etc.
* The claim that "using the proposed approach, even large scale and highly dynamic RDF datasets can be efficiently versioned" is absolutely not justified by the presented evalution!
--> A few hundred MB are not "large scale" and neither are a few GB.
--> Also, there is nothing about "highly dynamic RDF datasets" in this evaluation; the authors just imported a file in which multiple dataset versions are represented. (see also my comment above about the lack of an evaluation of the insert and update part of the proposal)
Further smaller issues in the section about the evaluation:
* It needs to be specified which version of each of the systems was used exactly.
* Readers may not know what a ".ttl file" is and how it may be used to serialize datasets with RDF-star triples. In fact, the last paragraph of Sec.5 makes even me wonder what exactly the authors have done ("Once the turtle-star RDF serialization format becomes widely adopted we will fit our datasets into this format.") Does this mean the authors have not actually used Turtle-star for the serialization, but plain Turtle? How was it possible to represent the nested RDF-star triples then??
* What is the purpose of the "shell script" mentioned in Sec.4.1?
* The authors use the terms "compressed" and "compression" in several places, which is highly misleading because they have not actually used any compression techniques. Instead, there is simply a reduction of the dataset size (measured in terms of file size) after converting data from the Named Graphs representation to the RDF-star-based representations.
* It is not clear how to read things such as "173-256MB" in Sec.4.2.
* The authors should clarify what they mean by "storage consumption scaling factors".
* Table 7 says "mean ingestion time" but I don't see any indication of the number of file-import runs have been tried to calculate a mean from.
RELATED WORK
The "Related Work" section needs to be improved as well. Currently, it appears as a semi-organized collection of some related work, mixed up with an introduction of the background of the presented work. Additionally, there is a paper that has two entries in the bibliography (namely, [2] and [17]), and there are entries for which it is not clear where these papers have been published (e.g., [3], [41], [42]).
Finally, I suggest to reference also the W3C Community Group Report about RDF-star and SPARQL-star as this is the most recent document about the approach, and in this context I also suggest to use the terms RDF-star and SPARQL-star instead of RDF* and SPARQL*.
|