Sequential Linked Data: the State of Affairs

Tracking #: 2578-3792

Enrico Daga
Albert Meroño-Peñuela
Enrico Motta

Responsible editor: 
Guest Editors Web of Data 2020

Submission type: 
Full Paper
Sequences are among the most important data structures in computer science. In the Semantic Web, however, little attention has been given to Sequential Linked Data. In previous work, we have discussed the data models that Knowledge Graphs commonly use for representing sequences and showed how these models have an impact on query performance and that this impact is invariant to triplestore implementations. However, the specific list operations that the management of Sequential Linked Data requires beyond the simple retrieval of an entire list or a range of its elements --e.g. to add or remove elements from a list--, and their impact in the various list data models, remain unclear. Covering this knowledge gap would be a significant step towards the realization of a Semantic Web list Application Programming Interface (API) that standardizes list manipulation and generalizes beyond specific data models. In order to address these challenges towards the realization of such an API, we build on our previous work in understanding the effects of various sequential data models for Knowledge Graphs, extending our benchmark and proposing a set of read-write Semantic Web list operations in SPARQL, with insert, update and delete support. To do so, we identify five classic list-based computer science sequential data structures (linked list, double linked list, stack, queue, and array), from which we derive nine atomic read-write operations for Semantic Web lists. We propose a SPARQL implementation of these operations with five typical RDF data models and compare their performance by executing them against six increasing dataset sizes and four different triplestores. In light of our results, we discuss the feasibility of our devised API and reflect on the state of affairs of Sequential Linked Data.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Johannes Frey submitted on 10/Nov/2020
Minor Revision
Review Comment:

In the revised paper the authors significantly improved verbosity and clarity of the presentation of the results, fixed errors, typos as well as some flaws and removed redundant and implementation-specific information.

Amongst others, they improved the model pictures/examples and included a listing giving an overview over the queries for all models w.r.t. the GET operation, both helping readers not familiar with the models to understand and compare the models on a structural level. The plots were significantly improved resulting in a much better readability and the format is now optimized to compare the actual models for the same list. Furthermore, they added verbose descriptions of the figures, to be more self-explaining and summarizing the observed behaviour of the operation. The likert categories are defined by clear thresholds highlighted in the plots.
Moreover, they also extended the experiment by lower-half indexed queries and compared them to the higher-half queries.

Unfortunately, some issues raised in my previous review with regard to the design of the experiment (esp. the queries) remain or we misunderstood.
The benchmark queries now presented in Fig. 6 seem (still) not formulated in a pragmatic way and are a bit detached from reality. I think a person knowing about SPARQL and some basic quad store internals would write the queries differently and probably way more naturally/efficiently in order to achieve the specific purpose of the operations (esp. the get operation, still having the API use case in mind). In order to reduce personal bias in this matter, I asked 2 colleagues of mine to judge the shown queries. We came to very similar drafts of how the queries are supposed to look like (see further details below).

Although the authors tried to clarify and reframe the positioning of their work, it is hard for me to understand the actual objective of the revised paper. The authors mention that they intend to “compare observations” to study the “usefulness” of the sequence models. They state that they do not intend to “measure the cost of each operation”. But in fact the conducted experiments observe and compare the query performance of the individual operations for the different models and list sizes and classify them according to 5 likert categories. The datasets which contain only one list each are synthetic and seem to be designed to study exactly these costs in a way isolating the different comparison dimensions. Adding to the confusion the core research questions are defined as: R1: How to model lists *efficiently*? and R2: How do the operations *perform* in current models? I understand that it is not the objective of the revised paper to evaluate the performance of the databases itself and therefore the database configuration as well as the queries do not need to be fine-tuned for each system individually (e.g. usage of list:member SPARQL feature in JENA). I now also understand that the authors do not focus on inspecting optimal ways to model/encode these lists in RDF and comparing optimizations (e.g. adding a pointer in the RDF list entity to the end of the list to speed up append operation, or storing the number of elements for a list) although R1 could imply to do exactly that. From my point of view, it definitely is a research gap to accurately measure and compare the query time costs / query performance for the defined operations (even without testing further optimisations of the models itself to make them more efficient) and the paper in its current form definitely does/can address this.

However, sound/appropriate queries (optimized w.r.t. to the nature/benefits of the models and minimal in a sense that they do not perform unnecessary computation) are essential for meaningful “observations”. In my opinion, the current (inconvenient - from my/our perspective) realisation of the queries in Fig. 6 having harder complexity than necessary weaken the usefulness and trustworthiness of the work/experiment and as a consequence the subsequent performance results and conclusions can be questioned.
Although I do not share some of the assumptions and arguments (aiming at my comments from the previous review) made by the authors (e.g. that mixing query runs and databases from the models have an advantage over isolation to observe the performance), I consider the substantial missing step to being “accepted” for the SWJ issue is reconsolidating and fixing/optimizing the queries. I don’t know the exact number of queries (I assume at least a portion of the index-based queries) which need to be reworked, but from the perspective of the paper (concept and manuscript) itself I estimate that this would be only a “minor revision”. However, I acknowledge that repeating (parts of) the benchmarking procedure with improved queries and repeating the manual analysis, could be a “major” amount of work esp. if the observations/conclusions would vary significantly due to potential complexity class changes of the queries.

Figure 6 - “minimal query” issues:
a) The minimal query should look like the SET query`s WHERE part. No substring+int-casting+filtering should be needed (filtering can be faster in some cases then a JOIN though. A proper comparison would need to study both approaches, but I understand that this is not the focus of the authors research at the current stage, and I agree that the effects are likely to be minor due to the low number of triples and sequences of the datasets in general. )
b) The casting to integer is in question. A “minimal” query could use the string representation directly?
c) Sorting is an expensive operator. I don’t see a reason why the entire List should be sorted to retrieve only one element whereas the index is known and materialized? Having that in mind, casting to integer would be obsolete.
d) An exact length property path should be the perfect fit for that kind of query. I don’t see a reason why the kleene star should be used to materialize the positions for ALL events in the List. I do not understand the claimed necessity for the subquery?
e) Same as in d): the path lengths can be determined based on the requested index.

Fig. 12/13 The fact that 13b) 1k performs much better (more than an order of magnitude with small std. deviation) than 500 is still curious to me, now even more given that this effect can not be observed for the low index query.
What is meant by “results are equivalent”?
The changed y axis scale for 12d) makes it hard to compare between 12c) and and 13d). I suggest using a consistent scale for the plots of one operation.

p5l31 I don’t understand why the axiom is valid if and *only* if Q is empty
p10l14 “falicitate”
p10l17 “predicable”
p11l44 “as klong”
p14l19 “the item in to”

Review #2
By David Chaves-Fraga submitted on 17/Nov/2020
Minor Revision
Review Comment:

I would like to thank the authors about their effort to take into account my previous comments on the new version of the paper. I'm now really happy with its current state and the paper now clearly fills a gap in the state of the art. It also provides very useful results on the experimental part that will generate new ideas and developments in the management of sequential linked data models.

A set of minor but important comments that I suggest to improve the paper readability and clarify its contributions:
- I would suggest clarifying what are the research questions that want to be answered (two provided at line 14 - page 2 and the other two at line 30), so maybe they can be merged, IMO there is a bit of overlapping among them.
- About the contributions, I would say that the first one is not exactly a contribution. As I said in my previous review, the operations already exist. So the contributions could be the adaption (and formalization) of those operations for a Semantic Web List API.
- There are parts of the benchmark extension that seem more evaluation decisions than a general methodology to evaluate these approaches, so I would recommend rewriting a bit that sentences declaring that are part of the extension/preparation of the benchmark (e.g., line 3 - page 10 "When the operation requires pointing to a specific item (GET, SET, and REMOVE_AT), we performed two experiments for each operation"). My suggestion is to present these steps in a more general and abstract manner. Another point: line 42, second column, page 10 "We prepared a dataset for each...", is that not part of your evaluation and other potential users could use different data sizes? I assume that the benchmark provides a data generator so, that would be possible. If not, it should be clarified.

minor comments:
- review orphans words at the end of the paragraphs
- tables on top as figures with the caption above (table 2)
- "As klong as these" evaluation section, paragraph 3
- y-axes legend in all the figures

Review #3
By Luis-Daniel Ibáñez submitted on 26/Dec/2020
Review Comment:

I am satisfied with the answers provided by the authors and the content of the revision, therefore, the paper is now in an acceptable state for me.

There is a typo on p11, col 1, line44.