Review Comment:
In the full paper with the title “Sequential Linked Data: the state of affairs” the authors carry out a performance study for 5 list/sequential data models in RDF using 4 different triple stores. The topic is a good fit for the special issue. The language is very good and comprehensible. The paper is structured well and follows a clear methodology. The methodology, the model survey, the related work section and the used LIST.MID benchmark are reused from previous works, but as a consequence the paper is very well self-contained. The novel contributions are a formalisation of 9 atomic read or write operations for sequences (e.g. first, rest), the definition of new queries in the benchmark w.r.t. these operations and the performance study (varying along 4 dimensions: operations, 5 sequence sizes, database system, sequence model). The authors present detailed performance results (average response time for 10 runs) for every combination and summarize them in a comprehensive matrix along the dimensions operations & sequence models using 5 likert categories.
However, the observed behaviour and differences of the individual combinations represented in the individual plots and papers are not described on a sufficient level of depth for a journal paper, especially when considering the fact that there are 15 pages of text versus 18 pages of plots and tables. Huge parts of the analysis and comparison of the individual plots are left to the reader.
Unfortunately, the design & realisation of the benchmark have flaws, and the setup & execution of the experiments raises questions. The experiment “datasets” contain only one sequence at a time which seems not a realistic benchmarking scenario to compare different sequence models when having a sequential API in mind. The queries are fixed (no templating with regard to different sequences but also a fixed index positions for get, set and remove was chosen in questionable way). The experiments are performed on the same database instance for the different sequence models.
Although it seems a-priori reasonable to drive the experiment design and queries by requirements from the mentioned sequential SPARQL-backed API use case, it is in question whether this is a relevant and meaningful use case in practice. SPARQL was designed as a query language to traverse and analyze graph structures. A pure emulation of a sequential data structure with SPARQL seems synthetic and not a pragmatic and efficient way to go from an engineering side. It would be natural to store the individual RDF entities (or pointers to it) as text in a database optimized for this type of requests and use this as backend for the API instead. The power and potential of SPARQL in the context of RDF sequences is its expressiveness allowing to combine access and manipulation of actual graph and sequential data in one query (e.g. Retrieve all albums from 2020 where the first track is shorter than a minute). While the presented results give an indicator how the sole atomic sequential data structure operations perform, it is uncertain whether the findings also hold for these combined queries and operation chains and whether they can be transferred to real life workloads. These types of queries should be addressed (ideally with different levels of computational complexity) in the study.
The mentioned flaws, questions and further (minor) issues are discussed in detail below.
I encourage the authors to revise the work with extended benchmark datasets and queries and an improved setup and analysis.
I suggest a major revision since the work in general is addressing an interesting research gap.
=================================================
Further Comments:
p4l25b (E x L)
p5l36b e_i should be E or the arrow should be = or the other “maps to” operator; what is L_i-1 and E_n, this formalisation seems to raise more questions than a textual explanation
p6 Table 1. I do not understand why operations are classified with regard to “relevancy” taking into account that all operations have been evaluated for all models; Table 1 seems not referenced?
Figure 1-5: it is not really clear how the lists / events are linked to the track. _x seems to be the list entity in Fig. 1-2 and the track entity in Fig. 3-4 and in Fig 5 it is missing completely, it should be consistent
p7l47a Section ??
p8l11-24b and 36-49b and command line details should be removed
p9l14a not “from the current one” but starting with the second one
p9l36a The decision to pick the second-half of the list introduces bias and seems inconvenient. It is obvious that there is more overhead to traverse to an element of the second half in a list, while for an index-based structure there is less overhead if an element is removed in the second half, while it is the other way around when the first half is chosen. Moreover, I don't understand the idea of a single fixed index position for a benchmark. This is usually handled via templates and a set of random numbers (assuming uniform distribution). These decisions without further justification weaken the trust in the measured results.
p9l17b Nevertheless, the proper setup is one database instance per combination (or at least per model if scalability is of minor interest) to ensure that there is no interference (e.g. all graphs can share same index structures and dictionaries); this also has the advantage that it is impossible to forget to restart the database when proceeding with the next combination, raising the question whether this has been performed by the authors.
p10 Table 3 shows a major weakness or design flaw of the benchmark: the “datasets” contain only one sequence each and are really small and easily fit into main memory. I am convinced that the average datasets hosted in SPARQL endpoints in real life are much larger (many sequences, millions of triples). Moreover, there is a typo at 500(k) files.
p10l41a-15b should be removed; The VMware Hypervisor is an indicator for a virtual server. If other virtualized systems are running on the physical hardware (e.g. cloud hosting) this could affect the benchmark results and therefore would be not suitable for benchmarking. This should be clarified or fixed.
p11l15a I do not understand the usage of “a midi:track” triple pattern since the entity id is provided, one could argue that this is not a “minimal query”.
p11l6b The query is confusing to me since the SPARQL result set is not of the same list type (RDF sequence in this case), I would not consider this as a function of L -> L as actually defined; chaining of this function would not be possible although this is the primary use case of the rest function
p11l11b an “offset 1” expression would be more intuitive than the filter statement and potentially more efficient and conforming to the “minimal/pure query” requirement. Having in mind that the performance may vary depending on the query realisation, it could be seen as a methodological problem of the experiment that the effect on the performance caused by different query realisations has not been studied.
p11l18b This is counterintuitive to me, I would expect the performance for the rest function to be very efficient for List which offers the rest structure already on storage level; having read the query, I assume that it is not the goal to evaluate how the rest function performs on the different RDF list data models, but how efficient the materialization of “the rest” of the elements from the model to a SPARQL result set is?? If that is the case, this seems not explained in a clear way (“The operation returns the content of the sequence”)
p12l26a I did not understand this given the information in the paper so far; according to Figure 5 SOP does not have a dedicated sequence entity where its members are directly connected to
p12l43a I do not agree. Unless negative index numbering is allowed, which would be surprising, append_front should lead to a huge workload due to rewriting all index numbers. For the uri-numbering this should have dramatic consequences since all entity ids change, leading to updates of triples apart from sequence management triples.
p12l14 I do not understand the difficulties. A zero-padding policy should not have any impact when removing an element (only when adding one! This should be discussed for append.). Moreover, It would be possible to evaluate with and without padding restrictions. Nevertheless, all index numbers have to be rewritten (decremented) anyway.
p12l19b It would make sense to introduce “surrogate” IRIs (linked via owl:sameAs to the original resources (events)) only dealing as alternative ids for participation in a sequence. This should eliminate the impact of entity-specific triples. Both URI-numbering approaches (with and without surrogates) could be compared to study the impact. This would also allow an entity to participate in multiple sequences at different positions, which seems a major drawback of the uri-numbering model that should be emphasized. Without surrogates this could be seen as comparing apples and pears.
p12l11-40 I was expecting an optimal similar performance for “list” ( because it seems easy to find and replace the end of a list) or at least a similar performance compared to SOP. After having a look at the query (involving variable length property paths) the performance results are not surprising anymore. Maybe it is worth mentioning to the reader that RDF list has no direct link to the end of the list. Nevertheless, this raises the question whether it is a fair or meaningful comparison between SOP and List, given the fact that within the SOP query the “hasEvent” link to every event is leveraged while this is not possible for the RDF list. Additionally it is worth mentioning that the SOP pattern actually recommends to use subproperties of sequence:precedes/follows (otherwise one entity could be only one member of one sop sequence). Moreover, the individual linking per event to its track seems not proposed in the referend sequence ontology design pattern. This is adding up to a bias in favor of SOP.
p13l14 The query is tailored to one very specific use case and is not representative for the SOP model in general, because the substring index is hardcoded. While this is possible for rdf:_* properties, this is not applicable for subject IRIs that can be of various (prefix) form and length. The only requirement is to follow a specific pattern which helps to identify/ extract the index number in a reliable and interoperable (between datasets and sequences) way. When considering the mentioned sequence API use case on top of a SPARQL endpoint, I wonder how such an API would look like if it needs to know the prefix length of the URIs in the collection or how the queries would look like if there are different types of entities contained in the same sequence naturally having different prefix lengths.
p13l47a I assume the wrong table is referenced. The caption of Table 5 is mentioning a different purpose. What do Y, M and N represent?
all bar charts: I think using box-plots (additionally including the average) or error-bar-plots (since 10 runs were performed each) would be better to study the significance and variance of the results between different sequence lengths.
p18 Fig.9 The fact that 2k and 3k sizes perform much better than 500 or 1k can be indicators that there is problem with the experiment setup. This might be caused due to caching and warmup of the database if it has not been restarted after every single run.
The colors for sop and list in the scalability view are hard to distinguish.
There is not much added value in the tables compared to the plots, the tables could be removed and presented on an experiment website (e.g. markdown in the github repo)
p30 caption should be “popoff”
Was a check for the correctness of the query results across different models and databases performed?
|