Evaluation of Metadata Representations in RDF stores

Tracking #: 1607-2819

Authors: 
Johannes Frey
Kay Müller
Sebastian Hellmann
Erhard Rahm
Maria-Esther Vidal

Responsible editor: 
Guest Editors Benchmarking Linked Data 2017

Submission type: 
Full Paper
Abstract: 
The maintenance and use of metadata such as provenance and time-related information (when was a data entity created or retrieved) is of increasing importance in the Semantic Web, especially for Big Data applications that work on heterogeneous data from multiple sources and which require high data quality. In an RDF dataset, it is possible to store metadata alongside the actual RDF data and several possible metadata representation models have been proposed. However, there is still no in-depth comparative evaluation of the main representation alternatives on both the conceptual level and the implementation level using different graph backends. In order to help to close this gap, we introduce major use cases and requirements for storing and using diverse kinds of metadata. Based on these requirements, we perform a detailed comparison and benchmark study for different RDF-based metadata representations, including a new approach based on so-called companion properties. The benchmark evaluation considers two datasets and evaluates different representations for three popular RDF stores.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Gabor Bella submitted on 25/Apr/2017
Suggestion:
Major Revision
Review Comment:

SUMMARY

The paper presents a thorough survey and evaluation of RDF metadata "representation models" (patterns). Evaluation is carried out based on a wide range of criteria (performance, number of RDF triples, usability, loading times, etc.) and parameters (input data representations, metadata patterns, triple stores, query types). The paper also introduces and evaluates a novel metadata representation model (MRM in short), called Companion Properties, against other well-known MRMs. Results show a heavy dependence on the input parameters, thus it is not generally possible to designate a "winning" MRM. Rather, the judicious choice of MRM can only be made in function of the data representation (e.g., triples or quads), the triple store being used, and the complexity of the typical queries.

SCIENTIFIC CONTENT

The topic of the paper is a good fit for the SWJ Special Issue on benchmarking. The paper appears to offer the following contributions:
1) a survey on metadata usage in real-world data;
2) a thoroughly documented performance evaluation of RDF metadata usage: representation models, current practices, DB storage and SPARQL query support, query complexity, and standards conformance are all considered;
3) a novel MRM, Companion Properties, is introduced as an improvement to the Singleton Properties pattern, addressing common performance issues related to storing and querying huge numbers of triples with unique property names;
4) evaluations over large real-world datasets (collected from Wikidata and DBpedia).

This large number of contributions in this paper is simultaneously a strong and a weak point, as each individual contribution is minor and there is no single main novelty or message highlighted. The stated goal of the paper is to evaluate metadata representations; however, it does so by reproducing the experiments described in two earlier papers not by these authors ([9] and [10]), a fact that the paper clearly states. On the positive side, with respect to these two articles, the coverage of the subject is more comprehensive, the evaluation is somewhat finer-grained and results are documented in more detail. From an engineering perspective, the results should be useful for readers looking for well-performing combinations of metadata representation patterns and back-ends. From a theoretical standpoint, however, on the whole the experimental results mostly concord with [9] and [10] in the sense that there is no "single best" metadata representation nor back-end, because performance will depend on how each back-end was equipped to deal with various RDF constructs and SPARQL query types. This overall conclusion remains even if the detailed results are not always identical with those in the earlier works.

Overall, the question is whether the sum of these minor contributions reaches the critical mass for the paper to be accepted in the SWJ. I suggest acceptance because the paper is very relevant to the Special Issue, the general problem of RDF metadata is covered from several angles, and there is inherent value in being comprehensive. However, in its current form, the paper is too chaotic to be accepted (see below) so I suggest major revisions.

Specific questions that the paper should have answered:
(a) Why was CPPROP, one of the contributions of the paper, not evaluated in the quins experiment?
(b) Why are simple queries evaluated only on Wikidata and complex queries only on DBpedia? Why not evaluate all four combinations?

LENGTH AND STRUCTURE

The somewhat unclear message of the paper is not at all helped by being too long and not very well structured. While its length (24 densely written pages) is partly due to being very detailed with respect to experimental design, setup, data, and results, which is OK, there is also considerable amount of redundancy and verbosity that could have been eliminated. Some examples:
- p. 2, second column, on the principle of the reification of a triple not entailing the triple itself: the first three paragraphs could be replaced by a single short paragraph explaining this very straightforward principle.
- p. 3, first column: the introduction need not go this deep into details, just mention what is to come later in the paper.
- p. 3, second column, on nano-publications: this is again a lengthy presentation of a subject also covered in section 3.1.2.
- p. 6, section 3.2: granularity, which should be a separate dimension, is explained three times within three dimensions (3.2.1, 3.2.2, 3.2.3).
- p. 11, DBpedia: I feel that a full page on presenting metadata in DBpedia is unnecessarily long.

The structure and section titles of the paper could also be greatly improved. A figure on MRM schemas is provided in the introduction while the MRMs themselves are only explained in section 4 (p. 8). There are several forward references to this section and uses of terms before defining them (e.g., singleton properties and n-ary relations on p. 5) which is also typically a sign of bad paper structure. In section 3.2, the four presented dimensions of evaluation are mixed up: 3.2.1 is entitled "purpose and types" but it rather presents levels of granularity; 3.2.2 presents the dimension of MRMs but it also includes granularity; and 3.2.3 is entitled "dataset characteristics" which seems to be a generic "other" category, again including granularity (!). 3.2.4 states that "we define three complexity classes" but then fails to state which ones. In section 4, the titles of subsections are very confusing. I suggest 4.1 to be called "RDF compliant models" and 4.2 "Vendor-specific models". Otherwise it is not clear whether this section is about techniques, RDF stores, or models. Furthermore, the MRM descriptions are a mix between definition and evaluation, adding to the confusion. Section 5 presents evaluation datasets but Wikidata is already presented earlier in section 3.1. Section 6 presents the evaluation setup but it is not clear why evaluation dimensions are anticipated to section 3.2 and evaluation criteria to 3.3. Throughout section 6, evaluation results are anticipated while they should be kept for section 7 (see sections 6.3.1 and 6.3.2).

The following paper structure should have been much clearer, as it separates the introductory and definitorial part (sections 1-4) from the evaluation part (sections 5-8):
1. Introduction
2. Related Work
3. Metadata Representation Models
4. Survey on Metadata Use (formerly 3.1)
5. Evaluation Datasets
6. Evaluation Method (containing 3.2 and 3.3)
7. Evaluation Results
8. Conclusions and Future Work

LANGUAGE AND WRITING QUALITY

While the writing is generally understandable with no major issues, the English is approximative throughout the paper with lots of grammar mistakes (too many to enumerate). A thorough proofreading would be necessary.

Review #2
Anonymous submitted on 22/May/2017
Suggestion:
Major Revision
Review Comment:

The authors provide an experimental study and analysis, based on Wikidata and DBpediabased datasets, for the most common RDF compliant Metadata Representation Models (MRMs), tested on three prominent RDF stores (Virtuoso, Blazegraph, Stardog). A new MRM is proposed, named Companion Properties (cpprop) and a new dataset is created (the DBpedia-based one), targeted the case where provenance data (metadata and meta-metadata) are especially large. Last, complex queries for the new dataset are created and use cases for proper benchmarking are inserted.

Strong Points:
• A new DBpedia-based (DBpedia) dataset is created for high provenance cases and various
template queries, conformed to specific complexity classes, are evaluated on this dataset.
• A new MRM is proposed (cpprop) and is compared, among other MRMs, on DBpedia dataset.
• The rdr MRM is modified properly for testing on Blazegraph RDF back-end.
• Analytic experiments on loading times and database sizes (Table1: Wikidata, Table3: DBpedia).
• Some interesting, additional to related work, criteria and use cases for MRM evaluation are examined, like MRM overhead for data only queries (Figure3) and dataset impact based on similar query complexity (Figure4 & Figure5).

Weak Points:
• The Wikidata dataset ([9], [10]) and the life-science dataset in [4] possess a realistic and normal amount of metadata. The former is created and constantly updated for the purposes of a company and that makes the metadata cardinality realistic, where the latter is constructed from the merging of various life-science sources that may generate a redundant but logical number of metadata, deriving that by looking the total number of final triples in each MRM case. Differently, the authors utilize the Wikidata history and DBpedia to construct a provenance dataset that contains overwhelming metadata and meta-metadata information that seems extraordinary to me for MRM benchmark purposes.
• The Wikidata experiments ([9], [10]) are significantly repeated. The authors could choose another dataset for their evaluation and discuss the found results with the Wikidata ones. Specifically, the authors could select the life-science dataset in [4] which is more provenance targeted than Wikidata and suits better for comparison with the DBpedia dataset that they crafted.
• cpprop is not tested on Wikidata dataset, without the authors to explain this omission to their Wikidata experiments.
• The experimental findings in some cases they contradict with the similar work that is done in [4] (subsection 7.6). This contradiction must be studied in more depth and experiments with the same dataset would help us understand the cause of the discrepancy.
• The authors also state in the conclusions that “ngraphs and rdr support queries against meta-metadata much better than the other MRMs”, something that does not hold for rdr if we look the experiments in total. Besides, rdr is tested only in Blazegraph, so there is not a reason for a winner point from authors for rdr.
• There are some problems with the structure of the paper. Several parts are repeated, e.g., the Wikidata experiments, the presentation of the candidate datasets for analysis (3.1.2), the too much elaborated sections 3 and 4. By the way, the stdreif MRM is not presented in section 4.
• Threre are several presentation problems, and several part of the paper are repeated. Table3, based on the paper discussion flow, should swap position with Table4. Moreover, Table3 is not well described. What do the last two columns represent? What is the meaning and the purpose of the rows two and three? What does Table2 offer more compared to Table3? Last, empty values in Tables should be explained.

Review #3
By Theodore Dalamagas submitted on 01/Jun/2017
Suggestion:
Minor Revision
Review Comment:

This is a well written paper that focuses on the issue of metadata representation in RDF datasets. The authors discuss existing methods, models and representations for RDF metadata, and evaluate them on several different qualitative and quantitative aspects. Specifically, they present six different ways of describing (i.e., reifying) RDF triples, and they perform a requirements analysis over qualitative and quantitative characteristics of RDF datasets and their metadata.

The paper goes on to discuss the differences between the six models, through a comparison that is dictated by a set of requirements and use cases, also introduced by the authors. Finally, an experimental evaluation measuring loading and query times is performed on two real and widely used datasets, using three state of the art RDF/graph database engines.

The paper is well structured and the writing is clear. The problem is clearly described, and the various methods and models are detailed with the right amount of information for the reader. However, the paper lacks technical depth, since there is no novelty introduced - the contributions are limited to an experimental evaluation of existing approaches. This is not necessarily a problem, since the domain of RDF metadata representations is indeed lacking both a comparative evaluation of existing approaches and a thorough benchmark for future endeavors. However, it would be interesting to see a formal representation of the proposed requirements, e.g. in the form of a set of quality metrics.

Moreover, even though the authors do not present their benchmark evaluation (especially the workloads they use) as the selling point of the paper, I would strongly recommend that they formalize a few of these concepts into a more formal and reusable benchmarking framework for general-purpose evaluations of metadata representation approaches. I understand, however, that this is not the main focus of the paper.

The Related Work section should be extended to include arguments on how the limitations of the representation models can be amended by extending the declarative expressivity of the language. E.g., several approaches extend SPARQL in order to account for more intuitive ways to query annotations, metadata etc. E.g. see [1] for an annotation-enabled SPARQL extension, and [2] for a temporal metadata extension.
Thus, the RW section should be revised to include references on how the problem is tackled on the Query Language side of the problem. Even though the authors mention that triple revisioning is not the focus of the paper, a lot of work has been done on (temporal) annotations of RDF data, which is a special case of metadata At least a minimal discussion on how the limitations of SPARQL over the evaluated representation models can be surpassed by SPARQL extensions should be included.

Minor remarks and other comments:

- Figure 1 is referenced too early in the paper, since the reader is not necessarily familiar with any of the MRMs. The MRMs should be briefly introduced in the Introduction section.
- Section 1: "...the usage of Singleton Properties can lead in" -> can lead to*
- Section 3: It would be interesting to see some real-world statistics (e.g. from query logs) of what the actual usage of metadata information is (e.g. how often are these metadata queried in real world cases?)
- Section 3: "...A graphs contains all triples from one source..." -> A graph*
- Section 3.2.4: "...have an significant impact..." -> a significant*
- Section 3.2.4: After the sentence "Based on this work we define three complexity classes..." the section ends abruptly. I would expect at least a brief mention of the classes.
- Section 3.3.1: I am not fully convinced that the usability criterion is correct. Query complexity might be unavoidable in cases where the schema is inherently complex (e.g. scientific data)
- Section 4: I would refrain from placing queries and data in the main body of the paper, and would instead provide visual examples. The queries can be moved to appendices to improve readability.
- Section 6.3.1: "...the the..."-> remove excess "the"
- As a final note, I would move Section 4 further up in order to provide a basis for discussions (e.g., in section 2 or 3).

[1] Lopes, Nuno, Axel Polleres, Umberto Straccia, and Antoine Zimmermann. "AnQL: SPARQLing up annotated RDFS." The Semantic Web–ISWC 2010 (2010): 518-533.

[2] Meimaris, Marios, George Papastefanatos, Stratis Viglas, Yannis Stavrakas, Christos Pateritsas, and Ioannis Anagnostopoulos. "A query language for multi?version data web archives." Expert Systems 33, no. 4 (2016): 383-404.