RDF 1.1: Knowledge Representation and Data Integration Language for the Web

Tracking #: 1496-2708

Authors: 
Dominik Tomaszuk
David Wood

Responsible editor: 
Krzysztof Janowicz

Submission type: 
Survey Article
Abstract: 
Resource Description Framework (RDF) is seen as a solution provider in today's landscape of knowledge representation research. This survey outlines RDF, version 1.1, the W3C Recommendation for knowledge representation on the World Wide Web. In this article, we review and present works from RDF v1.0 and v1.1 implementations. We also provide insights on the reification, blank nodes and entailments. This article surveys current approaches, tools and applications for mapping from relational databases to RDF and from XML to RDF. We discuss RDF serializations, including formats with support for multiple graphs and we analyze RDF compression proposals. All approaches are presented in tabular format concisely and are grouped under a classification scheme. Moreover, the article provides an empirical study about usage of different RDF model elements as well as RDFS vocabulary terms. Finally, we present a summarized formal definition of RDF 1.1 and emphasize changes between RDF versions 1.0 and 1.1.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Antoine Zimmermann submitted on 09/Feb/2017
Suggestion:
Reject
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.

Summary:
=======
The paper presents a survey covering many topics related to RDF in general, and RDF 1.1 in particular. It deals with the abstract syntax, the formal semantics, serialisation syntaxes, compression, with issues on blank nodes, data integration, online usage.

Main comments:
=============
The paper achieves to collect a wide range of work related to RDF and can serve as a good source for students/researchers that want to get a reading list about this topic. In several places the paper reads much like a textbook rather than a scientific article, with very basic, informal presentation of the concepts. As such, it does not suit well the readership of the Semantic Web Journal (we can expect that they know a bit about RDF and the Semantic Web). Moreover, it does not even do a good job as a textbook because it is full of errors, ambiguities, vagueness where there should be precise, rigorous definitions. Moreover, most of the references are described very shortly and very little analysis, conclusions and insight is given. The paper provides statistics about the authors' own experiments, but these are mostly redundent with existing work that does that already. The last sentences before the concluding section only mentions more references and does not put anything in perspective. The conclusion of the paper is very short and shallow.

So to summarise:
- there is not real original research and new insight about the topic;
- the paper is full of many problems, some of them quite serious, that I detailed below.

Detailed remarks:
================
Sec.1:
- it is said that RDF was a response to the problem of natural language being ambiguous and hard to process, but this not backed up with references. I dobt that it is what RDF initial aimed at.

Sec.1.1:
- "Our surcey serves to surface some of final state of design" -> it is not clear what this means
- "analyze the RDF blank nodes" -> there is little analysis on this topic and nothing that has not been said before, especially in [92]
- "to compare the RDF reification approaches" -> there is very little in this regard

Sec.1.2:
- the section is called "Related Surveys" but contains many references that are not surveys
- ref.[111] should be "Draltan Marin. RDF formalization. Masters thesis. École polytechnique. August 2004." If the technical report of 2006 is cited, it should be "Claudio Gutierrez. A note on the history of the document:
RDF Formalization, by Draltan Marin. Technical report. Universidad de Chile. 2006."
- Jeremy Carroll provides a formal analysis of comparing RDF graphs in 2002. He proves that isomorphism of RDF graph can be reduced to known graph isomorphism problems and gives the complexity of it. This paper is not cited. "Jeremy J. Carroll. Matching RDF Graphs. In Ian Horrocks, James Hendler: The Semantic Web - ISWC 2002. First International Semantic Web Conference Sardinia, Italy, June 9–12, 2002 Proceedings. ISBN: 978-3-540-43760-4. Lecture Notes in Computer Science, vol.2342. Springer 2002.
- In [116,117] ... . This paper" -> which paper, there are two cited
- While the section puts a lot of references in minimal space, it does not analyse them at all

Sec.2:
- this section gives a very basic overview of the semantic web standards that are certainly not interesting to the readership of the Semantic Web Journal.
- "An RDF constitutes ..." -> what is "an RDF"?
- "... which in the RDF terminology, are referred to as triples (or statements)" -> this would only be true in informal speech or prose. In RDF terminology (that is, according to the RDF Concepts spec), subject-predicate-object are "RDF triples", nothing else.
- Def.1: "assuming I is the set of [IRI] references" -> I should be the set of IRIs (not IRI references)
- Def.2 is not a definition
- "Note that in RDF 1.0 identifiers was RDF URI references" -> were URI references
- "is an URI" -> is a URI
- Def.3: this definition is wrong. Cf. RDF 1.1 Concepts
- "simple literals" -> not defined
- "RDF 1.1 supports the new datatype rdf:HTML" -< what does "supports" mean here? rdf:HTML is non-normative
- Def.4: "Blank nodes are defined as existential variables" -> no. blank nodes are defined as elements of an infinite set disjoint from IRIs and Literals. The relation between bnodes and existential variables is that of the semantics of bnodes. That's how they are interpreted, not how they are defined (similarly, literals are defined as a lexical string with a datatype IRI, not as a value in a value space).
- Def.4: bnodes are not used to denote anything. They indicate the existence of a thing and do not denote anything. They are not anonymous resources. They may indicate the existence of a thing that happens to be identified by an IRI or a literal
- "Given 2 blank nodes, it is not possible to determine whether or not they are the same" -> if they are 2, they are not the same. Being the same means there is only one bnode. The way bnodes work in RDF 1.0 and 1.1 is the same as far as RDF graph and RDF graph serialisations are concerned. There is only an extension on how bnodes work in RDF datasets, a concept note defined in RDF 1.1.
- Def.6 "so-called context" -> so-called by whom? there is no such notion as "context" in RDF 1.1 Concepts.
- before Def.7, there is a little discussion about the semantics of RDF datasets, while there has been nothing said about the semantics of RDF graphs. This kind of remarks should come after presenting the formal semantics
- Def.7 is ot complete. Graph names cannot repeat
- The description of what RDF containers are is vague and misleading. It is not clear why there is a focus on RDF containers, a rather unimportant feature of RDF.
- "Quite rarely used feature is reification" -> this needs a citation.
- "There are other proposals [87,118]" -> there are other approaches (a good reference for this would be "Daniel Hernández, Aidan Hogan and Markus Krötzsch. Reifying RDF: What Works Well With Wikidata? In Proc. of SSWS 2015")
- The section ends abruptly without transition or analysis

Sec.3:
- after Def.8, it is said that the graph isomorphism problem is GI-complete, which is correct. Then, Table 1 says it is NP-complete.
- the source for GI-completeness should be Carroll 2002 as cited above.
- Def.9 mentions common labels. There is no bnode labels in the abstract syntax and the sentence should stop with "do not share any blank nodes"
- "The labels of blank nodes are bot of significance outside of the local scope RDF merge" -> what does this mean?
- "than the original graph" -> graphs
- Def.10: "a bijective function M: B-> B ... M is the identity map on RDF literals [etc]" -> it cannot be a function from B to B if it maps literals and other things
- Def.11 does not make sense as formalised
- before Def.12 "is a that"
- Def.12: "Assume that a map is a function ... there is no function" -> why introduce the term "map" that is never used? Also, the "otherwise" part is not useful because it's a definition
- after Def.12 "that the a subgraph"

Sec.4:
- Def.14: "Let V be a vocabulary" -> interpretations in RDF 1.1 do not depend on a vocabulary
- after Def.14: what is H? what is I(H)?
- after Def.16: "in RDF 1.0 [...] D-entailment was described as an RDFS-entailment semantic extension. In RDF 1.1 it is defined as a RDF's direct extension" -> No. It is an extension of simple entailment.
- Def.17: there is a \mathcal{I} while the interpretation is just $I$.
- "A selection of the inference rules" -> rules of what?
- after Def.18: "in Table[newline]3" -> use non-breaking space

Sec.6:
- "Turtle (denotes ttl)" -> denoted ttl in Table 7
- "RDF[newline]1.1" -> non-brealing space
- Sec.6.1, after Ex.13: "the syntax is viewed as problematic to read and write for humans so one should consider using other syntaces for data maangement" -> why is it a problem for data management? For RDF editors maybe
- after Ex.14: "separated by space key, tabulator key" -> separated by space, tabulation
- This section does not talk about TriX, RDF/JSON
- Table 7 is not referenced in the text apparently (it seems that text after the table says "Table 8" but should read Table 7)
- "below" -> above?
- Table 7: why are rdfa and xml "human readable"? What can we conclude about the table?

Sec.7:
- Table 8: why is there a star in the cell "standard" for HDT?
- "Several research areas have emerged around MapReduce and RDF compression" -> maybe cite "José M. Giménez-García, Javier D. Fernández, Miguel A. Martínez-Prieto. HDT-MR: A Scalable Solution for RDF Compression with HDT and MapReduce. ESWC 2015: 253-268"
- "In Table 8 ... below" -> above

Sec.8.1:
- "public-lod@w3.org mailing" -> mailing list
- "LOD Cloud.Each" -> space
- "The Fig.3" -> "Fig.3"
- Fig.3 is not useful.
- "The Table 9" -> Table 9
- the data from Table 9 comes from somewhere, cite it

Sec.8.2:
- "we define the riib ratio" -> this comes from other research, cite it
- "ddr_l metric for rdf:langString is almost 0" -> what is being computed here language tagged strings are quite common. If the computation is about how many times rdf:langString shows up explicitly, then the value should zero. There are no concrete syntaxes where one can explicitly write a language tagged string with its type being explicit.

Sec.8.3:
- "The results show that RDF*" -> what is RDF*

Sec.8.4:
- "Table[newline]17" -> non-breaking space

At the end of Sec what do we conclude? What was the use of doing all this?

Sec.9: there is only very little related to KR in this paper. It's mostly about data management, with a little section on reasoning. Nothing on modelling knowledge and related things such as ontology engineering, reaasoning algorithm, etc.

Ref.[62]: missing capital letters on rdf

Review #2
By Axel Polleres submitted on 15/May/2017
Suggestion:
Reject
Review Comment:

First and formeost, I would like to sincerely apologize for the delay of my review.

The present paper presents a very exhaustive literature review about works around RDF and RDF1.1.
I must say that I was impressed by the coverage of related literature where I hardly found any work I would consider relavant missing (except for the compression chapter, see below, and also for the foundations of RDF). As such, the paper is useful, however it also has its severe downsides:
First of all the paper seems very unpolished in parts containing grammatically incomplete sentences (I would have expected better proof-reading with a native speaker as a co-author) and also in parts lack of complete examples. Second, in terms of formality and definitions there are several drawbacks as the level varies from formal, but not properly explained definitions, to informal superficial descriptions of issues that would require more formal rigor to be fully understandable. Particularly, for people to whom RDF is new the style (despite the many and good references) is not self-contained in all places. I will list details below.
Finally the overall structure of ther paper is confusing a bit and seems a bit of an unfocused collection:
An overview is given about RDF itself and its development (but without history on e.g. the pre 1999 version of RDF, which e.g. didn’t have explicit (named) blanknodes), nor having a comprehensive summary of the RDF1.1 vs. RDF1.0 differences, which I was eagerly hoping to find in such a paper), then mappings from Relational DBs and XML to RDF are summarized (without e.g. explicitly trating R2RML, which I was pretty surprised not to find.
Then an overview of an experiment is given trying some tools and analyzes (without a clear focus, though) on a dataset comprising a subset of the LOD-cloud. (BTW, e.g. the LOD-Laundromat project which claims to enable exactly such experiments isn’t even mentioned). I would question paritally the usefulness of the experimemt in the sense that e.g. compression rates for HDT against other formats highly depend on the graph at hand, whereas - on the positive side - again after the authors’ own experiment a useful overview of larger scela RDF dataset analyses is given.

In total, I am wiggling between proposing a major revision: which would require a lot of work though, or suggestion a rejection with the option to resubmit a fully overworked version, rethinking the focus and structure of the paper.

Details, which I hope help the authors understand my rationale, as follows:

p.1 "Resource Description Framework (RDF) is seen as a solution provider ” —> “The Resource Description Framework (RDF) is seen as a solution”

Also, I don’t know what “solution provider” means here… how can a framework be a solution provider?

p.2
In section 1.2, you write that Marin [111] was the first formal analysis of the RDF model. I would not say this, as ter Horst’s paper in 2005 was earlier.

BTW, you seem to miss:

C. Gutierrez, C. Hurtado, A. O. Mendelzon, Foundations of Semantic Web Databases, Proceedings ACM Symposium on Principles of Database Systems (PODS), Paris, France, June 2004, pp. 95 - 106

which was even earlier.

also: "Grau et al. [79]” … there is no "et al."

In general, the publications listed in this subsection would benefit from being presented chronologically or somehow more systematically, than just an unsorted enumeration.

p.3

"This data model should be general enough to provide a
canonical representation for arbitrary content”

I have to stress that RDF does not provide this… there is many possible RDF representations for the same data.

"An RDF constitutes a universal method of the con-
ceptual description or information modeling accessible
in Web resources”

this sentence needs rewording. An RDF Graph? The Resaourcse Description Framework? Is it really a “method”? I don’t think so.

Consider to merge Definitions 1 and Definition 5 into one.

Example 1: the triple is barely readable (small font)

“which are URI generalization” —> rewording needed.

neither in Definition 3 nor later on is there explained what a lexical space and value space of a datatype is… later in Definition 16 you talk about L2V, where you haven’t explained these things before, so it is confusing. This is one example of what I meant when saying that the paper as a survey for people new to RDF doesn’t really do it’s job in being self-contained and understandable.

p.4:

"Note that RDF 1.0 in no way refers to any blanknodes internal structure. Given two blank nodes, it isnot possible to determine whether or not they are thesame. In RDF 1.1 blank node identifiers are local iden-tifiers adapted in particular RDF serializations or im-plementations of an RDF store.

I am not sure which difference between RDF1.0 and 1.1 you are hinting here, whether there is any and what it exactly is.

p.5

"RDF namespace that unify popular RDF patterns, suchas RDF collections, containers and RDF reification.”

I am afraid, without further explanation what a container or collection are, this is not rally clear to someone new in RDF.

Also, e.g. for the use of the list vocabulary, it is not noted that cyclic or unterminated lists in RDF are possible. I think this should be made clear. It would be good if the paper was stressing, maybe evne defining “standard use” and "non-standard use” of the RDF and RDFS vocabulary (as done in the literature, e.g. by de Bruijn or by Hogan in his PhD thesis)

"sec-
ond”

it looks strange that Example 8 cuts through the sentence

As for RDF* , more details would be good: to me, this looks like this proposal is dependent on the Turtle/N-Triples serialization, and not made on the level of the RDF data model.

Why don’t you provide an example for named graph notation for reification (mentioned briefly in the end of section 2?

Section 3:

whaty do you mean by “to inject the N-ary association” … reword?

p.6

Definitions 8 an 9 look alien where they are located. It would be better to move Def 8 after “This gives rise to a notion of isomorphism between RDFgraphs that are the same up to blank node relabeling:isomorphic RDF graphs can be considered as contain-ing the same content.”

and to move Def 9 after:

"The result of this operation on RDF graphs can produce more nodes than the original graph."

you speak about "GI-complete" in the text where I guess you mean the refinement int table 1 to NP-complete and PTIME for the ground case. This should be said explicitly, the term GI-complete is not that common.

Can you make clearer/intuitively explain what the difference definition between Def 8 and 10 is?

Definition 11 seems sefl-referential to me, i.e. you define Skolemization by “skolemization”. Also, the definition doesn’t say clearly that the skolem terms need to be fresh constants, i.e. S \cap IRI(G) = \emptyset.

p.7

Instead of just referring to well-behaved RDF, you should make clear that NP-completeness stems from cyclic blank nodes only, which can’t arise for what you call well-behaved. Otherwise, how should the reader know why this concept is useful/needed. Also, it would be good to notice that the first version of RDF difn’t have named bnodes and thus was per definition well-behaved.

"Please note that the a subgraph can a graph with fewer triples."
… another non-sentence.

"Example 10. The example presents that the topgraph is lean and the bottom graph in not lean.”
in what sense does the example present that? I see two graphs… you should explain, why one is lean and one isn’t, based on the definitions, not leave it as an exrcise… Examples in a survey should be used for illustration, not for making the reader guess.

p.8

"In this paper the author proves that the RDFgraph G simply-entails finite RDF graph H is NP-complete.”
—> ??? reword.

"Definition 15: (semantic condition for blank nodes)”

What is defined here and what is the intuition?

“This semantics generalizes the RDFS 1.0 semantics"
—> this is strange since at this point in the paper, you haven’t even talked about RDFS semantics.

p.9

The presentation of the RDFS semantics is not much better than verbatim copying it from the spec, i.e. in such a paper, I would expect much more the intuition being presented, than just verbatim copying the spec definitions.

"In [150] ter Horst stud-ied RDFS 1.0 entailment and he proved that the con-sistency problem is in PTIME.”

Which exact decision problem do you talk about? you ahven’t introduced what consistency is —> another case of not self-contained.

p. 10

Table 4 comes without real explanation/verbal description

I was surprised not seeing the interpolation lemma mentioned in Section 4 at all, cf. also Gutierre et al 2004, mentioned above.

Example 12 again would benefit from some more explanations.

"This subsection contains an overview and comparison of the approaches adopted by the RDF and relational database environments.”

approaches for what?

p.11

Table 5 … note that XSPARQL (which you later mention for XML mapping, has also been used/extended for Relational mapping:

Nuno Lopes, Stefan Bischof, and Axel Polleres. On the semantics of heterogeneous querying of relational, XML, and RDF data with XSPARQL. In Proceedings of the 15th Portuguese Conference on Artificial Intelligence (EPIA2011) -- Computational Logic with Applications Track, Lisbon, Portugal, October 2011.

Additionally, I wonder why R2RML, as a W3C standard with several implementations, os only mentioned with a short sentence in the end of the page, even without an example: “Another language based of RDF for expressing mappings fromrelational databases to RDF datasets is R2RML”. This is another example of the sheer enumeration of the paper, instead of explaining or at least chronological listing of the related works.

p.13

"That approachcan create RDF from XML document.”
Doesn’t that apply to all the listed approaches in this section?

Overall, the section seems to be a sheer enumeration without any qualitative comparison or alike and as such not as useful as it could be.

It is also surprising that you start to describe RDF serializations only after the conversion techniques from relational and XML data. I would have expected that before.

What do you mean by “(denoted ttl)”? that this is a common file suffix? If so, say it. again an example that might be confusing for someone not familiar with RDF.

p.14

CURIEs first time mentioned here, are not explained (maybe a better place would be to explain URIs, IRIs, namespaces, CURIEs all in one place further up.

"Nevertheless,the syntax is viewed as problematic to read and writefor humans so one should consider using other syn-taxes in data management."

Why? I do not think this is valid argument for a machine-readable syntax, suggest to remove "so one should consider using other syn-taxes in data management.” as it’s not really argued well.

p.15

Turtle and N-triples should be swapped and an examples for N-Triples should be given.

p.16

"However, it also ex-tends the RDF data model, e.g. in JSON-LD predicatescan be IRIs or blank nodes whereas in RDF have to beIRIs.”

What is the rationale for that and please provide an example?
What does the “@“ sign mean in the serialization?

As for TriG, I find thre syntax explanation insufficient, in just a bullet list, a proper definition is missing and also:
"4. optional blank lines.”
seems to be superfluous.

N-Quads [31] (denoted nq) is a line-based syntax for serializing RDF datasets.
—>
N-Quads [31] (denoted nq) is another line-based syntax for serializing RDF datasets.

Make clear/explicit that N-Triples to Turtle is what N-Quads to TriG.

p.17

Table 7 is a bit hard to digest it should also be explained in more detail in the text, particularly what “partial” means.

p.18

For RDF compression, note that e.g. EXI has also been used for RDF compression:

Käbisch, Sebastian, Daniel Peintner, and Darko Anicic. "Standardized and efficient RDF encoding for constrained embedded networks." European Semantic Web Conference. Springer International Publishing, 2015. http://link.springer.com/chapter/10.1007/978-3-319-18818-8_27

Also, I was surprised not to see any mention of Linked Data fragments or Triple pattern fragments.

Also, this one seems worth a mention:
José M. Giménez-García, Javier D. Fernández, Miguel A. Martínez-Prieto:
HDT-MR: A Scalable Solution for RDF Compression with HDT and MapReduce. ESWC 2015: 253-268

p.19 Fig 3 is not the latest version of the LOD cloud diagram.

p.20 the compression efficiency of HDT obviously and other formats obviously depends on the dataset at hand and it’s graph structure… this doesn’t come across well.

p. 25

[23] doesn’t have a venue , also the other references should be checked again.


Comments