Declarative Generation of RDF-star graphs from Heterogeneous Data

Tracking #: 3501-4715

Authors: 
Julián Arenas-Guerrero
Ana Iglesias-Molina
David Chaves-Fraga
Daniel Garijo
Oscar Corcho
Anastasia Dimou

Responsible editor: 
Guest Editors Tools Systems 2022

Submission type: 
Tool/System Report
Abstract: 
RDF-star has been proposed as an extension of RDF to make statements about statements. Libraries and graph stores have started adopting RDF-star, but the generation of RDF-star data remains largely unexplored. To allow generating RDF-star from heterogeneous data, RML-star was proposed as an extension of RML. However, no system has been developed so far that implements the RML-star specification. In this work, we present Morph-KGC^star , which extends the Morph-KGC materialization engine to generate RDF-star datasets. We validate Morph-KGC^star by running test cases derived from the N-Triples-star syntax tests and we apply it to two real-world use cases from the biomedical and open science domains. We compare the performance of our approach against other RDF-star generation methods (SPARQL-Anything), showing that Morph-KGC^star scales better for large input datasets, but it is slower when processing multiple smaller files.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Sebastián Ferrada submitted on 01/Aug/2023
Suggestion:
Accept
Review Comment:

This new version successfully addresses all my previous comments, by being more precise about the purpose of RDF-star, making comparable examples among the different reification options, improving the clarity of the main algorithm presented and its parameters, and solving the presentation issues. Furthermore, the authors greatly improved the presentation of the paper by adding colors, highlights and running examples.

Therefore, and by my previous review, I recommend this paper to be accepted.

Review #2
By Pierre-Antoine Champin submitted on 09/Sep/2023
Suggestion:
Minor Revision
Review Comment:

This paper presents RML-star, an extension of RML (RDF Mapping Language) to support the new features introduced by RDF-star in RDF. The paper also presents Morph-KGC^star, an implementation of RML-star, and compares it with other implementation of mapping languages supporting RDF-star.

I am generally satisfied by how the authors took into account my remarks on the previous version of this paper. There is however a few remaining points (which, I admit, I missed in my first review) that I believe need fixing, all concerning Algorithm 1.

1) the first input of the algorithm is a "mapping rule", but the RML-star ontology (described in Figure 1), has no explicit concept of *rules* -- instead, it defines different kinds of *maps*. At first, I thought that the parameter 'm' was a TriplesMap, but that interpretation does not work, because the algorithm assumes that 'm' has exactly one of each: Subject Map, Predicate Map (PM), Object Map (OM). By contrast, a TriplesMap may have multiple PMs and OMs. Also, p.7, the authors mention "the mapping rules *within* a non-asserted triples map" (emphasis is mine), so rules and maps are clearly different things. But again: where are the rules in the ontology?
Reading between the lines, I am guessing that rules are somewhat generated from all the SM-PM-OM combinations inside a given map. But that should be explicit!

2) if I try to apply the algorithm to the running example (Listing 8), to the unique rule within the map #dateTM, I come to the conclusion that
* 'subjects' is populated, line 10 of the algorithm, with the following values:
- << :Angelica :jumps "4.80" >>
- << :Katerina :jumps "4.85" >>
* 'predicates' is populated, line 21 of the algorithm, with the following values:
- :date
* 'objects' is populated, line 19 of the algorithm, with the following values:
- "2022-03-21"
- "2022-03-19"
* these 3 collections (ordered? with repetition?) are passed to the function createTriples, line 27, which combines them into triple.

How it decides which subject to combine with which predicate and which object, is absolutely not clear, and should be made explicit.

3) this becomes trickier if I try to apply the algorithm to the more complex example in Listing 6. In my interpretation:
* 'subjects' is now populated with the following values:
- << :Angelica :jumps "4.80" >>
- << :Katerina :jumps "4.85" >>
- << :Angelica :scores "1211" >>
- << :Katerina :scores "1224" >>
* 'predicates' and 'objects' as above.
(NB: this would be even trickier if the OM was *also* a StarMapping generating multiple triples)
→ It is even less clear now how createTriple decides which subjects goes with which predicate and which objects.

I have a hunch that in that case, you are actually considering that #dateTM contains 2 rules instead of 1, which could be summarized as :

- rule1: << [:{PERSON}] :jumps [MARK] >> :date [DATE]
- rule2: << [:{PERSON}] :score [SCORE] >> :date [DATE]

But that brings us back to question 1): you should explain more explicitly what *rules* are, and how you generate them from the maps.

Review #3
By Kai Eckert submitted on 22/Sep/2023
Suggestion:
Accept
Review Comment:

This is a revision of a formerly submitted paper, my review focuses on the points I raised for the former paper.

(1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided).

The paper is still relevant, as indicated.

(2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool.

The current version shows major improvements over the former version. The overall layout is much easier to parse and the description of the algorithm with a running example is now much easier to comprehend. I still would prefer "triple map" instead of "triples map", but as this is also part of the ontology, it is understandable that the authors stick with it.

All in all, the critical points here have been addressed.

Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess

(A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data,

Yes, the repository is constantly updated with new stable Zenodo URIs for releases.
(B) whether the provided resources appear to be complete for replication of experiments, and if not, why,

The link to the documentation works now and I could follow the example workflows.

All in all I think the paper can be accepted for publication now.