Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Empowering the SDM-RDFizer Tool for Scaling Up to Complex Knowledge Graph Creation Pipelines

Submitted by Enrique Iglesias on 06/17/2023 - 13:58

Tracking #: 3489-4703

A new version of this paper is available

Authors:

Enrique Iglesias

Maria-Esther Vidal

Samaneh Jozashoori

Diego Collarana

David Chaves-Fraga

Responsible editor:

Guest Editors Tools Systems 2022

Submission type:

Tool/System Report

Abstract:

The significant increase in data volume in recent years has prompted the adoption of knowledge graphs as valuable data structures for integrating diverse data and metadata. However, this surge in data availability has brought to light challenges related to standardization, interoperability, and data quality. Knowledge graph creation faces complexities arising from factors such as large data volumes, data heterogeneity, and high duplicate rates. This work addresses these challenges by focusing on scaling up declarative knowledge graph creation specified using the RDF Mapping Language (RML). We propose SDM-RDFizer, a two-fold solution designed to address these complexities. Firstly, we introduce a reordering approach for RML triples maps, prioritizing the evaluation of the most selective maps first to reduce the memory consumption. Secondly, we employ an RDF compression strategy, along with optimized data structures and novel operators, to prevent the generation of duplicate RDF triples and optimize the execution of RML operators. We evaluate the effectiveness of SDM-RDFizer using established benchmarks, which demonstrate its superiority over existing state-of-the-art RML engines, highlighting the tangible benefits of our proposed techniques. Furthermore, the paper presents real-world projects where SDM-RDFizer has been utilized, providing insights into the advantages of declaratively defining knowledge graphs and efficiently executing these specifications using SDM-RDFizer.

Full PDF Version:

swj3489.pdf

Revised Version:

Empowering the SDM-RDFizer Tool for Scaling Up to Complex Knowledge Graph Creation Pipelines

Previous Version:

Empowering the SDM-RDFizer Tool for Scaling Up to Complex Knowledge Graph Creation Pipelines

Tags:

Reviewed

Long-term Stable Link to Resources:

https://doi.org/10.5281/zenodo.7027549

Decision/Status:

Minor Revision

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Dominik Tomaszuk submitted on 08/Jul/2023

Suggestion:
Minor Revision

Review Comment:

I would like to express my gratitude to the authors for addressing my comments and suggestions. I am pleased with the revisions made in the new version. As the author emphasized, this is primarily a tool report paper, and taking that into consideration, I am now inclined to accept it. However, I have noticed that some of my comments regarding the Related Work section have been overlooked or dismissed by the author without any explanation. I still believe that the Related Work could be improved, and it strikes me as odd that the authors did not acknowledge or incorporate several suggestions that were provided.

Review #2

By Antoine Zimmermann submitted on 13/Jul/2023

Suggestion:
Accept

Review Comment:

The new version of the paper addresses the concerns I had before. This is an improved version, though I still find that the technical development is a bit hard to follow but I don't see anything that is truly objectionable.

Review #3

Anonymous submitted on 30/Jul/2023

Suggestion:
Major Revision

Review Comment:

Unfortunately, the paper still reads like a report for an incremental tool development, but not as a self-sustained, reflexive and complete research paper.

The quality of the paper has not really improved since the last submission.

There is no consistency in description of the challenges and the methodology is not clear. Requirements are generic, and it is not clear wherefrom they were derived. It is also not clear what are the use cases and scenarios that need this work. In the introduction, an ESWC 2023 event benchmark is mentioned, and in the next section already "EU H2020 funded iASiS project" as a base.
Who really needs this tool and what for?
The research questions themselves are mentioned for the first time in Section 5 (!). What were the authors doing till then - hacking?

There are lacks in essential details when describing the approach. Why, for example, RML is chosen to apply in the first place?
Or: "Three state-of-the-art RML engines, i.e., RMLMapper v6.08, Morph-KGC v2.1.19, and SDM-RDFizer v3.2, are utilized to create this portion of the KG; the engines timed out in five hours." -> in which environments, and on which machines and technical settings the experiments were run?

There is no clear comparison to the state of the art in the light of research questions. But there are unsubstantiated (sometimes self-praising) statements f.e. "Since its first release, the SDM-RDFizer has caught the attention of practitioners and knowledge engineers
due to its good results w.r.t. other RML engines." -> what are "good results" - where and on what? reference? "has caught" is also also another example of an informal language use, see also below.

The fact that the tool was "sold" in several EU project does not mean that much. One cannot just name-drop project names without any detailed explanations of what exactly the tool has improved there and how. And here are no such details provided (for the evaluation of the problems that are also not clearly described).

The text should be improved.
For example, KGs, DISs, RLM are introduced in the paper multiple times. One should introduce acronyms once, and then use them through the remaining paper and not introduce them over and over again.
There are sentences with unclear grammar, e.g. see this one right in the abstract: "We evaluate the effectiveness of SDM-RDFizer using established benchmarks, which demonstrate its superiority over existing state-of-the-art RML engines, highlighting the tangible benefits of our proposed techniques." -> what does "demonstrate" here - benchmarks?
Title of section 4: why the word "tool" is not starting with a capital letter?

Log in or register to post comments
1877 reads

Main menu

Editorial Board

Syndicate

Empowering the SDM-RDFizer Tool for Scaling Up to Complex Knowledge Graph Creation Pipelines

Tracking #: 3489-4703

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Empowering the SDM-RDFizer Tool for Scaling Up to Complex Knowledge Graph Creation Pipelines

Tracking #: 3489-4703

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles