Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Optimising the ShExML engine through code profiling: from turtle's pace to state-of-the-art performance

Submitted by Herminio Garcia... on 08/13/2024 - 01:55

Tracking #: 3736-4950

Authors:

Herminio Garcia-Gonzalez

Responsible editor:

Guest Editors KG Construction 2024

Submission type:

Full Paper

Abstract:

The ShExML language was born as a more user-friendly approach for knowledge graph construction. However, a recent study has highlighted that its companion engine suffers from serious performance issues. Thus, in this paper I undertake the optimisation of the engine by means of a code profiling analysis. The improvements are then measured as part of a performance evaluation whose results are statistically analysed. Upon this analysis, the effectiveness of each proposed enhancement is discussed. Moreover, the optimised version of ShExML is compared against similar engines, delivering a comparable performance to its alternatives. As a direct result of this work, the ShExML engine offers a much more optimised version which can cope better with users' demands.

Full PDF Version:

swj3736.pdf

Previous Version:

Improving the ShExML engine through a profiling methodology

Tags:

Reviewed

Long-term Stable Link to Resources:

https://doi.org/10.5281/zenodo.13305712

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Jakub Klimek submitted on 06/Sep/2024

Suggestion:
Accept

Review Comment:

This is a revised version of the paper where the author presents a report on how they investigated bottlenecks in their software, the ShExML engine, through software performance profiling techniques, and how the found bottlenecks were addressed in different versions of the software.

Compared to the previous version, this version is a major improvement. It has also been reshaped into a full paper instead of the "Reports on tools and systems" type submission and significantly extended. Now it seems more self-contained. It describes and quantifies the improvements made to the ShExML engine, which could be of interest to all users and implementers of similar software and related benchmarks. The research is original, significant and the quality of writing has improved.

Also, the evaluation seems replicable and the results are published with a long-term stable link.

My comments from the previous version were addressed and now I am happy to recommend accepting the paper.

Review #2

By Ana Iglesias-Molina submitted on 20/Oct/2024

Suggestion:
Accept

Review Comment:

This is a review based on a revised version of the paper, that presents how the ShExML engine is optimized to improve performance in terms of KG materialization time and resource usage. I would like to commend the author for taking into consideration the suggestions from my previous review and the other reviewers’, as I consider the paper has greatly improved from the first version.

I like much more how section 2 reads now, however I believe that “rmlmapper” is written like “RMLMapper”. Moreover, creating the a new section for describing the ShExML language has been a good decision IMO, the following sections are clearer and can be followed more easily with this context established. I’d suggest to add line numbers to the mapping listing, so that particular parts in the mapping can be referenced from the text when describing the example.

Section 4 has improved and reads better now, but it is still however not clear what are the inputs (data, mapping, both?) of the pipeline, or the different components. Sections 5 and 6 looks also better with Table 1, the violin plots and the replication of the SPARQL-Anything experiments. As a small suggestion, I’d advise to separate the groups among versions, so that they can be more easily differentiated. Additionally, it is not clear why the EHRI data is not tested in the first version.

In general, I’m quite content with the changes and modifications made for this version, I have only minor remarks to improve the paper further.

Review #3

By Nuno Lopes submitted on 21/Oct/2024

Suggestion:
Accept

Review Comment:

The revision addressed my concerns, and the additional benchmarks significantly improved the paper's contributions.

Log in or register to post comments
1086 reads

Main menu

Editorial Board

Syndicate

Optimising the ShExML engine through code profiling: from turtle's pace to state-of-the-art performance

Tracking #: 3736-4950

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Optimising the ShExML engine through code profiling: from turtle's pace to state-of-the-art performance

Tracking #: 3736-4950

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles