Optimising the ShExML engine through code profiling: from turtle's pace to state-of-the-art performance

Tracking #: 3736-4950

Authors: 
Herminio Garcia-Gonzalez

Responsible editor: 
Guest Editors KG Construction 2024

Submission type: 
Full Paper
Abstract: 
The ShExML language was born as a more user-friendly approach for knowledge graph construction. However, a recent study has highlighted that its companion engine suffers from serious performance issues. Thus, in this paper I undertake the optimisation of the engine by means of a code profiling analysis. The improvements are then measured as part of a performance evaluation whose results are statistically analysed. Upon this analysis, the effectiveness of each proposed enhancement is discussed. Moreover, the optimised version of ShExML is compared against similar engines, delivering a comparable performance to its alternatives. As a direct result of this work, the ShExML engine offers a much more optimised version which can cope better with users' demands.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Jakub Klimek submitted on 06/Sep/2024
Suggestion:
Accept
Review Comment:

This is a revised version of the paper where the author presents a report on how they investigated bottlenecks in their software, the ShExML engine, through software performance profiling techniques, and how the found bottlenecks were addressed in different versions of the software.

Compared to the previous version, this version is a major improvement. It has also been reshaped into a full paper instead of the "Reports on tools and systems" type submission and significantly extended. Now it seems more self-contained. It describes and quantifies the improvements made to the ShExML engine, which could be of interest to all users and implementers of similar software and related benchmarks. The research is original, significant and the quality of writing has improved.

Also, the evaluation seems replicable and the results are published with a long-term stable link.

My comments from the previous version were addressed and now I am happy to recommend accepting the paper.

Review #2
By Ana Iglesias-Molina submitted on 20/Oct/2024
Suggestion:
Accept
Review Comment:

This is a review based on a revised version of the paper, that presents how the ShExML engine is optimized to improve performance in terms of KG materialization time and resource usage. I would like to commend the author for taking into consideration the suggestions from my previous review and the other reviewers’, as I consider the paper has greatly improved from the first version.

I like much more how section 2 reads now, however I believe that “rmlmapper” is written like “RMLMapper”. Moreover, creating the a new section for describing the ShExML language has been a good decision IMO, the following sections are clearer and can be followed more easily with this context established. I’d suggest to add line numbers to the mapping listing, so that particular parts in the mapping can be referenced from the text when describing the example.

Section 4 has improved and reads better now, but it is still however not clear what are the inputs (data, mapping, both?) of the pipeline, or the different components. Sections 5 and 6 looks also better with Table 1, the violin plots and the replication of the SPARQL-Anything experiments. As a small suggestion, I’d advise to separate the groups among versions, so that they can be more easily differentiated. Additionally, it is not clear why the EHRI data is not tested in the first version.

In general, I’m quite content with the changes and modifications made for this version, I have only minor remarks to improve the paper further.

Review #3
By Nuno Lopes submitted on 21/Oct/2024
Suggestion:
Accept
Review Comment:

The revision addressed my concerns, and the additional benchmarks significantly improved the paper's contributions.