Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

Submitted by Umair Qudus on 06/12/2020 - 01:28

Tracking #: 2514-3728

A new version of this paper is available

Authors:

Umair Qudus

Muhammad Saleem

Axel-Cyrille Ngonga Ngomo

Young-Koo Lee1

Responsible editor:

Ruben Verborgh

Submission type:

Full Paper

Abstract:

Finding a good query plan is key to the optimization of query runtime. This holds in particular for cost-based federation engines, which make use of cardinality estimations to achieve this goal. A number of studies compare SPARQL federation engines across different performance metrics, including query runtime, result set completeness and correctness, number of sources selected and number of requests sent. Albeit informative, these metrics are generic and unable to quantify and evaluate the accuracy of the cardinality estimators of cost-based federation engines. To thoroughly evaluate cost-based federation engines, the effect of estimated cardinality errors on the overall query runtime performance must be measured. In this paper, we address this challenge by presenting novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines. We evaluate five cost-based federated SPARQL query engines using existing as well as novel evaluation metrics by using LargeRDFBench queries. Our results provide a detailed analysis of the experimental outcomes that reveal novel insights, useful for the development of future cost-based federated SPARQL query processing engines.

Full PDF Version:

swj2514.pdf

Revised Version:

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

Previous Version:

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

Tags:

Reviewed

Decision/Status:

Minor Revision

Solicited Reviews:

Click to Expand/Collapse

Review #1

Anonymous submitted on 13/Jul/2020

Suggestion:
Major Revision

Review Comment:

I thank the authors for their reply letter and updated paper.
Most of my earlier comments have been addressed,
except for my main comment regarding the significance of the metrics.

I will re-iterate this earlier remark on the weakness of the similarity and q-error metrics:
According to the current experiments, correlation to query execution time has been shown to be quite small (R values).
Furthermore, the results for q-error are insignificant (p values) in most cases,
and for similarity in 50% of the cases under linear regression, and 25% using robust regression.
The authors correctly explain that this shows that query execution time is explained by a multitude of factors, where cardinality estimates are one.
However, given the outcomes of these experiments,
there is no proof that these metrics are significant enough to provide information about cardinality estimation,
which makes the value of these metrics questionable.
I find it unfortunate that the authors did not find a alternative method of proving the effectiveness of these metrics in a statistically significant manner,
as I do think that these metrics have their value in this domain.

Even after repositioning the paper, the introduction, evaluation, and discussion of these metrics up the majority of this submission.
Therefore, I find this work too weak in its current state to accept for a journal.

Typo:
* Page 2: "sate-of-the-art cost-based"

Review #2

By Stasinos Konstantopoulos submitted on 29/Jul/2020

Suggestion:
Minor Revision

Review Comment:

The submission proposes a similarity metric that evaluates the accuracy of the predictions that a query optimizer makes about the cardinalities of triples and joins. As these predictions are the primary input for estimating cost, erroneous cardinality predictions can lead the optimizer to sub-optimal query plans.

The proposed similarity metric is compared against q-error, a metric for evaluating cardinality predictions from the relational database literature. The basis of comparison is how strongly the evaluation metric correlates with actual execution times on a benchmark suite. That is to say, the proposed similarity metric is found superior to q-error in the sense that the new metric is a better predictor of overall query execution performance than q-error. Such a metric can be used (in combination with other metrics evaluating other steps of the query processing pipeline) in order to pin-point the step that causes ill-performance.

The article is well-written and the work described can potentially be a significant contribution to the federated query processing literature, and the submitted manuscript is a substantial contribution towards materializing this potential.

My comments regarding the analysis of the experiments' outcomes have been addressed to the degree that I only have minor editorial complaints. Specifically, I recommend that Section 6.3 (or thereabout) the authors provide a concrete example demonstrating how the new metric can be used in a way that q-metric alone would be insufficient or inadequate. This would evaluate the first contribution claimed by the authors: the new metric can give a fine-grained evaluation of query engines. I note here that in the previous round, the authors reacted to this recommendation with a rough sketch of such an example in their response, but nothing relevant was added to the manuscript.

Log in or register to post comments
3780 reads

Main menu

Editorial Board

Syndicate

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

Tracking #: 2514-3728

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

Tracking #: 2514-3728

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles