Robust Query Processing for Linked Data Fragments

Tracking #: 2888-4102

Authors: 
Lars Heling
Maribel Acosta

Responsible editor: 
Guest Editors Web of Data 2020

Submission type: 
Full Paper
Abstract: 
Linked Data Fragments (LDFs) refer to interfaces that allow for publishing and querying Knowledge Graphs on the Web. These interfaces primarily differ in their expressivity and allow for exploring different trade-offs when balancing the workload between clients and servers in decentralized SPARQL query processing. To devise efficient query plans, clients typically rely on heuristics that leverage the metadata provided by the LDF interface, since obtaining fine-grained statistics from remote sources is a challenging task. However, these heuristics are prone to potential estimation errors based on the metadata which can lead to inefficient query executions with a high number of requests, large amounts of data transferred, and, consequently, excessive execution times. In this work, we investigate robust query processing techniques for Linked Data Fragment clients to address these challenges. We first focus on robust plan selection by proposing CROP, a query plan optimizer that explores the cost and robustness of alternative query plans. Then, we address robust query execution by proposing a new class of adaptive operators: Polymorphic Join Operators. These operators adapt their join strategy in response to possible cardinality estimation errors. The results of our first experimental study show that CROP outperforms state-of-the-art clients by exploring alternative plans based on their cost and robustness.In our second experimental study, we investigate to what extent different planning approaches can benefit from polymorphic join operators and find that they enable more efficient query execution in the majority of cases.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Hala Skaf-Molli submitted on 25/Oct/2021
Suggestion:
Accept
Review Comment:

The paper "Robust Query Processing for Linked Data Fragments" is an extension of a previous work published at ISWC 2020. The new contributions are clearly defined. The current version of the paper is a revision of a previous submission to the Semantic Web Journal. In this revision, the authors have clearly answered my questions. I thank the authors for their efforts to address all my comments and suggestions. The changes made to the article make it clearer and easier to read. The theoretical and experimental scientific contributions are important. The article is ready to be published.

Review #2
By Stasinos Konstantopoulos submitted on 15/Nov/2021
Suggestion:
Accept
Review Comment:

The article presents two approaches for alleviating the efficiency degradation caused by join cardinality mis-estimations during query execution planning.

The first method adds the concept of robustness in the optimizer's cost function: plan A is considered superior to plan B when A is expected to suffer minor performance degradation in the face of mis-estimations, although B would be more efficient if all estimations prove to be accurate.

The second method presents "polymorphic operators", operators that can switch mid-execution between being bind-join or hash-join physical operators, if the observed selectivity indicates that such a switch would improve efficiency.

I found both methods interesting and the evaluation thorough and convincing. In general the submission is a demanding, but enjoyable, read describing interesting work.

All my comments have been addressed or rebuked satisfactorily. I feel, and I hope that the authors agree, that the article is now a smoother read.

I apologize to the authors for delaying the review.

Review #3
By Oscar Corcho submitted on 17/Nov/2021
Suggestion:
Accept
Review Comment:

Once I have checked the changes done in the paper as a result of my comments, I consider the paper ready for publication. I appreciate the inclusion of the definition of robust query processing upfront, and while I think that the relationship to federated query processing is not so relevant for the state of the art analysis, I consider it interesting to make the paper more complete, so no problem in leaving it inside.