Review Comment:
The submission proposes a similarity metric that evaluates the accuracy of the predictions that a query optimizer makes about the cardinalities of triples and joins. As these predictions are the primary input for estimating cost, erroneous cardinality predictions can lead the optimizer to sub-optimal query plans.
The proposed similarity metric is compared against q-error, a metric for evaluating cardinality predictions from the relational database literature. The basis of comparison is how strongly the evaluation metric correlates with actual execution times on a benchmark suite. That is to say, the proposed similarity metric is found superior to q-error in the sense that the new metric is a better predictor of overall query execution performance than q-error. Such a metric can be used (in combination with other metrics evaluating other steps of the query processing pipeline) in order to pin-point the step that causes ill-performance.
This work can potentially be a significant contribution to the federated query processing literature, and the submitted manuscript is a substantial contribution towards materializing this potential.
Regarding the authors' reaction on my comment regarding taking index size into account, I do not agree with their rationale for not involving this information as part of the metric (explanation follows), but it is their metric and their article, so AFAIIC the substance of the comment has been addressed as they now report index size and index construction time.
For the sake of academic discussion only, not be understood as a review comment, let me explain my argument by reduction to absurdity: there is nothing preventing a federation engine from copying the complete datasets and calling this copy the "index", internally trying out all sub-queries on all datasets, and then proceeding to plan the "real" execution based on perfect cardinality "estimations". None of the engines tested does this, but in my opinion the metric is weakened by the fact that it can be gamed in such a straightforward way. In fact, although the default behaviour is the one correctly described in Section 5, the Semagrow index can be configured to also include multi-dimesional join cardinalities. The index extractor is given as a parameter how much space it is allowed to consume and it decides how to use this space budget prudently, only keeping the most surprizing selectivities. Naturally it is not advisable to use too large an index as searching through the index would end up taking longer than executing even the most suboptimal plan, but it would game the cardilatity estimation metric. So a metric that combines cardinality accuracy with index size would be a useful tool for tuning systems with a variable-size index.
Regarding the authors' comment that index size depends on the format used to store the index, I am sure that "size" can be defined in a way that abstracts to information granules without being sensitive to the specific format.
My other comment has to do with the level of detail in the analysis of the experiments' outcomes. Although the second version of the article has become considerably more technically detailed and interesting, I think that some improvement is still in order: Empirical results do not go back to the great overview the authors are now giving in Section 5 to explain what characteristic of each of the five cardinality estimation strategies do or do not work well for the different dataset/query situations explored by the benchmark. In other words, if the new metric is compared against q-metric not only statistically, but also using the hints given by the statistics to go back and demostrate what actual phenomena q-metric fails to capture but the new metric does. This would be a blueprint on how to use the metric to discover and fix problems in federation engines, demonstrating its potential and increasing our confidence that it is capturing something that q-metric does not.
Given the above, I find the authors' response to the relevant comment in the first review satisfactory, as they added a very interesting analysis of how better cardinality estimation leads to better query plans. What is missing, editorially rather than substantially, is a concrete demonstration of how the new metric can be used in a way that q-metric alone would be insufficient. This would evaluate the first contribution claimed by the authors: the new metric can give a fine-grained evaluation of query engines.
Another minor editorial in Section 4.1, Definition 1:
(r_1, ... r_n) and (e_1, ... e_n) are in R^n, not R
|