Review Comment:
The work proposes an approach to the problem of continuous multi-query optimization for subgraph matching containing 3 core aspects. First, the technique merges the set of input queries into a single query referred to as the "annotated query graph" obtained by merging each of the input queries into one. Second, An auxiliary data structure that stores intermediate results and finally, a matching strategy which outputs corresponding new/deleted matches in one pass.
The manuscript communicate the ideas of the work clearly with a reasonable set of experiments which can be improved on.
I have comments on the work's assumptions, prior work, or experiments:
1) Assumptions: The work assumes that all of the queries are available at the start making it possible to merge all queries in what is referred to as an "annotated query graph". This assumption is quite strong and the common case would be that queries are added one at a time over time. Specifically, within an organization where different teams are interested in different newly matched or deleted subgraphs for their needs, their demand wouldn't come all at once and therefore the right problem statement would assume an order in which the queries should be added and optimized.
2) Prior work: I believe the paper omits a lot of the prior research. Subgraph matching is equivalent to multi-way self-joins between using an Edge table (from,to,label). While the relational techniques wouldn't scale in the context of subgraph queries, it is important to acknowledge that given a mapping between the relational and graph models, why would the relational techniques not work. one can start from the following reference [1] and do a forward sweep for more related work.
In the context of continuous subgraph queries, it is worth pointing that multiple continuous queries have been studied unlike the claim in the abstract: "Unfortunately, current approaches are mainly designed for a single query: optimizing and answering each query separately."
For instance, reference [2] evaluates multiple subgraph queries under single-edge insertion workloads. The thesis by Kankanamge also does the same [3]. More recent work was published *after* your submission [4]. I do not expect you to compare against reference [4] due to the recency and conflict between submission times but I believe it is worth adding to the related work as another technique to consider in this context. A major and more thorough review of existing literature is a must.
It is important to also communicate in the related work section how your auxiliary data structure differs from ones used both in the one-time and continuous subgraph matching cases.
3) Experimental setup:
3.1) Is it possible to showcase an experiment regarding the matching order? Specifically one to show empirically that the intuition regarding matching vertices shared by more queries first is truly beneficial?
3.2) The experiments provide run times on the aggregates, it might be worth adding single query experiments to establish very clear baselines for the 0% sharing case instead of a large suit of queries none of which is shared.
3.3) It is important to showcase limitations with regard to either dataset size or query size for the proposed technique. Specifically, when does the auxiliary data structure used lead to a degradation in performance?
Handling these comments would be important to make the work impactful and also acknowledge and reflect a better understanding of existing literature.
[1] Efficient and extensible algorithms for multi query optimization. SIGMOD 2000.
[2] Efficient multiview maintenance under insertion in huge social networks. TWEB 2014.
[3] Multiple Continuous Subgraph Query Optimization Using Delta Subgraph Queries. Thesis 2018.
[4] Optimizing One-time and Continuous Subgraph Queries using Worst-case Optimal Joins. TODS 2021.
|