Review Comment:
SUMMARY:
The paper addressed the problem of flexibly querying heterogenous geospatial (GIS) data sources. For that, the authors describe the system OnGIS, a semantic query broker, which is based on a framework for querying answering (QA) over multiple GIS data sources. It is based on a set of prototypical queries, which are matched to the sources at query time. As a query language, they use a restricted version of GeoSPARQL without optionals (which would introduce non-monotonicity), but introduce as the ontology language DL-Lite instead of RDF(S) (which is the standard). For query decomposition and QA, they extend existing query containment techniques, i.e. the work of Horrocks [1], to (spatial) topological relations, which include RCC8, Egenhofer (DE-9IM), and simple relations. They provide a new method of calculating spatial containment based on "topological relation restriction", which establishes a hierarchy of spatial containment (and equality). Their "simple" algorithm for query evaluation uses query containments to build a lattice of related subqueries, which splits the query at the first level of the lattice. For optimizations, they propose a pruning heuristics to eliminate redundant query subsets. Finally, the provide an example scenario using LinkedGeoData, GeoNames, and a custom KB to illustrate how their approach would be applied in practice.
The main contributions of the paper are:
- The authors give a good overview of GeoSPARQL and the related data model.
- They develop an geo-ontology used with GeoSPARQL, which puts the different topological relation (RCC8, Egenhofer, etc.) into a common framework.
- They give detailed insight into standard query containment technique and suggest that it can be applied to DL-Lite.
- The technique for query containment is extended with spatial containment using topological relation restrictions. The give a good description of their technique, which is novel and interesting.
- They use their technique for an algorithm to calculate a lattice of sub-query containments based on an input query.
- The lattice then is used to evaluate the queries based on the different data sources. Further they refine the algorithm using pruning to eliminate redundant queries.
- Finally the show by an example, how their approach would work in a "real-world" scenario.
ORIGINALITY:
The originality is given, since the authors aim to lift existing algorithms for query containment to the spatial setting. Using a lattice of containments for query evaluation is not entirely new, but an interesting approach for evaluation.
SIGNIFICANCE:
This work is surely significant, if some details are clarified, since there are still not that many approaches and frameworks which aim to combine DL-Lite, GeoSPARQL, and spatial relations with heterogenous data sources.
QUALITY OF WRITING:
The paper is well structured and straightforward to read. However the authors write mainly in the 3rd person, which makes it sometimes hard to read. Some minor issues are below.
TECHNICAL SOUNDNESS:
Correctness of the provided algorithms is only sketched, and we doubt that it is well thought-out. In particular, the authors do not address the specific issues which are important if one deals with DL-Lite. They do not mention anything regarding query rewriting using PerfectRef or similar algorithms, nor do they address how to deal with existentials (unknown individuals) on the right-hand side of inclusion assertions. Things are getting even more complicated if SPARQL and spatial relations are used for QA over DL-Lite.
MAJOR ISSUES:
The topic of the paper and the main ideas are appealing. We believe that this would make a good conference paper but is not yet mature enough to be published in SWJ. For acceptance following major need should to be fixed (sorted by severity):
(1) Details in technical soundness.
(2) Since their approach is more on the practical side, we would expect at least a thorough experimental evaluation of the techniques and algorithms. Also the authors should provide mored details on the implementation itself. We believe, scalability is an important challenge, since they deal with huge datasets, e.g., LinkedGeoData. We doubt that the proposed algorithms scale, since they the entire KB has to be grounded to determine the containment of two sub-queries.
(3) Related work is not thoroughly covered, beside the above mentioned papers, recent work regarding QA with DL-Lite with RCC8 (see Möller [2]) and DL-Lite with spatial relations (see Eiter [3]) is not discussed. Besides Parlament, there are other GeoSPARQL engines, which need to be discussed, i.e., Strabon [4].
(4) We believe that the title is somewhat misleading, since the authors do not focus on providing a broker framework for a geospatial, but mainly write about query containment and evaluation techniques. If the focus is one the broker framework, we would appreciate more details on the broker architecture, the matching of the query templates to the actual query, and the techniques to deal with distributed data sources (e.g., traditional GIS web services).
(5) Query containment is an important technique for query optimization initially developed for relational databases (see Chandra [5]) and extended to DL-Lite (see Bienvenu [6]). However, there are other techniques available, i.e., tree-decomposition (see Maier [7]) which are often used to split the input query in sub-queries and building an evaluation strategy (using heuristics). It might be interesting to compare the different techniques side-by-side.
(6) In the introduction, a section on the general setting and the motivation of this work is missing.
MINOR ISSUES:
- Some sentences sound "unconventional", for example:
- "we use technique requiring ..." -> "Our technique requires"
- "The most important is the data itself, given in a form of serialization" -> "Serialized data is important, ..."
- "One part of supporting semantics of GeoSPARQL" -> "The semantics of GeoSPARQL is lifted to ..."
- "if it is semantically equivalent to a query", please clarify;
- They introduce the definition of DL-Lite, but miss the definition for the value-domains (see DL-LiteA);
- Please explain, why filters do not posses OWA?
- How does TR(OP) relate to the encoding matrix of the DE-9IM?
- The title "Lattice" seems a bit short;
- Fig. 5 needs more explanations;
- Listing 2 - 4 could go into the Appendix.
REFERENCES:
[1] Horrocks,I., Sattler,U., Tessaris,S., Tobies,S.: How to decide query containment under constraints using a description logic. Logic for Programming and Automated Reasoning, pp. 326–343 (2000)
[2] Özçep,Ö.L., Möller,R.: Scalable geo-thematic query answering. In: Proc. of ISWC 2012.LNCS, pp. 658–673 (2012)
[3] Eiter,T., Krennwallner,T., Schneider,P.: Lightweight spatial conjunctive query answering using keywords. In: Proc. of ESWC 2013. pp. 243–258 (2013)
[4] Kyzirakos,K., Karpathiotakis,M., Koubarakis,M.: Strabon: A semantic geospatial DBMS. In: Proc. of ISWC 2012. pp. 295–311 (2012)
[5] Chandra,K., Merlin,P.M.: Optimal implementation of conjunctive queries in relational data bases. In: Proc. of STOC 1977. (1977)
[6] Bienvenu,M., Lutz,C. Wolter,F.,: Query Containment in Description Logics Reconsidered. In: Proc. of KR 2012 (2012)
[7] Maier,D.: The Theory of Relational Databases. Computer Science Press (1983)
|