Sound, Complete and Minimal UCQ-Rewriting for Existential Rules

Tracking #: 569-1775

Authors: 
Melanie Konig
Michel Leclere
Marie-Laure Mugnier
Michaël Thomazo

Responsible editor: 
Guest Editors RR2013 Special Issue

Submission type: 
Full Paper
Abstract: 
We address the issue of Ontology-Based Data Access, with ontologies represented in the framework of existential rules, also known as Datalog+/-. A wellknown approach involves rewriting the query using ontological knowledge. We focus here on the basic rewriting technique which consists of rewriting the initial query into a union of conjunctive queries. First, we study a generic breadth-first rewriting algorithm, which takes as input any rewriting operator, and define properties of rewriting operators that ensure the correctness of the algorithm. Then, we focus on piece-unifiers, which provide a rewriting operator with the desired properties. Finally, we propose an implementation of this framework and report some experiments.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Giorgio Orsi submitted on 15/Jan/2014
Suggestion:
Minor Revision
Review Comment:

The paper addresses the problem of conjunctive query answering under Datalog rules with existential quantifiers, also known as tuple-generating dependencies (TGDs). The technique adopted for query answering is by backward-chaining rewriting of the conjunctive queries given as input w.r.t. TGDs of the theory. The class of TGDs considered has the property of being a finite-unification sets (FUS), while the target language for the rewriting is that of unions of conjunctive queries (UCQs).

The problem of conjunctive query answering under constraints is a well-known and long-standing problem in knowledge representation and database theory. It is suited for the semantic-web journal.

In my opinion, the main contributions of this work are:

- A sound and complete rewriting algorithm for conjunctive query answering under FUS of TGDs.
- The formal definition of the concept of rewriting operator and associated properties such as prunability.
- The formal definition of some useful properties of a rewriting sets such as covering and minimality.
- The proof that the rewriting operator (based on piece unifiers) used in the proposed algorithm guarantees soundness, completeness, and minimality.

General Comments:

The paper is, in my opinion technically sound and the results are relevant to the semantic-web and database communities. I believe that the paper can be accepted provided that the following comments are addressed.

In the discussion of the related work at page two, some related work is missing. Among the forward-chaining approaches, the work of [2] is not discussed. Also, among the Datalog rewriting techniques [3] is missing. I would substitute [VSS12] with [4].

Page 2, 3rd paragraph. “While previously cited work focuses on specific rule sublanguages, we consider general existential rules.” —> The work of [5,6,GOP11] also work for general existential rules. In particular, in [6,GOP11] the optimisation techniques are specific to a restricted class of TGDs while the rewriting algorithm is not. The foundational work of [5], also has a notion of minimality and works for a generic class of TGDs and equality-generating dependencies (EGDs).

It would be really helpful to understand the relationship between fus and fo-rewritability. It would be an interesting result to know whether they are equivalent notions or not.

It would be helpful to relate the notion of minimality used in this paper with the one of [5].

it would be interesting to understand the relation between piece-unifiers and the notion of factorisation used, e.g., in [GOP11].

Page 3, contributions. “This algorithm is generic in the sense that it is not restricted to a particular kind of existential rules nor to a specific rewriting operators.” —> I don’t understand this. Assuming that the algorithm is sound, it has to be restricted to fus if you want completeness and termination right? Otherwise it means that that you can decide fus using your algorithm. Is there something I am missing here?

Page 4, Definition 3, 4. As far as I can see this is just containment and minimality of two UCQs. I would avoid to redefine the notions.

Page 18. After example 7. I would say again that u_1 and u_2 are substitutions.
Section 7 is called implementation but is only giving a list of properties, definition and an algorithm. I would expect an implementation description here. I suggest to move Section 7.1 into a section called something like “rewriting algorithm” and rename the section “Implementation” into “Experiments” unless the authors want to briefly describe the implementation of the system.

Section 7.2. This section is a bit disappointing. No comparison evaluation is provided with existing systems, although there’s plenty of implementations out there that can be used and that work for the ontologies of table A,S,U.V. I understand that a comparison with DL systems would provide little information due to the restrictions on the languages, but what about [GOP11]? Also, there’s no description about the characteristics of the system used for testing, e.g., memory, CPU, setting, etc. It would also be helpful to analyse the behaviour of the rewriting algorithm w.r.t. the characteristics of the ontologies and of the queries given as input. E.g., how does it scale with increasing numbers of rules in the theory and atoms in the queries? Does the type of axioms in these ontology affect how the search-space of rewritings is explored? I believe this section should be sensibly improved.

Minor comments:

Page 1, line 5: “[..] where data are incompletely represented [..]” —> This reads somehow weirdly. I would refer to the problem of query answering in incomplete databases and cite something like [1].

Page 2, first paragraph. “With respect to lightweight DLs, these decidable rule fragments are more powerful and flexible.” —> I would not be so drastic. There are pros and cons of DLs as there are for rule-based languages. E.g., You rarely need more than DL-like languages for conceptual modelling, i.e., UML and E/R diagrams.

Page 3, first paragraph: (join-)sticky —> sticky(-join).

Page 4. Last paragraph. “Any rule can be decomposed into an equivalent set of atomic-head rules by simply introducing a new predicate for each rule.” Indeed this is true but this normalisation also has a negative impact on the performance of the rewriting (as also noted in [PUHM09]), as it blows the number of rules and intermediate rewriting steps. This is why [PUHM09] and [3] try to avoid this.

Page 8. Second paragraph. “it is well known [..] by removing redundant atom” I would add a citation here.
Please make the bibliography uniform, e.g., the proceedings of KLMT12 and SM96

References:

[1] Andrea Cali, Domenico Lembo, and Riccardo Rosati. 2003. Decidability and Complexity of Query Answering Over Inconsistent and Incomplete Databases. In PODS. 260–271.

[2] Nicola Leone, Marco Manna, Giorgio Terracina, Pierfrancesco Veltri. Efficiently Computable Datalog∃ Programs. In KR 2012.

[3] Giorgio Orsi, Andreas Pieris: Optimizing Query Answering under Ontological Constraints. PVLDB 4(11): 1004-1015 (2011)

[4] Tassos Venetis, Giorgos Stoilos, Giorgos Stamou. Query Extensions and Incremental Query Rewriting for OWL 2 QL Ontologies. In JoDS, 2013.

[5] Alin Deutsch, Lucian Popa, and Val Tannen. 2006. Query reformulation with constraints. SIGMOD Rec. 35, 1 (March 2006), 65-73.

[6] Andrea Calì Georg Gottlob Andreas Pieris Query Rewriting under Non-Guarded Rules. AMW 2010.

Review #2
By Stijn Heymans submitted on 23/Jan/2014
Suggestion:
Minor Revision
Review Comment:

The paper's context is queries over existential rules. In particular, they divide this paper in 2 parts:
- a definition of a general algorithm depending on a so-called rewriting operator (rewriting of queries) that rewrites queries over existential rules and facts into queries over just facts. Moreover they define properties (soundness, completeness, and prunability) that have to be satisfied by such an algorithm to be correct.
- They give an example of a specific rewriting operator that satisfies the identified properties.

The paper is well-written and the topic is of interest. I checked most of the results and they seem correct. As such I'll recommend the paper for an accept with some revisions (as described below). My main quibble is probably with the Evaluation (see my last point before the Typos), and I'd like to see that clarified.

- The introduction would benefit from a small running example. The first example occurs only on page 4 after a significant amount of detail was given already (on the positive side, example 1 is very helpful)

- Paragraph after Def 2 (page 6): "then Q2 is useless". Can you clarify "useless" there, it's not helpful as part of an argument there.

- Paragraph before Corollary 1: "core (notion introduced for graphs)" -> please add a reference for that.

- 2nd Paragraph after Def 7 (starting with "Algorithm 1 performs"). Could you think of rewording (or introducing differently) the part on Q_C cup {q} and Q_c cup {q'} being covers..I think the confusing part here is that this comes quite abrupt and before the algorithm (it might be as simple as moving the text after the algorithm, then again that might make the algorithm harder to understand).

- Same place. "breadth-first" is made a prominent part of the naming of the algorithm "A generic breadth-first rewriting algorithm". Is there a depth-first variety that you have in mind? How would that be different. Alternatively, I think you can easily drop the "breadth-first" qualifier everywhere.

- Same place. Would it theoretically make sense to just have Q_i = cover(rew(Q_{i-1})) and to say that the algorithm computes a fixed point that will constitute the cover? Or in other words, do the queries that are already rewritten new thing? If they don't you can just take them with you in each Q_i? In practice what you wrote is an optimization then. And maybe it makes sense of having part of this discussion from a fixed point perspective (for example, that such a fixed point exists would depend on cover(few) being monotonic; that it terminate would depend on other conditions of the operator).

- Def 8 (prunable). The paragraph after just rehashes the definition in words; it might be valuable to give some more intuition on the prunable concept.

- Theorem 3. Regarding the fus rules. A crucial result you have is Theorem 3, so how widely applicable is this: how common are fus rules? You might want to point to the evaluation if you mention it there.

- Before Property 5 (the first half of page 13): I found the partition/addmissible partition rather confusing. You write that it simplifies "notions and proofs". However, does it mainly simplify proofs, because I do not have the impression that it simplifies notions. For example, Def 12 which takes about normal substitutions again is the first point where I feel it gets clear again (in general the paper does a good job explaining al these concepts though!)

- Nice result in Theorem 5!

- From Def 14 till the start of Section 7: this is all rather dense (e.g. Def 14, 15, 16 in a row) or the long proof for Property 10. I'd consider shortening and clarifying this and referring some stuff to the Appendix.

- Section 7.2: Experiments. I have the feeling this does not motivate the actual use of query rewriting for existential rules enough, especially since you use your existential rules from a translation from the DL DL-Lite_R. How does your rewriting technique for queries w.r.t existential rules compare to existing evaluations of queries w.r.t Dl-Lite_R (if there are any, I hope there are). If you are worse, you have to argue why you did not pick existential rules that cannot be expressed by DL-Lite_R such that you provide query answering for a more expressive KR framework than just DL-Lite_R.

TYPOS

- Def 2, page 6: "if for all fact" -> facts
- Paragraph before Def 10 "must processed together": must be processed
- Theorem 4: off there is Q' a piece-rewriting: something is off, maybe "there is *a* Q', a ..."
- Overfull box that disturbs in Example 6
- Paragraph before example 8 (page 18): "as the next examples show it" -> leave off "it"

Review #3
Anonymous submitted on 25/Jan/2014
Suggestion:
Major Revision
Review Comment:

The paper studies ontology-based data access (OBDA) in a general framework of existential rules (datalog+/-). Both theoretical and practical aspects are addressed. In particular, a generic algorithm based on the notion of a rewriting operator is presented and a particular rewriting operator (most general piece-unifier) is proven to satisfy condition required for sound and complete rewritings. One of the interesting results (which to my knowledge has never been stated before) is that, if a finite UCQ rewriting exists, then there is a unique minimal UCQ rewriting. Turning to practice, the authors then study specialized single-piece unifiers and their aggregations and present some preliminary evaluation of the approach.

This problem of OBDA is certainly relevant to the journal (and its special issue). Existential rules generalize light-weight description logic (which have become quite popular in recent years) and the obtained results are novel and interesting. The paper is very well-structured and mostly well-written (detailed comments can be found below).

There is, however, a concern that the rewriting operator based on aggregated unifiers may not be prunable. Example 9 has a single rule, $b(x) \to p(x,y)$. Consider now the query from Example 9 and two rules, $b_1(x) \to p(x,y)$ and $b_2(x) to p(x,y)$. Similarly to the argument of Example 9, applying two single-piece unifiers is not enough (because their results will be pruned out), while the simultaneous application of the unifiers for two rules will result in a CQ that cannot be pruned (and thus, should be included in the final rewriting).

general remarks
===============

The many different counters (examples, properties, theorems, corollaries, lemmas etc.) make it quite difficult to locate referenced fragments of text --- would the authors consider using fewer counters?

It is somewhat confusing (in particular, in Introduction) to use the term "rewritings" for individual elements of a UCQ (usually it is reserved for the whole UCQ). This usage of the term "rewriting" quite logically then leads to "a set of rewritings" (on p 2), which is then replaced by a "rewriting set" (for the whole UCQ) starting from Section 3. Would the authors consider consistently using "rewriting set" for the whole UCQ and "elements of the rewriting set" for individual CQs?

Notation for substitutions in Example 1 is not explained. And would it not be more intuitive to use a function-like notation, e.g., \{ u \mapsto x, v \mapsto y }, instead?

Using A \geq B to say that there is a homomorphism from A to B is a bit unexpected -- one would think that if A can be homomorphically mapped to B then A is smaller. I understand that it is related to \leq used similarly for covers, partitions, etc. (cf. footnote 3 on p 13), but I find it rather counter-intuitive.

The notion for sets of queries introduced after Definition 7 could be used to simplify Definitions 6 and 7 (one can always start from a set of queries).

The transition from a set of rules in Sections 1-4 to a treatment of a single rule in Section 5 is unclear -- in particular, the piece-based rewriting operator is never defined formally.

It might make more sense to define the aggregation of a set of rules (Definition 16) first (which is simple enough to understand) and then compatible piece-unifiers and aggregated unifiers.

The purpose of the remark in the last paragraph on p 20 is unclear.

The three items on the bottom of p 24 appear to compute the set of all subsets of U_1 -- why not write it then?

Reference to [GOP11] as a source of information on the ontologies used in the experiments is a bit strange: they all appeared in

H. Pérez-Urbina, B. Motik and I. Horrocks: A Comparison of Query Rewriting Techniques for DL-lite. Description Logics 2009

detailed comments
=================

abstract: "takes as input any rewriting operator" -> "takes any rewriting operator as a parameter"

p 1, par 1: are represented completely

p 1, par 2, facts --- or data --- and an ontology (a set of existential rules)

p 2, par 1: synthesis -> summary

p 2, par 2: schmematized -> illustrated

p 3, par 1: if the answer to Q_2 in F is positive *then* so is the answer to Q_1

p 3, par 1: sticky(-join) (and a missing space before)

p 3, par 2: We call rewriting operator ... -> A rewriting operator is a function

p 3, last line: unicity -> uniqueness

p 4, par 1: halting -> termination

p 4, par 2: We call piece a minimal subset -> A piece is a minimal subset

p 5, par 1: *An* interesting point

p 5, par 5: s.t. -> such that (and elsewhere, e.g., Definitions 2, 9)

p 6, Definition 2: no brackets are needed around R,F\models Q (twice)

p 6, Definition 2: what is the order of the components in a KB: is it F,R or R,F?

p 6, above Definition 3: "covers" -> "cover"

p 6, Example 2: move "See also Figure 2" to the end of the first sentence

p 7, Example 3, last sentence: than -> that

p 7, above Theorem 1: no matter of -> irrespective of

p 8, above Corollary 1: By the remark above and Theorem 1, we obtain

p 8, Section 4: this operator satisfy

p 8, footnote 2: The notion of a finite unification set

p 8, footnote 2: , as recalled in Section 5, -> (see Section 5)

p 8, footnote 2: the two definitions are equivalent

p 9, Definition 7: We denote the set of k-rewritings of Q by rew_k(W,R)

p 9, above Algorithm 1: subsequently specified -> specified below (or subsequently)

p 10, Proof of Lemma 2: wrong font for rew_i

p 12, above Definition 10: we call cutpoints the variables ... -> cut points are the variables ...

p 13, par 1: We can thus assign with u a partition -> Thus, u can be associated with a partition

p 13, par 1: Conversely, given a partition ..., we can consider a substitution u obtained by

p 13, par 1: let { e_1, \dots, e_k } be a -> if { e_1, \dots, e_k } is a class ... and e_i is a selected element then, for all ...

p 13, par 2: The sentence ending with "until stability" is unreadable

p 13, above Property 5: why is it "immediate"?

p 13, Definition 11: the "i.e" bit item 1 is a repetition of the definition (should be removed)

p 13, Definition 11, item 2: "note sep(Q'), the variables" -> denote by sep(Q') the variables

p 13, Definition 11, item 2: it would be better to place definitions of separating variables above (otherwise, too much to digest in a single definition)

p 13, Definition 11, item 3: no need to duplicate the definition of a substitution associated with a partition

p 13, Definition 11, after item 3: We call cupoitns, and note cutp(\mu), the variables -> Cutpoints of \mu, denoted by cutp(\mu), are the variables

p 15, Theorem 4: basically is too informal

p 15, proof of Property 6: not clear what "too" is meant to emphasize in the last sentence

p 16, item (ii): wrong font for terms()

p 18, par 1: less rewritings -> smaller rewritings (or rewritings with fewer components)

p 18, above Example 8: the next *two* examples

p 20, Definitions 14 and 15 share a large proportion -- should they not be merged?

p 22, above Property 11: remove "it holds that"; corollary of the *following* property

p 24, above Algorithm 2: if these atoms are in A *then* Q'

p 25, par 1: were led -> were performed

p 25, last par: can save much with respect to both -> can considerably reduce both

p 26, par 1: are currently lacking -> are currently not available

p 27, [LTW09]: el -> EL

p 27, [RMC12]: DL-lite -> DL-Lite