Query Answering over Contextualized RDF/OWL Knowledge with Forall-Existential Bridge Rules: Decidable Finite Extension Classes

Tracking #: 710-1920

Mathew Joseph
Gabriel Kuper
Till Mossakowski
Luciano Serafini

Responsible editor: 
Bernardo Cuenca Grau

Submission type: 
Full Paper
The proliferation of contextualized knowledge in the Semantic Web (SW) has led to the popularity of knowledge formats such as \emph{quads} in the SW community. A quad is an extension of RDF triple with the contextual information of the triple. We in this paper, study the problem of query answering over quads augmented with forall-existential bridge rules that enable interoperability of reasoning between the triples in various contexts. We call a set of quads together with such expressive bridge rules, a quad-system. Query answering over quad-systems is undecidable, in general. We derive decidable classes of quad-systems, for which query answering can be done using forward chaining. Sound, complete and terminating procedures, which are adaptations of the well known chase algorithm, are provided for these classes for deciding query entailment. Safe, msafe, and csafe class of quad-systems restrict the structure of blank nodes generated during the chase computation process to be directed acyclic graphs (DAGs) of bounded depth. RR and restricted RR classes do not allow the generation of blank nodes during the chase computation process. Both data and combined complexity of query entailment has been established for the classes derived. We further show that quad-systems are equivalent to forall-existential rules whose predicates are restricted to ternary arity, modulo polynomial time translations. We subsequently show that the technique of safety, strictly subsumes some of the well known, expressive techniques such as joint acyclicity and model faithful acyclicity used for decidability guarantees in the realm of forall-existential rules.
Full PDF Version: 

Minor revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Michaël Thomazo submitted on 17/Sep/2014
Minor Revision
Review Comment:

The paper provides motivation for studying conjunctive query answering in a contextualized setting, and studies cases where this can be done by materialization. More specifically, after redefining the chase in this setting, the contributions are:

- the definition of safe, msafe and csafe quad systems;
- the study of the complexity of reasoning for such quad systems;
- how to recognize such quad systems;
- restriction of the problem to the case where bridge rules are range-restricted
- a comparison with similar acyclicity notions in the framework of existential rules (and a proof tha safeness is more general in a sense than MFA)

Compared to the previous version of paper, a lot of shortcomings have been addressed and the quality has largely improved. In particular, the complexity study of the introduced classes has been corrected, and a thorough comparison with similar criteria known in the literature has been added. There are still quite some minor improvements to be done, and two technical points that need correction/better explanation:

- p.17, definition of augmented rules: adding c_c:(y_j, descendantOf,y_j) is very surprising. It is stated in the right column that it is done in order to "maintain also the reflexivity of `descendantOf' relation". However:
- it is stated p.8 that this relation is not reflexive
- it seems to me that if the relation is reflexive, then all safeness test trivially fails as soon as there is a blank node generated (since one can take b = b')
Could you elaborate on this?

- p.20, Th.5.3: the reduction provided in Dantsin et. al is not adaptable here, since it uses predicates of arbitrary arity. It actually seems to me that the hardness result does not hold.

As for the significance for the paper, it could be improved by clearly answering the following questions: why are the introduced safeness notions important with respect to the zoology of previously defined acyclicity notions? Does safeness covers interesting cases that are not covered by MFA, in particular when dealing with contexts? Is csafeness easy to recognize?

The style of the paper could be greatly improved by making shorter sentences and using less commas. At some points (some of them mentioned below), it becomes really tricky to keep track of the sentence.

"Recently, as a solution the aforementioned problem, SW community, adopts the use of a quad, an extension of a triple, as the primary carrier of knowledge": this sentence is quite hard to parse. Maybe a more light use of commas may help?

"A quad c:(s,p,o), thus adds the fourth component": ungrammatical use of the comma

p.2, col1, l-9: "the latest Billion triple challenge datasets has been" --> "have been"

"While reasoning with knowledge...": this sentence is also hard to parse

p.3, col 1, item 1: "Studying conjunctive query answering." --> here a comma, instead of a point
p.3, col 1, item 4: \forall\exists rules: the terms "existential rules" or "tuple-generating dependencies" seem to be more widely accepted than \forall\exists

p.4, col 2 : I have not found any explanation of the acronym "LIR" before (or during) its introduction

p.7 : "Although, we now know how" : ungrammatical use of a comma

p.8, col.1, undecidability results: the original reference from Beeri-Vardi should be cited (A Proof Procedure for Data Dependencies, J.ACM, 84). Moreover, the provided proof is very similar to the proof detailed in the research report associated with: Markus Krötzsch, Sebastian Rudolph, Pascal Hitzler: Conjunctive Queries for a Tractable Fragment of OWL 1.1. ISWC/ASWC 2007: 310-323. I have not checked in detail the two pages and a half of proof.

p.8, col.2, l-14: lemma 3.3 --> Lemma 3.3 (references to specific figures, theorems, are capitalized)

p.8, Def 4.1: x[\mu^{ext(y)}]: this is actually the same thing as x[\mu], or am I mistaken? Moreover, please cut the sentence before the introduction of w. Moreover, announcing Example 4.3 just after the definition may be useful.

p.9, caption of Fig.1 : note -> not
p.10, Ex. 4.6: this example seems to be unnecessarily long at this point. The same result could be obtain by removing r_3,r_4 and r_5. Explicitly writing the origin vectors of b_1 and b_2 may also ease the understanding.

p.14, section 4.1: this proof is very similar to the one of 2-exptime hardness for weakly-acyclic rules (Towards more expressive ontology languages: The query answering problem, Cali, Gottlob, Pieris, Artificial Intelligence Volume 193, December 2012, Pages 87–128), Theorem 5.1

Recognition of the safety conditions: all the acyclicity notions so far have been provided with their complexity of recognition. Maybe adding it here as well would be beneficial

p.19, col.1, par.1, l-2: dChase iterations is --> are

Section 6: the comparison between acyclicity for existential rules and safeness for quad system is somewhat strange. Indeed, acyclicity criteria for existential rules are meant to be "data-independent". In other terms, it ensures that whatever the quads are, a set of bridge rules ensure termination of the chase. This is not the case (nor the intention) for the safeness condition that are proposed here. Maybe expressing clearly this distinction may be useful to the reader.

p.21, col.1,l-1: p_n(x,y) --> p_n(x,z)

p.21, proof of Th 5.5: DP: the use of this abbreviation is not justified by an intense use of the expression "decision problem", and hinders the reading
p.21and elsewhere : s.t. --> such that
p.21: of the from (2): this formula appears on page 4. It would ease the reading to repeat the formula here. Moreover, it would also be clearer to first define (in a separate definition) the transformation, and then to give its properties.
p.25: Bernardo et al. : Bernardo is the first name (it is correctly cited elsewhere)
p.26: "A deterministic algorithm.." : actually, the cited algorithm considers only a subclass of bts, called gbts.
p27: "recently, a more dynamic approach...": the approach is mainly used (at least in the proposed experiments) to detect a *data-independent* termination, by starting from the so-called critical instance.
p.31, proofs of Section 4: the proofs [...] is

Review #2
By Adila Krisnadhi submitted on 08/Oct/2014
Minor Revision
Review Comment:

This is the second version of the paper.

In this paper, the authors focuses on the query answering and entailment problem, through a forward chaining approach, over a collection of contextualized RDF triples (i.e., quads) whereby interoperability between different contexts is enabled through a set of bridge rules which are first-order rules with existential heads. The authors (1) extended the RDF/OWL semantics to a context-based semantics; (2) show some (un)decidability as well as complexity results for a number of variants of quad systems; (3) show the correspondence between quad systems and Datalog+/-.

Compared to the first version, changes in the current version include the following.
(1) Local semantics includes not just OWL-Horst, but also other RDF/OWL semantics that admit reasoning via forward-chaining
(2) Three classes, instead of one, of quad systems are defined for which the query entailment is decidable.
(3) 2ExpTime-completeness of the combined complexity of CCQ entailment, instead of 3ExpTime.
(4) Formal comparison with Datalog+-, which does not exist in the first version of the paper.

I believe the current version is stronger than the first version. The results are now parallel to the similar results in Datalog+-. The additional section providing the both-ways, polynomial reducibility with Datalog+- confirms this. Also, the safety notion superseds MFA from Datalog+-.

Overall, the results seem sound, including the corrections requested by the reviewers for the first version. With some minor corrections below, the paper is worthy of acceptance.

Minor comments/grammatical errors:

p2, left col, 2nd paragraph: In the sentence that starts with "A quad c:(s,p,o), thus adds ... " --> remove comma before the first 'thus' and remove the second 'thus'.

p2, left col, last paragraph: Another benefit --> Other benefits

p2, right col, last paragraph: RDF/OWL knowledge --> RDF/OWL knowledge bases?

p3: the sentence "we extend the standard RDF/OWL semantics ... " may need to be clarified because the OWL standards, as far as I know, define two kinds of semantics: the Direct semantics and the RDF-based semantics. Inferencing for these semantics is in general value-generating (not range restricted), so I recommend some explanation or clarification (both in the intro and/or narration in Section 2) whether the semantics as defined in Definition 2.2 and 2.3 would still apply to both the aforementioned semantics or not.

p7, proof of lemma 3.2, pint (4): there no --> there is no

p8, definition 4.1.: noted originRuleId --> denoted originRuleId

p8, I suggest rewording the phrase 'to _:b' in the definition of childOf since it may make the direction of the relation according to the definition misunderstood as the opposite direction of the way it is used in the rest of the paper.

p9, figure 1, caption: note shown --> are not shown