Query Answering over Contextualized RDF/OWL Knowledge with Expressive Bridge Rules: Decidable Classes

Tracking #: 471-1663

Mathew Joseph

Responsible editor: 
Bernardo Cuenca Grau

Submission type: 
Full Paper
The recent outburst of context-dependent knowledge on the Semantic Web (SW) has led to the realization of the importance of the quads in the SW community. Quads, which extend a standard RDF triple, by adding a new parameter of the `context' of an RDF triple, thus informs a reasoner to distinguish between the knowledge in various contexts. Although this distinction separates the triples in an RDF graph into various contexts, and allows the reasoning to be decoupled across various contexts, bridge rules need to be provided for inter-operating the knowledge across these contexts. We call a set of quads together with the bridge rules, a quad-system. In this paper, we discuss the problem of query answering over quad-systems with expressive bridge rules using a contextualized OWL-Horst semantics. We present various decidable classes of quad-systems on which query answering can be done using forward reasoning. Besides undecidability of the most general case, both data and combined complexity of query entailment has been established for the various classes derived.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Michaël Thomazo submitted on 23/Jul/2013
Review Comment:

This article considers the conjunctive query answering problem over contextualized RDF/OWL knowledge base. The inter-operability between different contexts is achieved thanks to bridge rules. The work is motivated by an increase of interest by the Semantic Web community for contextualized knowledge, and deals with an intensively studied problem (conjunctive query answering when taking ontologies into account). The problem is stated as follow: does a quad system (which is a pair of a set of quads and a set of bridge rules) entail a contextualized conjunctive query answering?

More specifically, it:
1) provides the reader with a semantics for conjunctive query answering over contextualized knowledge base, together with a procedure (a generalization of the standard chase) to reason with respect to that semantics
2) shows that conjunctive query answering is undecidable when considering arbitrary quad systems
3) propose the class of safe quad systems, for which dChase halts, analyses the complexity of conjunctive query answering with respect to this class, and proposes an algorithm for recognizing safe quad systems
4) restrict further the class of safe quad systems, defining Horn quad systems, and provides the corresponding complexity analysis

Some general comments about these contributions:
1) the proposed semantics is well explained and quite natural. A more detailed discussion about the choice of OWL-Horst would be very welcome.
2) this is not surprising at all, given that contextualized query answering with bridge rules as considered in the paper generalizes conjunctive query answering with Datalog+/- rules, as noted by the author. Note that [AIJ11] provides such a proof using only unary and binary predicates (a motivation of the proof provided by the author of the currently reviewed paper is that it uses at most ternary predicates)
3) - the author mentions that bridge rules can be ``seen as'' Datalog+/- rules. Does it mean that he has built a reduction from contextualized query answering with bridge rules to conjunctive query answering with Datalog+/- rules? Both cases, it would be interesting to compare more precisely safe quad systems and join-acyclicity, as well as model summarizing acyclicity [KR12]. I suspect MSA to be a more general criterion than safe quad systems (at least on the ``intersection'' between quad-systems and datalog+/- rules, if not definable on more expressive quad-systems).
- I believe the proof of Theorem 4.9 to be incorrect. See below for more precise comments
4) I find the name ``Horn quad-systems'' quite misleading, since other quad-systems can also be seen as Horn (Horn Description Logics support value invention)

Overall appreciation:

The paper considers a currently intensively studied problem in a more general framework, where context are added to triples. Most of the results seem sound -- but not very surprising either, given similar results on Description Logics or Datalog +/-. The paper will greatly improve its significance with a better comparison with this work: can results from these fields be porter to contextual query answering in a generic way? If yes, how? If not, why? Given the strong similarity between results and methods used in both cases, I think this is an important question to be tackled. The 3-EXPTIME-hardness proof for safe-quad systems should be corrected.

Detailed comments:

Abstract: quads [...] thus informs --> thus inform
Header: Joseph et al. --> there is a single author
P.3: left column, l.2 : the set C = U \cup B \cup L are --> is called
p.5: horn first-order formula --> Horn first-order formula
p.5: Let M be the set .... to the set of constants C: seems complicated to me, and \mu does not appear in the text anymore. Let M be set the of all functions from the set of variables X to the set of constants C
p.5 : missing parenthesis in the definition of r(Q_C)
p.6: Glim et al. --> Glimm et al.
p.6, left column, l.-4 : with out -> without
p.6., beginning of Section 4: "we define a more general fragment of quad-systems": more general than what?
p.8 : right column, bottom: descedant -> descendant
p.11: since the number of storage cells is doubly exponentially bounded, we first... : I do not see any link between the "since part" of the sentence and what comes after
p.11: right column, bottom: no two alpabets --> no two letters
p.12: simulation of a Turing machine. I believe that the quad system provided by the author is a correct simulation of a Turing machine. However, the author does not prove that this quad system is safe. In particular, it seems that the blank nodes created by applications of rules simulating transitions break the safe property: during the simulation of an execution of a Turing machine, a blank node b_1(of that appear only in context c_n) will be created. To simulate the next transition, x_0 will be mapped to this blank node b_1, creating a novel blank node b_2 in the same context. We will thus have b_2 descendant of b_1 created in the same context, which violates the safeness of the proposed quad system. Am I missing something?
p.16: drawbacks mentioned for acyclicity conditions on existential rules are also true for safe-quad systems (some unsafe quad systems still admit a finite dChase). It would thus be necessary to compare safe quad systems to known acyclicity conditions
p.16: there is a confusion between sets of rules enjoying the bounded treewidth property and first-order rewritable sets of rules. Algorithms for the former make a step of materialization, while exclusive query rewriting approaches are possible only for first-order rewritable sets.

[AIJ11] Jean-François Baget, Michel Leclère, Marie-Laure Mugnier, Eric Salvat: On rules with existential variables: Walking the decidability line. Artif. Intell. 175(9-10): 1620-1654 (2011)

[KR12] Bernardo Cuenca Grau, Ian Horrocks, Markus Krötzsch, Clemens Kupke, Despoina Magka, Boris Motik, Zhe Wang: Acyclicity Conditions and their Application to Query Answering in Description Logics. KR 2012

Review #2
By Adila Krisnadhi submitted on 28/Oct/2013
Minor Revision
Review Comment:

This paper focuses on the query answering and entailment problem, according to OWL-Horst semantics, over a collection of contextualized RDF triples (i.e., quads) whereby interoperability between different contexts is enabled through a set of bridge rules which are first-order rules with existential heads. The author adapted OWL-Horst semantics to these quad systems akin to the semantics of Distributed Description Logics. For this formalism, the author then:
(i) showed the undecidability of conjunctive query entailment (CQE) of unrestricted quad systems;
(ii) proposed a fragment called safe quad systems for which combined complexity is 3ExpTime-complete based on a certain safety criteria;
(iii) identified Horn fragment of safe quad systems for which combined complexity of CQE is ExpTime-complete and its more restricted variant whose combined complexity of CQE is NP-complete;
(iv) showed that data complexity is P-complete for safe quad systems and their fragments mentioned in (ii) and (iii) above.

This paper is definitely relevant to the journal. I also did not spot any problems in terms of correctness of the results. Discussion on related work is adequate. The results contain novel contribution for theoretical foundations of contextualized reasoning over RDF triples. In general, the paper merits an acceptance but some improvements are needed as discussed below.

One main issue which can be seen as a weakness of this paper is the safety criteria which is formulated as a procedural definition based on the procedure with which decidability is established. Essentially, a quad system is safe iff its dchase does not contain two different blank nodes whose origin-context is the same such that one is a descendant of the other. That is, the dchase would always terminate because dchase never performs a recursive generation of blank nodes using other blank nodes with the same origin-context. Then, the author employed dchase itself as a decision procedure for query answering over safe quad system. On a first glance, this would seem trivial since one can almost say that this is the same as saying that reasoning over a language that satisfies a conditon C is decidable where C is the statement "the reasoning procedure terminates". I think the author should clarify in the paper whether this is indeed trivial. Moreover, the author should also clarify that deciding safety is possibly not easy since it is done using the triple-exponential dchase procedure itself, and this would distinguish this framework from other earlier approaches using conditions on dependency graph which, I believe, are easier to check. It would also be much more useful if there is a discussion that clarify whether a safety criteria that is independent of the dchase procedure is very difficult or impossible to obtain.

I also suggest the author to include a discussion on the implication of the safety criteria from user's point of view. More precisely, if I am a user who wishes to model some domain knowledge using this contextualized framework, then I would most likely not know whether my knowledge base is safe or not in advance. Using the proposed idea from the author, the (only?) way to check whether my knowledge base is safe or not is by running dchase on it and there is no simpler guidance that I can use while developing the knowledge base to avoid unsafety in the first place.

On the other hand, I would also say that making the definition procedural like above does not necessarily mean that everything else does not have any merit because the author was able to obtain the remaining, more precise, complexity results (not just decidability). The author should also make a more pointed discussion on this aspect.

The next issue is regarding the complexity of the Horn-fragment. Isn't it the case that this fragment can be directly translated into Datalog (ground facts from the triples and rules from the bridge rules)? If yes, I suspect that this translation is sufficient to derive ExpTime-completeness of combined complexity and PTime-completeness of data complexity without going through analysis on dchase; (the author cited the relevant paper from Dantsin, etal, but only for ExpTime-hardness of combined complexity). If not, it would be better if the author clarifies why using only such a translation would not work.

Another issue is related to inconsistency handling. From the introduction, the author seemed to alluded that with his framework, knowledge can be fed into separate reasoner (based on context) so that it prevents inconsistencies which may happen if the reasoning is done on the union of all triples in all the contexts. First of all, "to prevent incosistency" might not be a correct term here (inconsistency is a property of the knowledge base, not the reasoning approach). I would think it is more appropriate to say "to ensure reasoning can still be performed although union of all triples is inconsistent". Moreover, there is no section in the paper that addresses this issue more precisely. From the presentation, it is rather difficult to see whether dchase procedure actually handles this.


Section 4, p13, right column, the paragraph between the statements about condition of unSafeTest(Q,R) and R^{safe}(Q_C) should be rewritten using shorter sentences (i.e., avoid using a long sentence with too many subsentence which makes it difficult to read).

Section 6, p16, right column, in the 2nd paragraph with heading "Data Integration"
"... employs the super weak-acyclicity to ensure decidability. It can straightforwardly noted that ..." ===> It can be straightforwardly noted that
"... This is because when a quad-system is unsafe, requires skolem blank-node ... " ===> This is because when a quad-system is unsafe, it requires Skolem blank-node/