Optimizing Tableau Reasoning: a Prolog-based Framework

Tracking #: 2202-3415

Authors: 
Riccardo Zese
Giuseppe Cota

Responsible editor: 
Bernardo Cuenca Grau

Submission type: 
Full Paper
Abstract: 
One of the foremost reasoning services for knowledge bases is finding all the justifications for a query. This is useful for debugging purpose and for coping with uncertainty. Among Description Logics (DLs) reasoners, the tableau algorithm is one of the most used. However, in order to collect the justifications, the reasoners must manage the non-determinism of the tableau method. For these reasons, a Prolog implementation can facilitate the management of such non-determinism. The TRILL framework contains three probabilistic reasoners written in Prolog: TRILL, \trillp and TORNADO. Each one of them uses different approaches for probabilistic inference and handles different DLs flavours. Our previous work showed that they can achieve sometimes better results than state-of-the-art (non-)probabilistic reasoners. In this paper we present two optimizations that improve the performances of the TRILL reasoners. In the first one, the reasoners build the hierarchy of the concepts contained in a knowledge base in order to quickly find connections among them during the expansion of the tableau. The second one modifies the order of application of tableau rules in order to reduce the number of operations. Experimental results show the effectiveness of the introduced optimizations. All systems can be tried online in the TRILL on SWISH web application at \url{http://trill-sw.eu/}.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 30/Jun/2019
Suggestion:
Major Revision
Review Comment:

The paper describes two optimizations of existing Prolog-based description logic reasoners for computing justifications and probabilistic inference. This "TRILL framework" was recently shown to outperform many classical DL reasoners on medium ontologies. The first optimization pre-computes parts of the class hierarchy of the ontology, while the second implements a queue-based approach for applying tableau rules on newly generated assertions in a specific order. The evaluation shows that this achieves a significant speedup for complex tasks.

The paper is quite well-written, but the optimizations are not very original (apart from the fact that they are implemented in Prolog), and the evaluation, while promising, requires improvement.

The assertion-queue optimization is a special case of one that was already described for Pellet. The pre-computation of the class hierarchy is also similar to what other reasoners do, although in the TRILL framework it is implemented in a more general way (for example, including complex in addition to atomic classes). Since the optimizations follow existing ideas, I don't think that it is necessary to present their implementation in Prolog in such detail (and the old version is anyway described in other papers). The listed code snippets refer to many undocumented predicates, and they raised a lot of questions that were not answered in the text (for details see below). As the paper can hardly contain the whole code, I think the better option is to show as little code as possible.

The evaluation was designed to be similar to the one in [14,17], which seems like a good idea. However, given the different scopes of these papers, I don't think that so many comparisons are necessary. For example, since the optimizations are the same for all three reasoners in the TRILL framework, and TORNADO is overall the most efficient, it would be enough to present only the results for TORNADO. Moreover, since the optimizations only modify the classical tableau reasoning, and the probability computations remain exactly the same between the two versions, Section 6.2 is also not necessary. Test 2 could also be removed, since it only shows that for small ontologies only TRILL^p (which is anyway the slowest overall) can profit from the optimizations. Test 3 is the only one in Section 6.1 that shows a real speedup, and only for artificially constructed ontologies. In general, I think this is OK, but it is better to use realistic ontologies; for this reason, it would be good to see a non-probabilistic version of Test 5, to show how the optimizations scale in "real" ontologies. Including more (larger) ontologies would of course be better.

The utility of Fig. 1 for people who do not already know the tableaux rules is questionable. Many terms such as "indirectly blocked", "S-neighbour", "safe" are not defined. In the \forall-rule, "\tau(S,)" may not be defined, since the connecting edge may be with label S^-. In the \forall^+-rule, "R\sqsubseteq S" may not be an axiom from the ontology, because the inclusion may hold due to an axiom "R^-\sqsubseteq S^-", or trivially in case S=R. In the \geq-rule, "\neq(b_i,b_j)" is missing a dot, in the definition of \tau it should be "\geq n S.C" (two times), and "\tau(S,)" should be removed. In the \sqcup-rule, the mappings \tau in the different graphs G_i should also be distinguished by indices. Finally, it is not clear why the "choose"-rule from [18] is missing.

Smaller comments:

p2: SHIQ allows number restrictions only over simple roles.

p3: "T is initialized with a single completion graph G_0" -> This graph is called "G" two paragraphs earlier.

p3: "blocking [24]" -> The reference seems to be wrong.

p3: "This approach is correct and terminating for the DL SHI" should be reformulated. Lemma 2.4 in [16] is independent of a specific logic.

Section 2.3 could be removed, as it is not relevant for the paper. Otherwise, the definition of the formula f_K(X) would require more explanations. Moreover, the BDD representation does not belong in this section, as it is not probabilistic.

Fig. 3 can be removed, as this paper is not about user interfaces.

Fig. 4: The use of "modify_ABox" is not clear. How can the ABox be updated based only on A0, C, Ind, and Expl? Since "modify_ABox" is used in both the or_rule and the unfold_rule, it needs at least one additional parameter saying which rule is supposed to be applied to obtain the modified ABox.

Fig. 5: Why is "Ax" computed, but not used for expanding "Expl"?

Fig. 7 does not help to explain the new architecture. "KB Utilities" could be anything.

p9: Why is "classesName" needed? All class names also occur in the dictionary "classes".

p9/10: "explanations" seems to be a single list for the whole hierarchy. Shouldn't it contain a mapping from each edge to the explanations for this edge? What does "add_subclass_expl" do exactly?

Fig. 10: "are_subclasses_int(KB0,PC,PC1)" -> "PC" and "PC1" should be swapped

p10/11: The simple parts of the hierarchy initialization are given as code, but the interesting parts (merging nodes, adding links between existential restrictions) are only described in the text. How can the "classes" dictionary express two classes belonging to the same (merged) node?

Fig. 12: Why is "setting_trill" used here, while in Fig. 6 it was simply "[...]"? The content of this figure is nearly the same as before, except for the additional parameter "EA".

p13: "update_queue/3" -> "update_queue/2"?

p13: "(1) collecting all the newly added assertions" -> How can "update_queue" find this information without access to "Tab1" (Fig. 11)? Wouldn't it be easier if the rules would add the new assertions to the queue directly?

p14: If "V is the set of [...] concepts and roles", why does one need \rho, which "is a mapping between elements in V and concepts (roles)".

p14: "concepts to be both nominal and complex" is unclear.

p14: "TRILL systems TRILL systems"

p15: "these assertions are added at the end of the queue, and therefore possibly after other assertions that may match non-deterministic rules" -> This seems to contradict p13: "update_queue/3 distinguishes between concepts that matches with non-deterministic rules [...] and concepts matching with deterministic rules."

p15: I would suggest to simply say "old" and "new" instead of "prev." and "hier.".

p15: The link in footnote 7 is broken.

Were the reasoners checked for correctness, either by comparing the results with other reasoners, or verifying the returned justifications?

Table 1: "," -> "." (also in some other tables)

Table 1: All numbers should have the same precision.

p16: "effect of non-determinism in the choice of rules" is not clear. Only a single (deterministic) rule is applicable at all times (unfold).

p16: "timeout of 30 minutes on the fastest system ..." is not clear. Does this refer to TORNADO?

p18: "For the sake of completeness ..." -> If a timeout is fixed, only runtimes below the timeout should be reported. By increasing the timeout, one can include more measurements in the tables (where they belong).

Test 4: Why were only 50 axioms annotated with probabilities, and not all of them, or a fixed percentage?

Tables 9 & 10 are nearly exactly the same as Tables 1 & 8.

Tables 11,12,13: "Test 5" -> "Test 6"

The paper should be checked for correct grammar (singular/plural, articles).

Review #2
Anonymous submitted on 29/Oct/2019
Suggestion:
Minor Revision
Review Comment:

In principle, this paper provides optimization approaches for improving the performances of reasoners in TRILL framework. The main proposal within this paper is included in Chapter 4 describing their two improvement approaches: the first one scans and analyzes the knowledge base in order to build a concept hierarchy w.r.t. subsumption order, while the second one modifies the application order of the tableau rules.

My observations on the following three dimensions for their research contributions:
1. Originality: By looking at their explanations in Chapter 5 and after I briefly read the related work and their previous paper in TLP '19, to the best of my knowledge, this paper presents a new combination of optimization approaches and experiments in the area of tableau algorithms for (probabilistic) DLs. Nonetheless, there are some technical details that I will mention below and I need them to clarify those details.

2. Significance of the results:
Their proposed ideas to construct a concept hierarchy as a preprocessing step before tableau expansion as well as to modify the application order of tableau rules make this reasoner competitive enough in comparison with state-of-the-art DL reasoners. This competitiveness can be justified from the fact of their experiments which makes them better (in some tests) than the previous version of TRILL framework, and as shown in their previous paper in TLP '19 that each TRILL reasoner built over the unoptimized algorithm is also sometimes better than other well-known DL reasoners. So, transitively, their experiment results should confirm that these reasoners can be one of the good alternatives for implementation over DL ontologies.

3. Quality of writing:
In terms of writing flow from one chapter to another, this paper is very well written. Then, the presentations of examples and algorithm codes make every explanation quite complete so that the reader can visualize the narrative argument within this paper.

Summing up my observations above, I recommend this paper to be accepted in this journal by also considering the following technical questions and minor revisions:
1. Technical questions:
- In Subsection 2.3., you briefly define what DISPONTE is as a probabilistic DL. However, at line 46, you define a DNF boolean formula f_K and it's not defined explicitly what K is. I know that K is supposed to be the set of all explanations for a Query based on what I read in your TLP paper. But, I think, to make the definition more complete, it should be better to define what K is in this subsection.

- On page 13, you describe that the query expansion process is performed separately for the two cases: first, you look for the assertions that trigger deterministic rules iteratively until the expansion queue is empty. Then, you update the queue by considering the assertions triggering non-deterministic rules so that the application of the non-deterministic rules are performed as late as possible. But, what happens if, during the application of non-deterministic rules, more newly assertions are produced and, it turns out, that those assertions trigger deterministic rules? I think you mention this situation again on page 15 in line 7 in column 1 where these new assertions will be added at the end of the queue. But, if this is the case, then for me it seems that there will be a sequence of deterministic rules followed by a sequence of non-deterministic rules followed by deterministic rules again and so on during the tableau expansion. If my assumption is correct, then how many such iterations does this algorithm need to perform?

- In test 1 and test 4, it seems that your hier. version is always better than the prev. version in the KB BRCA. Why is it the case? How does this KB look like such that your optimized reasoners always look better than the old ones?

- I think I also need more explanations based on Table 3 where the hier version of TRILL^p only outperforms the old one where m = 7. Could you briefly give the reason for this situation?

- At line 47 column 2 page 7, you said that the order of rule application does not affect the completeness of finding all justifications. Could you give me any reason for this statement? Maybe you can cite some related work stating about or proving this statement.

2. Small typos or writing errors:
- Line 10 Column 2 Page 4: "implements" --> "implement"
- Remove one of "TRILL systems" at line 48 column 2 on page 14.
Put "comma (,)" after "when the running time increases" in line 34 column 1 on page 16.
Remove “is” at line 49 column 1 page 17 —> the more difficult the application of the rule, the higher the ratio.
- For me, it's better to not start writing a sentence with a symbol or the name of the method/predicate. For instance, in line 35 column 1 on page 7, you write "findClassAssertion" after "scan_or_list. " which somewhat makes the flow of your explanation not readable or understandable. Alternatively, you can somehow write "The predicate findClassAssertion ... "

Review #3
Anonymous submitted on 12/Nov/2019
Suggestion:
Reject
Review Comment:

The article describes optimizations for (Prolog-based implementations of) tableau algorithms for Description Logics with particular focus on the ability of providing explanations/justifications of derived consequences. The optimizations aim at improving the applications of expansion rules via a more sophisticated queue of those facts for which rules may have to be applied and via the utilization of subsumption relations between (complex) concepts/class term expressions that can easily be identified from the ontology axioms in a preprocessing step. In order to show the effectiveness of the proposed optimizations, the authors present several evaluations on real world ontologies as well as some constructed test cases, which compare their reasoners with older versions where the optimizations are not included.

In my opinion, the content of the article is in principle interesting to the semantic web community since it presents novel reasoning systems with included support for justifications as well as some form of probabilistic reasoning and are available online via a web interface, which makes experimenting with them easy. In addition, the Prolog-based realization of the tableau algorithm is interesting since it enables certain analysis and debugging possibilities (e.g., providing justifications/explanations). I further appreciate that the source code and some of the evaluation data is available, but it does not seem straightforward to reproduce all the results of this article since some test cases/datasets are missing (a script/docker image would be great that executes the tests and makes the results appropriately available). Nevertheless, I do not recommend acceptance at the moment due to a variety of reasons, which are listed and explained in the following:

Although the optimizations that are presented in this article make sense and can result in an increased reasoning performance, they neither seem to be very challenging nor particularly novel. In fact, most reasoning systems for expressive Description Logics maintain more or less sophisticated queues or other data structures for efficiently managing the rules that may have to be applied for (derived) facts in the completion graph. The most difficult part seems to be the realization in the Prolog-based implementation, which mainly seem to be chosen for easily managing/identifying the causes of derived facts/assertions via the Prolog backtracking mechanism in order to be able to provide justifications/explanations. Also, the utilization of subsumption relations between concepts in the reasoning process has already been investigated and has been shown to be quite effective. For example, the reasoning system Konclude saturates the concepts that may occur in the actual reasoning process with a simple (completion-based) procedure in order to derive many subsumer concepts in an effective way and uses them for the expansion of the completion graph (see, e.g., [1]). The proposed optimization in this article differs in the sense that only some subsumers of concepts are determined from a syntactical analysis of the ontology axioms. Evaluations with the Konclude reasoning systems have however shown that even quite sophisticated saturation procedures improve the performance for many ontologies. Of course, there are some adaptions required in order to be able to determine the explanations and this can be seen as one of the contributions of this article (but it is arguable how big of a contribution it is since most of it seems straightforward).

Unfortunately, the Prolog-based implementation of the tableau algorithm does not even seem to be complete w.r.t. determining all justifications/explanations. In fact, the proposed implementation prioritizes deterministic rules, but finding all justifications may only be possible if all rules are equally treated such that the entire "search space" (w.r.t. the traced axioms) is explored. For example, if we consider a knowledge base K1 with the axioms A \sqsubseteq F and F \sqsubseteq B, then it obviously holds that K1 \models A \sqsubseteq B. By querying (the online available version of) TRILL with sub_class('A','B',ListExpl), it correctly determines the axioms of K1 as the only justification. For a knowledge base K2 containing the axioms A \sqsubseteq B \sqcup C \sqcup D, A \sqsubseteq \atleast 5 r, C \sqsubseteq \atmost 4 r, and D \sqsubseteq \atmost 3 r, it obviously also holds that K2 \models A \sqsubseteq B and TRILL correctly determines the axioms of K2 as a justification. However, for the knowledge base K3 = K1 \cup K2, TRILL only determines the axioms of K1 as a justification for A \sqsubseteq B, although clearly also the axioms of K2 constitute a justification. It might be possible to fix this by treating all rules equally, but this would probably result in a much worse performance. Hence, the proposed implementation of the tableau algorithm also seems to require the Hitting Set Tree (or a similar) approach to determine all justifications. (The systems also seem to have a few other bugs, e.g., some axioms of the form \top \sqsubseteq ... are not correctly processed/handled.)

The optimizations are mainly presented via some Prolog (pseudo) code, which can make it quite difficult to understand them (for people that are less familiar with Prolog) and/or to transfer them to other systems implemented with other languages. In fact, a mathematical notation or some general pseudo code would be much more compact, generally easier to understand and, hence, much more suitable for presenting these optimizations. Since the research aspect should be the central part of full papers, it would also be much better, in my opinion, to separate the general idea/abstract algorithm of the optimizations from the concrete implementation in a system with a certain programming language/paradigm (such as Prolog). In general, it seems to me that the authors mixed the research paper with a system description and, in its current form, it does not seem to fit any paper category of SWJ (especially not full (research) paper).

Also, it is quite unclear why the probabilistic reasoning is so prominently presented/motivated. In fact, it seems that the probabilistic reasoning is simply reduced to the justifications, i.e., the tableau-based reasoning is used as a black-blox and not really integrated. As a consequence, every procedure that provides the required justifications can be used and it is clear that improvements in the underlying reasoning algorithm would also result in an improved probabilistic reasoning performance. Hence, one could simply evaluate the optimization in the context of probabilistic reasoning and it would be sufficient to introduce and discuss the probabilistic reasoning just in this section (as it is not a central aspect/requirement of the article).

I further wonder why the authors evaluate their optimizations on only four real-world ontologies although it is easily possible to find more ontologies that fall in the fragment that could be handled by their reasoners. The authors even conclude from the results of one ontology that "the overhead [of their optimization] is compensated by a faster management of the expansion", but it could also just be an anomaly of that ontology. Unfortunately, there are also a few unclear and hardly discussed decisions w.r.t. the evaluations. For example, for some ontologies only versions without individuals were used and it is unclear why they created instance queries for some ontologies but subsumption queries for others. Moreover, only queries with explanations were used, although it seems quite likely that also queries without explanations occur in practice (i.e., instance and subsumption queries that do not hold). Although the authors created several test cases, their evaluations are not particularly interesting or meaningful since these tests are designed such that the optimizations work well (but it is unclear whether or how often the tested cases occur for real-world ontologies). I think it would be much better to reduce the amount of artificial test cases and evaluate more real-world examples/ontologies.

The article mostly seems to be reasonably written, but contains several errors and typos (I listed a few below). There is certainly also some room for improving the (presentation of the) article, e.g., by improving examples, making sentences more clear, adding line numbers to algorithms, etc.

Some minor comments (by far not all):
- Section 2.1: You may want to mention that complex roles are not allowed in number restrictions (otherwise decidability cannot be guaranteed).
- Example 1: I wonder whether it is indeed a good example for an axiom to state that a human with a pet is a nature lover. I think that most humans have pets in order to have some company or entertainment and this does not necessarily imply that they love nature.
- Page 4, left-hand side: "Unfortunately, classical tableau" -> "Unfortunately, a classical tableau...". Also, the explanation of the HST algorithm is unclear to me (line 25-32). Shouldn't it also check for justifications by removing other axioms from the knowledge base (i.e., other axioms than E_1) even if K' = K \setminus E_1 does lead to new justifications?
- Example 2 should probably be earlier (at the end of Section 2.2.1).
- Figure 1: Shouldn't a tableau algorithm for SHIQ contain a choice rule that (non-deterministically) decides whether a neighbour node is an instance of C or \neg C for an at-most restriction of the form \atmost r.C? Also, wouldn't it be simpler to mark the rules that are additionally required for SHIQ (instead of marking the rules that are only required for SHI)?
- Is it really necessary to have three different reasoners? Wouldn't it be simpler for the user to just have some options/setting (for the reasoners, the user has to know what the differences are since they are not clear from the names)?
- Page 7: It is mentioned that "the order of rule application does not affect the completeness of the covering set of justifications", but I do not understand this. In fact, it seems to be quite relevant. For example, if we have the axioms A \sqsubseteq C and B \sqsubseteq C, then only one unfolding is possibly performed and, consequently, only one axiom is added to the tracing function, i.e., for determining the explanations/justifications it seems to be relevant in which order the rules are applied.
- Figure 5: Shouldn't the variables D or Ax be used somewhere for the modify_ABox function in the unfold_rule?
- Page 9, right-hand side: Does it really make sense to represent a (concept) hierarchy in form of a tree? Do you need several nodes then that represent bottom? Are all these parts of the kb dictionary really necessary to present the optimizations (e.g., how is the information about the datatypes required for understanding these optimizations)?
...

[1] Andreas Steigmiller, Birte Glimm: Pay-As-You-Go Description Logic Reasoning by Coupling Tableau and Saturation Procedures. J. Artif. Intell. Res. 54: 535-592 (2015)