A Divide and Conquer Approach for Parallel Classification of OWL Ontologies

Tracking #: 599-1807

Authors: 
Kejia Wu
Volker Haarslev

Responsible editor: 
Guest Editors RR2013 Special Issue

Submission type: 
Full Paper
Abstract: 
Description Logic (DL) describes knowledge using entities and relationships between them, and TBox classification is a core DL reasoning service. Over more than two decades many research efforts have been devoted to optimizing TBox classification. Those classification optimization algorithms have shown their pragmatic effectiveness for sequential processing. However, as concurrent computing becomes widely available, new classification algorithms that are well suited to parallelization need to be developed. This need is further supported by the observation that most available Web Ontology Language (OWL) reasoners, which are usually based on tableau reasoning, can only utilize a single processor. Such an inadequacy often leads to frustrated users working in ontology development, especially if their ontologies are complex and require long processing times. In this paper we present a novel algorithm that uses a divide and conquer strategy for parallelizing OWL TBox classification, a key reasoning task. We discuss some interesting properties of our algorithm, for example, its suitability for distributed reasoning, and present an empirical study using a set of benchmark ontologies, where a speedup of up to a factor of four has been observed when using eight workers in parallel.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Raghava Mutharaju submitted on 17/Feb/2014
Suggestion:
Major Revision
Review Comment:

The paper presents a divide and conquer approach to parallelize TBox classification. The given ontology is partitioned and assigned to separate threads to be classified and later on they are merged.

My comments on the presentation, evaluation and related work are given below.

Presentation:

1) I think it would be much better if the presentation of the algorithms is example driven. The example given in Section 3.3 can be mentioned lot earlier and after each algorithm is explained, the output of applying that algorithm on the example can be discussed in as much detail as possible.
2) For certain algorithms, there is no explanation at all. Since there is no paper limit, even for algorithms that were used from literature (Top Search, Bottom Search), it would be good to have some explanation.
3) It would be better to discuss ontology partitioning before discussing classification since I couldn't understand the terms heuristic and even partitioning mentioned in Section 3.1.
4) In Algorithm 1, line 2, how do we know whether the ontology has been "divided_enough"?
5) In Algorithm 3, when variable green is empty, it is not clear why ?
6) In Proposition 3, instead of , shouldn't it be ?
7) In Algorithm 7, line 6, is it P <-- P U {n}?

Evaluation:

1) I didn't find any code in the provided github url. There is only a jar file and test ontologies. Was that intentional?
2) It is good to have a section (either in Evaluation section or outside of it) on implementation challenges and design decisions taken. Issues like race conditions, deadlock etc can also be discussed there.
3) The number of workers were limited to only 8 although the test machine supports 32 cores. I would expect an evaluation starting from 1 worker to more than 32 workers. This allows us to check the overhead and performance over a wider range.
4) Why weren't any bigger ontologies such as NCI, FMA and SNOMED considered? The difference in performance with the usage of more workers could be more prominent here.
5) It is confusing whether the performance mentioned in Figures 15, 16 are for even-partitioning scheme or heuristic partitioning scheme. I did not find any mention of this while describing those two figures.
6) It was mentioned that heuristic partitioning improved the performance but there were no graphs or evaluation numbers to support that claim.
7) The actual runtime numbers were not mentioned anywhere. It would be good to have them along with the speed ups. I see that authors focus on the parallelization using divide-conquer strategy, but, however much may be the speed up, if the total runtime taken is very high, parallelization might not bring it down by much, especially with 4-5 workers.
8) In almost all cases, highest speed up of 4 is achieved when using 5 workers. In fact, from the figures, with 8 workers I don't notice a speed up of 4 for any ontology as was claimed by the authors.
9) Can there be any heuristics for determining the optimum number of workers for a given ontology? (Assume that the number of cores can be determined at runtime, which is generally true).
10) In Section 6.2, it is mentioned that large number of GCIs could be the reason for decrease in performance of merge-classification approach. But, isn't it the case that in most ontologies there would be more GCIs than other type of axioms?

Related Work:

1) In was mentioned in the paper that there is a discussion on the applicability of this approach to distributed reasoning. I did not find this anywhere except a line in Section 7.2 where it is mentioned that this approach is also suitable for distributed implementation. Either there should be more discussion -- can be it be implemented as is? any problems that you foresee compared to shared memory implementation etc -- or remove that statement altogether.
2) Considering that ELK is the fastest parallel implementation of ontologies that it can support, I think there should be more discussion than just a passing remark on the similarities of your work with ELK. Although it was mentioned that the procedure ELK follows is different, it is similar to the extent that ELK is also a shared-memory parallel implementation of a classification procedure. Comparison of speed ups for the same ontologies that ELK was tested on (on your test machine obviously) and the actual runtimes taken would be interesting. If they are not close then is their something that can be learnt from the way parallelization works in ELK?

Due to these reasons, I felt this paper needs some major revisions, after which, it should be reviewed again.

Review #2
By Frantisek Simancík submitted on 20/Feb/2014
Suggestion:
Reject
Review Comment:

SUMMARY

This paper presents a concurrent divide and conquer algorithm for classification of OWL ontologies. It comprises the following results:

Sections 3-4 : Parallel merge sort algorithm for sorting partially ordered sets.
Section 5: Heuristic partitioning scheme based on told subsumptions.
Section 6: Experimental evaluation of the approach showing speedups for varying number of concurrent workers using the JFact reasoner.

I like the underlying idea of experimenting with merge sort for parallelizing classification. The authors show that on some ontologies this can lead to up to 4x speedups, which is competitive with other state of the art concurrent OWL reasoners. This result can serve well as a short workshop/conference paper - indeed, a shorter version of this work has already been published. However, I don’t think this work meets the standards of a journal publication, neither in terms of contributions nor in quality of presentation.

Firstly, the contributions are overstated. A big majority of the paper (pages 4-19) focuses on developing this merge sort for arbitrary partially ordered sets, only to end up with a fairly straightforward recursive procedure for merging two hierarchies into one. I wonder if the authors are reinventing the wheel here. Unfortunately, there is no discussion on any related work on general poset sorting, and I am not an expert on this topic either. Quick searching online finds a number of papers, such as

“On Poset Merging” by Chen, Ding, and Seiden
“Sorting and Selection in Posets” by Daskalakis, Karp, Mossel, Riesenfeld, and Verbin

that seem to present more sophisticated merging procedures and even discuss optimal number of comparisons. I would therefore not consider this part of the paper to be a novel contribution.

Secondly, the paper is technically weak and not presented very well. Readers are often referred to study pseudocode. Intuitive explanations, when given, are imprecise and sometimes even misleading. Mathematical notation and terminology is used sloppily. Space can be used better: On one hand, long passages (Propositions 1-8, Figs 3-6) are devoted to trivial properties of transitive relations. On the other hand, identification of equivalent concepts, which seems to be an essential part of the algorithm, is dismissed in a footnote for “sake of conciseness”.

MORE DETAILS AND SUGGESTIONS

It was very hard for me to understand the relationship between your algorithms T_merge and T_merge-. On page 7 you say this about T_merge- “However, this method does not make use of the so-called told subsumption… For example, it is unnecessary to test… Therefore, we designed a novel algorithm T_merge”. This sounds to me as if T_merge- was already a correct sorting algorithm and you were now optimizing it to somehow benefit from using told subsumptions. Only after carefully diving into the pseudocode I realized that T_merge- is actually incomplete, and T_merge is not an optimization but simply a fix to make the algorithm correct. Please state this explicitly as the main motivation if you really want to go this way. But this makes me wonder why even present the incorrect algorithm in the first place? Would it be possible to just directly show T_merge as a single recursive procedure? Something like taking Algorithm 6 and instead of returning anything you could attach C under D in line 15.

Also, in the above, referring to these as “told subsumptions” does not help. I understand told subsumptions to be somehow easily derivable from looking at input axioms without full reasoning, and you actually use the term in this sense in Section 5. Here, however, you seem to mean the (complete) subsumption relationship over the subdomain that has been derived in the recursive call.

It is good that you decided to explicitly define the input and output of each procedure. But please, make this more formal and make sure that it is valid for every possible (even recursive) call of the procedure, not only for the topmost call. For example, let’s look at Algorithm 3. Firstly, the algorithm should also have “C subsumes D” as a precondition, since you return D as a parent of C without checking. Second, the procedure does not return the set of all parents of C, but only those that are descendants of D. Finally, using the notation {p | \in \leq_i} for the parents of C is incorrect because C is not in the hierarchy \less_i at all at this point.

On the same note, in Algorithms 4 and 5, what exactly is “output: the merged subsumption hierarchy \less_\alpha over \less_\beta”? Presumably this is not the complete hierarchy over the union of the two domains, because you still have to run bottom-merge to make it complete. It is not even a hierarchy in the sense you described in the preliminaries: it can have multiple bottoms, as in Fig. 10, right? It would help if you could define these datastructures more precisely. Also, do you maintain your hierarchies transitively closed or transitively reduced, or maybe neither? On one hand, you use the notation ( \in \less_\alpha) to mean that A occurs in the hierarchy, which suggests that the relationship is transitively closed. On the other hand, in line 4 of Algorithm 4 and 5, you only add where a is the immediate parent of B, so the “merged” hierarchy is not transitively closed anymore?

In Section 5, can you describe (precisely) what the algorithm does using words, rather than asking the reader to digest recursive functions with four parameters? I eventually understood the algorithm as follows: Iterate over all (direct) children of top. For each child c of top, add a new partition consisting of all told descendants of c that do not yet belong to any of the earlier partitions. Is this correct?

Algorithm 9 appears completely out of the blue. In your main concurrent Algorithm 1, there are no workers, no shared queue, and no possible deadlock or racing issues. Then suddenly in this one paragraph in Section 6 you start talking about this one rather small detail without describing anything else about the implementation first. I didn’t understand the motivation here, why mention this one particular problem? Worse, does your solution really work? What if there are only two items in the queue, and two workers repeatedly take one item each, and then return it back to the queue, forever? Or the same for n items and n workers.

As mentioned earlier, I would also like to see what exactly you do with equivalent concepts. Do you have to somehow interleave your algorithm with cycle detection?

What is that expression on line 3 of page 4? Sorry, this really doesn’t make any sense. Did you want to say something like “subsumption tests over all pairs of atomic concepts occurring in the TBox”?

On first reading of Section 3, I didn’t understand what you meant by the domain \Delta, it became clearer only later in Section 3.1. Perhaps, in the context of DLs, it is unfortunate to use the term “ domain” and even the symbol \Delta, since this is unrelated to the domain of an interpretation. Would “signature” be more appropriate?

Review #3
By Giorgos Stoilos submitted on 10/Mar/2014
Suggestion:
Major Revision
Review Comment:

The paper presents a parallelisation approach to the Enhanced Traversal algorithm for constructing the class hierarchy of ontologies. It is based on a divide and conquer approach where the classes of the ontology are partitioned into clusters which are then classified independently and finally merged. The paper presents in detail the search and the merge algorithms, it proves their correctness and termination, and also discusses partitioning approaches. Finally, it provides an experimental evaluation.

The topic of exploiting parallel architectures to speed up classification is indeed interesting and the approach taken in the paper looks sound. However, my most important concern is that at the current point it is not very clear whether the approach could indeed provide with very important and significant results. As the authors also comment in the paper the current paper could set a foundation for promising future results, but in my opinion this needs to become a bit stronger and even in the current submission. For example, it would be good to have an ontology where normal reasoners would take hours to terminate while the proposed approach would finish in a few minutes. Then, even though there are a few pathological cases, having a couple of ontologies where the new approach makes a large improvement would be a more convincing argument that indeed the approach could be very valuable once all pathological cases have been resolved. Although there are a couple of cases where a speed up of 4 was achieved in the experiments, it is not clear to me whether these are indeed cases where normal approach takes several minutes (hours) and the concurrent one a few seconds (minutes) or it is just seconds vs milliseconds. Hence, it would be good to try to find (perhaps challenging) ontologies that would show this.

Regarding the more technical points of the paper, I also feel that some more results on some of the open questions are important even for this submission. For example, providing even one additional divide scheme would make this submission much more complete. Moreover, it would also be good to provide a more thorough analysis on the cases where there are speed ups vs no (or marginal) speed up vs decrease in performance. Is it possible to predict when one is expected to have a speedup or not? Is it possible to predict the behavior of the concurrent algorithms by checking e.g., the size, modules, expressivity, told subsumption graph of the ontology? What are the factors that influence reasoning performance? Even though definitive answers to all questions might not be possible from a single submission I feel that some of them need to be looked a bit more and have at least some initial good intuition.

All in all I feel that the evaluation needs to be extended a bit with some additional ontologies that clearly show superiority of the proposed method (eg, hours vs minutes) and some additional details about open issues raised in sections 6.2 and the conclusions.

Editorial comments and suggestions:
In the abstract the phrase “key reasoning task” has been mentioned quite a few times.
Page 3:
“as an abbreviations” -> “as an abbreviation”
C\equiv D stands for the set {C\sub D, D\sub C}. This creates a notation bug since then according to this definition the set {A\sub B, C\equiv D} is {A\sub B, {C\sub D, D\sub C}} which is not a set. I think it suffices to say that C\equiv D is an abbreviation of the pair of axioms C\sub D and D\sub C.

The notions of subsumes and subsumer is used but is not defined. Although you refer elsewhere for an intro to DLs I think it would be good to at least define all the central notions that you use often.

It is not clear whether you assume that the relation \leq (end of page 4) is transitively reduced.

Page 5: “This divide and conquering operations” -> “These divide …”

The notions of significant and intermediate concepts are a bit vague. I think the issue in the first case is that the notion of explicitly declared concept has not been defined or explained before.

As far as I understand Algorithm3 is the normal top-search. If so then wouldn’t it be better to be moved to the preliminaries?

Remarks on Proposition 1 to 8.
It is continuously mentioned that \in \leq_\alpha (\in\leq_\beta), but it is not clear what is the role of mentioning this? Isn’t it trivial? As far as I can see these two are not used in the proofs. Similarly for some concept C \in \leq_\alpha is also mentioned a couple of times in algorithms and in text but it is not clear to me what is its purpose.

Some of the properties mentioned in the propositions have also been raised in [9]. It would be good to mention this in the paper before stating them.

The preamble in many propositions is common “When merging …” It would be good to group proposition 1 to 4 and 5 to 9 into one proposition with many cases. Eg.,
“When merging … the following conditions hold:
1) …
2) …

Actually, I am not sure whether these can be stated as propositions as in most cases their proofs are just a small example given mostly in a figure. Perhaps calling them properties is more appropriate.

Some of the figures used should probably go side-by-side to save length and have the text that is referring to these figures close to the actual figures. For example, proposition 1 refers to figure 3 but there is a two page gap between them showing algorithms 4 to 6. All these make it a bit difficult to read the paper as one has to constantly go back and forth two pages. Another idea to fix this issue would be to try and merge algorithms 4 and 5. As far as I can see the only difference between them is that the one calls the \top-search and the other the \top-search* algorithm. You can perhaps say that
“our modified top-merge algorithm replaces line 2 of algorithm 4 with line (give line of algorithm 5) which calls a modified top-search algorithm 6 (or something like this), instead of repeating the whole of algorithm 4 just because of a differences in a single line.

In pages 18 and 19 you provide lemmas and theorems for correctness of the \bottom-merge procedure. This is slightly weird since the details of these algorithms have not been provided (you mentioned that the approach in \bottom-merge is similar to \top-merge hence you will focus only on the latter). Perhaps these should be dispensed and only mention that soundness and completeness of \bottom-merge can be shown in a similar manner like before.