Optimizing and Benchmarking OWL2 RL for Semantic Reasoning on Mobile Platforms

Tracking #: 1666-2878

Authors: 
William van Woensel
Syed Sibte Raza Abidi

Responsible editor: 
Thomas Lukasiewicz

Submission type: 
Full Paper
Abstract: 
The Semantic Web has grown immensely over the last decade, and mobile hardware has advanced to a point where mobile apps may consume this Web of Data. This has been exemplified in domains such as mobile context-awareness, m-Health, m-Tourism and augmented reality. However, recent work shows that the performance of ontology-based reasoning, an essential Semantic Web building block, still leaves much to be desired on mobile platforms. Applying OWL2 RL to realize such mobile reasoning is a promising solution, since it trades expressivity for scalability, and its rule-based axiomatization easily allows applying axiom subsets to improve performance. At any rate, considering the current performance issues, developers should be able to bench-mark reasoners on mobile platforms, using different process flows, reasoning tasks, and datasets. To that end, we developed a mobile benchmark framework called MobiBench. In an effort to optimize mobile ontology-based reasoning, we further propose selections of OWL2 RL rule subsets based on logical equivalence, purpose and reference, and domain relevance. Using MobiBench, we benchmark multiple OWL2 RL-enabled rule engines and OWL reasoners on a mobile platform. Results show drastic performance improvements by applying OWL2 RL rule subsets, allowing for performant reasoning for small datasets on mobile systems.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Nick Bassiliades submitted on 11/Jul/2017
Suggestion:
Minor Revision
Review Comment:

This paper presents a mobile benchmark framework for rule-based systems that is used to test various subsets of the OWL2 RL ruleset for optimizing rule-based ontology reasoning on mobile platforms. The paper extends previous authors' publications. Actually, the benchmark framework is described in a more generic way and extensibility issues have been added. Furthermore, more rule engines have been added although only few are used in the benchmarks. It would be nice to have a clearer rationale in the paper about why only these engines have been used in the experiments.

In addition, the paper discusses various subsets of the OWL2 RL ruleset that could be optionally left out during rule-based reasoning in order to speed-up rule-based OWL2 RL reasoning. This discussion is really helpful. However, this discussion is more generic on rule-based efficiency and it does not have to do particularly with mobile platforms; the same discussion would be valid for any computing environment, isn't it? The authors may be could clarify this in the manuscript.

One remark about section 3.1.3 and the treatment of the owl:sameAs relation between two properties. The solution in this section could be re-used for equivalent properties or the other way around: the same solution followed for equivalent properties (Code 14) could be followed for owl:sameAs between properties. Is that right?

Concerning the Service Matching experiment, I believe it's the weakest part of the paper. First of all, service matching has several levels of matching: exact, Plugin, Subsume, and Sibling. Your experiments cover the first three. However, these levels also have to be examined regarding the input or the output of the services. The most logically correct way to consider successful matching is this: the input concepts of the service should be equivalent or subclasses of the query inputs, and the output concepts of the query should be equivalent or superclasses of the
service outputs. In your experiments either more general or more specific inputs/outputs are intermixed, which is a very relaxed matching criterion that could end up in returning services with little semantic relatedness with the query. You should consider that in your manuscript.

Furthermore, the metric "total performance overhead per service match" does not make much sense. I would prefer to have a metric such as "average performance overhead per service query", since this would indicate the scalability of the rule-based matching.

Summarizing, concerning originality the paper is based on previous authors' work which is extended in a satisfactory extend. Previous benchmarks have been done on ontology reasoning on mobile platforms but this is the first one to test OWL2 RL reasoning (i.e. rule-based ontology reasoning). Furthermore, concerning significance of the results, I believe the results are very interesting because they provide insights on how OWL2 RL reasoning can benefit from using subsets of the original OWL2 RL subset. These results are not restricted to mobile platforms alone but can be used for any computing platform. Finally, concerning quality of writing I thing that the authors have done an excellent job.

Review #2
Anonymous submitted on 02/Aug/2017
Suggestion:
Major Revision
Review Comment:

The paper studies the scalability limits of OWL 2 RL reasoners on mobile platforms.
Roughly, the claimed contribution consists of the following parts:
(1) identification of several subsets and extensions of OWL 2 RL ontology language,
(2) implementation of a framework that allows to run reasoners on mobile platforms and measure their performance,
(3) benchmarking two reasoners on the subsets using the framework.

Presentation.
The writing style of the paper, especially of its first, formal part, is quite ambiguous. It operates various notions, such as ‘rule’, ‘ontology’, ‘clause’, ‘assertion, ’data pattern’, but never gives their definitions. So, it is quite difficult to understand the precise meaning of what is written, and sometimes
it seems just wrong. In fact, the only definition in the paper, Definition 1, is broken because it is recursive: IR is defined as a union of \alpha and \beta, while \alpha and \beta are defined in terms of IR. Another example is Codes 8 and 10: they use undefined notation (C_1, R_t, etc.), and it is not very clear what do they mean. Code 2 and Code 4 are also strange: Code 2 does not have any variables, that is, it is a fact (or, ‘axiom’ if I understand authors’ terminology correctly), not a rule that is applicable to each annotation property, while the rule in Code 4 has variables in the head, but nothing in the premise, so it generates infinite number of triples (or, a triple for each pair (?lt, ?dt) in the active domain, depending on the semantics). The second half of the paper, corresponding to parts (2) and (3) above, is written in a more clear way, but it is not very easy to understand as well because of the problems with the first part.

Contribution.
As far as I can see, the contribution of the paper is rather limited. In particular, part (1), described above, is quite simple, and not very well motivated. The declared aim of the search of the subsets (and extensions) is to minimise the number of rules and optimise the performance of reasoning (in fact, I do not see any reason why these two should be related). However, the minimisation is just claimed, but never shown, that is, it is not clear whether any of the resulting subsets are indeed minimal in any sense. Moreover, some of the subsets are not even equivalent to the original, so I do not see the point of comparison of the performance for non-equivalent sets of rules. At least, the word ‘optimisation’ is hardly applicable here. Another question is the presence of ‘domain-based’ subsets: in essence, some rules are eliminated because they are not applicable on the particular dataset, and then the reasoning performance (in terms of time) of the original set and the subset are compared; but the elimination step, which is normally a data-dependent part of reasoning, is not counted (and even done on a server), so the claim that the performance on the subset is better than on the whole set is just vacuous. Part (2), that is, implementation of the framework, probably took most of the time in this piece of research; however, it is just infrastructural, and can hardly be considered as contribution per se. Finally, part (3), could be moderately interesting, but suffers hardly from the problems of part (1) (and does not contain any surprising results).

Summary.
The presentational problems of the paper require quite essential rewriting of the first part of the paper, but probably fixable with a reasonable amount of effort. However, the main problem of the paper is a luck of good motivation, technical difficulty (apart of an engineering effort in part (2)), and reasonable explanation why the presented results are interesting and important. Without these, I cannot recommend acceptance.

Review #3
Anonymous submitted on 10/Oct/2017
Suggestion:
Minor Revision
Review Comment:

This work has two main parts, the one dedicated to optimize semantic reasoning on mobile devices by selecting OWL2 RL subsets, and the part dedicated to a mobile benchmark framework. The text is well written and English is ok.

The strongest point of this work is the originality. Although there exist others (not many) works focused on reasoning on mobile devices, they focus on OWL2 DL reasoning rather than OWL2 RL. Probably the proposal is not perfect but it covers a wide range of problems that arise in this context.

I include in the following some concerns that should be addressed before accepting it for publication:

I miss a (short) discussion about the + and - of reasoning with OWL2 RL or OWL2 DL on mobile devices.

In section 4.2 you talk about rule triggering but do not say anything about possible infinite loops, which is one of the dangers of dealing with any rule system.

The impact of the so called stable or volatile ontologies appears in several places along the text. However you have not defined such terms precisely. In the following works they give more precise definitions that you can borrow:

Carlos Bobed, Fernando Bobillo, Sergio Ilarri and Eduardo Mena, "Answering Continuous Description Logic Queries: Managing Static and Volatile Knowledge in Ontologies", International Journal on Semantic Web and Information Systems, IGI Global, ISSN 1552-6283, volume 10, number 3, pp. 1-44, July 2014

Yuan Ren, Jeff Z. Pan, Isa Guclu, Martin J. Kollingbaum, "A Combined Approach to Incremental Reasoning for EL Ontologies", RR 2016: 167-183, 2016

In section 4.4.4, you talk about memory consumption but not about energy consumption? nor even a small reference about its importance?

Section 5.3 is perhaps too short to understand the features of such tools.

Section 6.5.1.1 is huge!! the reading become difficult sometimes. You should consider to restructure some parts of the text, perhaps creating new sections or subsections.

Minor comments:

* [62] has been extended and published in a more recent work that you may consider to reference:

Carlos Bobed and Roberto Yus and Fernando Bobillo and Eduardo Mena, "Semantic Reasoning on Mobile Devices: Do Androids Dream of Efficient Reasoners?", Journal of Web Semantics, ISSN 1570-8268, Springer, volume 35, number 4, pp. 167-183, December 2015

* Many times some text line begins with a numeric reference, please add non-breaking spaces. For instance, Section 3.2 in first paragraph of section 3, Section 3.1.2 (between pages 5 and 6), Section 3.1.3 in first line of page 6, among many others. Please browse the whole text to fix this, just look for lines beginning with a number or numeric reference.

* "rule- and data format", is that sentence correct? I dont understand what that hyphen means

* Please do not use bold fonts in the main text, for example in section 6.2.1.1, 6.2.1.2, conclusions, etc. Underline instead

* In figures 3-8 values in axis-x goe in steps of 9. Why? 10 would be more reasonable step to specify round values in x-axis

* "(Future work...", transform it into a footnote.