Enhancing the Scalability of Expressive Stream Reasoning via input-driven Parallelization

Tracking #: 1900-3113

Authors: 
Thu-Le Pham
Ali Intizar
Alessandra Mileo

Responsible editor: 
Guest Editors Stream Reasoning 2017

Submission type: 
Full Paper
Abstract: 
Stream reasoning is an emerging research area focused on providing continuous reasoning solutions for data streams. The exponential growth in the availability of streaming data on the Web has seriously hindered the applicability of state-of-the-art expressive reasoners, limiting their applicability to process streaming information in a scalable way. In this scenario, in order to reduce the amount of data to reason upon at each iteration, we can leverage advances in continuous query processing over Semantic Web streams. Following this principle, in previous work we have combined semantic query processing and non-monotonic reasoning over data streams in the StreamRule system. In the approach, we specifically focused on the scalability of a rule layer based on a fragment of Answer Set Programming (ASP). We recently expanded on this approach by designing an algorithm to analyze input dependency so as to enable parallel execution and combine the results. In this paper, we expand on this solution by providing i) a proof of correctness for the approach, ii) an extensive experimental evaluation for different levels of complexity of the input program, and iii) a clear characterization of all the algorithms involved in generating and splitting the graph and identifying heuristics for node duplication, as well as partitioning the reasoning process and combining the results.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By David Bowden submitted on 19/Jul/2018
Suggestion:
Accept
Review Comment:

I think the authors’ amendments have satisfactorily addressed the suggestions from my review of the original paper, particularly in sections 1, 5 and 7. The paper proposes an innovating approach to the problem of scaling the reasoning of streaming data, which is a challenge for current production implementations. I would therefore recommend to the editors, that the paper is ready for publication.

Review #2
By Konstantin Schekotihin submitted on 25/Jul/2018
Suggestion:
Accept
Review Comment:

Some final comments after the authors response:

- Regarding terms, both Gebser et al. and Baral allow for function symbols. Therefore, in their definitions it is clear that variables may appear in a term, e.g., a variable X appears in the term f(X). Since you exclude the function symbols from your definition, as it is stated in footnote 2, the sentence sounds a bit odd for me.

- "We realize this sentence without context might be misleading". Therefore, I think, the paper would be more clear if you add this discussion to it.

- Fonts Figures 1, 2, and 6 is really small. Consider either using the larger fonts or placing the figures over the both columns.

- A part of the last par in Section 2.2 starting with "The logic program (or program) P ..." appears somehow unrelated to the other text. Please rewrite it or consider moving its parts to Section 3.1.

Review #3
Anonymous submitted on 13/Aug/2018
Suggestion:
Minor Revision
Review Comment:

The paper targets the scalability of stream reasoning, focusing on a fragment
of Answer Set Programming (ASP). The paper considers a conceptual architecture
in which a stream query processor filters the input data to reduce its volume,
and the results of this filtering phase are fed into a non-monotonic rule
engine based on ASP. It proposes a technique to partition the input to the
rule engine such that no answers are missed.

The paper is very interesting and the proposed approach appears to be
promising, and complementary to existing techniques that either partition the
set of rules or investigate incremental approaches.

The definitions and demonstrations are rigorous and correct (as far as I could
check), and all the issues of the first submission have been fixed in this
revision.

In my opinion, Section 3 still remains a bit difficult to read and understand,
probably because the definitions and examples are introduced without any
previous discussion on their meaning. My suggestion would be to add some
introductory text that explains the key ideas behind the definitions and
algorithms, before they are formally presented.

It would also be interesting to investigate the benefits of input partitioning
with respect to rule partitioning and incremental reasining, although this is
probably beyond the scope of the paper.

Minor comments:

Page 5: a parallel instantiation algorithms that generate -> a parallel
instantiation algorithm that generates