Continuous Top-k Query Answering over Streaming and Evolving Distributed Data

Tracking #: 1843-3056

This paper is currently under review
Shima Zahmatkesh
Emanuele Della Valle

Responsible editor: 
Oscar Corcho

Submission type: 
Full Paper
Continuously finding the most relevant answer of a query that joins streaming and distributed data is getting a growing attention in recent years. It is well known that, remaining reactive can be challenging, because accessing the distributed data can be highly time consuming as well as rate-limited. In this paper, we consider even a more extreme situation: the distributed data slowly evolves. The state of the art proposed two families of partial solutions to this problem: i) the database community studied continuous top-k queries [1] ignoring the presence of distributed datasets, ii) the Semantic Web community studied approximate continuous query answering over RDF streams and dynamic linked data sets [2] ignoring the specificity of top-k query answering. In this paper, extending the state-of-the-art approaches, we investigate continuous top-k query evaluation over streaming and evolving distributed dataset. We extend the data structure proposed in [1] and introduce Super-MTK+N list, which handle changes in distributed dataset while minimizing the memory usage. To address the query evaluation problem, first, we propose MinTopk+N algorithm, which is an extension of algorithm proposed in [1], in order to handle the changed objects in the distributed dataset and manage them as new arrivals. Then, considering the architectural approach presented in [2] as a guideline, we propose AcquaTop algorithm that keeps a local replica of the distributed dataset. In order to approximate the correct answer, we propose two maintenance policies to update the replica.We provide empirical evidence that proves the ability of the proposed policies to guarantee reactiveness, while providing more accurate and relevant result comparing to the state of the art.
Full PDF Version: 
Under Review