Abstract:
Data exploration and visualization systems are of great importance in the Big Data era, in which the volume and
heterogeneity of available information make it difficult for humans to manually explore and analyse data. Most traditional systems
operate in an offline way, limited to accessing preprocessed (static) sets of data. They also restrict themselves to dealing
with small dataset sizes, which can be easily handled with conventional techniques. However, the Big Data era has realized the
availability of a great amount and variety of big datasets that are dynamic in nature; most of them offer API or query endpoints
for online access, or the data is received in a stream fashion. Therefore, modern systems must address the challenge of on-the-fly
scalable visualizations over large dynamic sets of data, offering efficient exploration techniques, as well as mechanisms for
information abstraction and summarization. Further, they must take into account different user-defined exploration scenarios
and user preferences. In this work, we present a generic model for personalized multilevel exploration and analysis over large
dynamic sets of numeric and temporal data. Our model is built on top of a lightweight tree-based structure which can be efficiently
constructed on-the-fly for a given set of data. This tree structure aggregates input objects into a hierarchical multiscale
model. We define two versions of this structure, that adopt different data organization approaches, well-suited to exploration and
analysis context. In the proposed structure, statistical computations can be efficiently performed on-the-fly. Considering different
exploration scenarios over large datasets, the proposed model enables efficient multilevel exploration, offering incremental construction
and prefetching via user interaction, and dynamic adaptation of the hierarchies based on user preferences. A thorough
theoretical analysis is presented, illustrating the efficiency of the proposed model. The proposed model is realized in a web-based
prototype tool, called SynopsViz that offers multilevel visual exploration and analysis over Linked Data datasets. Finally, we
provide a performance evaluation and a empirical user study employing real datasets.