OPTIMA: A Hybrid OBDA System for Efficiently Querying Large Heterogeneous Data

Tracking #: 3249-4463

This paper is currently under review
Dr. Abderrahmane Khiat
Nabil Keskes

Responsible editor: 
Guest Editors Tools Systems 2022

Submission type: 
Tool/System Report
The current decade is witnessing a remarkable evolution in terms of big data virtualization. Data is queried on-the-fly against the original data sources without any prior data materialization. Ontology-Based Big Data Access solutions by design use a fixed model, e.g., TABULAR, as the only Virtual Data Model - a uniform schema that is built on-the-fly to load, transform, and join relevant data. While other data models such as GRAPH or DOCUMENT are more flexible and, thus, can be more suitable for some common types of queries such as join or nested queries. Those queries are, in many cases, hard to predict because they depend on many criteria such as query plan, data model, data size, operations e.g., join, filter. To address the problem of selecting the optimal virtual data model for various queries on large datasets, we develop OPTIMA. OPTIMA is a framework that (1) builds on the principle of ontology-based data access to enable the querying, aggregating, and joining of large heterogeneous data in a distributed manner using a unique query language SPARQL and (2) calls the deep learning method to predict the optimal virtual data model using the features extracted from SPARQL queries. OPTIMA currently leverages state-of-the-art Big Data technologies, Spark, and implements two virtual data models, GRAPH and TABULAR, and supports out-of-the-box five data sources Neo4j, MongoDB, MySQL, Cassandra, and CSV. Extensive experiments show that OPTIMA returns the optimal virtual model with an accuracy of 0.831, thus reducing the query execution time by over 40% in favor of tabular model selection and over 30% for the graph model selection.
Full PDF Version: 
Under Review


This paper was submitted for consideration in the Special Issue: "Tools & Systems ” and “Submission type" should be Tool/System Report.

Thanks, this has been corrected.