RDF Graph Partitioning: Techniques and Empirical Evaluation

Tracking #: 2187-3400

This paper is currently under review
Adnan Akhter
Muhammad Saleem
Axel-Cyrille Ngonga Ngomo

Responsible editor: 
Guest Editors EKAW 2018

Submission type: 
Full Paper
Over the past few years, we have witnessed that the RDF data sources, both in numbers and volume have grown enormously. As the RDF datasets gets bigger, system's storage capacity becomes vulnerable and the need to improve the scalability of RDF storage and querying solutions arises. Partitioning of the dataset is one solution to this problem. There are various graph partitioning techniques exist. However, it is difficult to choose the most suitable (in terms of query performance) partitioning for a given RDF graph and application. To the best of our knowledge, there is no detailed empirical evaluation exists to evaluate the performance of these techniques. This paper presents an empirical evaluation of RDF graph partitioning techniques by using real-world datasets and real-world benchmark queries selected using the FEASIBLE benchmark generation framework. We evaluate the selected RDF graph partitioning techniques in terms of query runtime performances, partitioning time and partitioning imbalance. In addition, we also compare their performance with centralized storage solutions, i.e., no-partitioning at all. Our results show that the centralized storage of the complete datasets (no-partitioning) generally lead to better query runtime performance as compared to their partitioning. However, for specific cases the performance is improved with partitioning as compared to centralized solution. Hence, the general graph partitioning techniques may not lead to better performance when implied to RDF graphs. Therefore, clustered RDF storage solutions should take into account the properties of RDF and Linked Data as well as the expressive features of SPARQL queries when partitioning the given dataset among multiple data nodes.
Full PDF Version: 
Under Review