SpecINT: A framework for data integration over cheminformatics and bioinformatics RDF repositories

Tracking #: 1528-2740

This paper is currently under review
Branko Arsić
Marija Đokić-Petrović
Petar Spalević
Ivan Milentijević
Dejan Rančić
Marko Živanović

Responsible editor: 
Michel Dumontier

Submission type: 
Full Paper
Many research centers and medical institutions have been accumulating the huge amount of various biological and chemical data over the past decade and this trend continues. Their associated information models, notions, areas of interest,units of measurement, parameters and conditions for experiments are different. Based on Linked Data vision, many semantic applications for distributed access to these heterogeneous RDF (Resource Description Framework) data sources were developed. Their improvements brought about a decrease of intermediate results and an optimizing query execution plans. But still many requests are unsuccessful and they time out without producing any answer. Also, the queries over different repositories with many data sources are not available. In this paper, the SpecINT is proposed as a comprehensive hybrid framework for data integration and federation in semantic data query processing over repositories. Innovativeness of the approach lays in the fact that the coordinates of graph eigenvectors are used for query join-ordering and translation of directed graph into federated SPARQL queries, instead of data statistics and classical algorithms which are applicable to the weighted graphs. Chemists and biologists could gain large benefit with the SpecINT by creating virtually distributed database as a resource for gaining new knowledge about chemical substances and compounds and their natural influence on environment. In experiments, we demonstrate the potential of our framework on a set of heterogeneous and distributed cheminformatics and bioinformatics data sources.
Full PDF Version: 
Under Review