Survey on semantic data integration approaches: Issues and directions

Tracking #: 2756-3970

This paper is currently under review
Maroua Masmoudi
Mohamed Hedi KARRAY

Responsible editor: 
Oscar Corcho

Submission type: 
Survey Article
The spread of new Web technologies has led to organizational transitions that are at the root of the digital revolution and then the generation of a big amount of heterogeneous data using different vocabularies and different conceptual schemas. Accordingly, data resides in many siloed systems and are mainly untapped for integrated operations, insights, and decision-making situations. To overcome the insufficient exploitation of data, a data integration system is crucial to break down data silos and create a common information space where data will be semantically linked. Data integration is at the heart of the data value chain. It allows to integrate the big amount of the heterogeneous acquired data and prepare it to exploitation phase. Specifically, semantic data integration provides a semantic meaningful enrichment to the integrated data that empowers the capabilities of data exploration through artificial intelligence algorithms. Semantic data integration will give data analysts a comprehensive toolkit for dataset exploration and for discovering the knowledge within integrated datasets. This paper provides a survey on the different generations (ETL, OBDA) of semantic data integration approaches and systems. Specifically, it reviews 29 works belonging to three categories of approaches; materialized, virtual and hybrid approaches. It aims to identify the most relevant aspects to consider in the development of a semantic data integration approach in order to support analysts and experts to select the most appropriate approach, depending on their needs.
Full PDF Version: 
Under Review