Empowering the SDM-RDFizer Tool for Scaling Up to Complex Knowledge Graph Creation Pipelines

Tracking #: 3246-4460

This paper is currently under review
Enrique Iglesias
Maria-Esther Vidal
Samaneh Jozashoori
Diego Collarana1
David Chaves-Fraga

Responsible editor: 
Guest Editors Tools Systems 2022

Submission type: 
Tool/System Report
Data has grown exponentially in the last years, and knowledge graphs have gained momentum as data structures to integrate heterogeneous data and metadata. This explosion of data has created many opportunities to develop innovative technologies. Still, it brings attention to the lack of standardization for making data available, raising questions about interoperability and data quality. Data complexities such as large volume, heterogeneity, and high duplicate rates affect knowledge graph creation. This work addresses these issues to scale up knowledge graph creation guided by the RDF Mapping Language (RML). For that purpose, we present the SDM-RDFizer, a two-fold solution to address these two sources of complexity. First, RML triples maps are reordered in a way that the most selective maps are evaluated first, while non-selective rules are considered at the end, reducing the number of triples that are kept in the main memory. In the second step, an RDF compression strategy and novel operators are implemented to avoid the generation of duplicated RDF triples and the reduction of the number of comparisons during the execution of RML operators between mapping rules. We test our tool on two well-known benchmarks, overcoming state-of-the-art RML engines, and hence, demonstrating the benefits of the proposed techniques.
Full PDF Version: 
Under Review