Dragoman: Efficiently Evaluating Declarative Mapping Languages over Frameworks for Knowledge Graph Creation

Tracking #: 3289-4503

This paper is currently under review
Samaneh Jozashoori
Enrique Iglesias
Maria-Esther Vidal1

Responsible editor: 
Katja Hose

Submission type: 
Full Paper
In recent years, there have been valuable efforts and contributions to make the process of RDF knowledge graph creation traceable and transparent; extending and applying declarative mapping languages is an example. One challenging step is the traceability of procedures that aim to overcome interoperability issues, a.k.a. data-level integration. In most pipelines, data integration is performed by ad-hoc programs, preventing traceability and reusability. However, formal frameworks provided by function-based declarative mapping languages such as FunUL and RML+FnO empower expressiveness. Whether data processing or entity alignment, data-level integration can be defined as functions and integrated as part of the mappings performing schema-level integration. However, combining functions with the mappings introduces a new source of complexity that can considerably impact the required number of resources and execution time. We tackle the problem of efficiently executing mappings with functions and formalize the transformation of them into function-free mappings. These transformations are the basis of an optimization process that aims to perform an eager evaluation of function-based mapping rules. As a result, each function is executed once and efficiently reused. These techniques are implemented in a framework named Dragoman, providing, thus, the possibility to plan the optimized execution of functions and the materialization of reusable functions. We demonstrate the correctness of the transformations while ensuring that the function-free data integration processes are equivalent to the original one. The effectiveness of Dragoman is empirically evaluated in 230 testbeds composed of various types of functions integrated with mapping rules of different complexity. The outcomes suggest that evaluating function-free mapping rules reduces execution time in complex knowledge graph creation pipelines composed of large data sources and multiple types of mapping rules. The savings can be up to 75%, suggesting that eagerly executing functions in mapping rules enables making these pipelines applicable and scalable in real-world settings.
Full PDF Version: 
Under Review