Abstract:
Recent advancements in declarative knowledge graph generation have led to the development of multiple mapping
languages, their various versions, and different mapping engines that can interpret these languages and execute the mapping
process. The field has progressed to the extent that current studies are now more focused on optimizing the knowledge graph
generation process. Although different mapping engines share the common functionality of generating knowledge graphs from
heterogeneous data sources, sharing the various optimization techniques and features of these engines remains challenging due to
the lack of formal operational semantics for the general mapping processes. A set of algebraic mapping operators can provide the
necessary operational semantics for general mapping processes, establish a theoretical foundation for mapping languages, and
facilitate the introduction and evaluation of a compliant implementation, that is capable of interpreting and executing multiple
mapping languages. In this paper, we propose such an algebra based on the SPARQL algebra. This allows us to maximally reuse
established definitions, and further bridge the world of knowledge graph generation with query engines. To evaluate that our
work is not limited to a single specific mapping language, we translated mapping languages ShExML and RML to our mapping
plan composed of algebraic mapping operators. The results of our completeness evaluation shows that our algebraic operators
cover the operational semantics of RML and partially for ShExML. To fully cover ShExML, further analysis into ShExML’s
concise operational semantics is needed (e.g. for joining data from two input sources). For performance evaluation, our proofof-concept algebraic mapping engine has a consistent memory usage of around 500 MB across the different workloads, and
achieved second place in the Knowledge Graph Construction Workshop’s performance challenge. Algebraic mapping operators
decouple mapping engines from the mapping languages, enabling multilingual mapping engines. Furthermore, the mapping
plan can incorporate optimization techniques as a separate process from the mapping itself, allowing us to benefit from stateof-the-art mapping process optimizations. The proposed set of algebraic mapping operators will lay the foundation for future
studies on the theoretical analysis of complexity and expressiveness of mapping languages, and will provide consistency in the
execution semantics of mapping engines. Furthermore, the alignment of our algebra with SPARQL will enable further research
into advanced methods such as virtualization, enabling heterogeneous data querying.