Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Incremental Knowledge Graph Construction from Heterogeneous Data Sources

Submitted by Dylan Van Assche on 12/17/2024 - 09:30

Tracking #: 3790-5004

This paper is currently under review

Authors:

Dylan Van Assche

Julian Rojas

Ben De Meester

Pieter Colpaert

Responsible editor:

Cogan Shimizu

Submission type:

Full Paper

Abstract:

Sharing real-world datasets that are subject to continuous change (creates, updates and deletes) poses challenges to data consumers, e.g., reconciling historical versioning, handling change frequency. This is evident for Knowledge Graphs (KG) that are materialized from such datasets, where keeping the graph synchronized with the original datasets is only achieved by frequently and fully regenerating the KG. However, this approach is time-consuming, loses history, and wastes computing resources due to reprocessing of unnecessary data. In this paper, we present a KG generation approach that is capable of efficiently handling evolving data sources with different data change signaling strategies. We investigate different change signaling strategies observed in real-world data sources, propose the corresponding algorithms to detect data changes, and introduce a declarative approach that relies on RML and FnO to materialize and characterize data changes for evolving KGs. Our approach allows to optionally and automatically publish detected data changes in the form of a Linked Data Event Stream (LDES), relying on the W3C Activity Streams 2.0 vocabulary to describe changes semantically. This way, changes can be communicated to consumers over the Web. We implement our approach in the RMLMapper as IncRML (Incremental RML). We functionally evaluate our approach on a set of test cases, and quantitatively using a modified version of the GTFS Madrid Benchmark (taking change into account), and various real-world data sources (bike-sharing, transport timetables, weather, and geographical). On average, our approach reduces the necessary storage and used computing resources for generating and storing multiple versions of a KG (up to 315.83x less storage, 4.59x less CPU time, and 1.51x less memory), reducing KG construction time up to 4.41x. The performance gains are more evident for larger datasets, while for smaller datasets, our approach’s overhead partially nullifies such performance gains. Our approach reduces the overall cost of publishing and maintaining KGs, which may contribute to the uptake of semantic technologies. Moreover, the use of LDES as a Web-native publishing protocol enables that not only the KG publisher benefits from the concise and timely communication of changes, but also third-parties if the publisher chooses to make it public. In the future, we plan to explore end-to-end performance, possible optimizations on our change detection algorithms and the use of windows to expand our approach to streaming data sources.

Full PDF Version:

swj3790.pdf

Previous Version:

IncRML: Incremental Knowledge Graph Construction from Heterogeneous Data Sources

Tags:

Under Review

Long-term Stable Link to Resources:

https://doi.org/10.5281/zenodo.14038823

Log in or register to post comments
513 reads

Main menu

Editorial Board

Syndicate

Incremental Knowledge Graph Construction from Heterogeneous Data Sources

Tracking #: 3790-5004

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Incremental Knowledge Graph Construction from Heterogeneous Data Sources

Tracking #: 3790-5004

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles