iLOD: Preserving the Linked Open DataCloud through the Interplanetary File System

Tracking #: 2686-3900

This paper is currently under review
John McCrae
Jamal Nasir

Responsible editor: 
Guest Editors Advancements in Linguistics Linked Data 2021

Submission type: 
Full Paper
One of the main challenges that World Wide Web and Semantic Web applications face is infrastructural load in terms of availability, immutability, and security. The Linked Open Data (LOD) cloud is failing this challenge due to the brittleness of its decentralisation and a growing obsolescence of datasets. In this work, we present iLOD: a dataset linking technique and a peer-to-peer decentralized storage infrastructure using the InterPlanetary File System (IPFS). iLOD, a dataset sharing system that leverages content-based addressing to support a resilient internet, can speed up the web by getting nearby copies. We empirically analyze and evaluate the availability limitations of LOD cloud and propose a distributed system for storing and accessing linked datasets without requiring any centralized server. Of the approximately 0.2 million datasets we pre-processed, more than $90,000$ were added to iLOD together with their metadata information. The dataset linking algorithm found 719,253 links between these datasets and around 32% datasets are linked to at least ten or more other datasets. Our dataset clustering algorithm identified 12,324 clusters with 67% clusters having at least six or more datasets.
Full PDF Version: 
Under Review