Link maintenance for integrity in linked open data evolution: literature survey and open challenges

Tracking #: 2468-3682

Authors: 
Andre Regino
Julio Cesar dos Reis
Rodrigo Bonacin
Ahsan Morshed1
Timos Sellis1

Responsible editor: 
Oscar Corcho

Submission type: 
Survey Article
Abstract: 
RDF data has been extensively deployed describing various types of resources in a structured way. Links between data elements described by RDF models stand for the core of Semantic Web. The rising amount of structured data published in public RDF repositories, also known as Linked Open Data, elucidates the success of the global and unified dataset proposed by the vision of the Semantic Web. Nowadays, semi-automatic algorithms build connections among these datasets by exploring a variety of methods. Interconnected open data demands automatic methods and tools to maintain their consistency over time. The update of linked data is considered as key process due to the evolutionary characteristic of such structured datasets. However, data changing operations might influence well-formed links, which turns difficult to maintain the consistencies of connections over time. In this article, we propose a thorough survey that provides a systematic review of the state of the art in link maintenance in linked open data evolution scenario. We conduct a detailed analysis of the literature for characterising and understanding methods and algorithms responsible for detecting, fixing and updating links between RDF data. Our investigation provides a categorisation of existing approaches as well as describes and discusses existing studies. The results reveal an absence of comprehensive solutions suited to fully detect, warn and automatically maintain the consistency of linked data over time.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Mikel Emaldi Manrique submitted on 22/May/2020
Suggestion:
Accept
Review Comment:

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.

In this paper, the authors present a survey about methods and algorithms responsible for detecting, fixing, and updating links between LOD datasets. They formalize the problem of broken links appropriately and they present and categorize different works related to the topic following a well-explained methodology. In this resubmission, I consider that authors have adequately tackled all proposed suggestions.

(2) How comprehensive and how balanced is the presentation and coverage.

In this resubmission, authors have performed a deeper analysis of each of the solutions covered by the survey. They have explained the rules followed to include or not a work into the survey clearly, and they have properly improved the discussion section of the paper.

(3) Readability and clarity of the presentation.

They have fixed all the major issues regarding the clarity and readability of the presentation. In section 3.1, I will replace "papers noncompliant with international standards" by "papers noncompliant with academic best-practices", or include a reference to those "international standards".

(4) Importance of the covered material to the broader Semantic Web community.

This survey tackles an interesting topic for the community that has not been fully solved nowadays, as it seems that there is not a tool for detecting and/or fixing broken links in a fully automatic manner.

Review #2
Anonymous submitted on 26/Jun/2020
Suggestion:
Accept
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.

The authors have satisfactorily addressed almost all my concerns about the manuscript in their revision, through the addition new sections, which makes the comparative analysis easier to follow.

The authors provided a convincing argument about the absence of an experimental analysis of the compared tools.

I have just some minors remarks on the revised manuscript:
- For the category of “Link management mechanisms” it is not clear why the authors restricted the study on only 5 works (in table 15), especially for PARIS which does not consider data evolution.
- I would suggest a minor re-organization of section 4: Put subsection 4.1 in section and rename the section  4. Publication analysis. Create a new section 5. Link maintenance approaches and put all the sections 4.2 – 4.9
- In section 5.1 (comparative analysis) I would suggest keeping the original names of the system and not use VERSIO-1, HYBRID-1, … and add the references in table 18.
- In section 4.6, paragraph “Continuing the work in Vesse …” two misspelling errors to be fixed.

- I believe the manuscript can be accepted for publication, contingent on the authors making these minor corrections.