Instance Level Analysis on Linked Open Data Connectivity for Cultural Heritage Entity Linking and Data Integration

Tracking #: 3026-4240

Authors: 
Go Sugimoto

Responsible editor: 
Guest Editors KG Validation and Quality

Submission type: 
Full Paper
Abstract: 
In cultural heritage, many projects execute Named Entity Linking (NEL) through global Linked Open Data (LOD) references in order to identify and disambiguate entities from their local datasets. It allows users to obtain extra information and contextualise the data with it. Thus, the aggregation and integration of heterogeneous LOD are expected. However, such development is still limited partly due to data quality issues. In addition, analysis on the LOD quality has not sufficiently been conducted for cultural heritage. Moreover, most research on data quality concentrates on ontology and corpus-level observations. This paper examines the quality of the eleven major LOD sources used for NEL in cultural heritage with an emphasis on instance-level connectivity and graph traversals. Standardised linking properties are inspected for 100 instances/entities in order to create “traversal maps”. Other properties are also assessed for quantity and quality. The outcomes suggest that the LOD is not fully interconnected and centrally condensed; the quantity and quality are unbalanced. Therefore, they cast doubt on the possibility to automatically identify, access, and integrate known and unknown datasets. This implies the need for LOD improvement, as well as the NEL strategies to maximise the data integration.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Miel Vander Sande submitted on 08/Mar/2022
Suggestion:
Accept
Review Comment:

The paper contents and presentation have significantly improved after revision by the authors. It reads a lot more fluently and feels less verbose, despite being as long as the previous version. The research objectives have been toned down to fit the analysis better and as a result, the reported research is much more sound. The authors addressed the reviewers comments and questions extensively andproperly in the accompanying letter, although there are of course always things that are up for debate. All things considered, I think the paper can be accepted in its current form.

I do suggest that the author revises section 3.2. The research question is not clearly stated and it therefore not clear what the reported research's success criteria are/were. The narrative on "Who?", "What?", "When?", "Where?" is very vague and verbose (I think the ontology context is relevant, but not here). Just state your exact RQs (eg. something like "Can agents be discovered using graph traversal in eleven major LOD sources?") or hypotheses. The RQs, or what is presented as such, is also neither clearly answered nor reprised in the conclusion.

There are also still some language errors and typos.
- "there is no doubt that we can automatically traverse graphs and aggregate LOD information ..." would be better as passive: "there is no doubt LOD information can be automatically traversed and aggregated"
- "Therefore, a close observation of instances are needed." -> "is needed"
- "3. Objectivities" -> do you mean objectives? Both are possible, but after reading the section, I'm not sure. Also, the term "validity" or "External validity" is more common I think.
- "...main entity with sting matching, ..." -> string matching
- You can remove the prefixes of the Turtle examples, or only state them once in the intro.
All in all, the author is a verbose writer - you can also tell from the reponse letter, so I think the text can always be made more consise (but it's ok now).

The author has understood my remark on instance similarity. If the end goal is to be able to fit everything in a SPARQL query (btw, it could help the scope if this was clearly stated in the paper ), I can see why you need to have standard properties. I just wonder how realistic sticking to graph traversal is in such scenario, but that's also what this paper unfolds.

Review #2
By Herminio Garcia-Gonzalez submitted on 18/Mar/2022
Suggestion:
Accept
Review Comment:

First of all, I would like to thank the author for taking the reviewers' comments and suggestions into consideration. After re-reading the article and looking to the author's answers I think the paper is more understandable, clear and self-explanatory. In addition used terminology is now better supported and unambiguous making the motivation and conclusions easier to follow. In particular, I really liked the take-home lessons for data consumers and producers as they can be really helpful for new LOD graphs to be published.

Being said that I do not see further issues for its publication and I only want to leave some minor typos and suggestions that can be taken into account for the final version:

#Introduction

As Tomasuzuk and Hayland-Wood [12] indicate that RDF → As Tomasuzuk and Hayland-Wood [12] indicate, RDF

perhaps most effort has put into → perhaps most effort has been put into

As such, the real value of LOD → As such, the full potential of LOD

what level of information can we find? [...] and content patterns for different types of entities? (I would structure this as research questions, i.e., RQ1, RQ2, etc. Then you can refer to them when you answer them throughout the paper.)

Section 2 deals with related research → Section 2 explores the related research

#Objectivities and Methodology → Objectives and Methodology

the process of defining objectivities → the process of defining objectives

The Group 1 primary [...] (Groups are mentioned but it is not clear if they are used further and why they are grouped).

of the main entity with sting matching → of the main entity with string matching

#Linked Open Data Analysis

Traversability is better than other categories → Traversability for places is better than in other categories

It seems that it extracted a great deal of data from Wikpedia → It seems that it extracted a great deal of data from Wikipedia

#Conclusion → Conclusions (As having different subsections I think it is better in plural)

By removing obstacles found in this article, LOD traversing and date integration become more feasible for the end-users with help → By removing the obstacles found in this article, LOD traversing and data integration become more feasible for end-users with help