Review Comment:
The paper describes a study on the connectivity and traversability of linked open data published and/or used in the cultural heritage domain. The analysis is carried out by investigating ten prominent data sets, some of them generic (e.g. DBpedia, YAGO, Wikidata) and others more specific to cultural heritage (e.g. VIAF, Getty vocabularies). The objective is to determine whether these datasets can be traversed in an automated manner, - i.e. by following only explicitly defined links via owl:sameAs, rdfs:seeAlso, skos:exactMatch, schema:sameAs - , in order to potentially create aggregated data views for instances included in different datasets, one of the premise of the linked data vision.
Positive points of this paper include:
- The subject of the study is timely and interesting. The linked data initiative and corresponding standards have been existing for some time, so it is interesting to examine the level and manner of adoption in cultural heritage. On the other hand there are still discussions on best practices for linked data publishing, so it is interesting to take a look at the current state of affairs.
- The methodology is described in an appropriate manner, including assumptions and limitations, in the first sections of the paper.
- The study made efforts to cover the subject from different perspectives: in addition to a statistical analysis on the connectivity of the datasets, a manual more qualitative analysis at instance level is also carried out.
Weak points:
- The English, in which the sections detailing most of the analysis are written, is poor. I would definitely recommend a thorough proof-reading. I assume that some strange terminology, such as "logical framework, reciprocal vector links" are also included because of that(?)
- The analysis is limited in coverage. Most importantly, it does not include cultural artifacts such as works of art, monuments and/or books. Although the author mentions this as a conscious choice when describing the scope of the paper - in footnote 3-, and it is certain that it is impossible to cover every type of artifact, still I believe that the above artifacts are too important for cultural heritage to be completely ignored. In the same manner, the absence of Europeana, a major source of linked data in the domain, is an important limitation - even if corresponding linked data is only available in JSON or the API is in alpha version, which are mainly technical reasons.
Minor points:
- The first section of the paper has as title "Background - Linked Open Data Quality" - I would replace that with "Introduction".
- Section 2.1. "Related Work" is part of section 2 "Methodology". That sounds strange. I would make "Related work" a completely separate section.
Typos, etc. (just a few of them, a really thorough proof reading is required):
***Section 1***
distributed data into connected global graphs -> into a connected global graph
which facilitate -> which would facilitate
logical framework:?? I don't understand this
One of the problems is the quantity and quality of data -> ?? Two of the most important issues are the limited quantity and low quality of data ??
***Section 2.1***
is the key for the users to navigate themselves in the network -> indicates whether users can navigate in the network
***Section 2.2***
This enables the users' automation of LOD traversals:?? I don't understand this
***Section 2.3***
Provide proper citations for Europeana EDM, CIDOC-CRM, FRBR, DCMI
***Section 2.4***
a various type of charts -> various types of charts
***Table 2***
4 dual identify data -> ??? 4 duplicated entities ???
10 dual identify data -> ??? 10 duplicated entities ???
***Section 3.2***
8 other links are found which bound for outside the 10 data sources -> ??? 8 other links are found to data sources that are not included in this study ???
and average (58.8) of the whole entities -> ??? and average (58.8) of the whole set of entities ???
understandable that Getty -> understandable since Getty
***Section 3.3***
Apart from DBpedia -> In addition to DBpedia
are not widely recognised -> are not widely used
no sources links -> no sources link
A little deviation -> A small deviation
Compare to -> Compared to
clearly exposes -> clearly illustrates
Unlike agents, Wikipedia is connected by YAGO -> Unlike for agents, Wikipedia is connected to YAGO
access detail information -> access detailed information
the semantic of rdfs:seeAlso is weak: ?? I don't understand this
A typical case of an entity -> A typical entity
***Section 3.4***
The economy of the creation of date entities may show serious issues: ?? I don't understand
***Section 3.5***
yet even more idiosyncratic than other entities: ?? I don't understand
***Fig. 5***
The amount of outgoing links to 10 data sources found in 20 agent entities -> The amount of outgoing links to the 10 data sources and 20 agent entities examined in this study
***Section 3.7***
The lowest source VIAF still hold over 37.2% -> The lowest source in terms of links to the other datasets, VIAF, still hold...
The statistics indicate the closed and close connections of 10 data sources..: ?? I don't understand
[11] note -> Ding et al. [11] note
Overall percentage is expectedly low -> The overall percentage is, unsurprisingly, low
What is a content property???
***Section 4.1***
for the representative entities for humanities research -> for representative entities of humanities research
***Section 4.2***
the first choice should be given to the standard properties -> standard properties should preferred
reciprocal links are needed with care -> reciprocal links need to be added with care
reciprocal vector links:?? what is this?
|