A smart data case study using Wikidata to expand access and discovery in the Schoenberg Database of Manuscripts

Tracking #: 3488-4702

L.P. Coladangelo
Lynn Ransom

Responsible editor: 
Guest Editors Wikidata 2022

Submission type: 
Tool/System Report
This case study explored the results and lessons learned from the initial contribution of over 10,000 name identifiers to Wikidata and considered the use of Wikidata for enhancement of data related to premodern manuscripts. Wikidata, as a Linked Open Data (LOD) repository and hub, was used in the semantic enrichment of a particular dataset from the Schoenberg Database of Manuscripts (SDBM) Name Authority, yielding unique insights only possible from linking data from Wikidata and the SDBM. Mapping named entity metadata related to premodern manuscripts from one context to another was also explored, with a particular emphasis on determining property alignments between the linked data models of the SDBM and Wikidata. This resulted in a workflow model for LOD management and enhancement of name authority data in library, archive, and museum (LAM) contexts to encourage the manuscript studies community to contribute further data to Wikidata. This research demonstrates how the application of smart data principles to an existing dataset can address knowledge gaps related to people traditionally underrepresented in the digital record and opens new possibilities for access and discovery.
Full PDF Version: 

Reject (Two Strikes)

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 18/Jul/2023
Major Revision
Review Comment:

The authors have enriched the paper with a lot of information. I appreciated the work the authors performed. The article had a great benefit from this. However, I suggest highlighting the changes not only in the cover letter but also in the manuscript for future papers since I tried finding this information in the paper a challenging work.
Regarding Section 3.3, the authors added examples of other External IDs present in Wikidata as well as other similar Wikidata projects, but they have not clarified how their project is positioned in comparison with other ones that had reached the same aim, using different IDs or VIAF IDs too. Moreover, the advantages of adopting the proposed approach should be described and highlighted in the manuscript.

This contribution is an example of integration between different KBs, and this work indubitably enriches Wikidata. At the same time, the proposed approach is basic from both a technical and theoretical point of view.
To sum up, if the authors add the requested information to the paper and provide proper answers to all the questions of the two other Reviewers and the Editor believes that this paper fits the aims of the Special issue, I'm willing to reconsider my first decision.

Review #2
Anonymous submitted on 13/Nov/2023
Review Comment:

Short Summary

This paper describes the activities undertaken to link a portion of an existing open access dataset to Wikidata. This dataset is called Schoenberg Database of Manuscripts (SDBM), and it contains detailed information related to pre-modern manuscripts. The proponents facilitated the creation of identifier properties in Wikidata, P9756 (SBDM Agent ID) and P9757 (SBDM Place ID), to enable linking of SBDM entities. The paper contains descriptions of the reconciliation process, such as the selection of fields for reconciliation and their corresponding success rates. To illustrate the value of semantic enrichment, the proponents provided several SPARQL queries to answer questions which would not have been possible without explicit links to Wikidata entities.

This paper was submitted under the Tools and Systems Report. However, the work provides neither a tool nor a system related to Semantic Web. If the paper provided a thorough description of the entities and their properties in SBDM as an authority dataset, it could provide more value to be submitted under Dataset Descriptions or Descriptions of ontologies.

Strong Points:
- Facilitated the creation of Wikidata properties relevant to pre-modern manuscripts dataset.
- Provided SPARQL Queries to illustrate semantic enrichment.
- The text reads well.

Weak Points
- The paper describes neither a tool nor a system.
- Literature review is lengthy but lacks focus. Wikidata was described rather thoroughly. However, literature pertaining to semantic enrichment of authoritative datasets is not sufficient.
- The SBDM Data Model does not adhere to Semantic Web naming conventions (i.e. Class names are not capitalized [sdbm:entries], the use of class name as prefix to property names [sdbm:entries_*]).
- ER Diagram is incomplete (i.e. cardinality notation is missing, relationships are not illustrated).

Minor Comments:
- Some parts of the text are repetitive.
- Layout of tables could be optimized to combine Tables 1-10; white space on page 9.

5: very high, 4: high, 3: good, 2: low, 1: very low

Quality (1-5) : N/A (Neither a tool nor a system)
Importance (1-5) : 2 (Marginal: it does not introduce a novel method. Reconciliation on identifier and type fields, such as VIAF ID, is trivial.)
Relevance (1-5): 4 (Entity reconciliation and dataset enrichment are very relevant in cultural heritage datasets.)
Impact (1-5) : 2 (The paper’s contribution is limited to the creation of additional Wikidata properties relevant to SBDM.)
Readability (1-5): 3 (The paper reads well.)

Overall Impression Score: 44 (Total Score (11 = 2+4+2+3) * (100/25))

Reviewer's confidence: 4
4: (high) Quite sure. I tried to check the important points carefully and am familiar with related work. It’s unlikely, though conceivable, that I missed something that should affect my ratings.