Linking Women Editors of Periodicals to the Wikidata Knowledge Graph

Tracking #: 2845-4059

Katherine Thornton
Kenneth Seals-Nutt
Marianne Van Renmoortel
Julie M. Birkholz
Pieterjan De Potter

Responsible editor: 
Special Issue Cultural Heritage 2021

Submission type: 
Full Paper
Stories are important tools for recounting and sharing the past. To tell a story one has to put together diverse information about people, places, time periods, and things. We detail here how a machine, through the power of Semantic Web, can compile scattered and diverse materials and information to construct stories. Through the example of the WeChangEd research project on women editors of periodicals in Europe from 1710 - 1920 we detail how to move from archive, to a structured data model and relational database, to Wikidata, to the use of the Stories Services API to generate multimedia stories related to people, organizations and periodicals. As more humanists, social scientists and other researchers choose to contribute their data to Wikidata we will all benefit. As researchers add data, the breadth and complexity of the questions we can ask about the data we have contributed will increase. Building applications that syndicate data from Wikidata allows us to leverage a general purpose knowledge graph with a growing number of references back to scholarly literature. Using frameworks developed by the Wikidata community allows us to rapidly provision interactive sites that will help us engage new audiences. This process that we detail here may be of interest to other researchers and cultural heritage institutions seeking web-based presentation options for telling stories from their data.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Lydia Pintscher submitted on 21/Aug/2021
Review Comment:

The paper describes the steps taken by the authors to add data about women editors of periodicals to Wikidata and then make use of that data in a web application called WeChangEd Stories.

The authors detail the steps taken from collecting the data, applying for a new property in Wikidata, getting approval for an import bot, importing data, augmenting and interconnecting the imported data and finally querying it to find interesting new information and make use of the data in their web application.

The paper stands out for two reasons.
First, the authors clearly explain the benefits of opening up research data in the humanities via Wikidata for Wikidata and the world at large. But equally, they highlight the benefits they themselves received from adding this data to Wikidata in the form of error correction by the Wikidata Community, wider reach of their research, augmentation of their data through a myriad of other data on Wikidata and being able to rely on an ecosystem of tools for further work with the data. This mutual benefit is often overlooked or not described with such clear examples as in this paper.
Second, the authors describe the individual steps of getting data into Wikidata, including often overlooked parts like using the EditGroups tool. This will make it easier for future researchers to follow similar processes for their own data.

Review #2
By Filip Ilievski submitted on 21/Aug/2021
Minor Revision
Review Comment:

Inspired by the underrepresentedness of women in Wikidata and the richness of information in social science archives, this paper proposes to link women editors of historical journey periodicals to Wikidata. The method of aligning the two resources is described in detail. The paper then uses a web interface to allow non-semantic-web users to navigate the data, and makes a case for inclusion of other social science datasets into Wikidata, in line with Wikidata's development plan and a goal to increase diversity.

The strong points of the paper are that: 1) it addresses a real problem with representativeness of knowledge, in terms of both gender and time; 2) it establishes a natural two-way connection between Wikidata and social sciences, where each can benefit the other; 3) it sets an example for future projects that integrate social science and the semantic web; 4) the online demo is really nice.

This version is notably improved, and it addresses my main concerns, so I'd vote for acceptance. Suggestions for the final version of the paper:
* it would be good to provide an indication of how well the OpenRefine software was able to perform the alignment between the two sources, and how many human edits were needed during the validation stage.
* Tables 1-3 are informative, it would be informative to indicate the number of statements for each property if this number is different across properties of the same type.
* Section 7 would benefit from a use case-driven (or question-driven) example. For instance, what questions might a user have, that would benefit from exploring the page of Lady Mary Wortley Montagu?
* Note that table captions should be placed above the table. Figure captions should be below (the latter is done correctly)
* Typos: 'snipit' -> 'snippet', 'as seen in 4' -> 'as seen in Figure 4'

Review #3
Anonymous submitted on 18/Sep/2021
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

This review is an update to my earlier review where I had recommended minor revisions. The paper is interesting and well written. I still find the emphasis on LOD to be unnecessary, but this is a minor issue and the other reviewers have not objected to it, so I will not insist on revising this aspect of the paper. The other minor issues are corrected, and the paper is much improved in response to the changes requested in other reviews.