Can you trust Wikidata?

Tracking #: 3538-4752

Authors: 
Veronica Santos
Daniel Schwabe
Sérgio Lifschitz

Responsible editor: 
Guest Editors Wikidata 2022

Submission type: 
Full Paper
Abstract: 
In order to use a value retrieved from a Knowledge Graph (KG) for some computation, the user should, in principle, ensure that s/he trusts the veracity of the claim, i.e., considers the statement as a fact. Crowd-sourced KGs, or KGs constructed by integrating several different information sources of varying quality, must be used via a trust layer. The veracity of each claim in the underlying KG should be evaluated, considering what is relevant to carrying out some action that motivates the information seeking. The present work aims to assess how well Wikidata (WD) supports the trust decision process implied when using its data. WD provides several mechanisms that can support this trust decision, and our KG Profiling, based on WD claims and schema, elaborates an analysis of how multiple points of view, controversies, and potentially incomplete or incongruent content are presented and represented.
Full PDF Version: 
Revised Version:
Previous Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Tianyi Li submitted on 14/Oct/2023
Suggestion:
Minor Revision
Review Comment:

The paper discusses the caveats in WD w.r.t supporting a "trust layer" determining whether a WD statement could be considered voracious and up-to-date. They analysed a range of resources that WD provides as potential inputs to a "trust layer", including explicit record of disagreements, data incompleteness, constraint violations w.r.t. required qualifiers, ranking, incongruences and provenance. They concluded with a warning that WD is underequipped for supporting this trust process without external assistance.

This paper has seen tremendous improvements from the previous draft, especially in terms of presentation clarity (e.g. added discussions at the beginning of section 2 and the end of section 3), although there still remains a few presentation issues as listed below. Overall I believe the findings in this paper are helpful for the community, where it could be further improved with a few changes, and a few more questions answered (see "questions to authors").

List of presentation issues:
- page 3, line 27: delete "either"
- page 6, line 3: "controversy" was not the focus in this subsection, consider putting it in a footnote rather than the first paragraph.
- page 11, line 6: where does 21.00% come from? Please further explain in text.
- page 11, line 14: the number of violations proportional to the number of claims WITH CONSTRAINTS would be more straightforward than that proportional to ALL claims;
- page 17, line 47: "constraints violation" -> "constraint violations"
- page 18, line 9: "trusty layer" -> "trust layer"
- page 19, line 48: "chose" -> "choose"

Questions to the authors:
- page 2, line 46: you said the values could be context-dependent, does this mean the predicates themselves are not well-defined? (e.g. biological mother vs. mother in a social sense)
- page 9, line 20: any idea why there are disproportionally many "astronomical filters" in WD?
- page 13, line 40: suppose these values are all true at the same time? Can you distinguish the mutually exclusive statements?
- page 16, line 42: any idea how we could make them more popular?

Review #2
By Elisavet Koutsiana submitted on 23/Oct/2023
Suggestion:
Accept
Review Comment:

The paper has radically improved. It reads better and is clear regarding the Wikidata background, methodology, results and discussion. The authors have applied all my comments. I understand what the KG Profiling is. Furthermore, the narrative behind the study of incompleteness, incongruences ranking, references, and provenance is straightforward. I suggest this paper for publication.

Review #3
Anonymous submitted on 30/Oct/2023
Suggestion:
Accept
Review Comment:

The minor corrections have been applied.
Regarding the text in blue, some modifications are needed:
- In line 31 of page 7: replace "approximate" with "approximately"
- I would suggest in the conclusion section, in blue, to remove the word "recap" which is informal (replace it with something more formal).

Overall, the corrections of other reviewers were also considered and modifications to the paper have been done.
The quality of the paper has therefore been improved.