Can you trust Wikidata?

Tracking #: 3376-4590

Authors: 
Veronica Santos
Daniel Schwabe
Sérgio Lifschitz

Responsible editor: 
Guest Editors Wikidata 2022

Submission type: 
Full Paper
Abstract: 
In order to use a value retrieved from a Knowledge Graph (KG) for some computation, the user should, in principle, ensure that s/he trusts the veracity of the claim, i.e., considers the statement as a fact. Crowdsourced KGs, or KGs constructed by integrating several different information sources of varying quality, must be used via a trust layer. The veracity of each claim in the underlying KG should be evaluated, considering what is relevant to carrying out some action that motivates the information seeking. The present work aims to assess how well Wikidata (WD) supports the trust decision process implied when using its data. WD provides several mechanisms that can support this trust decision, and our KG Profiling, based on WD claims and schema, elaborates an analysis of how multiple points of view, controversies, and potentially incomplete or incongruent content are presented and represented.
Full PDF Version: 
Revised Version:
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Tianyi Li submitted on 03/Jul/2023
Suggestion:
Minor Revision
Review Comment:

The paper offers a detailed analysis of the resources available within WikiData to support the acceptance and/or selection of its statements for purposes specific to users’ needs. They contemplated the user selection criteria to be task-specific, and analyzed only the resources available to aid such selections. They analyzed three types of resources, qualifiers, community-based rankings, and references. They highlighted the lack in their coverage and advocated for efforts to enrich these information.

I have one broad comment about the trust layer described in the paper: in general, for the purpose of accepting / rejecting a statement in WikiData, how trust-worthy are the references / rankings crowd-sourced from WikiData itself? I would expect higher accuracy for the trust layer, by consulting expert resources external to Wikidata and grounding statements to those external resources. Therefore, I feel that the authors’ suggestion to improve WikiData ontologies with expensive human participation could be better justified in comparison to the Information Retrieval based methods to extract existing relevant external resources, and a trust layer based on WikiData information could be compared with one comparing WikiData entries to their natural language supporting texts.

Another comment is, the qualifiers / rankings seem most relevant to the selection in case multiple values are associated to the same head-predicate pair. As such, it would be better to bring the multi-value discussions in section 4.4 up front and report the qualifiers / ranking statistics proportional to the number of multi-value head-predicate pairs in addition to the total numbers of triples.

A few more comments in the details and presentation of the paper:
1) in page 9, line 18-22 and 41-42, two concepts are mentioned, “mandatory required qualifier violation” and “mandatory required qualifier absence”, can you better define the two terms and clarify their differences?
2) in Table 2, page 6 line 5, and below, proportion percentages are rounded to different decimals in different rows, for clarity of presentation and consistency, it is preferable to round them consistently and add trailing 0s to clearly show the precision of reported values.
3) The section 4 of the paper feels a bit crowded with tables, and the discussions in section 5 feel a bit far apart from the crucial numbers. It would be desirable to summarize the important values in more concise tables, bring discussions in section 5 forward close to the numbers, and put the rest of the detailed statistics in the appendix (if applicable).

Review #2
By Elisavet Koutsiana submitted on 13/Jul/2023
Suggestion:
Reject
Review Comment:

In my understanding, this paper proposes a methodology using a number of Wikidata “mechanisms” to asses if we can trust the data in Wikidata. The authors use information related to statements (i.e. “A Statement is a piece of data about an item, recorded on the item's page.” (https://www.wikidata.org/wiki/Wikidata:Glossary)). They present descriptive statistics for the properties including specific qualifiers, statement rank (normal, preferred, deprecated), and references. The authors argue in the end that these mechanisms are not trusted enough, and they should be improved.

The subject of this research, data quality, is essential and gives great value to the field and the Wikidata KG. This work aims to propose a new aspect to examine whether we can trust the data; presents an overview of statistics related to properties; suggests interesting future directions for research. However, I found this paper difficult to follow, and it is unclear to me if I have understood the full methodology and the results correctly.

General comments:
- The authors may need to consider organising the paper differently. A more simple structure with standard sections, e.g., (1) Introduction, (2) Background, (3) Related work, (4) Data, (5) Methodology, (6) Results, (7) Discussion, and (8) Conclusion and Future work could improve the narrative of the paper and give a clear understanding to the methodology and results.
- The paper could also benefit from concrete paragraphs. In many cases, there are one-sentence paragraphs usually connected with the previous or the next paragraph. For example, Section 5.1 Discussion is full of them. Unclear paragraphs made it hard to follow the narrative of the sections. A nice change would be, for each section, to write down a number of topics that the authors want to mention there and then write a paragraph for each of these topics. Small and simple sentences could also improve understanding.
- It would be more transparent if the authors considered adding more references to their arguments. For example, on page line 36 about applications, page 2 lines 42-46, etc.

Introduction - Data in Wikidata:
- I would appreciate it if the authors keep this section as “Introduction” and remove “Data in Wikidata”. I believe the authors here aim to explain (1) why the subject of the paper is important?, (2) what others did about this?, (3) what the authors did in this paper?. However, it is hard for the reader to understand the narrative. This sectioning could benefit from better paragraph structure as to the previous indication, and some extra information about contributions and implications to understand the value of this work.
- Lines 38-45 could be easier to understand if the authors first explain the terminology of Wikidata. For example, Figure 1 shows an example of an item. The figure could benefit from signs to highlight the item, claim, statement, property, qualifiers etc. I suggest including a section Background to describe Wikidata practices like terminology and all the descriptions we find on the section Motivation - Incongruences in Wikidata. After understanding how Wikidata works, the reader can follow the methodology and results.

Motivation - Incongruences in Wikidata:
- This section describes Wikidata. I suggest changing the title to “Background” and providing all Wikidata practices described in multiple places in the paper (constraints, qualifiers, the property “disputed by” etc). Clear examples for every case and highlighted Wikidata screenshots could improve the understanding.
- This section is mixed of Wikidata descriptions and methodology. It would be more clear if the authors could for example describe what is constraints in Section Background, and how and why they use constraints in Section Methodology.
- The authors present three types of situations to investigate trust decisions: incompleteness, incongruences, and controversies. In my understanding, these three are related to qualifiers, but this is not clear in the manuscript. This information may need to be described in Section Methodology

The trust process using WD :
- This Section starts to explain the methodology. It describes the information used for the methodology, qualifiers, statements rank, and references. It would be more clear if the authors could explain more about their methodology after the terminology page 4 line 8. It could be useful to connect here the qualifiers with the completeness, incongruences, and controversies.
- The authors mention here that they call this methodology “TrustLayer”. However, in Abstract and Future work, they call it “KG profiling”. I would suggest keeping one name that would be introduced in the Abstract and Introduction.
- I would move the description of the property “disputed by” to the section Background.
- It would be very helpful if we have examples for the terminology on page 4 lines 4-8.

WD support for the trust process:
The descriptive statistic presented in this section are very informative and provide useful information related to property characteristics. However, it is not clear to me if these are the results or an exploratory analysis. Assuming that these are the results here are my suggestions:
- The authors start by describing the Data. It would be more clear if they add a separate section about Data. Section Data could describe what data the authors use and the descriptive statistic of pages 4-6 about the examined properties (predicate, qualifications and qualifies), properties constraints (could it be possible to explain what is “none” in Table 1), properties qualifiers, and claims. The rest of the data in this section could maybe form the Section “Results”. The subsections 4.1, 4.2, 4.3, 4.4, 4.5 could be the subsection of “Results”. I found it nice that they are split based on the five examined characteristics.
- Footnote 4 states that some data are missing from the initial dataset. I think this is a piece of important information that should be included in the main manuscript. The authors could be more detailed about what and why is missing and the reader will know when the authors mention this again on page 5 line 7.
- I would appreciate it if the authors could include labels for the y-axis in the Figures.
- It would be nice if the authors could add more on what is the difference between “Frequency” and “Accumulated Frequency” in the Tables.

Conclusions:
- It would be more clear if the authors could create sections from the subsections here. After Section Results maybe the Related work could follow, then the Discussion, and finally Conclusions and future work.
- In the subsection Discussion, I expected to get a better understanding of the paper’s results and the authors’ arguments. However, it is still not clear to me how the results evaluate the data in Wikidata. In my opinion, a clear narrative in the Introduction and a methodology plan could really improve the understanding of the paper.
- In the end, the authors argue that “WD’s support for trust decisions about its statements is low and could be improved”. It would be interesting to read here if the authors have any thoughts related to improvements.

Long-term stable URL for resources:
- The authors use a GitHub repository to share their data and scripts.
- The repository includes a README.md file with descriptions regarding the data

Review #3
Anonymous submitted on 14/Jul/2023
Suggestion:
Accept
Review Comment:

The journal provides a comprehensive analysis of the veracity issues arising from the use of Wikidata (WD). The authors emphasize on the controversies and incongruences that may arise from the gathered information in WD. Despite the existence of constraints, properties, and qualifiers in WD to increase the trustworthiness of the knowledge graph, the authors demonstrate that in practice, the current WD is facing different anomalies that do not fully respect the set of constraints. For example, only 3.6% of the required qualifier constraints are fulfilled. As a result, the authors recommend the use of a Trust Process to identify and to assess the information in WD better and to consider cross-information for further cross-validation and information proofing. The authors provide interesting statistics and examples that reveal the exposed veracity breach that exists in WD and the holes that need to be filled, such as the explicit disputes.

Regarding the writing, the article is written in clear and simple English, making it easy to understand. Although some sentences are too long, they do not impact the reading experience.

Minor corrections:
- Page 7 – Line 17: "Fig. 4. An example of a dispute with multiple points of views." -> "Fig. 4. An example of a dispute with multiple points of view."
- Page 12 Line 41: Bullet points might be an option to list out statistics.
- Page 13, Line 19: "four qualifiers" -> "4 qualifiers" (to be consistent with previous formalism)
- Page 14, Line 20: "aren’t" -> "are not"
- Page 15, Line 40: Remove space between the text and footnote 12.
- Page 16, Line 27: "insight into" -> "insight on"
- Page 17, Line 9: "which we already indicated is not provided in" -> "which we already indicated that it is not provided in"
- Page 17, Line 38: Remove « fully » in "fully 61,7%".
- Page 19, Line 26: "aided by" -> "helped by"
- Page 19, Line 34: "considered context" -> "considered as/a context"

Overall, this article provides valuable insights on the veracity issues that arise from WD and highlights the need for a Trust Process to improve the trustworthiness of the information in WD.