Psychiq and Wwwyzzerdd: Wikidata completion using Wikipedia

Tracking #: 3450-4664

Daniel Erenrich

Responsible editor: 
Guest Editors Wikidata 2022

Submission type: 
Tool/System Report
Despite its size, Wikidata remains incomplete and inaccurate in many areas. Hundreds of thousands of articles on English Wikipedia have zero or limited meaningful structure on Wikidata. Much work has been done in the literature to partially or fully automate the process of completing knowledge graphs, but little of it has been practically applied to Wikidata. This paper presents two interconnected practical approaches to speeding up the Wikidata completion task. The first is Wwwyzzerdd, a browser extension that allows users to quickly import statements from Wikipedia to Wikidata. Wwwyzzerdd has been used to make over 100 thousand edits to Wikidata. The second is Psychiq, a new model for predicting instance and subclass statements based on English Wikipedia articles. Psychiq’s performance and characteristics make it well suited to solving a variety of problems for the Wikidata community. One initial use is integrating the Psychiq model into the Wwwyzzerdd browser extension.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Fariz Darari submitted on 07/May/2023
Review Comment:

The manuscript is a revision over a previous version for which I was also a reviewer.

Overall, my previous review points have been addressed nicely by the author. Well done! I also agree that in general Wikipedia and Wikidata should be better aligned and Wwwyzzerdd (+ Psychiq) could be an enabler for it.

Please find below minor feedback for the revised manuscript:
- "As of November 2022, there are 359,581 items on Wikidata corresponding to English Wikipedia articles that have no P31 or P279 statements" -> I was wondering, out of these 350k items, how many items were eventually enriched by Wwwyzzerdd & Psychiq?
- What is interesting is that how Wikipedia is edited right now? How could Wikipedia cope with the speed of information? Are Wikipedia edits mostly done by bots or humans? Information on these matters is crucial since Wikipedia serves as the data source for Wwwyzzerdd.
- Typo: "Kian was gameified into a human-in-the-loop" -> gamified
- "Good or productive edits tend not to be reverted. In total 0.69% of edits made using the tool were reverted. For comparison, on a typical day, January 1 2023, about 1.3% of all edits made using the Wikidata web UI were reverted." -> This one is interesting. What could be a plausible explanation for the lower reversion rate of Wwwyzzerdd?
- How does Wwwyzzerdd solve the ambiguity problem, like when there are two possible properties between item X and Y?

Review #2
By Dimitris Kontokostas submitted on 30/May/2023
Review Comment:

the author addressed all the comments I made in my previous review. I continue to share the same concerns & impressions about this tool wrt the Wikidata policies as this is something someone could easily perform with DBpedia data

Review #3
Anonymous submitted on 10/Jun/2023
Minor Revision
Review Comment:

I would like to thank the author for their replies to the comments, I am happy with the edits that have been done in the new version. I now am more sided to accept the paper, considering (as the author repeatedly mentioned) it is a tool report paper. However, I can see that almost all my comments have been reluctated by the author in some way. It is fine and quite understandable, but I cannot accept excuses such as "I do this work in my spare time I have limited resources to expand this work", or "Frankly don’t want to bother my users with another poll just to justify publication". Such statements are not scientific (even in the cover letter and even for a tool report) submitted to a scientific journal. If you do not have the resources to perform a check asked by your reviewers, you need to add that as a limitation of your work and suggest it as the future work "in the paper". If you cannot (or do not like) to perform a questionary, you still need to admit that in the evaluation section. If you lack information about Linked Data quality, you need to mention in the paper something like: "The quality of the statements had been checked informally and further study should be conducted".

In this regard, I would like to see the proper statements (about the limitations or future work) added to the paper regarding the:
1. Linked Data quality criteria (Note that not all criteria are subjective, some criteria do not need human opinions in measuring the quality of triples/facts/etc.)
2. The advantage of primary sources over secondary sources and explicitly explain what would be advantages and disadvantages of using Wikipedia for the baseline.