RQSS: Referencing Quality Scoring System for Wikidata

Tracking #: 3326-4540

This paper is currently under review
Authors: 
Seyed Amir Hosseini Beghaeiraveri
Alasdair J G Gray
Fiona McNeill

Responsible editor: 
Maria Maleshkova

Submission type: 
Full Paper
Abstract: 
Wikidata is a collaborative multi-purpose knowledge graph with the unique feature of adding provenance data to the statements of items as a reference. About 73\% of Wikidata statements have provenance metadata, but there are few studies on the referencing quality in this knowledge graph, with existing studies focusing on relevancy and trustworthiness. While there are existing frameworks to assess the quality of Linked Data, there are none focused on reference quality. We define a comprehensive referencing quality assessment framework based on Linked Data quality dimensions. We implement the objective metrics of the assessment framework as the Referencing Quality Scoring System - RQSS. RQSS provides quantified scores by which the referencing quality can be analyzed and compared. RQSS scripts can also be reused to monitor the referencing quality regularly. Due to the scale of Wikidata, we have used well defined subsets to evaluate the quality of references in Wikidata using RQSS. We evaluate RQSS over three topical subsets: Gene Wiki, Music, and Ships, corresponding to three Wikidata WikiProjects, along with four random subsets of various sizes. The evaluation shows that RQSS is practical and provides valuable information, which can be used by Wikidata contributors and project holders to identify the quality gaps. Based on RQSS, the overall referencing quality in Wikidata subsets is 0.58 out of 1. Random subsets (representative of Wikidata) have higher overall scores than topical subsets by 0.05, with Gene Wiki having the highest scores amongst topical subsets. Regarding referencing quality dimensions, all subsets have high scores in accuracy, availability, security, and understandability, but have weaker scores in completeness, verifiability, objectivity, and versatility. Although RQSS is developed based on the Wikidata RDF model, its referencing quality assessment framework can be applied to knowledge graphs in general.
Full PDF Version: 
Tags: 
Under Review