Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Linked Data Quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO

Submitted by Michael Färber on 04/09/2016 - 14:27

Tracking #: 1366-2578

A new version of this paper is available

Authors:

Michael Färber

Basil Ell

Carsten Menne

Achim Rettinger

Frederic Bartscherer

Responsible editor:

Guest Editors Quality Management of Semantic Web Assets

Submission type:

Survey Article

Abstract:

In recent years, several noteworthy large, cross-domain and openly available knowledge graphs (KGs) have been created. These include DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Although extensively in use, these KGs have not been subject to an in-depth comparison so far. In this survey, we provide data quality criteria according to which KGs can be analyzed and analyze and compare the above mentioned KGs. Furthermore, we propose a framework for finding the most suitable KG for a given setting.

Full PDF Version:

swj1366.pdf

Revised Version:

Linked Data Quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO

Previous Version:

A Comparative Survey of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO

Tags:

Reviewed

Decision/Status:

Minor Revision

Solicited Reviews:

Click to Expand/Collapse

Review #1

Anonymous submitted on 13/May/2016

Suggestion:
Minor Revision

Review Comment:

This paper aims to measure the quality of five famous knowledge graphs by many criteria, and proposes a framework to recommend the most suitable knowledge graph for a given setting. As a survey, the authors focus on the quality of knowledge graphs and give lots of metrics in multi dimensions. The detailed description for every metric, including definition, discussion and formalization, has directive significance for KG evaluation. The five KGs are representative ones in semantic web domain. Comparisons among them help readers to understand knowledge graph well. Besides, the readability and organization of this paper are greatly appreciated.
Contrast to the previous version, the paper changes a lot in the aspects of organization, expression, metric and so on. The new version gives specific definition for data quality and provides solutions for former suggestions and improves a lot. The authors add many solid criteria such as “Accuracy”, “Trustworthiness” and so on. They also describe the criteria in detail and classify them. The five KGs are counted in the aspects of trips, class, entity and relation, and measured according to the mentioned criteria. However, there are still some comments as follows:
1. The figures are lack of further description. For example, in Figure 4, why there are much more media entities in Freebase? Why there are no organization entities in Wikidata or no biology entities in Freebase? It’s strange as the two KGs are cross-domain.
2. Although the paper proposes a recommend framework and detailed steps, it has no concrete example for how to use it. Giving a particular scenario, or listing the existing applications based on the five KGs will make the paper more practical.

Review #2

By Sebastian Mellor submitted on 08/Jul/2016

Suggestion:
Accept

Review Comment:

The authors have significantly improved upon their original paper where they demonstrated many of the common features and differences between publicly available knowledge graphs. The number of measures used for analysis has reduced by one, but a number of additional statistics and more in depth explanations have been provided.

All of the metrics are now well described with explicit quantification which should allow a reader to reproduce the results exactly and apply the 34 metrics to any new KG.

Metrics such as ability to rank data, and existence/support of, e.g., owl:disjoint, could easily skew the result in favour of a particular data set with many ‘features’ without regard for quality, of course it would be up to the reader to choose a sensible weighting for such measures. This is a situation where the metrics now being valued continuously between 0 and 1 rather than simply true/false allows for better flexibility.

I feel that the gold standard is at the most risk of being subjective, while the metrics derived from it (completeness) are some of the more important values. I trust that the authors have truly chosen a suitable gold standard set, their descriptions would suggest so and the resulting values appear fair.

I enjoyed the expanded discussions and evaluations of each metric, in addition to the summary table of references. I would be happy using the information within this paper and being able to back up each decision and only ask the authors to perform a final spell-check, e.g., p5. firectly => directly.

I thank the authors for answering my previous questions and addressing mine and the other reviewers’ concerns.

Log in or register to post comments
17161 reads

Main menu

Editorial Board

Syndicate

Linked Data Quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO

Tracking #: 1366-2578

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Linked Data Quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO

Tracking #: 1366-2578

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles