CaLiGraph: A Knowledge Graph from Wikipedia Categories and Lists

Tracking #: 3801-5015

Authors: 
Nicolas Heist2
Heiko Paulheim

Responsible editor: 
Eva Blomqvist

Submission type: 
Dataset Description
Abstract: 
Knowledge Graphs (KGs) are increasingly used for solving or supporting tasks such as question answering or recommendation. To achieve a useful performance on such tasks, it is important that the knowledge modelled by KGs is as correct and complete as possible. While this is an elusive goal for many domains, techniques for automated KG construction (AKGC) serve as a means to approach it. Yet, AKGC has many open challenges, like learning expressive ontologies or incorporating long-tail entities. With CaLiGraph, we present a KG automatically constructed from categories and lists in Wikipedia, offering a rich taxonomy with semantic class descriptions and a broad coverage of entities. We describe its extraction framework and provide details about its purpose, resources, usage and quality. Further, we evaluate the performance of CaLiGraph on downstream tasks and compare it to other popular KGs.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 14/Mar/2025
Suggestion:
Accept
Review Comment:

thanks for the revision. Paper looks good now.

Review #2
Anonymous submitted on 02/Apr/2025
Suggestion:
Accept
Review Comment:

This updated version of the paper presents the last version of CaLiGraph, a knowledge graph automatically constructed from Wikipedia’s semi-structured content, with a specific focus on leveraging both categories and list pages. The authors have taken into account all reviewer feedback and successfully addressed the remarks, resulting in an even more polished and complete contribution.

Summary:
CaLiGraph stands out due to its emphasis on generating rich, axiom-enhanced taxonomies and capturing long-tail entities (with the help of lists). The authors describe a comprehensive pipeline for ontology construction (including taxonomy induction and axiom learning) and knowledge graph population (including named entity recognition, disambiguation, typing, and relation extraction). The key strength of the paper remains its ability to extract expressive semantic class descriptions and integrate a large number of entities into a coherent and usable KG already used by some applications.

Remark:
A minor remark: I would have liked a comparison with YAGO4.5 (or Wikidata?) in Section 5.3 "Evaluation via Downstream Tasks". As this comparison wouldn't have been fair in Table 2 due to the Wikidata vs DBpedia schema and construction differences, I believe the comparison would have been fair for downstream tasks.

Conclusion:
The revised manuscript is still very well-written, technically sound, and enriched with further clrity and detail. The discussions of limitations and quality assurance are transparent, and the authors have proposed realistic strategies for both improvement and sustainability (despite the webpage not being accessible at the time of the review).
In conclusion, this is a strong and valuable contribution to the Semantic Web and KG construction communities. The improvements have strengthened an already solid paper, and I fully support its acceptance.