Learning SHACL Shapes from Knowledge Graphs

Tracking #: 2681-3895

This paper is currently under review
Pouya Ghiasnezhad Omran
Kerry Taylor
Sergio Rodriguez Mendez
Armin Haller

Responsible editor: 
Guest Editors KG Validation and Quality

Submission type: 
Full Paper
Knowledge Graphs (KGs) have proliferated on the Web since the introduction of knowledge panels to Google search in 2012. KGs are large data-first graph databases with weak inference rules and weakly-constraining data schemes. SHACL, the Shapes Constraint Language, is a W3C recommendation for expressing constraints on graph data as shapes. SHACL shapes serve to validate a KG, to underpin manual KG editing tasks and to offer insight into KG structure. We introduce Inverse Open Path (IOP) rules, a predicate logic formalism which presents specific shapes in the form of paths over connected entities. Although IOP rules express simple shape patterns, they can be augmented with minimum cardinality constraints and also used as a building block for more complex shapes, such as trees and other rule patterns. We define quality measures for IOP rules and propose a novel method to learn high-quality rules from KGs. We show how to build high-quality tree shapes from the IOP rules. Our learning method, SHACLearner, is adapted from a state-of-the-art embedding-based open path rule learner (OPRL). We evaluate SHACLearner on some real-world massive KGs, including YAGO2s (4M facts), DBpedia 3.8 (11M facts), and Wikidata (8M facts). The experiments show SHACLearner can learn informative and intuitive shapes from massive KGs effectively. Our experiments show the learned shapes are diverse in both structural features such as depth and width, and in quality measures.
Full PDF Version: 
Under Review