Inferring Resource Types in Knowledge Graphs using NLP analysis and human-in-the-loop validation: The DBpedia Case

Tracking #: 2722-3936

This paper is currently under review
Mariano Rico
Idafen Santana-Pérez

Responsible editor: 
Guest Editors KG Validation and Quality

Submission type: 
Tool/System Report
Defining proper semantic types for resources in Knowledge Graphs is one of the key steps on building high quality data. Often, this information is either missing or incorrect. Thus it is crucial to define means to infer this information. Several approaches have been proposed, including reasoning, statistical analysis, and the usage of the textual information related to the resources. In this work we explore how textual information can be applied to existing semantic datasets for predicting the types for resources, relying exclusively on the textual features of their descriptions. We apply our approach to DBpedia entries, combining different standard NLP techniques and exploiting complementary information available to extract relevant features for different classifiers. Our results show that this approach is able to generate types with high precision and recall, above the state of the art, evaluated both on the DBpedia dataset (94%) as well as on the LDH gold standard dataset (80%). We also discuss the utility of the web tool we have created for this analysis, NLP4Types, which has been released as an online application to collect feedback from final users aimed at enhancing the Knowledge graph.
Full PDF Version: 
Under Review