Generalized neural embeddings for link prediction in knowledge graphs: training, evaluation, explanation

Tracking #: 2014-3227

This paper is currently under review
Asan Agibetov
Matthias Samwald

Responsible editor: 
Guilin Qi

Submission type: 
Full Paper
Link prediction is the task of finding missing or unknown links among inter-connected entities in knowledge graphs. It can be accomplished with a classifier that outputs the probability of a link between two entities. However, the way in which entities and networks are represented strongly determines how well classification works. Recently, several works have successfully used neural networks to create entity embeddings which can be fed into binary classifiers. Moreover, it was proposed in literature that creating specialized embeddings separately for each relation type in the knowledge graph yields better classifier performance, as opposed to training a single embedding for the entire knowledge base. The drawback of these specialized approach is that they scale poorly as the number of relation types increases. In this work we formalize a unified methodology for training and evaluating embeddings for knowledge graphs, which we use to empirically investigate if, and when, the generalized neural embeddings -- trained once on the entire knowledge graph -- attain performance similar to specialized embeddings. This new way of training the neural embeddings and evaluating their quality is important for scalable link prediction with limited data. We perform an extensive statistical validation to empirically support our claims, and derive relation-centric connectivity measures for knowledge graphs to explain our findings. Our evaluation pipeline is made open source, and we aim to draw more attention of the community towards an important issue of transparency and reproducibility of the neural embeddings evaluations.
Full PDF Version: 
Under Review