An Analysis of the Performance of Representation Learning Methods for Entity Alignment: Benchmark vs. Real-world Data

Tracking #: 3636-4850

This paper is currently under review
Ensiyeh Raoufi
Bill Gates Happi Happi
Pierre Larmande
Francois Scharffe
Konstantin Todorov

Responsible editor: 
Guest Editors OM-ML 2024

Submission type: 
Full Paper
Representation learning for Entity Alignment (EA) aims to map, across two Knowledge Graphs (KG), distinct entities that correspond to the same real-world object using an embedding space. The similarity of the entities can be measured based on the similarity of the learned embeddings, which serves as a proxy for that of the real-world objects. Although many embedding-based models show very good performance on certain synthetic benchmark datasets, benchmark overfitting limits the applicability of these methods in real-world scenarios where we deal with highly heterogeneous, incomplete, and domain-specific data. While there have been efforts to create benchmark datasets reflecting as much as possible real-world scenarios, there has been no comprehensive analysis and comparison between the performance of methods on synthetic benchmark and real-world heterogeneous datasets. In addition, most existing models report their performance by excluding from the alignment candidate search space entities that are not part of the validation data. This under-represents the knowledge and the data contained in the KGs, limiting the ability of these models to find new alignments in large-scale KGs. We analyze models with competitive performance on widely used synthetic benchmark datasets, such as the cross-lingual DBP15K. We compare the performance of the selected models on real-world heterogeneous datasets beyond DBP15K and we show that most of the current approaches are not effectively capable of discovering mappings between entities in the real world, due to the above-mentioned drawbacks. We compare the utilized methods from different aspects and measure joint semantic similarity and profiling properties of the KGs to explain the models' performance drop on real-world datasets. Furthermore, we show how tuning the EA models by restricting the search space only to validation data affects the models' performance and causes them to face generalization issues. By addressing practical challenges in applying EA models to heterogeneous datasets and providing valuable insights for future research, we signal the need for more robust solutions in real-world applications.
Full PDF Version: 
Under Review