Pushing the Boundaries: Classification of Entity Alignment from RDF Embeddings

Tracking #: 3638-4852

Authors: 
Bill Gates Happi Happi
Géraud Fokou Pelap
Danai Symeonidou
Pierre Larmande

Responsible editor: 
Guest Editors OM-ML 2024

Submission type: 
Full Paper
Abstract: 
Entity Alignment (EA) involves identifying entities across two knowledge bases that represent the same real-world entity. This task is crucial for the automated integration of multiple Knowledge Graphs (KG) thus enriching the knowledge. Recently, embedding methods based on KG have become predominant in EA techniques. These methods project entities into a lower-dimensional space and align them by evaluating their similarities. However, the classification and alignment of entities between two KG remain complex. This article evaluates the performance of various classifiers across multiple aspects of entity embedding features, applicable to both source and target data in binary classification processes for EA. Our experiments indicate a consistent range in the F1-score and accuracy, particularly when dealing with imbalanced data and changes in the dimensions of embeddings. This observation suggests that future research may need to focus on developing more robust classification algorithms.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 27/Mar/2024
Suggestion:
Reject
Review Comment:

In this article, the authors study different classifiers to predict alignments between entities of different RDF data sets. The input features of the classifiers are RDF Graph Embeddings learned using the RDF2Vec model.
The provided results show that performance measured with Accuracy, Precision, Recall and F1-score are modest and do not vary much depending on the chosen classifier or the chosen dimension for the embeddings. The authors thus conclude that RDF embeddings may not always provide features that are sufficiently discriminative for this task, but that the underperformance may also depend on the qualify of training data, class imbalance, or the difficulty of the task itself. This should motivate further work to build more robust classification algorithms.

The scope of the study is clear, however I have several major concerns.

First, the chosen embedding model is limited to one, RDF2Vec. Hence, the conclusion "RDF embedding may not always provide features that are sufficiently discriminative for this specific task" by the authors is of interest, but it requires more experiments, especially with other models, to determine if a trend emerges. Additionally, this is the only model used but it is very quickly described in the article, which makes it not self-sufficient. Embedding dimensions are only tested from 10 to 50, with a gap of 10. This choice should be further motivated as it also hinders the potential generality of the study. One could wonder if bigger embedding dimensions that can be seen in the literature would allow to overcome this inability to provide performant features for classifying entity alignments.

In the experimental section, the authors mention "nine classifiers were selected" but 10 were presented in the related work section. Even though it is explained in the discussion and conclusion that only the 9 best-performing classifiers were retained, I think this explanation deserves to be clarified in the experimental section as well.

Complicated notations are also introduced, which hinders the readability of the article and pose the following issues:
- G is defined as the set of (s, p, o) where o \in L. Do you only consider triples where objects are literals?
- V_data is included in D: is V_data an independent RDF data source? (By definition of D)
- Why does a=b in the definition of PL? PL could only be defined as a set of (s, t, 1) without constraints on the number. Or do the authors take into consideration a max number of existing alignments?
- To defined PL_data and NL_data, you use E(s_a) + E_(t_b), which corresponds to a vector addition. However, the authors mention concatenation. Which operation is actually used? Concatenation makes sense but I am more doubtful about addition, which could also explain the modest results in this study if actually used.
- When defining NL_data, the authors mention k!=l, 0 <= kl <= max. I do not understand the need for constraints on the number of negatives.

When discussing the results, the authors mention several times that similar metrics among classifiers may be caused by their inability to handle imbalance. However, I think this claim should be further substantiated by specific metrics taking into account imbalance to prove it, or experiments with balanced data.

Minor comments
- Introduction: "classifying Entity Alignments as referring to the same reality or not between two distinct RDF graphes remain a complex challenge". I agree with this sentence but I think references could be provided to sustain it.
- Introduction: "by aggregating their meaning in mutual relationships" I am not sure to understand this definition of KG embeddings. I am not sure KG embeddings take into account "meaning" as they do not consider semantics
- Related work: "daily observed data volume" I am not sure why a temporal notion appears here. I may have missed an important aspect or application domain of the paper. The same remark applies to "it is clear that doubt could arise in situations of imminent decision-making and pose significant risks for a wrong choice".
- Related work: "Sometimes, entity alignment approaches use mechanisms to help classifiers produce a consistent and reliable model for the classification task". I advise to add additional details to make clear what mechanisms the authors are referring to.
- Related work: I do not understand the link of [29] with entity alignment and the present work. Could additional details be added to explain this link?
- Section 3.2: "embeddings and vectorial representations" These are the same in my view, are you referring to something different?
- Section 4.1: "Among these datasets, we have chosen to work with various sources", I do not understand the link or difference between datasets and sources here.
- Section 4.1: "Spimbench": could you add more details about the domain of the data described in this dataset, similarly to other datasets?

To conclude, I think the overall idea of this study is definitely interesting. If this trend is confirmed over several RDF embedding models, with a strengthened evaluation taking into account class imbalance, and the quality of training data (as mentioned by the authors in Section 5), such a study and results could have a significant impact on the field. However, given the described concerns, I recommend to reject this paper.

Review #2
By Heiko Paulheim submitted on 30/Apr/2024
Suggestion:
Reject
Review Comment:

The paper analyzes the method of aligning two knowledge graphs by the means of building a classifier on top of knowledge graph embeddings. Both the source and the target KG are embedded, the embeddings are concatenated, and a binary classifier is trained on top of the concatenated vector to detect correspondences. RDF2vec is chosen as an embedding method, and different classifiers are trained on top.

There are multiple severe shortcomings about this paper.

First, there is no comparison against baselines or state of the art approaches. Without any comparison line, it is difficult to understand whether the results reported in the figures are good or not. However, there are some indications that they are not: on the anatomy dataset, a simple string equivalence comparison (without any training data!) yields an F1 score which is about 25 percentage points higher than the results reported in this paper. [1] For SPIMBENCH, the results in the paper seem at least closer to the state of the art. [2] For DOREMUS, it is unclear which subtask is chosen for evaluation, but results again seem suboptimal compared to existing approaches (from eight years ago!) [3]

The second problem is the narrow contribution. Essentially, the authors make an experiment where they fix the embeddings and run different binary classifiers. If we assume for a moment that the problem is in principle solveable (i.e., the embeddings capture enough information to create an embedding space in which the two classes (matches and non-matches) are separable), the variance between reasonably sophisticated classifiers should be neglible (in fact, the results confirm that assumption). What would be more interesting, though, would be the behavior of different embedding approaches. For RDF2vec alone, a number of variants exist, each of which has been shown to have the ability to capture different graph constructs and notions of similarity and relatedness. [4] In particular, I assume that the standard RDF2vec variant has been used in this paper, which mixes relatedness and similarity in the embedding space. That being said, I think it is a suboptimal candidate for matching, since it will hardly be able to tell related from similar pairs of concepts, which, however, is crucial for KG matching. Moreover, it is known that textual information (entity labels and descriptions) and further literal attributes (e.g., identifiers, codes, latitude/longitude pairs for georeferenced entites) are essential for KG alignment, but RDF2vec does not capture those. Therefore, it is expected that literal-aware embedding methods would yield better results.

Third, the presentation has to be improved. Capturing all results in four pages of figures is not ideal, tables would be more appropriate here. Normally, one would present the best working configuration, and have an ablation study on the impact of different classifiers. Some more details on presentation issues are provided below.

Fourth, the paper lacks a clear take home message, which is partly due to the research question being unclear. Section 6 is rather short. The authors claim that they experimented with different dimensionalities, the impact of that would be interesting to show as well (typically in another ablation study). Moreover, there are probably more relevant future research issues (different embeddings, negative sampling, ...) than testing yet another classification approach.

Further comments on the contents
* p.1: it is unclear what is meant by "the vectors [...] aggregating their meaning in mutual relationships"
* on p.2, there is a claim that the approach can be used for tasks such as document modeling etc., but it is not clear how EA would help for document modeling. Reference [15] also does not reveal that.
* p.2 mentions "the top 10 current classification approaches" - according to what criterion are they the top 10, and who says so?
* some of the claims about the classifiers in section 2 (e.g., decision trees are usually *not* thought of as sensitive to outliers) seem odd and should be underpinned with literature
* in section 2, it is unclear how noise and outliers play a role in the given setup. What would be noise and outliers in embeddings? Which embedding approach produces more outlying values? The conceptual relation to the problem at hand should be made clear here
* p.3: it is unclear how the work by Bujang et al. is relevant for the paper
* p.3: the definition of the KG is wrong, the objects should be the union of R, B, and L, not just L
* p.3: according to the definition, only subjects of triples are candidates for alignments, which I think is not correct (if we have a resource only appears as an object, i.e., does not have any outgoing relations, it can still be aligned to another KG)
* p.3: The variable D is not introduced/defined
* p.5: The paper mentions "previously calculated sets of positive and negative alignments", but it's not clear how they are precalculated.
* p.6: The authors state that they use a sequence length of 1, which would not make RDF2vec learn anything. Also, they should specify the full specification of parameters (e.g., SG vs. CBOW), no. of walks, etc.

Further comments on language issues
* p.1: "referring to the same reality or not" - probably you mean "real world entity"
* The way of referring to steps in Fig. 1 (e.g., "P2, Figure 1") is unusual, rather use "step 2 in Figure 1"

Summarizing, this paper does not meet the standards of SWJ.

[1] http://oaei.ontologymatching.org/2023/results/anatomy/index.html
[2] https://hobbit-project.github.io/OAEI_2022.html
[3] http://www.dit.unitn.it/~pavel/om2016/papers/oaei16_paper0.pdf
[4] Portisch, Paulheim: The RDF2vec Family of Knowledge Graph Embedding Methods. Semantic Web Journal, 2024
[5] Gesese et al.: A Survey on Knowledge Graph Embeddings with Literals: Which model links better Literal-ly? Semantic Web Journal 12(4), 2021

Review #3
By Ernesto Jimenez-Ruiz submitted on 26/Jun/2024
Suggestion:
Reject
Review Comment:

The paper deals with an interesting topic, but the conducted exercise is very limited to be considered for the Sem Web journal. I encourage the authors to revise the state-of-the-art and keep working on the topic.

Important comments:

The definition of RDF graph is not correct as objects can also be IRIs and blank nodes, this somehow hinders the subsequent definitions, as only subjects are aligned. Nevertheless the overall formalization of the problems needs an important revision.

The generation of negative samples is not described. Positive samples is the assumed to come from the OAEI datasets, but this is not mentioned in Section 3.

The related work does not mention modern systems dealing with EA and KG embeddings. e.g., OntoEA (https://arxiv.org/abs/2105.07688)

The evaluation and take home notes is not conclusive given the limited extent of th3 evaluation: (i) more sophisticated architectures are not evaluated, (ii) more KG embeddings approaches should be considered. The length of the vectors also seems limited (I would consider RDF2Vec vectors of at least size 100).

Additional comments:
- The paper is at some point not easy to follow.
- References are not always accurate, for example the ones in the column Evaluate in Tables 1 and 2.