Review Comment:
The paper analyzes the method of aligning two knowledge graphs by the means of building a classifier on top of knowledge graph embeddings. Both the source and the target KG are embedded, the embeddings are concatenated, and a binary classifier is trained on top of the concatenated vector to detect correspondences. RDF2vec is chosen as an embedding method, and different classifiers are trained on top.
There are multiple severe shortcomings about this paper.
First, there is no comparison against baselines or state of the art approaches. Without any comparison line, it is difficult to understand whether the results reported in the figures are good or not. However, there are some indications that they are not: on the anatomy dataset, a simple string equivalence comparison (without any training data!) yields an F1 score which is about 25 percentage points higher than the results reported in this paper. [1] For SPIMBENCH, the results in the paper seem at least closer to the state of the art. [2] For DOREMUS, it is unclear which subtask is chosen for evaluation, but results again seem suboptimal compared to existing approaches (from eight years ago!) [3]
The second problem is the narrow contribution. Essentially, the authors make an experiment where they fix the embeddings and run different binary classifiers. If we assume for a moment that the problem is in principle solveable (i.e., the embeddings capture enough information to create an embedding space in which the two classes (matches and non-matches) are separable), the variance between reasonably sophisticated classifiers should be neglible (in fact, the results confirm that assumption). What would be more interesting, though, would be the behavior of different embedding approaches. For RDF2vec alone, a number of variants exist, each of which has been shown to have the ability to capture different graph constructs and notions of similarity and relatedness. [4] In particular, I assume that the standard RDF2vec variant has been used in this paper, which mixes relatedness and similarity in the embedding space. That being said, I think it is a suboptimal candidate for matching, since it will hardly be able to tell related from similar pairs of concepts, which, however, is crucial for KG matching. Moreover, it is known that textual information (entity labels and descriptions) and further literal attributes (e.g., identifiers, codes, latitude/longitude pairs for georeferenced entites) are essential for KG alignment, but RDF2vec does not capture those. Therefore, it is expected that literal-aware embedding methods would yield better results.
Third, the presentation has to be improved. Capturing all results in four pages of figures is not ideal, tables would be more appropriate here. Normally, one would present the best working configuration, and have an ablation study on the impact of different classifiers. Some more details on presentation issues are provided below.
Fourth, the paper lacks a clear take home message, which is partly due to the research question being unclear. Section 6 is rather short. The authors claim that they experimented with different dimensionalities, the impact of that would be interesting to show as well (typically in another ablation study). Moreover, there are probably more relevant future research issues (different embeddings, negative sampling, ...) than testing yet another classification approach.
Further comments on the contents
* p.1: it is unclear what is meant by "the vectors [...] aggregating their meaning in mutual relationships"
* on p.2, there is a claim that the approach can be used for tasks such as document modeling etc., but it is not clear how EA would help for document modeling. Reference [15] also does not reveal that.
* p.2 mentions "the top 10 current classification approaches" - according to what criterion are they the top 10, and who says so?
* some of the claims about the classifiers in section 2 (e.g., decision trees are usually *not* thought of as sensitive to outliers) seem odd and should be underpinned with literature
* in section 2, it is unclear how noise and outliers play a role in the given setup. What would be noise and outliers in embeddings? Which embedding approach produces more outlying values? The conceptual relation to the problem at hand should be made clear here
* p.3: it is unclear how the work by Bujang et al. is relevant for the paper
* p.3: the definition of the KG is wrong, the objects should be the union of R, B, and L, not just L
* p.3: according to the definition, only subjects of triples are candidates for alignments, which I think is not correct (if we have a resource only appears as an object, i.e., does not have any outgoing relations, it can still be aligned to another KG)
* p.3: The variable D is not introduced/defined
* p.5: The paper mentions "previously calculated sets of positive and negative alignments", but it's not clear how they are precalculated.
* p.6: The authors state that they use a sequence length of 1, which would not make RDF2vec learn anything. Also, they should specify the full specification of parameters (e.g., SG vs. CBOW), no. of walks, etc.
Further comments on language issues
* p.1: "referring to the same reality or not" - probably you mean "real world entity"
* The way of referring to steps in Fig. 1 (e.g., "P2, Figure 1") is unusual, rather use "step 2 in Figure 1"
Summarizing, this paper does not meet the standards of SWJ.
[1] http://oaei.ontologymatching.org/2023/results/anatomy/index.html
[2] https://hobbit-project.github.io/OAEI_2022.html
[3] http://www.dit.unitn.it/~pavel/om2016/papers/oaei16_paper0.pdf
[4] Portisch, Paulheim: The RDF2vec Family of Knowledge Graph Embedding Methods. Semantic Web Journal, 2024
[5] Gesese et al.: A Survey on Knowledge Graph Embeddings with Literals: Which model links better Literal-ly? Semantic Web Journal 12(4), 2021
|