Benchmarking Embedding Techniques for Knowledge Graph Comparison

Tracking #: 2724-3938

Pieter Bonte
Sander Vanden Hautte
Filip De Turck
Sofie Van Hoecke
Femke Ongenae

Responsible editor: 
Guest Editors DeepL4KGs 2021

Submission type: 
Full Paper
Knowledge graphs (KGs) are gaining popularity and are being widely used in a plethora of applications. They owe their popularity to the fact that KGs are an ideal form to integrate and retrieve data originating from various sources. Using KGs as input for Machine Learning (ML) tasks allows to perform predictions on these popular graph structures. However, KGs can't directly be used as ML input, they first require to be transformed to a vector space through embedding techniques. As ML techniques are data-driven, they can generalize over unseen input data that deviates to some extend from the data they were trained upon. To fully exploit the generalization capabilities of ML algorithms, small changes in the KGs should result in small changes in the embedding space. Many graph embedding techniques exist, however, they have not been tailor towards KGs. We investigated how these whole graph embedding techniques can be used for KG embedding. This paper evaluates if these existing embedding techniques that embed the whole graph can represent the similarity between KGs in their embedding space. We compare the similarities between KGs in terms of changes in size, entity labels, and KG schema. We found that most techniques were able to represent the similarities in terms of size and entity labels in their embedding space, however, none of the techniques were able to capture the similarities in KG schema.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 09/May/2021
Minor Revision
Review Comment:

The paper evaluates existing graph embedding techniques for knowledge graph comparison and detection of knowledge graph differences.
An experimental evaluation is provided using benchmark RDF datasets and investigating if the embedding techniques can detect several kinds of changes (schema, entity, and subgraph changes).
The paper is well-written and easy to follow. However, my primary concern is the novelty and appropriateness as a "full paper" because this paper is limited to comparing graph embedding and kernel techniques that were already proposed.

Additional comments and questions:
- The authors said that they limit the scope to focus on techniques that output whole graph embeddings. However, is the evaluation not focused on subgraph embeddings?
- In section 4.2., it is said that each description of a molecule/research group is a different graph. Maybe an example would be beneficial, and the maximum and the minimum of the triples of each graph can be added in table 2.
- Regarding de evaluation set-up (section 4.3.), how is measured the distance/similarity between embeddings?
- The README file in the GitHub repository may be detailed to make it easier to reproduce the results.

Review #2
Anonymous submitted on 04/Jun/2021
Review Comment:

This paper benchmarks the ability of comparing knowledge graphs using whole graph embedding techniques as well as graph kernels. The comparison is evaluated by changing (1) the size of the KG (2) the entity labels and (3) the KG schema.
There have been numerous approaches proposed in the literature that find entity and relation embeddings or triple embeddings in a KG. However, this paper’s focus is on representing the entire graph in a low dimensional space. More specifically, as this paper uses the embedding techniques designed for Homogeneous graphs rather than knowledge graphs, they first apply the conversion technique proposed by Vries et al - Journal of Web semantics 2015.

To evaluate the impact of the aforementioned changes on the embeddings, first the gradual changes algorithm is proposed. In this setting, two random graphs are chosen from the dataset and gradually one is converted to the other by performing deletion and addition of triples all the while guaranteeing that the connectivity is preserved. It is proposed in the paper that they expect to see the distance between the graphs to gradually increase as more iterations are performed. There has been no explanation given with this regard. I would expect the exact opposite as the two graphs are becoming more and more alike, the distance between their embeddings are expected to decrease. Moreover, the pseudocode provided can get stuck in an infinite loop despite what is mentioned in footnote 6 and that is due to the main while loop.

To see how the schema affects the embeddings, two scenarios are envisioned. It must be noted that both of these two scenarios are much simplified with regards to what can be changed in the schema of the KGs. First, the entity types are gradually replaced by their superclass. In this scenario, it’s expected that deviation in the embeddings will be small. Whereas in the second scenario, the entity types are replaced with a random class and it’s expected that we see an increase in the distance of embeddings. As shown in the results neither of the graph embeddings used for the experiments are able to capture the modifications that have been imposed. The question is, why would we in the first place expect that the used graph embedding techniques and kernels to have such capability? Neither of these techniques are designed for the KGs nor are able to take into account the semantics of the graph.

Finally, to see how dependent embedding techniques are to the name (label) of an entity, they are gradually substituted with another name. It is vague with respect to what is written in section 4.3.3 and in 4.6 what is the expected behavior. At one point, the authors wish the embeddings not to change when the name of the entities change and in another, they report it as a success if the embeddings have been able to capture this difference.

The idea to see how much prone to change the embeddings are in a KG with respect to change in the size as well as semantics is indeed an interesting task. However, it might be more meaningful to capture this in the context of works proposed for entity and relation embeddings in a KG. Indeed, the proposed scenarios are not convincing and the datasets that are used are not a good representation of real KG.

The paper lacks thorough and broad analysis of the problem as well as state of the art KG embedding techniques in its related work. The reported results are not convincing (see the paper “how powerful are graph neural networks” by Xu et al -ICLR 2019 for a comparison of the expressiveness of graph embedding techniques). The paper's writing in its current version as well as the structure of the paper can be improved.

Strengths :
-The idea of comparing supervised and unsupervised embedding techniques wrt their capacity of taking the schema into account is novel for KG.
- The related work about existing approaches that can be used to compare labelled graphs (not KG) is well structured and covers many kinds of supervised and unsupervised approaches.

- Scenarios that are presented to argue on whole KG comparison are not convincing
- The expectations and the results are not clear and not well illustrated
- The considered schema changes are over-simplified
- The used datasets are small and are not good examples of real KGs

---- Some Minor remarks----
Related work section: Ontologie alignment techniques can also be seen as methods that can lead to KG comparison (see OAEI competition results).

Table 2: The number of classes, properties that belong to the original schema and to a graph on average have not been mentioned

Abstract: they have not been tailor towards → tailored
to some extend → extent

Section 4.3.1: removing the triples in G1 that are in G1 but not in G2 → removing the triples in G1 that are not in G2

Section 4: Subgraph changes: The KG grows or schrinks → shrinks

Section 4.7: because next to embeddings → in addition to embeddings

Section 5.6: bad performance → poor performance

Tables 3,4,5: Best results indicated in bolt → in bold

Review #3
By Tiphaine Viard submitted on 11/Jun/2021
Review Comment:

In their paper, the authors propose a benchmark of full graph embedding techniques applied to the task of knowledge graph comparison.

Unfortunately, I have multiple issues with this paper as it is. I will try to detail them. Most of my issues stem from a lack of justification and contextualization for the authors' work. Let me give some examples.

Little justification is given for the authors' choices. We do not know why this subset of embedding methods has been chosen for the benchmark (and, there are hundreds of embedding methods to choose from); we do not know why the authors focus on full graph embedding, which is a less common task in the area.

In a similar track of thought, the gradual change algorithm deserves a lot more explanations; only removing triplets that do not break connectivity is very biased (as compared to removing triplets at random); this is not a bad thing per se but the impact, or lack thereof, needs investigation. The algorithm itself is very verbose (and looks like a direct transcription from Python code) and could be made much more concise, which in turn would improve clarity.

Table 2, giving features of the two datasets, made me believe that each graph was rather small (less than 300 edges on average). This clashes with experimental results saying that methods such as SimGNN does not run for comparing graphs of this order of magnitude. It would immensely help to have more information, for example related to the spread around the mean of the graphs in the datasets.

Furthermore, in many ways, the writing does not do justice to the authors work, speaking in absolutes("graphs studied in embeddings are not as complex as KGs", on page 1, among others), while ignoring multiple related fields. For example, the distinction between homogeneous, heterogeneous and knowledge graphs feels forced (and does not match my knowledge of the literature, where such definitions are not so strict and vary a lot depending on the context and the paper), and ignores works on multiplex/multilayer graphs, HINs (heterogeneous information networks), among others. Many times, the authors will say that "x is not suited for ML tasks", which reads as a very bold statement, given the variety of machine learning tasks on graphs.

While there is clearly work on the paper, it is a bit difficult to evaluate what are the scientific contributions of the paper; to be clear, I think that this stems more from a redaction issue than a purely scientific one. The paper does not introduce a new task, a new benchmarking technique or a new dataset. Widely common datasets (such as YAGO or DBPedia) are mentioned but not used at all. I understand that these datasets are not directly a collection of graphs, however it seems that not using these datasets should at least be discussed; it is difficult to extract topical subgraphs from YAGO? or is this a bad idea regarding the representativity of the extracted subgraphs?

There are some more minor points; for example, I was puzzled as to why "type" nodes (in Figure 3) are demultiplied along edges, instead of having one "type" node that all entities point to. This would lead to a k-partite modelling of the knowledge graph, which is a bit more economical in terms of memory and has repercussions in terms of time complexity.
There are many typos throughout the paper; they do not directly mislead or hinder understanding, but I mention them because they are characteristic of my issues with this paper as is.

With significantly more redaction and further work, I think that this paper has the potential to be interesting to the community; however, in its current state, I cannot recommend it for publication. I would like to stress that there is a lot of work and results in the paper, and that improving the writing alone will significantly increase the paper's clarity and impact.