MADLINK: Attentive Multihop and Entity Descriptions for Link Prediction in Knowledge Graphs

Tracking #: 2960-4174

Authors: 
Russa Biswas
Harald Sack
Mehwish Alam

Responsible editor: 
Dagmar Gromann

Submission type: 
Full Paper
Abstract: 
Knowledge Graphs (KGs) comprise interlinked information in the form of entities and relations between them in a particular domain and provide the backbone for many applications. However, the KGs are often incomplete as the links between the entities are missing. Link Prediction is the task of predicting these missing links in a KG based on the existing links. Recent years have witnessed many studies on link prediction using KG embeddings which is one of the mainstream tasks in KG completion. To do so, most of the existing methods learn the latent representation of the entities and relations, whereas only a few of them consider contextual information as well as the textual descriptions of the entities. This paper introduces an attentive encoder-decoder based link prediction approach considering both structural information of the KG and the textual entity descriptions. A path selection method is used to encapsulate the contextual information of an entity in a KG. The model explores a bidirectional Gated Recurrent Unit (GRU) based encoder-decoder to learn the representation of the paths whereas SBERT is used to generate the representation of the entity descriptions. The proposed approach outperforms most of the state-of-the-art models and achieves comparable results with the rest when evaluated with FB15K, FB15K-237, WN18, WN18RR, and YAGO3-10 datasets.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 29/Dec/2021
Suggestion:
Accept
Review Comment:

The authors present MADLINK, an encoder-decoder-based approach with attention for link prediction, which considers both structural and textual information for learning entity representations. The structural information in the form of paths is integrated via a GRU based on seq2seq, while the textual information is encoded using SBERT. Both vectors are concatenated and fed, together with the learned relation vectors, to a DistMult scoring function. Extensive experiments are conducted on the standard benchmark datasets (and subsets of) Freebase, WordNet, and YAGO, and the results show comparable or superior performance compared to a wide range of baseline methods. The rationale for the model’s design is explained in detail, and ablation studies show the impact and relevance of each part of the model.

Most comments from my previous review have been addressed in the new version of the paper. Some remaining comments, especially concerning the quality of writing, are listed below.
It is not clear if the source code of the method will be made publicly available. For better reproducibility, it is highly recommended to add a reference to the final implementation.

Originality:
Most existing methods only consider 1-hop or n-hop information from a KG but do not include information from textual descriptions. However, there already exist some approaches that take multimodal data (such as text, images, dates, geometries) into account for learning vector representations. MADLINK combines existing concepts and algorithms (seq2seq, SBERT, attention layer, and DistMult) to form a new method for link prediction and triple classification. Here, paths in the KG are considered as sentences, which serve as input to the seq2seq-based model.

Significance of the results:
The results for the tasks link prediction and triple classification show comparable or superior performance of MADLINK compared to a large variety of baseline methods. MADLINK takes textual information into account, but this is not sufficient to outperform all baselines, e.g., TuckER. The ablation studies of using only textual entity descriptions, only structural information, and attention shows the importance of different components, which could provide insights helpful for the model design of subsequent approaches.

Quality of writing:
The writing is clear, with some possibilities for improvement:
- The capitalization of the captions (figures/tables) is not consistent, also sometimes in the text (e.g., “link Prediction/Link Prediction”).
- The use of American English and British English is mixed, e.g., optimize/optimise. vectorise, initialise/initialize.
- The different areas in the related work section are numbered, e.g., (1) translational models, (6)(a)(b)(c). Adding these numbers also to the following corresponding paragraphs could make it easier for the reader to find.
- In the related work, some methods are already described rather formally. It might make sense to add a short paragraph about notation at the beginning of this section.
- p.3. l.30: for r_i, the index i is not described; it should probably be r. And there should be a comma after 1.
- p.3 l.38: What are g_u and g_v?
- p.5. l.45: Use mathematical notation for “l”, otherwise it looks like 1 (one).
- p.5 l.48: “Also, the cycles present in the KGs are straightened and considered as a flat path.” What does this mean?
- Eq. 1: It seems unnecessary to define π(r) as the set of relations since R is already defined as set of relations and the other terms in the equations are only dependent on a specific relation r.
- p.7 l.24: A = a_1a_2…a_n is bold, while a_t is not bold in the following.

Review #2
By Bassem Makni submitted on 30/Jan/2022
Suggestion:
Accept
Review Comment:

The authors successfully addressed most of my comments. I would like to see the following edits in the final version.

Rewording:

1. However, triple classification, i.e., the task of finding if a given triple is valid or not in a KG is also considered as link prediction as it determines the validity of links between two entities. -> triple classification can be used for link prediction but is not necessarily considered as a link prediction task.

Editing:
Page 5, line 50R: he predicate frequency -> the predicate frequency
Page 6, line 28R: on the top of it. -> on top of it
Page 9, line 23L: of the using -> of using
Page 9, line 26L: to focus in on -> to focus on
Page9, line 30L: focuses of the -> focuses on the
Page 19, line 11R: whereas achieves -> whereas it achieves

Missing references: the reference for the STS dataset is missing.