Review Comment:
The authors present MADLINK, an encoder-decoder-based approach with attention for link prediction, which considers both structural and textual information for learning entity representations. The structural information in the form of paths is integrated via a GRU based on seq2seq, while the textual information is encoded using SBERT. Both vectors are concatenated and fed, together with the learned relation vectors, to a DistMult scoring function. Extensive experiments are conducted on the standard benchmark datasets (and subsets of) Freebase, WordNet, and YAGO, and the results show comparable or superior performance compared to a wide range of baseline methods. The rationale for the model’s design is explained in detail, and ablation studies show the impact and relevance of each part of the model.
Most comments from my previous review have been addressed in the new version of the paper. Some remaining comments, especially concerning the quality of writing, are listed below.
It is not clear if the source code of the method will be made publicly available. For better reproducibility, it is highly recommended to add a reference to the final implementation.
Originality:
Most existing methods only consider 1-hop or n-hop information from a KG but do not include information from textual descriptions. However, there already exist some approaches that take multimodal data (such as text, images, dates, geometries) into account for learning vector representations. MADLINK combines existing concepts and algorithms (seq2seq, SBERT, attention layer, and DistMult) to form a new method for link prediction and triple classification. Here, paths in the KG are considered as sentences, which serve as input to the seq2seq-based model.
Significance of the results:
The results for the tasks link prediction and triple classification show comparable or superior performance of MADLINK compared to a large variety of baseline methods. MADLINK takes textual information into account, but this is not sufficient to outperform all baselines, e.g., TuckER. The ablation studies of using only textual entity descriptions, only structural information, and attention shows the importance of different components, which could provide insights helpful for the model design of subsequent approaches.
Quality of writing:
The writing is clear, with some possibilities for improvement:
- The capitalization of the captions (figures/tables) is not consistent, also sometimes in the text (e.g., “link Prediction/Link Prediction”).
- The use of American English and British English is mixed, e.g., optimize/optimise. vectorise, initialise/initialize.
- The different areas in the related work section are numbered, e.g., (1) translational models, (6)(a)(b)(c). Adding these numbers also to the following corresponding paragraphs could make it easier for the reader to find.
- In the related work, some methods are already described rather formally. It might make sense to add a short paragraph about notation at the beginning of this section.
- p.3. l.30: for r_i, the index i is not described; it should probably be r. And there should be a comma after 1.
- p.3 l.38: What are g_u and g_v?
- p.5. l.45: Use mathematical notation for “l”, otherwise it looks like 1 (one).
- p.5 l.48: “Also, the cycles present in the KGs are straightened and considered as a flat path.” What does this mean?
- Eq. 1: It seems unnecessary to define π(r) as the set of relations since R is already defined as set of relations and the other terms in the equations are only dependent on a specific relation r.
- p.7 l.24: A = a_1a_2…a_n is bold, while a_t is not bold in the following.
|