Review Comment:
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.
The authors propose an approach for knowledge graph embedding that combines the contextual structural information and the textual descriptions of the entities. The motivation of the work is well established, and the paper is clearly written. However, the baseline experiments need to include more relevant approaches, and the ablation study needs to be more thorough.
For the baseline experiments, I encourage the authors to compare their approach with other KG embedding approaches that incorporate textual information, such as LiteralE [1] and SSP [2]. The only approaches that incorporated textual information that they compare with are DKRL and Jointly (ALSTM). The comparison is not across all benchmarks.
In the ablation study, it is not clear why there is a discrepancy between the results of MADLINK between Table 4 and Table 6. I assume that the results in Table 6 do not include the paths information (As hinted in the last sentence in the paragraph "Impact of Text."). If this is the case, the caption of the table should clarify this.
More importantly, the authors set a hyperparemeter for the length of the paths to be 5. I would highly encourage the authors to study the impact of the path information by varying the length of the paths.
The description of the approach does not clearly explain how the set of paths (P1 ... Pn in Fig. 2) were sorted before being encoded and fed to the neural network model. Neither do they address the strategy they use when the number of paths is less than the threshold they set at 1000. I assume the authors randomly sorted the different paths but this should be clearly stated.
I would highly encourage the authors to publish the code for their approach and experiments. It would even be better if the authors can implement their approach in one of the KG embedding frameworks such as PyKEEN, DGL-KE, GraphVite, or others in order to facilitate the comparison with other approaches.
Minor suggestions:
The authors can include a discussion about encoding the textual information of texts that have multiple textual literals such as labels, description, summary etc. Would a concatenation of the literals be enough or do certain literals- such as the label- need to have higher weights?
Also, encoding the textual information of the relations using their labels can be added to the discussion/future work.
Minor comments:
"BERT outperforms most of the SOTA results for a wide variety of tasks [36]." This statement is outdated.
"The transformer encoder reads the entire sequence of words at once which allows the model to learn the context of a word based on its surroundings, whereas the other models read the input sequentially." This statement is partially correct. Some models, like word2vec (skip-gram)- which read the input sequentially- can also learn the context of a word based on its surrounding.
"the nodes marked in green would have greater attentions than the ones marked in yellow" I would encourage the authors to visualize the attention weights for this example to confirm this claim.
Missing references:
[1] Incorporating Literals into Knowledge Graph Embeddings
Agustinus Kristiadi, Mohammad Asif Khan, Denis Lukovnikov, Jens Lehmann, Asja Fischer
[2] SSP: Semantic Space Projection for Knowledge Graph Embedding with Text Descriptions
Han Xiao, Minlie Huang, Lian Meng, Xiaoyan Zhu
[3] Utilizing Textual Information in Knowledge Graph Embedding: A Survey of Methods and Applications
FENGYUAN LU , PEIJIN CONG , AND XINLI HUANG
Editing:
Page 1:
Abstract: comprise of -> comprise
whereas -> , whereas
39: growing containing -> growing, containing
42: inter-connectivity -> interconnectivity
43: LOD -> the LOD
35: manually-curated -> manually curated
48: To-date -> To date,
Page 3:
20: whereas achieves -> , whereas it achieves
structure based representation -> structure-based representation
description based representation -> description-based representation
Page 4:
23: aforementioned models -> aforementioned models,
40: represents relation -> represents the relation
45: comprise of textual -> comprise textual
Page 5:
36: loose -> lose
36: domain specific -> domain-specific
37: fine tuned -> fine-tuned
Page 6:
25: fixed length -> fixed-length
25: called as -> called a
Page 8:
49: is -> are
Page 9:
Link Prediction can be defined by a mapping function which -> Link Prediction can be defined as a mapping function that
Page 10:
18: is already -> are already
Page 12:
34: very less information -> much less information
34: Therefore, the two research questions addressed in Section 3 is tackled as the use of contextual information plays a vital role in the link prediction task. -> This whole sentence is not clear and needs to be reformulated.
|