Review Comment:
The authors present a thorough analysis of the application of Graph Convolutional Neural (GCN) networks for knowledge graph alignment. The technical approach and analysis appear to be sound. The application area of knowledge graph alignment is of significant practical importance. The work is original and the writing is of good quality.
I have two suggestions for improving the manuscript.
First, the main manuscript contains a large amount of very detailed result tables. It is laudable that the authors present all these details, but I also have the concern that the reader cannot see the forest for the trees: What are the main trends and practically significant findings that one can glean from the results? E.g., which models and parameters work best, and what are the general trends that can be observed, if any? I would suggest that quite a few of the detailed data tables could be moved to an appendix / supplementary material, and that a condensed table or graph given a condensed overview of these main key insights is put into the main manuscript text instead. Most readers will come to this manuscript with the question of how they can best apply the described methodology to their own problem, and the paper should make it as easy as possible to answer that question.
Second, the authors present these detailed results for their own datasets (biomedical data with a focus on pharmacogenomic information), but it is unclear to what extent the presented findings could generalize to another dataset, potentially in another knowledge domain. The authors should discuss more how much (or how little) they expect the findings in this paper to generalize to other knowledge graphs, and what their intuitive expectations are for some other kinds of knowledge graph. An even better addition would of course be to apply the algorithms to at least another, independent and different knowledge graph, and to compare if there are common trends across different knowledge graphs -- but I leave it to the authors to decide if this is worthwhile and feasible.
Minor comments:
The input layer of the GCN is a one-hot encoded vector -> I just want to remark that it is surprising that this approach scaled for such a large graph! Perhaps mention scalability issues or considerations if there were any. Did you consider trainable dense vector embeddings?
"We experimented this approach"
-> "We conducted experiments with this approach"
"somehow corresponding to"
-> "corresponding to"
"in the Resource Description Format"
-> "in the Resource Description Framework (RDF) format"
"and their predicate represent the"
-> "and their predicates represent the"
"Here, we use the results of a rule-based method [10] in a “knowledge graph as silver standard” perspective [11]."
-> I think you should elucidate what you mean by that.
"We experimented our work within the"
-> "We conducted experiments within the" (please also check the use of the word experiment in other sections of the document; I will cease highlighting it from here on)
"We propose different gold clusterings"
-> "We propose different gold clusterings (named C0 - C6)"
Figure 3: x axis labeling is so small as to be illegible, please fix.
"input layer consists in"
-> "input layer consists of"
Tables 6: There should perhaps be some separator line between G0 and G5 results to make the table more intuitive.
|