Review Comment:
The paper tackles the problem of knowledge graph (KG) completion using internal features, i.e., features that are computed using only the KG itself.
Specifically, the authors propose an approach (called CAFE) that evaluates candidate triples for KG completion using a set of neighborhood-aware features, i.e., features that consider the entities and relations surrounding any given pair of entities. This feature set is used to transform triples in the KG into feature vectors, which are then fed to neural models for predicting which of the candidate triples are correct and should be added to the KG.
The evaluation results over 4 ground truth datasets showed that the proposed method outperforms (on average) 6 state of the art approaches.
The paper is in general easy to read, well written, technically sound, and tackles a very interesting problem.
The major issue of the paper concerns the selection of the baselines. In the related work section, the authors state that the proposed method falls under the category of works that are path-based. However, they are not compared to any such related work like [17], [24], [32] and [33]. Also, they are not compared to more recent works that are based on embeddings, like [27] and [28], which can be also considered state of the art works. Is there a specific reason that hinders you from comparing your method with such more relevant and more recent works?
With respect to the evaluation results, precision is in general very low (<50%) for a large number of relations. Given that a knowledge graph should contain true facts, i.e. "knowledge", I'm wondering about the usefulness and practical application of the proposed method and of the considered baselines. Why completing a KG using such methods if you know that half of the new triples are not correct?
It is not clear to me why the proposed method "can be applied at any time as a KG grows with new entities and relations, without the need of a complete recomputation in opposition to embedding-based approaches" (this claim is repeated many times in the paper). If new entities and relations are added in the KG, then all feature values change which means that new neural models need to be trained and evaluated for each relation based on the new feature vectors. Otherwise performance might be low. Isn't this true? What is different in your method compared to other works that rely on embedded representations? Couldn't they also make use of the same embeddings when new data are added in the KG?
Given that this is a journal paper, I would expect to see a more detailed evaluation, like a detailed error analysis as well as a feature analysis which demonstrates the importance of each feature group. For example, I expect that the feature groups f1, f2, f5 and f6 are not important (I do see the intuition of using them; more below).
About the title: I find it a bit misleading for two reasons:
i) the paper is actually about KG completion using internal features, not just checking the validity of triples. So, I would expect to see "KG completion" in the title. By reading the title as it is now, I though that the paper is about a method to validate the existing triples of a KG.
ii) "Fact checking" is highly used in the context of fake news and refers to verifying information in non-fictional text in order to determine its veracity and correctness (like claims that are fact-checked by PolitiFact, Snopes, etc.). Although I know that this term has been used in KG-related works, I would suggest the author to use a different term, e.g., fact verification.
Other comments:
- Abstract: "highly relational nature of KGs" (mentioned also in the RW section)--> This is not clear to me. What is this "relational nature" of a KG?
- Section 1, last paragraph --> *CAFEuses*
- Section 2: I would like to see a distinction of the related works in terms of the use of internal or external features.
- Section 2: I would like to see a paragraph at the end explaining the similarities and differences of the proposed method compared to the mentioned previous works, in terms of model used, similarity of considered features, etc. What features do the related work make use of?
- Section 3.1: The work seems to ignore literal properties which, though, are very common in all KGs, like properties pointing to dates, numbers, strings, etc. Is this true, or with "entity" you also mean dates, numbers, etc.?
- Section 3.2: I would like to see the intuition behind some of the feature groups, in particular features groups f1, f2, f5 and f6. For example, what is the intuition for considering the number of entities in the neighborhood subgraph of the source entity? How can this number help in deciding if a triple should be included in the KG?
- Section 3.2 - Feature group f5: "f5(2, hasPrequel)" --> "f5(hasPrequel, 2)" or "f5(r, n)" --> "f5(n, r)" ? (the same for feature group f6)
- Section 4.5: "we first remove any individual features that have the exact same value..." --> Which ones did you remove in your experiments?
- Section 5.1: "Relations that make up for less than 5% of the total amount of triples in the graph have been removed." --> Why? What amount of relations and triples were removed?
|