Review Comment:
This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.
-------
The paper presents a survey of automatic approaches to refining knowledge graphs. I think this paper has a potential to be exactly what SWJ expects from survey papers but it will require some major revisions for that to happen. Given that there are hardly any survey papers on knowledge graphs, and given that this survey can be quite broad, I sincerely hope that the author takes the time to address the revisions.
I think the discussion section is particularly strong and it is rare to see a survey paper that really focuses on analyzing the results and not just presenting them.
There are a couple of major concerns that I have with the current version.
First, I think that the focus on only automatic means is somewhat artificial and diminishes the value of the survey. There are not that many non-automatic means (anything beyond crowdsourcing?) so why not include all types of refinement for a really comprehensive survey. I would understand limiting the scope if expanding beyond the automatic means would add a lot more works to consider and would really dilute the focus. But I really don’t think this is the case here.
Second, the author makes a big emphasis on the fact that the paper is about refinement vs construction. I have to admit that I have trouble seeing where one ends and the other begins. In particular, the core part of refinement that the author considers at length, mainly completion, comes very close to construction. I am not suggesting expanding the survey to the construction methods as well, but I would like to see a more clear delineation.
Third, the paper never really defines what a Knowledge Graph is. I appreciate that there may not be a very straightforward definition, yet, it is hard to argue that the survey is comprehensive without defining what it operates on. I think this problem is compounded by the fact that some of the works included in the survey operate on all kinds of artifacts (e.g., “several small ones” for [58], WordNet for [76], [54], etc.). Please define what a knowledge graph is -- and what it is not. And then only include the works that operate on the artifacts that satisfy this definition.
Finally, the paper never mentions several large knowledge graphs -- the ones on which, admittedly, there has been much less research; yet, a survey on knowledge graphs is incomplete without their mention: Facebook, Microsoft, and Yahoo! all have their own knowledge graphs and have published at least a bit on them.
Yahoo:
Roi Blanco, B. Barla Cambazoglu, Peter Mika, Nicolas Torzec, Entity Recommendations in Web Search, ISWC 2013
Dalvi, Nilesh, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, and Srujana Merugu. A Web of Concepts, ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2009
Bing:
Thomas Lin, Patrick Pantel, Michael Gamon, Anitha Kannan, and Ariel Fuxman, Active Objects: Actions for Entity-Centric Search, WWW 2012
Facebook:
https://www.facebook.com/notes/facebook-engineering/under-the-hood-the-e...
Now, a few more detailed comments.
There is a comment at the very top of page 2 that decoupling knowledge graph construction and refinement allows for developing methods to refine arbitrary knowledge graphs. I would soften that statement as it has clearly not been the case so far, and, frankly, I am not sure this is really a fact.
There is a point towards the middle of the second column on p.2 that says that Google Knowledge Graph uses knowledge harvested from Google+. Please provide a citation (I don’t believe this is true, actually).
On page 3, there is a reference to a “genuine semantic web knowledge graph”. Can you explain what you mean by this?
Table 1:
please describe what the column mean; there is no reference in the text to what “relationship types” are for example; it is certainly not the same as “entity types” and I realized only later tha it is the number of properties
I was trying to understand where the data came from and I think the footnotes on p.3 don’t account for it. For instance, it is unclear where the number of entity types and relation types for Google knowledge graph comes from.
The table introduces Knowledge Vault which is hardly discussed or mentioned elsewhere
Section 3, “Targeted kind of information” -- it took me a while to understand what this means. Even a simple re-phrasing to “Type of structures targeted” might help.
Knowledge graph as a silver standard: I could not quite understand the explanation. What is being compared at the end, in particular if the graph is not being broken up into testing and training set? Maybe giving an example of the way the errors are identified could help?
Maybe use “Retrospective evaluation” instead of less commonly used “Ex post evaluation”?
I found the organization of Section 5 a bit odd. While you introduced the dimensions of comparison in Section 4, you used yet another dimension -- computational approach -- to organize your survey. At the very least, it seems like this dimension should also be included. But I also wonder if it would be useful to choose another dimension, such as the type of information targeted for organizing this section.
Section 6.1.4: Neither of the two references mentioned deal with knowledge graphs -- at least in a sense that the authors seem to imply the definition of a knowledge graph. Unless there are more relevant works that do, I think this should be dropped.
|