Review Comment:
First of all, I would like to thank the authors for the changes they made in response to my review. I appreciate in particular the clarified contribution and the added Section 9.1 on the potential of the proposed method.
Regarding the review criteria of originality, significance of the results and quality of writing, I now believe that the originality of the proposed end-to-end pipeline is good, and that the results are significant, in so far as they have the potential to considerably enrich existing knowledge bases.
Major concerns however remain however about the quality of writing:
1. Unless stated, it appears the paper has not been properly proofread for language and presentation issues. This is not just a surface issue, as the language currently used makes understanding difficult in several places, and especially surprising given that the paper has 6 authors.
2. While I see improvements, I still find the problem definition (Section 3.2) convoluted
3. The evaluation in several parts only spells out numbers, but does not provide the reader with insights or lessons learned
I think the abovementioned concerns require a major effort and was at the edge between suggesting a major or a minor revision. I chose a minor revision because I think the concerns can be adressed without requiring changes to the methodology or new experiments, but suggest to really take a major effort at improving the writing.
Details:
1. Language and Presentation
- Intro: "to aid knowledge base augmentation", "to enable the exploitation of markup data", "for supporting KBA tasks" - Why these reservations? Are you not *doing* KB augmentation, not *exploiting* markup data?
- Intro: "the KBA problem" - is the term "problem" needed here? I am not aware of an established KBA problem (like "the travelling salesman problem"), so if needed please introduce the term, though I'd rather simply drop "problem" and refer to it by the established process name: "KBA".
- Figure 1 has an unreadable fontsize
- 3.3: "the KBA problem defined above, consists of two steps" -> remove comma
- 3.3: "The first step aims KnowMore_{match} at" - fix word order
- 3.3: "with respect to KB." - add article
- Table 1: "Notation" -> "Term"?
- Table 1: Some terms are named well "F_class, F_ded", others are illegible (F, F'). Why not give sensible names to all?
- 4.3: Similarity for dates: What is "a conflict"? Insatisfiability or one disagreement? As it stands it is not clear whether values other than 0 and 1 could occur at all.
- 5.1: Features: The author reply mentions that the text appears after Table 4 as further explanation, but in the present version, the text appears first. Also as it stands, the typesetting of the features is quite unfortunate, it appears on a first glance as if they were part of references [1, 3]. How about using "t_1, ..., t_3" instead?
- "we follow the intuition that, if" - remove comma
- "the example facts #2 and #3, would be valid" - remove comma
- "S im(f_i,f_{KB})" - fix math typesetting
- "this does neither .... while improving ..." - fix language
- "for each type book and movie from Wikipedia" - fix language
- "preliminary analysis of the completeness" - relict of the old version
- "available ^{12}" - fix spacing
- Enumeration in 6.2.2: There seems to be a problem with types, the label of the first item seems to be a step in the pipeline, while the others sound like metrics. Also, what about aligning the labels with the names of the steps as defined in Table 1?
- 6.2.2: "R - the percentage" -> "Recall R - the percentage"
- 6.3: "S V M" - fix typesetting
- 7.1: "While the step is a precondition, we provide evaluation results" - "As it is an important precondition"?
- 7.4: "entities, existing" - remove comma
2. Problem Definition
- The addition of the example is nice, but it appears still too late. Please put an example of an entity description immediately when you define it.
- Use of symbol "q" for entities is confusing. The symbol bears no similarity with the term "entity" and in the field is often used for queries. Why not use "e"?
- What is "E"?
- What is the "type" of Definition 1? "Problem definition"? "Task"?
- As it is, Def 1 can be satisfied by the empty set. You likely want to not just "a set" but "the maximal set"?
- The problem definition is possibly overly complicated by the fact that there are actually two problems hidden in it, as also described in the approach section: 1) Find the entity descriptions that refer to the same entity, and 2) Find novel facts. Maybe the problem definition would benefit from being split in two, especially as you are adressing the problems independently, and appear not to use joint inference?
3. Missing Discussion/Lessons learned
- What do we learn from Naive Bayes outperforming the other methods? In response to my previous comment on this you added more experiments showing THAT it outperforms the others, my point was however rather to understand WHY that happens and what we can learn from this for similar scenarios.
- Section 4.2: Still hard to understand due to the language. What really is happening here, on which basis are entities returned by the Lucene queries? How conservative/credulous is the String similarity inside? Understanding this is crucial to understand how many false negatives this step may introduce, so I'd consider a discussion as imperative.
- 5.1: "We have experimented with several different approachs" - Why? And why the ones you chose?
- 7.1: Text only spells out the numbers in Table 7. Please rather provide 1-2 sentences explanation as to why the main results are as they are (i.e., why is the best configuration best, or why are the baselines worse?)
- 7.2: The same: "The baseline fails to recall a large amount of correct facts" - Why? What does your technique do better? What can future researchers learn from your method?
- 7.4: Coverage gain: I think the metric is reasonable, but as it is nonstandard please explain briefly before using it why you consider it useful/interesting for the present problem.
Other
- Is Naive Bayes really a state-of-the-art classifier?
|