Review Comment:
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.
The paper discusses knowledge base augmentation from structured web markup data. The problem seems to be an instance of data fusion. The idea of using web markup data for KB enrichment is interesting, and, due to its potential complementary nature wrt. sources used for standard KB construction, might have potential for a huge enrichment of existing knowledge bases. The paper focuses on two parts within the pipeline, namely entity resolution and data fusion.
Unfortunately there are several serious weaknesses.
1) Novelty and significance:
- Contribution 1: While the contribution mentions a novel and specifically tailored data fusion pipeline, it remains largely unclear in what the novelty lies and what is specifically tailored towards the problem. Diversification techniques are claimed to be included, but they look just like deduplication techniques.
- Contribution 2: Similarly, it is claimed that a novel fusion approach is introduced, but I don't understand well which specific parts of it are novel. The same holds for the features. The paper claims "We propose and evaluate an original set of features", but it I do not see any discussion or evaluation of the individual features.
- Contribution 3: This looks to me like the practically most interesting aspect of the work, but also here the paper does not substantiate the claims it makes, i.e., does not contain a significant analysis of the enrichment potential.
3) The paper is hard to read as it suffers from a lack of examples, an unclear formalization and is in many parts not self-contained. It gives the impression of a technical report that mentions what was done. I would expect much more of a discussion about options, reasons for choices, consequences of choices.
Detailed Comments
Potential for Enrichment
The contribution of the paper would be considered huge if it was accompanied by the full dataset of all new statements that were derived by the proposed method. If that is practically difficult, I would at least like to see an extrapolation of what can be expected as total number of new facts, based on the small-scale experiments conducted in the paper. At the moment there is only a discussion for movies, but if we assume that all objects behave movie-like, how would that extrapolate?
Evaluation
The claim "On average, KnowMore populates 14.7% of missing statements in Wikidata, 11.9% in Freebase and 23.7% in DBpedia." appears to be wrong, as you do not know how many statements are missing in the first place. E.g., if a movie has no award in a KB that does not mean that the KB is missing an award. Similarly, if a movie already has an award in a KB, you don't know how many more might be missing.
Diversity:
I was minorly confused to see this term mentioned as goal in the abstract, associating it first with diversity like in search result diversification. I understand that something else is meant, and I also understand that this diversity can be measured in the output, but I do not see how Section 5.2 achieves higher diversity? The section seems to talk only about deduplication. In which way can deduplication influence diversity? Possibly by adjusting a confidence threshold, leading to a classical precision-recall tradeoff? Needs complete rewriting.
Presentation
I found large parts of the paper to be not self-contained or badly explained. A significant amount of information is outsourced to references, or only explained in ways understandable for people involved in building this pipeline. Some examples are below
- Please give an example of markups and explanation of markups early on, do not only mention that there is some dataset
- Terms such as RDF-Quads, BM-25, blocking, pay-level-domain should be explained briefly when they are introduced (e.g., using relative clauses like "BM-25, an entity ranking algorithm, ...")
- "Previous work only considers correctness by measuring the quality of the source. ..." - Explain difference better
- 3.1 First sentence: Explain, not just reference
- 3.1 Last paragraph: Coherence: Investigation before is about movies and books, conclusion talks about highly volatile fields being present in markup data, not in KBs. Explain the step.
- Terminology in 3.2 is unclear. Are e_s sets? Is q an entity or a subject? Types seem also messed up in "a set of facts f_i \in F' from M", f_i is probably not a set of facts but a single fact? F' is the set? Why F', what is F? Suddenly there is a query q that was not mentioned before? What is its role? Please explain the whole Definition 1 with an example. Same type error appears for "e_i \in E from M".
- 3.3 "name field" - unclear what that is/where that comes from? "attributes" means "criteria"?
- Section 4.1 completely unreadable without consulting external literature. "Blocking" as title is not explained before, what search space is is unclear, co-references, ...
- 4.2: Explain first what and why you want to do, then reference something
- 4.3: Same problem as 4.1. ""
- "Features t_r^i \in [1,2,3]" and other features - use names, explain what they are before explaining why they are good
- "Given that our candidate sets contains many near-duplicates, we approch this problem through ..." - Which problem? The whole paragraph hangs in the air, the parts before and after look like an explanation of features, but the present paragraph is about clustering to solve an undefined problem.
- "We followed the guidelines laid out by Strauss [31]" - Guidelines for what? Why did you do that?
- Evaluation results should not only describe the obtained numbers, but also give reasons why something is better or worse than something else
- "experimented with different decision making parameters as discussed in [24]" - ?
Writing style, typos
- Abstract: "On the other hand" - fix style
- "facilitate" - "enable"?
- "aid KB augmentation" - "do KB augmentation"?
- Contribution makes excessive use of the term "novel"
- "setup; in that they"
- "Dong et al. ..." - fix grammar
- "classified into two classes" - style
- "refuse" - "fuse again"
- "selected...retrieve" - unify tense
- 3.2 first sentence: Fix grammar (relative clause)
- 4.3: "sim" - fix Latex typesetting
- "that, there"
- terms[17]
- "entities type" - add "of"
- "extracted respectively" - add comma
- "6 USD cents" - maybe "6 ct." or "$0.06"?
- "Baselines. We consider ..." - fix grammar
- "product, shows"
Other factual issues
- KBs are still incomplete - Not only still, but for conceptual reasons, they always will be incomplete. See e.g. the paper "But what do we actually know" (AKBC 2016)
- "usually there exist n \geq 0" - Tautology, thus it should be "always" instead of "usually" (though of course the whole statement is meaningless)
- SVM is known in an overwhelming range of applications to be one of the best classifiers. That Naives Bayes outperforms it makes me wonder whether something was done wrong in the experiments. Please explain this finding.
- "A fact is considered novel, if not a second time in our source markup" - Odd definition. If it is not already in the KB I would consider it novel, no matter whether it is once or 10 times in the markup dataset?
|