Review Comment:
The submission describes a multi-step approach to automatically generating facts about knowledge base entities from categories of these entities. The approach integrates a frequency-based method with lexical analysis of labels and several additional techniques such as bootstrapping overcoming the data sparsity problem.
The problem addressed is somewhat narrow, as it assumes a knowledge base with verbose categories encapsulating implicit facts; the practice of materializing such categories is probably rare beyond knowledge bases derived from Wikipedia categories. OTOH, Wikipedia is such a prominent resource (on which DBpedia, YAGO, etc. are built) that the effort is absolutely worthwhile.
The related work is quite lean, which is likely influenced by the mentioned specific focus. Of the 17 citations, 7 are by Suchanek and colleagues (as direct precursor project/s), and most other citations are to indirectly related projects (such as the construction of DBpedia). More thorough mapping of even more distantly related projects would be suitable for a fully-blown journal paper. Partially relevant techniques can surely be found both within statistical relational learning and lexical label decomposition. Beyond these areas directly referred to in the paper, there is also research on ontology pattern learning [1, 2]. As regards the transformation between the category-centric and relationship-centric view of describing entities, possibly related is also the generation of tentative categories from property-centric ontology axioms [3], which is a kind of opposite approach.
[1] Eleni Mikroyannidi, Luigi Iannone, Robert Stevens, Alan L. Rector: Inspecting Regularities in Ontology Design Using Clustering. International Semantic Web Conference (1) 2011: 438-453
[2] Agnieszka Lawrynowicz, Jedrzej Potoniec, Michal Robaczyk, Tania Tudorache: Discovery of Emerging Design Patterns in Ontologies Using Tree Mining. Semantic Web, to appear, http://www.semantic-web-journal.net/content/discovery-emerging-design-pa....
[3] Vojtech Svátek, Ondrej Zamazal, Miroslav Vacura: Categorization Power of Ontologies with Respect to Focus Classes. EKAW 2016: 636-650
The used methods appears sound in principle. However, the whole workflow consists of numerous steps and the impact of each of them is not individually tested. Therefore there is no empirical evidence that the approach as a whole is the optimal solution to be possibly reused where possible.
As regards the English of the article, careful proof-checking is a must. The number of grammatical errors is overwhelming, including (only picking examples from the very first paragraph!), but not only:
- Mismatch of grammatical number (e.g., “This paper mainly focus”)
- Counting of uncountable words (e.g., “a lot of attentions”, “many researches”)
- Ill/structured sentences, such as missing verb or connective (e.g., “despite of their rich information can be leveraged”).
Minor technical comments:
- I suggest to replace “predictability” (as feature of a rule) with “predictiveness”. A rule is what predicts and not what is predicted.
- Fig. 1 (btw, it is rather a listing…) is not completely intuitive. What is called “implied facts” rather looks like generalized patterns. However, can the “suicide” example indeed be generalized? (What other cause of death can one ‘commit’?) Why is ‘People’ generalized to ‘type’?
- “An association rule encodes an association among triples with some shared variables” Although intuitive, the term ‘variable’ has not been used before and its meaning in this context should be explained. More important, association rule mining is an established term reaching far beyond triple aggregation; you should introduce a specific term when talking about triples, and formalize what can appear in the LHS and RHS of the rule.
- “Let E be a set of entities; R, a set of relations” Only binary relations, if I understand right.
- “The problems to be dealt with can be described as follows. The first is…” Please, use better labels than just “first, second, last”.
- Sec. 5.1 “We filter out knowledge triples whose object has a type frequency that is lower than an average object-type frequency of a relation encoded in the knowledge triples” Average is definitely not an ideal measure for setting the threshold, due to possible presence of outliers. Overall, the filtering is necessarily rough and thus very likely to filter out correct triples together with the erroneous ones. Since it is not a method of cleaning the KB as such but only of preparing a training set for subsequent mining, such over-filtering might not harm so much, the authors should however discuss it explicitly.
- Sec. 5.2 “have a name comprises just one word that is a concatenation of several words (e.g., causeOfDeath)” The term “word” is not adequate for the (surface) concatenation.
- Step 3 in Sec. 5.3.1 “pr does not have the same datatype” Up to now you deal with string decomposition, i.e., “2001” should still be an alphanumeric string, unless you consider an string-to-integer transformation; the same holds for transformation of string to class (identifier); although this step might be mostly straightforward, you cannot omit it in the description of your method.
- End of p.6: “the proposed system uses words of similar meaning according to WordNet” Explain what kind/s of WordNet relationships are used here.
- Sec. 5.3.2 The algorithm of bootstrapping would deserve a formal presentation.
- Sec. 5.4 “the standard confidence measure frequently used in the traditional ARM field” As mentioned above, “traditional ARM” is not necessarily that dealing with triples.
- Ibid: The support formula has no reference to its original source, looks a bit weird in terms of notation (e.g., what is the II symbol after the summation?), and the formulation “A support of a set of triples is defined…” is by itself odd: support is related to a rule/formula/itemset, and is calculated as the (usually, relative) cardinality of a set of objects/transactions (here, triples). It is also not completely clear what is a support of {t_cat, t_fact}, which would formally be a set of two triple sets. Does it mean, the union of both sets?
- 5.4.2 “for example, in Figure 5.2, rule1 and rule2 have the different standard confidence values” – should be Figure 3.
- P.8: “and ^ and v denote the min and max functions, respectively” I would advise against using these symbols, as they could be easily confused with logical connectives.
- “prevents an abnormally high confidence values (e.g., standConf(rule1) in Figure 3” Should be ConfStand.
- Just to make sure, all the four theta thresholds are meant to address the same C-K imbalance problems, but in a different way? (Controlling the bootstrapping vs. reducing the effects of imbalance a posteriori?) It would be worth explaining these two strategies in combination and, ideally, measuring their effects.
- As regards the upper and lower bound of rule confidence: 1) what heuristics drives their setting? 2) in the shown example, is the effect of unifying the confidence of rule1 and rule2 due to the joint computation of confidence for the whole set of same-pattern rules (SP(rule)) or to the use of thresholds?
- Regarding the use of WordNet in SALA, it should perhaps be explained why only synonymy and meronymy is used but not hypernymy.
- Is there empirical evidence how often dSim (use of antonyms) helps?
- “Finally, we define the (unnormalized) SALA confidence that ranges from -1.0 to 2.0” Depending on setting the w_sim and w_dSim weights, the range could probably be different?
- Sec. 5.5: An example of a group-based sampling session might be helpful. “This approach needs far less human effort than the exhaustive approach” Actually, the indicated time saving is vaguely described (“30 minutes would have been too short to estimate the precision of the entire 28,150 rules”) and makes think of only a few tens of percent decrease.
- Sec. 6.2.1 “randomly sampled 30% of the mapping-based properties and types” The term “mapping-based” has not been explained.
- “AMIE+ is enabled to mine rules with a constant value.” Unclear – how?
- Sec. 6.2.2 how does “hit ratio” differ from precision and why are the values so low? The verbal description is odd, e.g. “indicates how large predictions are matched with answers” (maybe it is an issue of incorrect English?)
- “it is still reasonably fast to enrich KBs because it spends only few hours on the entire DBpedia dataset, not days or months.” Actually, only a half of an hour, by Table 3?
- Sec. 6.3 Ideally, the effect of lexical and semantic adjustment in SALA confidence should be evaluated separately.
- Sec. 6.4.2 “We manually evaluate the precision of the triples predicted by the proposed system” Did you indeed manually evaluate the precision of *all* produced rules? Is it feasible?
- “YAGO3 evaluated their results by a Wilson center, we also use the Wilson center” What is the Wilson center?
- “Because DBpedia’s properties and YAGO’s properties have different names, we only compare those whose meanings are the almost exactly same” What is “almost exactly same”?
- The text in 6.4.3, does not seem to fully match Tables 4 and 5, please, check, e.g., “in some relations (the top-line boldfaced relations: origin, birthPlace, genre, homeTown, writer, party, award, birthYear, and deathYear), the proposed system is more predictive than YAGO’s prediction system while in some relations (the bottomeline bold-faced relations: deathPlace, prize, director, foundationYear, and type), YAGO’s prediction system is more predictive than the proposed system”
- Fig. 5 and 6 (actually, rather, listings again…) shown unnormalized SALA – which is then difficult to compare with standard confidence, since they have a different range, if I understand well.
- Sec. 7: the caption of Table 7 refers to Korean DBpedia, however, the text rather seems to refer to the English one – please, check!
- Some bibrefs are incomplete, e.g., [1], [9], [13]. There is also a bibtex decapitalization, e.g., in [4] (“amie+”).
To summarize along the main review axes:
1) The originality is medium-to-high. Adapted versions of known methods (ARM over RDF triples, lexical analysis of category names) are combined with a number of novel heuristics.
2) The significance of results is potentially high, as the approach may help decently enrich DBpedia or similar resources. A weaker point is the unclear distinction of the contribution of the different heuristics combined in the approach, and, generally, some missing details of the method.
3) The quality of writing is seriously compromised. Many aspects of the approach have to be read between the lines due both to lack of detail in some methods’ descriptions and to low quality of English.
|