C2K: Acquiring Knowledge from Categories Using Semantic Associations

Tracking #: 1549-2761

Authors: 
Ji-Seong Kim
Dong-Ho Choi
Key-Sun Choi

Responsible editor: 
Guest Editors ML4KBG 2016

Submission type: 
Full Paper
Abstract: 
There are several RDF (Resource Description Framework) knowledge bases that store community-generated categories of entities and conceptual or factual information about entities. These two types of information may have strong associations; for example, entities categorized in "People from Korea" (categorial information) have a high probability of being a person (conceptual information) and being born in Korea (factual information). This kind of associations can be used for extracting new conceptual or factual information about entities. In this paper, we propose a prediction system that predicts new conceptual or factual information from categories of entities. First, the proposed system uses a novel association rule mining (ARM) approach that effectively mines rules encoding associations between categories of entities and conceptual or factual information about entities contained in existing RDF knowledge bases. Our extensive experiments show that our novel ARM approach outperforms the state-of-the-art ARM approach in terms of the prediction quality and coverage of these kind of associations. Second, the proposed system ranks and groups the mined rules based on their predictability by our novel semantic confidence measure calculated with a semantic resource such as WordNet. The experiments show that our novel confidence measure outperforms the standard confidence measure frequently used in the traditional ARM field in terms of discriminating the predictability of mined rules. Last, the proposed prediction system selects only rules of predictability from ranked and grouped rules, and then uses them to predict accurate new information from categories of entities. The experiments show that the results of the proposed prediction system are fairly comparable to that of the state-of-the-art prediction system in terms of the accuracy of prediction while overwhelming the coverage of prediction.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Simon Razniewski submitted on 25/Jan/2017
Suggestion:
Major Revision
Review Comment:

This paper discusses the problem of learning rules to predict KB facts from Wikipedia categories. There are two main parts, the first concerns the learning of (possibly parameterized) category-to-fact rules (called C2K rules), the second concerns the grouping and ranking of these rules.

In my view, the first part contains a novel and very valuable contribution, which has potential to significantly extend the scope of existing knowledge bases. However, the presentation of the paper shows weaknesses, and I particularly fail to understand large portions of the second part. As also the experiments do not sufficiently highlight the advantages of the presented approach, I recommend that this paper undergoes a major revision.

For the revision, I would concretely recommend to:
- Completely rewrite Sections 5.4 and 5.5, motivating and explaining the idea of nontrivial grouping of rules, and explaining why sampling rules suffices when aiming to select rules with good predictability
- Relate the proposed new rule scoring metric to previous metrics
- Explain the annotation scheme in the experimental section, and add numbers that mention total new facts, and total new facts wrt. YAGO relations
- Significantly improve the writing in terms of presentation.

Below I detail my comments:

Part 1 on C2K rule mining (5.2 in the paper)

Learning rules using exact or partial matches based on linguistic similarity is an interesting and promising contribution. I would like to understand in more detail in which way Wordnet is used. Which relations from Wordnet are you using? Are you using directly related words only, or also transitively related ones?
Also, I would greatly appreciate a simple analysis that says how scalable this method is. E.g., in a random sample of 50 categories, for how many can you find Wordnet-matches at all? How correct do the matches appear based on a superficial look?

Part 2 on Ranking and Grouping (5.4 and 5.5 in the paper)

I have difficulties to understand these sections. It seems that Section 5.5 should appear before Section 5.4, in order to motivate the grouping, but even then, I do not really understand what is going on. It seems the goal of grouping is to speed up the manual rule review, is that correct? I do not understand though why semantically related rules should have similar predictability (besides for the lexically related rules, where that is trivial), is there empirical evidence for that? How big are the clusters created in 5.4? Also, in Section 5.5 I do not understand why sampling rules of a particular confidence bucket is sufficient, as again, I do not see why confidence and predictability should be strongly related. The group-divide-predict-select scheme would probably also benefit from an example.
Regarding the proposed new ranking metric, I am missing a relation to other metrics. Standard confidence is by far not the only way to rank rules, see e.g. https://en.wikipedia.org/wiki/Association_rule_learning#Useful_Concepts for definitions of metrics like lift and conviction, or the PCA_Confidence used in AMIE+.

Experiments

Evaluating rule mining systems is difficult, due to the high incompleteness of knowledge bases. Table 3 convincingly shows that the proposed method can predict a significantly higher ratio of facts that were previously removed from a KB than the AMIE+ system. In contrast, the remaining evaluation leaves the question open to which extend the proposed method can extend the scope of existing KBs.
- "Some C2k rules are manually evaluated as predictive or not predictive" -> How many? What was the annotation scheme?
- "We manually evaluated the precision of the triples predicted by the proposed system" -> Table 4 shows over a million predicted facts, which makes me doubt this claim
- I did not find a single mention of how many facts the proposed method learns in totally, I only find the number of 4450 relations? How many new fact can you learn in total?
- Similarly, I would like to see how many new facts wrt. YAGO or DBpedia the method learns. According to Table 4 that is probably a lot, but one can only guess this. The real number is presumably a great selling point of the proposed approach, and should appear in the paper as early as possible, possibly already in the abstract!

Other
- Section 5.1 (Filtering) comes unexpected and looks a bit ad-hoc. Is there evidence that this is needed, are there common-sense reasons why the proposed heuristic filtering methods are good, or experimental evidence supporting that? Also, while it is believable that DBpedia contains many erroneous triples, a claim like "contains mostly erroneous triples" should be supported by a concrete number or reference.
- What is the relation to "The Association Rule Mining System for Acquiring Knowledge of DBpedia from Wikipedia Categories", NLP & DBpedia @ ISWC, 2015?

Presentation
- General: On how to write text in math mode properly, see e.g. http://www.math.harvard.edu/texman/node20.html
- The readability of the paper would greatly benefit from proofreading for grammar and style issues. A few are mentioned below.
- Is the relation called "categorize in" or "categorized in"? I find both forms.
- P1
- If using an abbreviation (C2K) in the title, it might be good to introduce it in the abstract
- Abstract last sentence, please change grammar/reformulate
- First sentence: "attention" is uncountable and should be used in singular
- Second sentence: Same for "research". In contrast, "tables" and "sources of information" are countable and should appear in plural. Also, grammar should be fixed
- 2nd paragraph 1st sentence: "have been" -> "are"
- P2
- "predictability" - Explain what that means
- Outline is generic and uninformative
- P3
- C2K is used here, but the name is not explained
- P5
- Terms T_know, T_cat are used in Fig. 2, but not explained. Also, the role of humans is not visible in Fig. 2.
- P7
- A nonexisting Fig. 5.2 is referenced
- P14
- Fig. 5 and 6 seem to be tables, not figures?
- Left text column has very weird spacing
- References: All references containing system names appear misspelled

Possibly relevant references:
- Acquisition of instance attributes via labeled and related instances, Alfonseca et al., SIGIR 2010
- Decoding Wikipedia Categories for Knowledge Acquisition, Natase and Strube, AAAI 2008

Review #2
Anonymous submitted on 10/Feb/2017
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.
Very inserting framework to mine rules from category triples. The writing is clear and nice to read. The author could change the y axis range in a plot in figure 4.

Review #3
By Vojtěch Svátek submitted on 25/Feb/2017
Suggestion:
Major Revision
Review Comment:

The submission describes a multi-step approach to automatically generating facts about knowledge base entities from categories of these entities. The approach integrates a frequency-based method with lexical analysis of labels and several additional techniques such as bootstrapping overcoming the data sparsity problem.
The problem addressed is somewhat narrow, as it assumes a knowledge base with verbose categories encapsulating implicit facts; the practice of materializing such categories is probably rare beyond knowledge bases derived from Wikipedia categories. OTOH, Wikipedia is such a prominent resource (on which DBpedia, YAGO, etc. are built) that the effort is absolutely worthwhile.
The related work is quite lean, which is likely influenced by the mentioned specific focus. Of the 17 citations, 7 are by Suchanek and colleagues (as direct precursor project/s), and most other citations are to indirectly related projects (such as the construction of DBpedia). More thorough mapping of even more distantly related projects would be suitable for a fully-blown journal paper. Partially relevant techniques can surely be found both within statistical relational learning and lexical label decomposition. Beyond these areas directly referred to in the paper, there is also research on ontology pattern learning [1, 2]. As regards the transformation between the category-centric and relationship-centric view of describing entities, possibly related is also the generation of tentative categories from property-centric ontology axioms [3], which is a kind of opposite approach.
[1] Eleni Mikroyannidi, Luigi Iannone, Robert Stevens, Alan L. Rector: Inspecting Regularities in Ontology Design Using Clustering. International Semantic Web Conference (1) 2011: 438-453
[2] Agnieszka Lawrynowicz, Jedrzej Potoniec, Michal Robaczyk, Tania Tudorache: Discovery of Emerging Design Patterns in Ontologies Using Tree Mining. Semantic Web, to appear, http://www.semantic-web-journal.net/content/discovery-emerging-design-pa....
[3] Vojtech Svátek, Ondrej Zamazal, Miroslav Vacura: Categorization Power of Ontologies with Respect to Focus Classes. EKAW 2016: 636-650
The used methods appears sound in principle. However, the whole workflow consists of numerous steps and the impact of each of them is not individually tested. Therefore there is no empirical evidence that the approach as a whole is the optimal solution to be possibly reused where possible.
As regards the English of the article, careful proof-checking is a must. The number of grammatical errors is overwhelming, including (only picking examples from the very first paragraph!), but not only:
- Mismatch of grammatical number (e.g., “This paper mainly focus”)
- Counting of uncountable words (e.g., “a lot of attentions”, “many researches”)
- Ill/structured sentences, such as missing verb or connective (e.g., “despite of their rich information can be leveraged”).
Minor technical comments:
- I suggest to replace “predictability” (as feature of a rule) with “predictiveness”. A rule is what predicts and not what is predicted.
- Fig. 1 (btw, it is rather a listing…) is not completely intuitive. What is called “implied facts” rather looks like generalized patterns. However, can the “suicide” example indeed be generalized? (What other cause of death can one ‘commit’?) Why is ‘People’ generalized to ‘type’?
- “An association rule encodes an association among triples with some shared variables” Although intuitive, the term ‘variable’ has not been used before and its meaning in this context should be explained. More important, association rule mining is an established term reaching far beyond triple aggregation; you should introduce a specific term when talking about triples, and formalize what can appear in the LHS and RHS of the rule.
- “Let E be a set of entities; R, a set of relations” Only binary relations, if I understand right.
- “The problems to be dealt with can be described as follows. The first is…” Please, use better labels than just “first, second, last”.
- Sec. 5.1 “We filter out knowledge triples whose object has a type frequency that is lower than an average object-type frequency of a relation encoded in the knowledge triples” Average is definitely not an ideal measure for setting the threshold, due to possible presence of outliers. Overall, the filtering is necessarily rough and thus very likely to filter out correct triples together with the erroneous ones. Since it is not a method of cleaning the KB as such but only of preparing a training set for subsequent mining, such over-filtering might not harm so much, the authors should however discuss it explicitly.
- Sec. 5.2 “have a name comprises just one word that is a concatenation of several words (e.g., causeOfDeath)” The term “word” is not adequate for the (surface) concatenation.
- Step 3 in Sec. 5.3.1 “pr does not have the same datatype” Up to now you deal with string decomposition, i.e., “2001” should still be an alphanumeric string, unless you consider an string-to-integer transformation; the same holds for transformation of string to class (identifier); although this step might be mostly straightforward, you cannot omit it in the description of your method.
- End of p.6: “the proposed system uses words of similar meaning according to WordNet” Explain what kind/s of WordNet relationships are used here.
- Sec. 5.3.2 The algorithm of bootstrapping would deserve a formal presentation.
- Sec. 5.4 “the standard confidence measure frequently used in the traditional ARM field” As mentioned above, “traditional ARM” is not necessarily that dealing with triples.
- Ibid: The support formula has no reference to its original source, looks a bit weird in terms of notation (e.g., what is the II symbol after the summation?), and the formulation “A support of a set of triples is defined…” is by itself odd: support is related to a rule/formula/itemset, and is calculated as the (usually, relative) cardinality of a set of objects/transactions (here, triples). It is also not completely clear what is a support of {t_cat, t_fact}, which would formally be a set of two triple sets. Does it mean, the union of both sets?
- 5.4.2 “for example, in Figure 5.2, rule1 and rule2 have the different standard confidence values” – should be Figure 3.
- P.8: “and ^ and v denote the min and max functions, respectively” I would advise against using these symbols, as they could be easily confused with logical connectives.
- “prevents an abnormally high confidence values (e.g., standConf(rule1) in Figure 3” Should be ConfStand.
- Just to make sure, all the four theta thresholds are meant to address the same C-K imbalance problems, but in a different way? (Controlling the bootstrapping vs. reducing the effects of imbalance a posteriori?) It would be worth explaining these two strategies in combination and, ideally, measuring their effects.
- As regards the upper and lower bound of rule confidence: 1) what heuristics drives their setting? 2) in the shown example, is the effect of unifying the confidence of rule1 and rule2 due to the joint computation of confidence for the whole set of same-pattern rules (SP(rule)) or to the use of thresholds?
- Regarding the use of WordNet in SALA, it should perhaps be explained why only synonymy and meronymy is used but not hypernymy.
- Is there empirical evidence how often dSim (use of antonyms) helps?
- “Finally, we define the (unnormalized) SALA confidence that ranges from -1.0 to 2.0” Depending on setting the w_sim and w_dSim weights, the range could probably be different?
- Sec. 5.5: An example of a group-based sampling session might be helpful. “This approach needs far less human effort than the exhaustive approach” Actually, the indicated time saving is vaguely described (“30 minutes would have been too short to estimate the precision of the entire 28,150 rules”) and makes think of only a few tens of percent decrease.
- Sec. 6.2.1 “randomly sampled 30% of the mapping-based properties and types” The term “mapping-based” has not been explained.
- “AMIE+ is enabled to mine rules with a constant value.” Unclear – how?
- Sec. 6.2.2 how does “hit ratio” differ from precision and why are the values so low? The verbal description is odd, e.g. “indicates how large predictions are matched with answers” (maybe it is an issue of incorrect English?)
- “it is still reasonably fast to enrich KBs because it spends only few hours on the entire DBpedia dataset, not days or months.” Actually, only a half of an hour, by Table 3?
- Sec. 6.3 Ideally, the effect of lexical and semantic adjustment in SALA confidence should be evaluated separately.
- Sec. 6.4.2 “We manually evaluate the precision of the triples predicted by the proposed system” Did you indeed manually evaluate the precision of *all* produced rules? Is it feasible?
- “YAGO3 evaluated their results by a Wilson center, we also use the Wilson center” What is the Wilson center?
- “Because DBpedia’s properties and YAGO’s properties have different names, we only compare those whose meanings are the almost exactly same” What is “almost exactly same”?
- The text in 6.4.3, does not seem to fully match Tables 4 and 5, please, check, e.g., “in some relations (the top-line boldfaced relations: origin, birthPlace, genre, homeTown, writer, party, award, birthYear, and deathYear), the proposed system is more predictive than YAGO’s prediction system while in some relations (the bottomeline bold-faced relations: deathPlace, prize, director, foundationYear, and type), YAGO’s prediction system is more predictive than the proposed system”
- Fig. 5 and 6 (actually, rather, listings again…) shown unnormalized SALA – which is then difficult to compare with standard confidence, since they have a different range, if I understand well.
- Sec. 7: the caption of Table 7 refers to Korean DBpedia, however, the text rather seems to refer to the English one – please, check!
- Some bibrefs are incomplete, e.g., [1], [9], [13]. There is also a bibtex decapitalization, e.g., in [4] (“amie+”).
To summarize along the main review axes:
1) The originality is medium-to-high. Adapted versions of known methods (ARM over RDF triples, lexical analysis of category names) are combined with a number of novel heuristics.
2) The significance of results is potentially high, as the approach may help decently enrich DBpedia or similar resources. A weaker point is the unclear distinction of the contribution of the different heuristics combined in the approach, and, generally, some missing details of the method.
3) The quality of writing is seriously compromised. Many aspects of the approach have to be read between the lines due both to lack of detail in some methods’ descriptions and to low quality of English.