Plausibility Assessment of Triples with Instance-based Learning Distantly Supervised by Background Knowledge

Tracking #: 1546-2758

This paper is currently under review
Soon Hong
Mun Yong Yi

Responsible editor: 
Guest Editors ML4KBG 2016

Submission type: 
Full Paper
Building knowledge bases from text offers a practical solution to the knowledge acquisition bottleneck issue. However, this approach has introduced another fundamental issue: rampant triples with erroneous expressions which are rarely found in human expressions. Unlike human expressions, triples are neither guaranteed to be plausible nor making sense because they are extracted by machines that have no semantic understanding of given text. Triple validation is a way of pruning erroneous triples. Recent research, however, does not consider this fundamental difference and has some limitations. First, the difference between plausibility and truth was not well understood. A true/false framework, which is more suitable for validating human expressions, has been used to validate triples. Therefore, some researchers perform plausibility assessments but jump to conclusions about truth or correctness. Second, most researchers use contrived negative training data because it is difficult to define what “negative” is for plausible or true triples. Third, the synergy of combining aWeb-driven approach and a knowledge base-driven approach has not been explored. This paper reviews the process of triple validation from a different perspective, improving upon the knowledge base building process. This paper conceptualizes triple validation as a two-step procedure: a domain-independent plausibility assessment and domain-dependent truth validation only for plausible triples. It also proposes a new plausible/nonsensical framework overlaid with a true/false framework. This paper focuses on plausibility assessments of triples by challenging the limitations of existing approaches. It attempts to consistently build both positive and negative training data with distant supervision by DBpedia and Wikipedia. It adopts instance-based learning to skip the generation of pre-defined models that have difficulty in dealing with triples’ variable expressions more readily. The experimental results support the proposed approach, which outperformed several baselines. The proposed approach can be used to filter out newly extracted nonsensical triples and existing nonsensical triples in knowledge bases, as well as to learn even semantic relationships. The proposed approach can be used on its own, or it can complement existing truth-validation processes. Extending structured and unstructured background knowledge remains for future investigation.
Full PDF Version: 
Under Review