Fact Checking in Knowledge Graphs by Logical Consistency

Tracking #: 2721-3935

This paper is currently under review
Ji-Seong Kim
Key-Sun Choi

Responsible editor: 
Guest Editors KG Validation and Quality

Submission type: 
Full Paper
Misinformation spreads across media, community, and knowledge graphs in the Web by not only human agents but also information extraction systems that automatically extract factual statements from unstructured textual data to populate existing knowledge graphs. Traditional fact checking by experts is increasingly difficult to keep pace with the volume of newly created information in the Web. Therefore, it is important and necessary to enhance the computational ability to determine whether a given factual statement is truthful or not. In this paper, our goal is to 1) mine weighted logical rules from a knowledge graph, 2) to find positive and negative evidential paths in a knowledge graph for a given factual statement by the mined rules, and 3) to calculate a truth score for a given statement by an unsupervised ensemble of the found evidential paths. For example, we can determine the statement "The United States is the birth place of Barack Obama" as truthful since there is the positive evidential path (Barack Obama, birthPlace, Hawaii) ∧ (Hawaii, country, United States) in a knowledge graph, and it is logically consistent with the given statement. On the contrary, we can determine the factual statement "Canada is the nationality of Barack Obama" as untruthful since there is the negative evidential path (Barack Obama, birthPlace, Hawaii) ∧ (Hawaii, country, United States) ∧ (United States, ≠ , Canada) in a knowledge graph, and it is logically contradictory to the given statement. For evaluation, we constructed a novel evaluation dataset by labeling true or false labels on the factual statements extracted from Wikipedia texts by the state-of-the-art BERT-based relation extractor. Our evaluation results show that the proposed weighted logical rule-based approach outperforms the state-of-the-art unsupervised approaches significantly by up to 0.12 AUC-ROC, and even outperforms the supervised approach by up to 0.05 AUC-ROC not only in our dataset but also in the two publicly available datasets. The source code and evaluation dataset proposed in this paper is open-source and available at https://github.com/machinereading/KV-rule and https://github.com/machinereading/KV-eval-dataset each.
Full PDF Version: 
Under Review