MTab4DBpedia: Semantic Annotation for Tabular Data with DBpedia

Tracking #: 2609-3823

This paper is currently under review
Authors: 
Phuc Nguyen
Natthawut Kertkeidkachorn
Ryutaro Ichise
Hideaki Takeda

Responsible editor: 
Jens Lehmann

Submission type: 
Full Paper
Abstract: 
Semantic annotation for tabular data with knowledge graphs is a process of matching table elements to knowledge graph concepts, then annotated tables could be useful for other downstream tasks such as data analytic, management, and data science applications. Nevertheless, the semantic annotations are complicated due to the lack of table metadata or description, ambiguous or noisy table headers, and table content. In this paper, we present an automatic semantic annotation system designed for the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2019), called MTab4DBpedia, to generate semantic annotations for table elements with DBpedia concepts. In particular, our system could generate Cell-Entity Annotation (CEA), Column-Type Annotation (CTA), Column Relation-Property Annotation (CPA). MTab4DBpedia combines joint probability signals from different table elements and majority voting to solve the matching challenges on data noisiness, schema heterogeneity, and ambiguity. Results on SemTab 2019 show that our system consistently obtains the best performance for the three matching tasks: the 1st rank all rounds (the four rounds), and all tasks (the three tasks) of SemTab 2019. Additionally, this paper also provides our reflections from a participant’s perspective and insightful analysis and discussion on the general benchmark for tabular data matching.
Full PDF Version: 
Tags: 
Under Review