MTab4D: Semantic Annotation of Tabular Data with DBpedia

Tracking #: 2894-4108

This paper is currently under review
Authors: 
Phuc Nguyen
Natthawut Kertkeidkachorn1
Ryutaro Ichise
Hideaki Takeda

Responsible editor: 
Jens Lehmann

Submission type: 
Full Paper
Abstract: 
Semantic annotation of tabular data is the process of matching table elements with knowledge graphs. As a result, the table contents could be interpreted or inferred using knowledge graph concepts, enabling them to be useful in downstream applications such as data analytics and management. Nevertheless, semantic annotation tasks are challenging due to insufficient tabular data descriptions, heterogeneous schema, and vocabulary issues. This paper presents an automatic semantic annotation system for tabular data, called MTab4D, to generate annotations with DBpedia in three annotation tasks: 1) Cell-Entity (CEA), 2) Column-Type (CTA), and 3) Column Pair-Property (CPA). In particular, we propose an annotation pipeline that combines multiple matching signals from different table elements to address schema heterogeneity, data ambiguity, and noisiness. Additionally, this paper provides insightful analysis and extra resources on benchmarking semantic annotation with knowledge graphs. Experimental results on the original and adapted datasets of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2019) show that our system achieves an impressive performance for the three annotation tasks. MTab4D's repository is publicly available at https://github.com/phucty/mtab4dbpedia.
Full PDF Version: 
Tags: 
Under Review