Typology-based Semantic Labeling of Numeric Tabular Data

Tracking #: 2172-3385

This paper is currently under review
Ahmad Alobaid
Emilia Kacprzak
Oscar Corcho

Responsible editor: 
Guest Editors EKAW 2018

Submission type: 
Full Paper
More than 150 Million tabular datasets can be found on the Google Crawl of the Web. Semantic labeling of these datasets may help in their understanding and exploration. However, many challenges need to be addressed to do this automatically. With numbers, it can be even harder due to the possible difference in measurement accuracy, rounding errors, and even the frequency of their appearance (if treated as literals). Multiple approaches have been proposed in the literature to tackle the problem of semantic labeling of numeric values in existing tabular datasets, but they also suffer from several shortcomings: closely coupled with entity-linking, rely on table context, need to profile the knowledge graph and the prerequisite of manual training of the model. Above all, they all treat different kinds of numeric values evenly. In this paper, we tackle these problems and validate our hypothesis: whether treating different kinds of numeric columns differently yields a better solution.
Full PDF Version: 
Under Review