One-shot HDT-based Semantic Labeling of Entity Columns in Tabular Data

Tracking #: 2474-3688

This paper is currently under review
Ahmad Alobaid
Oscar Corcho
Wouter Beek

Responsible editor: 
Guest Editors Web of Data 2020

Submission type: 
Full Paper
A lot of data are shared across organisations and on the Web in the form of tables (e.g., CSV). One way to facilitate the exploitation of such data and allow understanding their content is by applying semantic labeling techniques, which assign ontology classes to their tables (or parts of them), and properties to their columns. As a result of the semantic labeling process, such data can be then exposed as virtual or materialised RDF (e.g., by using mappings), and hence queried with SPARQL. We propose a one-shot semantic labeling approach to learn the classes to which the resources represented in a tabular data source belong, as well as properties of entity columns. In comparison to some of our previous approaches, this approach exploits the fact that the knowledge base used as an input source is only available in the RDF HDT binary format. We evaluate our approach with the T2Dv2 dataset. The results show that our approach achieves competitive results in comparison with state-of-the-art approaches without the need for using a full-fledged query language (e.g., SPARQL) or profiling of knowledge bases.
Full PDF Version: 
Under Review