Knowledge-Graph-Based Semantic Labeling: Balancing Coverage and Specificity

Tracking #: 2237-3450

This paper is currently under review
Ahmad Alobaid
Oscar Corcho

Responsible editor: 
Freddy Lecue

Submission type: 
Full Paper
Many data are published on the Web using tabular data formats (e.g., spreadsheets). This is especially the case for the data made available in open data portals, especially by public institutions. One of the main challenges for their effective (re)use is their generalized lack of semantics: column names are not usually standardized, their meaning and their content are not always clear, etc. Recently, knowledge graphs have started to be widely adopted by some data and service providers as a mean to publish large amounts of structured data. They use graph-based formats (e.g., RDF, graph databases) and often make references to lightweight ontologies. There is a common understanding that the reuse of such tabular data may be improved by annotating them with the types used by the data available in knowledge graphs. In this paper, we present a novel approach to automatically type tabular data columns with ontology classes referred to by existing knowledge graphs, for those columns whose cells represent resources (and not just property values). In contrast with existing proposals in the state-of-the-art, our approach does not require the use of external linguistic resources or annotated data sources for training, nor the building of a model of the knowledge graph beforehand. In this work, we show that semantic annotation of entity columns can achieve good results compared to the state-of-the-art using the knowledge graph as a training set without any context information, external resources or human in the loop.
Full PDF Version: 
Under Review