Abstract:
Semantic Table Annotation (STA) stands as a crucial process in the realm of data interpretation and knowledge extraction, especially within the context of big data and the Semantic Web. Tables, ubiquitous across diverse domains from scientific literature to business reports, contain a wealth of structured information waiting to be unveiled. However, this information remains largely untapped in the absence of effective methods for semantic annotation. In essence, STA enriches raw tables with semantic metadata such as entities, classes, and relations obtained from Knowledge Graphs (KGs). It bridges the gap between unstructured data and structured knowledge representation, enabling sophisticated data analytics, information retrieval, and decision-making processes. It unlocks the potential of tabular data in the era of data-driven decision-making. However, automating this semantic annotation, particularly for noisy tabular data, remains a formidable challenge. In this paper, we give a detailed overview of our STA system, JenTab. We developed and tested JenTab under the umbrella of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) challenge 2020-2022. However, we extended the evaluation of JenTab beyond the scope of these challenges. JenTab is a core system for STA, it is a top-3 ranked systems among other participants throughout its years of development. In addition, we present a detailed evaluation for its individual components, extensive discussion of JenTab with its limitations, and a demonstration of the system configuration, execution and output.