A Systematic Literature Review on RDF Triple Generation from Natural Language Text

Tracking #: 3650-4864

This paper is currently under review
Andre Regino
Anderson Rossanez
Ricardo da Silva Torres2
Julio Cesar dos Reis

Responsible editor: 
Guest Editors KG Gen 2023

Submission type: 
Survey Article
We live in a big data era of unstructured data expressed as natural language (NL) texts. As the volume of text-based information grows, effective methods for encoding and extracting meaningful knowledge from this corpus are of paramount relevance. A challenging task concerns transforming NL texts into structured and semantically rich data. Semantic web technologies have revolutionized the way we represent and access structured knowledge. Resource Description Framework~(RDF) triples serve as a fundamental building block for this purpose, allowing the integration of diverse data sources. This survey examines methods for RDF triple generation and Knowledge Graphs (KGs) enhancement from natural language texts. This study area presents wide-ranging applications encompassing knowledge representation, data integration, natural language understanding, and information retrieval. Our systematic literature review addresses the understanding, characterization, and identification of challenges and limitations in existing approaches to RDF triple generation from NL texts and their inclusion into an existing KG. We retrieved, categorized, and analyzed $150$ articles from several scientific databases. We provide a comprehensive overview of the field, identify research gaps, and provide directions for future research. We found the most commonly available study categories, especially considering the domain, the targeted language, the public availability of datasets, and real-world applications. Our results reveal a growing trend in this field in the last few years relate to the use of transformer-based machine learning methods for triple generation. Our study also drives innovation by highlighting open research questions and providing a roadmap for future investigations.
Full PDF Version: 
Under Review