A Systematic Literature Review on RDF Triple Generation from Natural Language Text

Tracking #: 3805-5019

Authors: 
Andre Regino
Anderson Rossanez
Ricardo da Silva Torres2
Julio Cesar dos Reis

Responsible editor: 
Guest Editors KG Gen from Text 2023

Submission type: 
Survey Article
Abstract: 
We live in a big data era of unstructured data expressed as natural language (NL) texts. As the volume of text-based information grows, effective methods for encoding and extracting meaningful knowledge from this corpus are of paramount relevance. A challenging task concerns transforming NL texts into structured and semantically rich data. Semantic web technologies have revolutionized the way we represent and access structured knowledge. Resource Description Framework (RDF) triples serve as a fundamental building block for this purpose, allowing the integration of diverse data sources. This investigation examines methods for RDF triple generation and Knowledge Graphs (KGs) enhancement from natural language texts. This study area presents wide-ranging applications encompassing knowledge representation, data integration, natural language understanding, and information retrieval. Our systematic literature review addresses the understanding, characterization, and identification of challenges and limitations in existing approaches to RDF triple generation from NL texts and their inclusion into an existing KG. We retrieved, categorized, and analyzed 150 articles from several scientific databases. We provide a comprehensive overview of the field, identify research gaps, and provide directions for future research. We found the most commonly available study categories, especially considering the domain, the targeted language, the public availability of datasets, and real-world applications. Our results reveal a growing trend in this field in the last few years related to the use of transformer-based machine learning methods for triple generation. Our study also drives innovation by highlighting open research questions and providing a roadmap for future investigations.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 31/Mar/2025
Suggestion:
Minor Revision
Review Comment:

# Feedback to the revised version
The revised version resolved several weaknesses. The following weaknesses still remain in my opinion:

Weakness: If most of the 150 papers are filtered out, why are they still considered in section 4? “We confirmed
that not all the surveyed articles conformed to the specific focus of our study, which centered on the generation of
triples from NL texts, and their insertion into an existing KG, adhering to predefined ontology specifications.” sounds like they were not in the scope of the study. Please clarify that.
Revision: Clarify the purpose of including the 150 papers. While you mention it helps answer RQ-05, the main scope is the remaining 15 papers. The role of the 150 papers should be explicitly stated.

Weakness: Why is entity linking not mentioned once? Yes, NER is important but if one wants to include new triples in a KG, identifying whether an encountered entity already exists in it, is important as well. :
Revision: While some of the reported methods use entity linking, it was not introduced in the paper.

#Comments/Suggestions/Typos
“Constributes” in line 42 on page 12

# Categories
(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.
The text lacks an introduction to entity linking. Otherwise, it is suitable as an introductory text.
(2) How comprehensive and how balanced is the presentation and coverage.
It is comprehensive.
(3) Readability and clarity of the presentation.
Overall readability is good. However, it is unclear why both the 150 papers and the 15 papers are analyzed when the former are out of scope. A clarifying statement is needed.
(4) Importance of the covered material to the broader Semantic Web community.
The covered material is important to the Semantic Web community.

# Summary
The revised version resolved most of the previously mentioned flaws. However, two main issues need clarification:
1. The rationale for analyzing both the 150 and 15 papers.
2. The omission of entity linking, which is crucial for RDF Triple Generation.

Given these points, I lean towards accepting with minor revisions.

Review #2
Anonymous submitted on 14/Apr/2025
Suggestion:
Accept
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

Review #3
By Ogerta Elezaj submitted on 17/Apr/2025
Suggestion:
Accept
Review Comment:

It should be accepted, as all the changes requested during the first review have been addressed.