Retrieval-Augmented Generation-based Relation Extraction

Tracking #: 3810-5024

Authors: 
Sefika Efeoglu
Adrian Paschke1

Responsible editor: 
Guest Editors KG Gen from Text 2023

Submission type: 
Full Paper
Abstract: 
Information Extraction (IE) is a transformative process that converts unstructured text data into a structured format by em- ploying entity and relation extraction (RE) methodologies. Identifying the relation between a pair of entities plays a crucial role within this framework. Despite the availability of various techniques for RE, their efficacy heavily depends on access to labeled data and substantial computational resources. To address these challenges, Large Language Models (LLMs) have emerged as promising solutions; however, they are prone to generating hallucinated responses due to the limitations of their training data. To overcome these shortcomings, this work proposes a Retrieved-Augmented Generation-based Relation Extraction (RAG4RE) approach, which offers a pathway to enhance the performance of RE tasks. We evaluate the effectiveness of our RAG4RE using different LLMs. By leveraging established benchmarks such as TA- CRED, TACREV, Re-TACRED and SemEval RE datasets, we aim to comprehensively assess the efficacy of our methodology. Specifically, we employ prominent LLMs, including Flan T5, Llama2, and Mistral, in our investigation. The results of our work demonstrate that RAG4RE outperforms traditional RE methods based solely on LLMs, with significant improvements observed in the TACRED dataset and its variations. Furthermore, our approach exhibits remarkable performance compared to previous RE methodologies across both TACRED and TACREV datasets, underscoring its efficacy and potential for advancing RE tasks in natural language processing.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 06/Apr/2025
Suggestion:
Accept
Review Comment:

This paper presents a relation extraction method based on retrieval-augmented generation.
The central idea involves enhancing the prompt given to the
language model by appending a similar sentence retrieved from the training subset of the same dataset.
The results demonstrate that this strategy improves performance across three out of four datasets,
regardless of the language model used.

Most of the points from the review in the first round are addressed (e.g., more in-depth analysis; Table 4 contains a direct comparison to related work; related work is updated).
Still, it would be interesting to see how larger models (Llama-13b or 70b) and also more recent models (Llama3) would perform on these tasks,
because in many cases they perform better (especially in cases where more reasoning is required, such as in the proposed tasks).
Another minor improvement can be made by keeping the figures readable when printed in black/white (appendix C).

Review #2
By Garima Agrawal submitted on 20/Apr/2025
Suggestion:
Accept
Review Comment:

There's substantial improvement in writing and structure of the paper. The augmentation method is explained well. The examples and figures are illustrative. Github repo has detailed Readme and data and code is available well organized.

Review #3
Anonymous submitted on 17/May/2025
Suggestion:
Accept
Review Comment:

I see few typos, please review for the finalized version and also would be nice to see the discussion on ethical aspect.

Review #4
Anonymous submitted on 19/May/2025
Suggestion:
Accept
Review Comment:

The paper "Retrieval-Augmented Generation-based Relation Extraction (RAG4RE)" presents a novel zero-shot approach to relation extraction that enhances prompt quality for large language models (LLMs) by integrating semantically similar sentences retrieved from training data. This RAG-based framework, evaluated on benchmark datasets such as TACRED, TACREV, Re-TACRED, and SemEval using models like Flan-T5, LLaMA2, and Mistral, demonstrates superior performance over simple query-based prompting and several state-of-the-art methods, particularly in reducing hallucinations and improving micro-F1 scores. The authors detail a well-structured pipeline consisting of retrieval, data augmentation, and generation modules, and support their claims with comprehensive ablation studies. However, the approach shows limited generalization to the SemEval dataset, likely due to its dependence on contextually inferable relations and the limitations of vanilla LLMs. While the method is robust and innovative, further refinements—such as improved domain adaptation and more accurate retrieval—could enhance its applicability across diverse relation extraction tasks.