Towards Intelligent Research Collaboration: A Hybrid AI Framework for Recommending Participants in Research Projects

Tracking #: 3870-5084

Authors: 
Piermichele Rosati
Emanuele Laurenzi
michela.quadrini

Responsible editor: 
Guest Editors 2025 LLM GenAI KGs

Submission type: 
Full Paper
Abstract: 
The success of research project proposals largely depends on the quality of the consortium, which must possess strong expertise and experience aligned with the themes of the relevant funding calls, such as those under the EU’s Horizon Europe programme. However, forming such a consortium remains one of the most difficult tasks, as it involves identifying suitable research collaborators. Traditional approaches typically rely on social networks or citation metrics, but these have shown limited effectiveness. This paper introduces an Agentic Graph-based Retrieval-Augmented Generation (RAG) approach that delivers contextualized and explainable collaborator recommendations, tailored to researchers’ expertise and the relevance of proposed projects, offering improved performance over conventional methods. The approach integrates the strengths of Knowledge Graphs (KGs) and Large Language Models (LLMs), and has been developed using the Design Science research methodology. Its effectiveness was assessed using two of the top-performing LLMs currently available: Claude Sonnet 3.5 and GPT-4o.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Sheeba Samuel submitted on 04/Aug/2025
Suggestion:
Major Revision
Review Comment:

Summary:
This paper proposes a hybrid AI framework combining Knowledge Graphs and Large Language Models using a GraphRAG and AgenticRAG approach to recommend research collaborators and consortia for EU-funded projects. It leverages the EURIO ontology to semantically model data from the CORDIS dataset, enabling explainable and contextualized recommendations based on project descriptions and objectives. The system architecture includes retrieval, augmentation, and generation components, coordinated through intelligent agents that generate SPARQL queries and semantic embeddings. Evaluation using the RAGAs framework shows that GPT-4o outperforms Claude 3.5 Sonnet in generating relevant, accurate, and semantically aligned recommendations.

While the proposed concept is relevant and timely, especially for research collaboration in large funding schemes, the manuscript suffers from several conceptual, structural, and technical gaps that need to be addressed for clarity, reproducibility, and impact.

Strengths:
* The integration of KGs with Retrieval-Augmented Generation (RAG) and Agentic workflows demonstrates a novel and thoughtful combination of symbolic and neural AI techniques for research collaboration.
* Leveraging the EURIO ontology and CORDIS datasets provides strong domain relevance and grounding in real EU-funded research projects, enhancing the system’s practical applicability.
* The code and datasets are available via GitHub to support reproducibility and community engagement.

Weaknesses:
* The literature review presents several state-of-the-art recommender systems and KGs, but it does not clearly connect these works to the proposed system, clarify how it advances the field and justify its novelty. Many referenced works, such as ORKG and VIVO ontology, are mentioned without proper citations or explanation of their relevance to the proposed approach. Furthermore, although the authors introduce Literature Requirements (LR1–LR4) and Application Requirements (APR1–APR7), these are not clearly defined or grounded in prior literature, and there is no comparative mapping showing how these requirements are (or are not) addressed by previous works.
* A critical omission in the paper is the lack of discussion or mitigation of bias. This is a serious concern given the paper’s goal of recommending collaborators and research consortia—tasks that can directly influence funding opportunities and academic visibility. The system is built upon the EURIO Knowledge Graph, which in turn draws from the CORDIS dataset—a database that primarily includes organizations and participants already funded under FP7 and Horizon 2020. This introduces a structural bias: the recommender system can only suggest collaborators who are already embedded in past EU-funded projects. As a result, early-career researchers, new organizations, researchers from underrepresented regions, or those who have not previously participated in funded consortia are invisible to the system. This kind of historical bias reinforces existing power structures and excludes new entrants, directly contradicting the EU’s stated goals of widening participation in research funding. Without strategies to incorporate external knowledge sources (e.g., Wikidata, ORCID, etc.) or mechanisms to promote fair exposure, the recommendations may amplify inequality in research access. Moreover, no attempt is made to evaluate the demographic or institutional diversity of recommended collaborators or consortia, nor is there any bias auditing or fairness metric applied during evaluation. The system's dependence on LLMs also introduces a secondary layer of bias from the language models themselves, which is similarly unaddressed.
* The results section provides a description of the system architecture and its components but lacks clear separation between methodological details and actual experimental outcomes. Much of the content labeled as “results” (e.g., the use of the EURIO ontology, the agent workflow, and system design) is more appropriate for the methods section, as these represent design decisions rather than evaluation findings.
* The evaluation in the paper is minimal and insufficient to support the claims of effectiveness and reliability of the proposed system. The authors rely on a dataset of only 10 queries, which is too small to yield statistically significant or generalizable results. While the use of the RAGAs framework is appropriate for assessing LLM-based systems, the evaluation lacks depth—there is no baseline comparison (e.g., LLM-only vs. hybrid KG+LLM), or error analysis to assess the impact of individual system components. Critically, no human-in-the-loop validation is conducted, meaning the recommendations are not tested for real-world usefulness by domain experts or potential users of EU funding systems. This is especially concerning for a system intended to support high-stakes decisions like research collaboration. The evaluation could be substantially improved by incorporating larger datasets, gathering qualitative feedback from users, and including comparisons with existing tools or baseline approaches.
* The authors mention that project participants are sometimes listed as semicolon-separated strings, but fail to explain whether persistent identifiers (e.g., ORCID, organization IDs) were available and how inconsistencies were resolved. This is critical for data normalization and entity linking—especially in a KG-based system.
* The evaluation focuses on projects related to Building Information Modeling (BIM) without providing a rationale for selecting this specific domain beyond vague relevance. The decision seems arbitrary unless BIM projects are particularly data-rich or representative. Additionally, links to the subprojects mentioned (e.g., BIMERR) are missing (in Datasets and Scenarios section), which hinders verification.
* The paper refers frequently to AgenticRAG but never explains what this term means for readers unfamiliar with it. There is also no reference for AgenticRAG and GraphRAG. There is no explanation of how AgenticRAG differs from traditional RAG or why it’s essential to the current system.
* Although the system uses agentic patterns, the internal decision-making processes, fallback mechanisms, and coordination between agents are described at a high level, leaving some ambiguity in understanding how the agents handle complex or ambiguous queries.
* Similarly, the use of EURIO as both a KG and an ontology is inconsistent. It’s unclear in several sections whether the authors are referring to the data structure (KG) or the conceptual schema (ontology). The first mention of EURIO lacks a direct link to the ontology, and the link later provided (https://op.europa.eu/en/) leads to the Publications Office of the European Union, not the ontology itself. The EURIO ontology is developed externally, and thus its mention in the Results section is inappropriate—it belongs in the Background or Methods section.
* The proposed architecture (Fig. 2) lacks clarity and flow. Figure 2 is also not directly explained in the text.
* Although the paper states that the system uses SPARQL queries for retrieving KG data, no examples of queries or prompt generation templates are provided. There’s also no discussion of the accuracy, coverage, or limitations of these SPARQL queries, which are central to the proposed architecture.
* In the Technical Implementation, the authors mention the use of LlamaIndex and LangChain but do not justify why these frameworks were chosen over alternatives. There is also no discussion of the performance tradeoffs, limitations, or alternatives explored during implementation.
* The paper claims that a chatbot-style interface provides a user-friendly experience but does not demonstrate how it differs from general AI prompting tools. The claimed benefits of the UI are unsubstantiated without user feedback or comparative analysis.
* The future work section is limited and generic, offering little insight into concrete next steps for advancing the system. The authors do not outline any plans for improving core components, such as refining agent behavior, enhancing SPARQL query accuracy, addressing bias, or validating recommendations with real users.

Data and Code Availability:
The code is share in GitHub via https://github.com/Piermuz7/MasterThesisProject.git. The repository includes a README file, which provides a high-level overview of the project, including its purpose, structure, and how to set up the environment. The presence of installation instructions, environment configuration (via requirements.txt), and usage guidance reflects good organization.
The code and resources are hosted on GitHub but are not archived in a long-term repository like Zenodo and therefore lack a persistent identifier (e.g., DOI) for stable citation and reproducibility.

Suggestions for Improvement:
While the paper addresses a relevant problem and proposes a potentially promising hybrid AI framework, several areas require further development to realize its full contribution.
* Consider strengthening the literature review by grounding it more firmly in existing research and providing a clearer comparative synthesis.
* It would be helpful to clarify and justify the methodological choices, particularly regarding dataset selection, framework adoption, and requirement definition.
* The system architecture and results could benefit from further elaboration. Including more detailed descriptions of key components, design decisions, and technical mechanisms would greatly aid reader understanding and highlight the novelty of the proposed framework.
* The evaluation could be significantly strengthened by larger datasets, qualitative user feedback, and comparisons to existing tools or approaches.
* Ensure that citations and references are added appropriately throughout the paper, particularly where prior work, frameworks, or datasets are mentioned.
* Bias is completely unacknowledged, despite the high-stakes nature of the task. The paper would benefit significantly from a dedicated discussion on:
* The sources and types of bias in the dataset and model outputs.
* Strategies to detect, monitor, or mitigate those biases.
* Consideration of how the system can promote inclusivity and fairness, particularly for underrepresented researchers.

Minor comments:
* Add direct links to all referenced ontologies (e.g., ORKG, VIVO, the European Science Vocabulary, DC, DCAT, DINGO, FaBiO, FRAPO, etc.) to improve accessibility.
* Provide URLs or project pages for the EU-funded projects mentioned (e.g., BIMERR and the other Horizon 2020 projects) to allow readers to verify and explore the data sources. The five selected projects are: “BIM-based holistic tools for Energy-driven Renovation of existing Residences”, “Integrated and Replicable Solutions
for Co-Creation in Sustainable Cities”, “New integrated methodology and Tools for Retrofit design towards a next generation of ENergy efficient and sustainable buildings
and Districts”, “Proactive synergy of inteGrated Efficient Technologies on buildings’ Envelopes”, and “Adaptive Multimodal Interfaces to Assist Disabled People in Daily Activities”.
* Include citations for tools and frameworks such as Chroma, LlamaIndex, LangChain, LLMs, which are currently used without proper attribution.

Review #2
Anonymous submitted on 10/Sep/2025
Suggestion:
Reject
Review Comment:

This paper presents an approach for recommending research collaborators based on user input. The approach builds on the EURIO knowledge graph and LLMs, leveraging GraphRAG and AgenticRAG in combination. It provides two main functionalities: recommending potential collaborators and consortiums. The evaluation focuses on these two tasks using two datasets, each consisting of 10 user queries and 10 corresponding ground-truth answers. The authors also provide source code via GitHub, which enables reproducibility and helps readers better understand the approach.

The proposed approach is promising and relevant. However, the paper currently reads more like an LLM engineering prototype than a Semantic Web research contribution. Stronger semantic grounding, baselines, and more thorough evaluation are needed for a journal-level paper.

Major weaknesses:

- Literature review: The review is not well-motivated and lacks depth. It seems intended to identify design requirements (LR1–4) within a Design Science research methodology. However, the methodology itself is never clearly explained, and the role of the literature review in deriving the requirements remains vague.
- Architecture presentation: Figure 2 does not clearly represent the system architecture. For example, the generation and evaluation of SPARQL queries over the KG are not well illustrated.
- Evaluation: The evaluation is very limited, relying on only a small number of queries and ground-truth answers.
- Result discussion: The discussion mainly compares different LLMs, rather than positioning the approach against traditional recommender systems or other baselines.
- Research question: The paper does not provide a clear or convincing answer to its central research question.

Some Detailed comments:

- In the Introduction, the sentence “LLMs and KGs can complement each other, providing contextual, explainable, and knowledge-grounded recommendations that reduce hallucinations and improve accuracy” appears to cite an incorrect reference.
- In the literature review, the Open Research Knowledge Graph is not cited.
- It is unclear why EURIO, EuroSciVoc, DINGO, and FRAPO are not discussed in the literature review.
- The word “selected” in the sentence “The following Literature Requirements (LRs) were selected from the relevant works in the literature” is misleading; a different term would be more appropriate.
- LR4 is phrased more as a problem statement than a requirement; it should be reformulated.
- On page 4, the purpose of listing application requirements is not explained.
- A schema diagram of EURIO would greatly help readers.
- On page 5, the authors use both e.g. and etc. in the same sentence, which is redundant.
- Figure 2 would be clearer if annotations were added to some of the arrows.

Review #3
By Ben De Meester submitted on 06/Oct/2025
Suggestion:
Reject
Review Comment:

I do not believe this paper is fit for publication. Contributions are unclear, descriptions are inconsistent in term of detail and clarity,
evaluation does not seem to indicate good results, and discussion text gives no good insights in why that's the case.
Set-up choices are not well documented (why CORDIS was not included, why which relations were deemed more important, why certain LLM configurations were used, etc),
so hard to identify how they should be interpreted.

Below, I give further review details, feel free to contact me for follow-up questions (ben.demeester@ugent.be).

research contributions which include
(1) originality: it's a fast-moving area, so I could give _some_ slack for the originality of this work, however, all set-up and evaluation are described very high-level, so it is hard to understand what are the academic contributions made by the author: evaluations only talk about the overall quality of the system (which, as far as I can tell, isn't great).
(2) significance of the results: the results do not seem to indicate any impact (and no evaluation on the actual recommendations is given: yes a ground truth is provied, but it is nowhere evaluated whether the results of the system have an actual impact on the end-users)
(3) quality of writing: it's a bit all over the place, some parts are very detailed whilst eventually not deemed very impactful on the work (e.g., EURIO ontology, CORDIS).

Long-term stable URL for resources: https://github.com/Piermuz7/MasterThesisProject: the URL does not give a good impression with repesct to longevity of the solution: an indivual's master thesis, where the individual is no longer working for that university, and apparently the work is not forked by that university: how 'stable' can I expect this resource to be? The documentation is however very clear

### Detailed review

#### Abstract

- "have shown limited effectiveness": citation needed (also not given in the introduction)

#### Introduction

- "While LLMs can better capture user preferences" --> citation needed
- "high-stakes tasks like research collaborator recommendations" --> why is this high stakes? No-one dies if you get a bad recommendation
- Hybrid AI gets 3 citations? I'd pick the most relevant one.
- You don't specify _why_ you used DSR.
- Also, the "problem awareness phase" is not mentioned in the DSR reference. Where did you get this from?

#### Literature review

- This sections reads weird: you mention all these related works, but it's unclear how relevant they are to your work, and why (not). It's also unclear which LRs orginated from which related work. This currently reads a bit disjoint from the remainder of the paper
- I would appreciate if the actual LRs are mentioned more specifically, e.g. "Regarding the hop reasoning perspective, retrieving relevant
information dynamically, using models such as GNNs, can require high computations on graphs, especially on large graphs (LR4)." --> what is the actual requirement here? being able to handle high computations? How high?

#### Scenarios Analysis

- "Similarly, the report summaries, which include periodic or final publishable summaries. " --> rephrase
- You mention that CORDIS is only available in non-RDF format, so preprocessing is needed, however, at https://cordis.europa.eu/about/services, it is stated that CORDIS data can be queried using a SPARQL endpoint, how do you reconcile that?
- I don't understand why you need APR's, this sounds as needed for application requirements, however, wouldn't EURIO also cover thos? Why (not)?

#### Results
- I would have expected mentioning of the EURIO Knowledge Graph (not only the EURIO ontology)
- It's very unclear that CORDIS is the dataset of FP7 and H2020, and discuss lifting those datasets to the same datamodel als EURIO (which covers Horizon Europe)
- In "Recommendation Strategy" you suddenly only talk about EURIO KG and don't even mention CORDIS anymore
- The EURIO Ontology gets quite a lot of page space, whilst I do not understand how important that description is for the remainder of your paper. Impact of EURIO relations or whatever are nowhere mentioned later on.
- "we developed a mechanism to query and extract the most meaningful and useful relationships for our purpose" --> how did you came to that selection? What metrics were used? How reproducible is this?
- Your requirements coverage is a bit all over the place (and like 80% in the ontology selection part), suggesting that you might not have identified the right requirements for your task: almost all APRs are just by covering a certain ontology, whilst that actual ontology is barely used (only the 'most meaningful and useful relationships' are used, without argumentation)
- Figure 2 is waaaaay to small to read, why not put it over 2 columns?
- It's not clear how 'Project information/Project participants' tools related to the EURIO KG? Do they, or is that a separate databse?
- There is no explanation why these context windows and token limits were chosen: are they random, based on pricing, defaults, ...?

#### Experiments and Discussion

- Your benchmark consists of a ground truth of 10 samples each. That feels very small to signify anything
- "the ground truth responses are not included in this thesis" -> copy-paste error
- "GPT-4o outperforms Claude 3.5 Sonnet in AR, AR, SS and AC, " --> one AR too much I assume
- "As a result of the Agentic Graph RAG approach, neither LLMs hallucinates." --> how do you quantify? as no organisations are invented? with a faithfulness of 32%, I wouldn't really call these approaches reliable either.
- The result numbers are barely discussed: yeah sure GTP is better than Claude, but with a CP of 7%, how useful is this? (it is touchedu upon during conclusion)

#### Conclusion

- I'm afraid I miss an honest description of the usefulness of this system