Review Comment:
Summary:
This paper proposes a hybrid AI framework combining Knowledge Graphs and Large Language Models using a GraphRAG and AgenticRAG approach to recommend research collaborators and consortia for EU-funded projects. It leverages the EURIO ontology to semantically model data from the CORDIS dataset, enabling explainable and contextualized recommendations based on project descriptions and objectives. The system architecture includes retrieval, augmentation, and generation components, coordinated through intelligent agents that generate SPARQL queries and semantic embeddings. Evaluation using the RAGAs framework shows that GPT-4o outperforms Claude 3.5 Sonnet in generating relevant, accurate, and semantically aligned recommendations.
While the proposed concept is relevant and timely, especially for research collaboration in large funding schemes, the manuscript suffers from several conceptual, structural, and technical gaps that need to be addressed for clarity, reproducibility, and impact.
Strengths:
* The integration of KGs with Retrieval-Augmented Generation (RAG) and Agentic workflows demonstrates a novel and thoughtful combination of symbolic and neural AI techniques for research collaboration.
* Leveraging the EURIO ontology and CORDIS datasets provides strong domain relevance and grounding in real EU-funded research projects, enhancing the system’s practical applicability.
* The code and datasets are available via GitHub to support reproducibility and community engagement.
Weaknesses:
* The literature review presents several state-of-the-art recommender systems and KGs, but it does not clearly connect these works to the proposed system, clarify how it advances the field and justify its novelty. Many referenced works, such as ORKG and VIVO ontology, are mentioned without proper citations or explanation of their relevance to the proposed approach. Furthermore, although the authors introduce Literature Requirements (LR1–LR4) and Application Requirements (APR1–APR7), these are not clearly defined or grounded in prior literature, and there is no comparative mapping showing how these requirements are (or are not) addressed by previous works.
* A critical omission in the paper is the lack of discussion or mitigation of bias. This is a serious concern given the paper’s goal of recommending collaborators and research consortia—tasks that can directly influence funding opportunities and academic visibility. The system is built upon the EURIO Knowledge Graph, which in turn draws from the CORDIS dataset—a database that primarily includes organizations and participants already funded under FP7 and Horizon 2020. This introduces a structural bias: the recommender system can only suggest collaborators who are already embedded in past EU-funded projects. As a result, early-career researchers, new organizations, researchers from underrepresented regions, or those who have not previously participated in funded consortia are invisible to the system. This kind of historical bias reinforces existing power structures and excludes new entrants, directly contradicting the EU’s stated goals of widening participation in research funding. Without strategies to incorporate external knowledge sources (e.g., Wikidata, ORCID, etc.) or mechanisms to promote fair exposure, the recommendations may amplify inequality in research access. Moreover, no attempt is made to evaluate the demographic or institutional diversity of recommended collaborators or consortia, nor is there any bias auditing or fairness metric applied during evaluation. The system's dependence on LLMs also introduces a secondary layer of bias from the language models themselves, which is similarly unaddressed.
* The results section provides a description of the system architecture and its components but lacks clear separation between methodological details and actual experimental outcomes. Much of the content labeled as “results” (e.g., the use of the EURIO ontology, the agent workflow, and system design) is more appropriate for the methods section, as these represent design decisions rather than evaluation findings.
* The evaluation in the paper is minimal and insufficient to support the claims of effectiveness and reliability of the proposed system. The authors rely on a dataset of only 10 queries, which is too small to yield statistically significant or generalizable results. While the use of the RAGAs framework is appropriate for assessing LLM-based systems, the evaluation lacks depth—there is no baseline comparison (e.g., LLM-only vs. hybrid KG+LLM), or error analysis to assess the impact of individual system components. Critically, no human-in-the-loop validation is conducted, meaning the recommendations are not tested for real-world usefulness by domain experts or potential users of EU funding systems. This is especially concerning for a system intended to support high-stakes decisions like research collaboration. The evaluation could be substantially improved by incorporating larger datasets, gathering qualitative feedback from users, and including comparisons with existing tools or baseline approaches.
* The authors mention that project participants are sometimes listed as semicolon-separated strings, but fail to explain whether persistent identifiers (e.g., ORCID, organization IDs) were available and how inconsistencies were resolved. This is critical for data normalization and entity linking—especially in a KG-based system.
* The evaluation focuses on projects related to Building Information Modeling (BIM) without providing a rationale for selecting this specific domain beyond vague relevance. The decision seems arbitrary unless BIM projects are particularly data-rich or representative. Additionally, links to the subprojects mentioned (e.g., BIMERR) are missing (in Datasets and Scenarios section), which hinders verification.
* The paper refers frequently to AgenticRAG but never explains what this term means for readers unfamiliar with it. There is also no reference for AgenticRAG and GraphRAG. There is no explanation of how AgenticRAG differs from traditional RAG or why it’s essential to the current system.
* Although the system uses agentic patterns, the internal decision-making processes, fallback mechanisms, and coordination between agents are described at a high level, leaving some ambiguity in understanding how the agents handle complex or ambiguous queries.
* Similarly, the use of EURIO as both a KG and an ontology is inconsistent. It’s unclear in several sections whether the authors are referring to the data structure (KG) or the conceptual schema (ontology). The first mention of EURIO lacks a direct link to the ontology, and the link later provided (https://op.europa.eu/en/) leads to the Publications Office of the European Union, not the ontology itself. The EURIO ontology is developed externally, and thus its mention in the Results section is inappropriate—it belongs in the Background or Methods section.
* The proposed architecture (Fig. 2) lacks clarity and flow. Figure 2 is also not directly explained in the text.
* Although the paper states that the system uses SPARQL queries for retrieving KG data, no examples of queries or prompt generation templates are provided. There’s also no discussion of the accuracy, coverage, or limitations of these SPARQL queries, which are central to the proposed architecture.
* In the Technical Implementation, the authors mention the use of LlamaIndex and LangChain but do not justify why these frameworks were chosen over alternatives. There is also no discussion of the performance tradeoffs, limitations, or alternatives explored during implementation.
* The paper claims that a chatbot-style interface provides a user-friendly experience but does not demonstrate how it differs from general AI prompting tools. The claimed benefits of the UI are unsubstantiated without user feedback or comparative analysis.
* The future work section is limited and generic, offering little insight into concrete next steps for advancing the system. The authors do not outline any plans for improving core components, such as refining agent behavior, enhancing SPARQL query accuracy, addressing bias, or validating recommendations with real users.
Data and Code Availability:
The code is share in GitHub via https://github.com/Piermuz7/MasterThesisProject.git. The repository includes a README file, which provides a high-level overview of the project, including its purpose, structure, and how to set up the environment. The presence of installation instructions, environment configuration (via requirements.txt), and usage guidance reflects good organization.
The code and resources are hosted on GitHub but are not archived in a long-term repository like Zenodo and therefore lack a persistent identifier (e.g., DOI) for stable citation and reproducibility.
Suggestions for Improvement:
While the paper addresses a relevant problem and proposes a potentially promising hybrid AI framework, several areas require further development to realize its full contribution.
* Consider strengthening the literature review by grounding it more firmly in existing research and providing a clearer comparative synthesis.
* It would be helpful to clarify and justify the methodological choices, particularly regarding dataset selection, framework adoption, and requirement definition.
* The system architecture and results could benefit from further elaboration. Including more detailed descriptions of key components, design decisions, and technical mechanisms would greatly aid reader understanding and highlight the novelty of the proposed framework.
* The evaluation could be significantly strengthened by larger datasets, qualitative user feedback, and comparisons to existing tools or approaches.
* Ensure that citations and references are added appropriately throughout the paper, particularly where prior work, frameworks, or datasets are mentioned.
* Bias is completely unacknowledged, despite the high-stakes nature of the task. The paper would benefit significantly from a dedicated discussion on:
* The sources and types of bias in the dataset and model outputs.
* Strategies to detect, monitor, or mitigate those biases.
* Consideration of how the system can promote inclusivity and fairness, particularly for underrepresented researchers.
Minor comments:
* Add direct links to all referenced ontologies (e.g., ORKG, VIVO, the European Science Vocabulary, DC, DCAT, DINGO, FaBiO, FRAPO, etc.) to improve accessibility.
* Provide URLs or project pages for the EU-funded projects mentioned (e.g., BIMERR and the other Horizon 2020 projects) to allow readers to verify and explore the data sources. The five selected projects are: “BIM-based holistic tools for Energy-driven Renovation of existing Residences”, “Integrated and Replicable Solutions
for Co-Creation in Sustainable Cities”, “New integrated methodology and Tools for Retrofit design towards a next generation of ENergy efficient and sustainable buildings
and Districts”, “Proactive synergy of inteGrated Efficient Technologies on buildings’ Envelopes”, and “Adaptive Multimodal Interfaces to Assist Disabled People in Daily Activities”.
* Include citations for tools and frameworks such as Chroma, LlamaIndex, LangChain, LLMs, which are currently used without proper attribution.
|