Abstract:
Large Language Models (LLMs) have demonstrated remarkable capabilities in extracting knowledge and generating new content from a wide range of resources, particularly text-based ones. Beyond unstructured data, LLMs also show strong performance on structured yet semantically rich resources such as ontologies, schemas, and knowledge graphs. However, the direct utilization of large-scale semantic artifacts as input to LLMs is constrained by prompt size and token limits. The state-of-the-art solution to this challenge is the use of Retrieval-Augmented Generation (RAG) systems.
In this work, we propose IndustrialGraphRAG, a novel graph-based RAG approach specifically designed for large semantic artifacts. Our method integrates LLM-based Named Entity Recognition (NER) and Entity Linking (EL), forming a unified pipeline tailored for semantically complex resources. Within this framework, we implement three use cases that combine LLM reasoning with our RAG system: (i) semantic artifact validation, (ii) information retrieval, and (iii) information model generation. The first two tasks convert natural language queries (NLQs) into executable SPARQL queries, whereas the third populates semantic artifacts based on NLQ-driven instructions. Across all use cases, the system demonstrates strong performance, confirming the effectiveness of the approach. Comparative experiments against two additional RAG baselines further show superior performance in both accuracy and contextual reasoning.
OPC UA serves as our primary data resource due to its breadth and semantic richness. To demonstrate generalizability, we additionally evaluate the system on the large-scale SAREF ontology, a structurally and semantically distinct artifact. Consistent performance across both resources indicates that the proposed system is not domain-specific and can be reliably applied to diverse semantic datasets.