Abstract:
Current Large Language Models (LLMs) can work with structured
information and even assist developing program code, but can they
support working with Knowledge Graphs (KGs) as well? Which LLM
is offering the best capabilities in the field of Semantic Web and
Knowledge Graph Engineering (KGE)? Is it possible to determine
this without manually checking many answers? The LLM-KG-Bench
framework is designed to answer these questions. It consists of an
extensible set of tasks for which the LLM answers are automatically
evaluated, and covers different aspects of working with semantic
technologies.
This article gives a description of the LLM-KG-Bench framework, its
main concepts, and the tasks implemented. In a benchmark run, a
comprehensive dataset has been generated with it, evaluating more
than 40 contemporary open and proprietary LLMs with 26 benchmark
tasks, resulting in interaction logs and evaluations of roughly 45 000
LLM task dialogues. Finally, this dataset is used for an analysis of the
SPARQL-related capabilities of the LLMs tested.