Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Knowledge Engineering with Large Language Models: A Capability Assessment in Ontology Evaluation

Submitted by Stefani Tsaneva on 04/30/2025 - 05:46

Tracking #: 3852-5066

This paper is currently under review

Authors:

Stefani Tsaneva

Guntur Budi Herwanto

Majlinda Llugiqi

Marta Sabou

Responsible editor:

Guest Editors 2025 LLM GenAI KGs

Submission type:

Full Paper

Abstract:

Advancements in large language models (LLMs) offer opportunities for automating challenging and time-intensive Knowledge Engineering (KE) tasks. Constructing an ontology is a complex process, particularly when logical restrictions are modeled or when the development is performed by novice knowledge engineers or domain experts with limited training in KE. Consequently, developed ontologies often contain modeling errors, undermining the success of ontology-based applications and hindering subsequent KE tasks. Thus, it is important to investigate how LLMs can support KE tasks such as the evaluation of ontologies, involving the detection and correction of errors in knowledge-based resources. However, challenges remain in systematically evaluating LLM performance and comparing different models in terms of their capabilities to perform concrete KE tasks. Moreover, there is a lack of comprehensive, task-specific benchmarks needed for such LLM capability assessments. As a result, selecting the right LLM to effectively support knowledge engineers presents a nontrivial problem. To fill these gaps, this study investigates how and to what extent LLMs can support four concrete ontology evaluation sub-tasks: the detection, classification, explanation, and possible correction of modeling issues in ontologies, focusing on the use of existential, universal, and cardinality constraints. To this end, we construct a benchmark dataset based on student-built ontologies and perform experimental assessments of the performance of four LLMs--GPT-4o, Claude Sonnet, DeepSeek V3, and Llama 3.3-- on these four KE sub-tasks. Additionally, we exemplify the definition of an annotation framework for the qualitative evaluation of LLM outputs and perform a comparative analysis of each model's capabilities. Our findings reveal notable differences in model behavior and task-specific strengths, underscoring the importance of selecting the most appropriate model to support concrete KE tasks.

Full PDF Version:

swj3852.pdf

Tags:

Under Review

Log in or register to post comments
294 reads

Main menu

Editorial Board

Syndicate

Knowledge Engineering with Large Language Models: A Capability Assessment in Ontology Evaluation

Tracking #: 3852-5066

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Knowledge Engineering with Large Language Models: A Capability Assessment in Ontology Evaluation

Tracking #: 3852-5066

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles