Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

LLM4Schema.org: Generating Schema.org Markups with Large Language Models

Submitted by Hala Skaf-Molli on 05/01/2025 - 06:41

Tracking #: 3867-5081

This paper is currently under review

Authors:

Minh-Hoang Dang

Thi Hoang Thi Pham

Pascal Molli

Hala Skaf-Molli

Alban Gaignard

Responsible editor:

Guest Editors KG Construction 2024

Submission type:

Full Paper

Abstract:

Integrating Schema.org markup into web pages has resulted in the generation of billions of RDF triples. However, around 75% of web pages still lack this critical markup. Large Language Models (LLMs) present a promising solution by automatically generating the missing Schema.org markup. Despite this potential, there is currently no benchmark to evaluate the markup quality produced by LLMs. This paper introduces LLM4Schema.org, an innovative approach for assessing the performance of LLMs in generating Schema.org markup. Unlike traditional methods, LLM4Schema.org does not require a predefined ground truth. Instead, it compares the quality of LLM-generated markup against human-generated markup. Our findings reveal that 40–50% of the markup produced by GPT-3.5 and GPT-4 is invalid, non-factual, or non-compliant with the Schema.org ontology. These errors underscore the limitations of LLMs in adhering strictly to structured ontologies like Schema.org without additional filtering and validation mechanisms. We demonstrate that specialized LLM-powered agents can effectively identify and eliminate these errors. After applying such filtering for both human and LLM-generated markup, GPT-4 shows notable improvements in quality and outperforms humans. LLM4Schema.org highlights both the potential and the challenges of leveraging LLMs for semantic annotations, emphasizing the critical role of careful curation and validation to achieve reliable results.

Full PDF Version:

swj3867.pdf

Previous Version:

LLM4Schema.org: Generating Schema.org Markups with Large Language Models

Tags:

Under Review

Long-term Stable Link to Resources:

https://github.com/GDD-Nantes/LLM4SchemaOrg

Log in or register to post comments
209 reads

Main menu

Editorial Board

Syndicate

LLM4Schema.org: Generating Schema.org Markups with Large Language Models

Tracking #: 3867-5081

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

LLM4Schema.org: Generating Schema.org Markups with Large Language Models

Tracking #: 3867-5081

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles