LLM4Schema.org: Generating Schema.org Markups with Large Language Models

Tracking #: 3793-5007

Authors: 
Minh-Hoang Dang
Thi Hoang Thi Pham
Pascal Molli
Hala Skaf-Molli
Alban Gaignard

Responsible editor: 
Guest Editors KG Construction 2024

Submission type: 
Full Paper
Abstract: 
Integrating Schema.org markup into web pages has resulted in the generation of billions of RDF triples. However, around 75% of web pages still lack this critical markup. Large Language Models (LLMs) present a promising solution by automatically generating the missing Schema.org markup. Despite this potential, there is currently no benchmark to evaluate the markup quality produced by LLMs. This paper introduces LLM4Schema.org, an innovative approach for assessing the performance of LLMs in generating Schema.org markup. Unlike traditional methods, LLM4Schema.org does not require a predefined ground truth. Instead, it compares the quality of LLM-generated markup against human-generated markup. Our findings reveal that 40–50% of the markup produced by GPT-3.5 and GPT-4 is invalid, non-factual, or non-compliant with the Schema.org ontology. These errors underscore the limitations of LLMs in adhering strictly to structured ontologies like Schema.org without additional filtering and validation mechanisms. We demonstrate that specialized LLM-powered agents can effectively identify and eliminate these errors. After applying such filtering for both human and LLM-generated markup, GPT-4 shows notable improvements in quality and outperforms humans. LLM4Schema.org highlights both the potential and challenges of leveraging LLMs for semantic annotations, emphasizing the critical role of careful curation and validation in achieving reliable results.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 09/Feb/2025
Suggestion:
Accept
Review Comment:

I appreciate the authors' efforts in addressing my previous comments and improving the manuscript. The revisions significantly enhance the clarity and depth of the paper, particularly the discussion on alternative Schema.org formats, the explanation of the prompting strategy, and the additional details on chunking and evaluation. The revised captions and conclusion also strengthen the overall readability and impact of the work.

While the improvements are substantial, a brief mention of why an OpenAI model comparison was not pursued could add further clarity. Additionally, a short summary of the prompting strategy directly in the methodology section would improve readability. These are minor points and do not affect my overall positive impression of the paper.

Regarding my previous concerns, I believe the manuscript is now suitable for acceptance.

Review #2
Anonymous submitted on 20/Feb/2025
Suggestion:
Minor Revision
Review Comment:

I want to thank the authors for their efforts in addressing the feedback from the first round of reviews. The original submission has evidently been refined, with notable improvements.

A shortfall in the resubmission is the absence of a detailed critique of SHACL limitations, despite Reviewer #1’s explicit call for discussion of validation tool constraints and Reviewer #4’s query about SHACL shape validation. Section 3.3 describes SHACL’s role and generation process, stating it is “semantically imperfect and not formally strict” to align with Schema.org’s design, but stops short of analyzing its practical limitations.

The authors have added a short paragraph discussing the generalization issue. However, this paragraph raises more questions. What is well documented? How many positive examples are sufficient? It sounds like the approach is applicable to other ontologies without adaptions. Is that really the case? From my understanding generalizing LLM4Schema.org to other ontologies would require decoupling its components from Schema.org-specific assumptions.

A final round of revisions should secure acceptance.

Review #3
Anonymous submitted on 23/Mar/2025
Suggestion:
Minor Revision
Review Comment:

The authors have addressed almost all issues successfully; however, one important topic has not been addressed. The experiment to evaluate the accuracy of the MeMR in section 4.5 is carried out over only 18 pages. An experiment with at least 30 pages should be feasible.

Minor

Page 2 "In this paper, we propose a novel approach." --> This sentence could be expanded by adding a brief description of the approach.

There are sentences after paragraphs that should be part of the paragraphs. For example, on page 4 "In this paper, we focus on the JSON-LD format." could be added to the previous paragraph.