Schema-Miner Pro: Agentic AI for Ontology Grounding over LLM-Discovered Scientific Schemas in a Human-in-the-Loop Workflow

Tracking #: 3953-5167

Authors: 
Sameer Sadruddin
Jennifer D'Souza
Eleni Poupaki
Alex Watkins
Bora Karasulu
Sören Auer1
Adrie Mackus
Erwin Kessels

Responsible editor: 
Guest Editors 2025 LLM GenAI KGs

Submission type: 
Full Paper
Abstract: 
Scientific processes are often described in free text, making it difficult to represent and reason over them computationally. We present schema-miner pro, a human-in-the-loop framework that automatically extracts and grounds structured schemas from scientific literature. Our approach combines large language models for schema extraction with an agent-based system that aligns extracted elements to external ontologies through interpretable, multi-step reasoning. The agent leverages lexical heuristics, semantic similarity, and expert feedback to ensure accurate grounding. We demonstrate the framework on two semiconductor manufacturing workflows—Atomic Layer Deposition (ALD) and Atomic Layer Etching (ALE)—mapping process parameters and outputs to the QUDT (Quantities, Units, Dimensions, and Types) ontology. By producing ontology-aligned, semantically precise schemas, schema-miner pro$ lays the groundwork for machine-actionable scientific knowledge and automated reasoning across disciplines.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Andrea Mannocci submitted on 27/Oct/2025
Suggestion:
Accept
Review Comment:

Having reviewed the paper in its previous iteration, I reckon that the manuscript has remarkably improved, as all comments provided by the reviewers have been carefully addressed by the authors.
The flow is much better now and many details have been polished out.
The SW repository on GitHub is now well-organised, and therefore it will be easier for the community to reuse the framework and its components.
Therefore, I do not have any reservations in accepting the paper for publication in SWJ.
Well done.

Review #2
By Antonello Meloni submitted on 02/Nov/2025
Suggestion:
Minor Revision
Review Comment:

The authors have addressed most of the previous review points thoroughly. The manuscript is now much clearer, with improved repository organization, workflow explanation, consistent terminology, and better figure/table presentation. The documentation and tutorials make the framework accessible and usable.

I remain concerned about the claims regarding domain-agnostic generalization of SCHEMA-MINERpro. The manuscript provides a detailed conceptual discussion and illustrative examples from biomedical, chemical, and engineering domains, but these examples involve relatively structured texts and do not provide empirical evidence of successful application to heterogeneous or less-structured scientific documents. I recommend that these sections be explicitly framed as potential extensions or prospective applications, rather than as demonstrated generalization. A brief qualitative example or small-scale test from a less-structured domain would strengthen the argument if feasible.

Originality:
The multi-agent, human-in-the-loop ontology grounding framework remains a novel and practically relevant extension of prior schema extraction pipelines.

Significance of Results:
The empirical results in the ALD/ALE domains are solid and clearly demonstrate the workflow’s effectiveness. Claims of domain-agnostic generalization should be presented as potential rather than proven applicability.

Quality of Writing:
The manuscript is well-written, clearly structured, and easy to follow. Figures and tables have been improved and terminology standardized.

Data and Resources:
The GitHub repository and documentation are comprehensive, well-organized, and accessible.

Review #3
By Angelo Salatino submitted on 25/Nov/2025
Suggestion:
Accept
Review Comment:

The authors have positively taken my feedback and incorporated suggestions in the new verison of the manuscript.