On General and Biomedical Text-to-Graph Large Language Models

Tracking #: 3642-4856

This paper is currently under review
Lorenzo Bertolini
Roel Hulsman
Sergio Consoli
Antonio Puertas Gallardo
Mario Ceresa

Responsible editor: 
Guest Editors KG Gen from Text 2023

Submission type: 
Full Paper
Knowledge graphs and ontologies represent symbolic and factual information that can offer structured and interpretable knowledge. Extracting and manipulating this type of information is a crucial step in complex processes such as human reasoning. While Large Language Models (LLMs) are known to be useful for extracting and enriching knowledge graphs and ontologies, previous work has largely focused on comparing architecture-specific models (e.g. encoder-decoder only) across benchmarks from similar domains. In this work, we provide a large-scale comparison of the performance of certain LLM features (e.g. model architecture and size) and task learning methods (fine-tuning vs. in-context learning (iCL)) on text-to-graph benchmarks in two domains, namely the general and biomedical ones. Experiments suggest that, in the general domain, small fine-tuned encoder-decoder models and mid-sized decoder-only models used with iCL reach overall comparable performance with high entity and relation recognition and moderate yet encouraging graph completion. Our results further tentatively suggest that, independent of other factors, biomedical knowledge graphs are notably harder to learn and better modelled by small fine-tuned encoder-decoder architectures. Pertaining to iCL, we analyse hallucinating behaviour related to sub-optimal prompt design, suggesting an efficient alternative to prompt engineering and prompt tuning for tasks with structured model output.
Full PDF Version: 
Under Review