Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Michael Cochez
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Angelo Salatino
Christoph Schlieder
Stefan Schlobach
Cogan Shimizu
Blerina Spahiu
GQ Zhang
Rui Zhu

Former/Founding Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Polyglot Persistence with Large Language Models

Submitted by J. de Curtò on 12/29/2025 - 02:35

Tracking #: 3999-5213

This paper is currently under review

Authors:

J. de Curtò

dezarza

Carlos T. Calafate

Responsible editor:

Guest Editors ML and KR 2025

Submission type:

Full Paper

Abstract:

Modern data-intensive applications demand intelligent systems capable of managing heterogeneous, highly interconnected data across multiple specialized storage backends. Yet accessing such systems typically requires expertise in multiple query languages, e.g. SQL, Cypher, MongoDB syntax, limiting accessibility for non-technical users. This paper presents a comprehensive architecture that integrates polyglot persistence, combining document stores, graph databases, key-value caches, and relational data warehouses, with Large Language Models (LLMs) to provide natural language query interfaces. Our implementation compares two Google Gemini model variants,Gemini 3 Pro and Gemini 3 Flash, for translating natural language queries into structured operations across PostgreSQL data warehouses, MongoDB document stores, Neo4j graph databases, and Redis caches. Experimental evaluation across 39 queries spanning six categories reveals a clear accuracy-latency trade-off: Gemini 3 Pro achieves 82.1% fully correct translations with average latency of 12.9-26.5 seconds, while Gemini 3 Flash achieves 76.9% accuracy but with significantly reduced latency of 9.0-11.5 seconds (approximately 1.76x faster). Both models achieve 100% combined accuracy (correct plus partial) with zero incorrect translations. Cross-domain validation comparing Traffic/BI (warehouse-centric) with Social Network (graph-centric) applications demonstrates that translation accuracy improves from 60.0% to 82.1% when moving to structured dimensional schemas, while the architecture adapts effectively across fundamentally different workload patterns. Performance analysis reveals that LLM translation time dominates overall latency (>99%), while database execution remains negligible (<55ms), highlighting opportunities for optimization through caching and prompt engineering. This work contributes a generalizable framework for LLM-powered polyglot persistence systems, comprehensive evaluation methodology for natural language database interfaces, and empirical insights into model selection and domain adaptation trade-offs.

Full PDF Version:

swj3999.pdf

Tags:

Under Review

Long-term Stable Link to Resources:

https://github.com/drdecurto/polyglot-persistence-llm/

Log in or register to post comments
260 reads

Main menu

Editorial Board

Syndicate

Polyglot Persistence with Large Language Models

Tracking #: 3999-5213

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Polyglot Persistence with Large Language Models

Tracking #: 3999-5213

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles