Abstract:
The verbalisation of structured data is a beneficial process for several applications. In the context of knowledge graphs (KGs), transforming RDF triples into natural language facilitates tasks such as KG documentation or alternative exploration methods for different user needs. While significant progress has been made on the English verbalisation of KGs, Spanish remains an under-represented language for this task due to the lack of suitable resources. This hinders developing and evaluating models capable of generating high-quality Spanish verbalisations. To tackle this problem, we create a Spanish adaptation of the WebNLG dataset, a benchmark consisting of over 45.000 verbalisations paired with DBpedia triple sets. To our knowledge, this is the first formal attempt to provide such a dataset in Spanish, which not only serves for data verbalisation but can also potentially support the automated generation of RDF triples from text. We leverage this dataset to conduct a comprehensive evaluation of resource-efficient models for the Spanish triple-to-text task employing two different learning approaches: context learning (zero-shot, one-shot, and few-shot settings) and supervised learning through partial fine-tuning. Our results highlight the challenges of generating fluent and accurate Spanish text and demonstrate that partial fine-tuning of the evaluated models significantly improves performance.