A Knowledge Graph of Medieval and Renaissance Geographical Works

Tracking #: 3876-5090

Authors: 
Valentina Bartalesi
Nicolò Pratelli
Emanuele Lenzi

Responsible editor: 
Guest Editors 2025 OD+CH

Submission type: 
Full Paper
Abstract: 
Geographical works from the Middle Ages and Renaissance offer crucial insights into the cultural and intellectual landscapes of their time. However, digital scholarship in this domain remains fragmented, with key historical sources scattered across various physical and digital repositories. The Index Medii Aevi Geographiae Operum (IMAGO) project, conducted from 2020 to 2024, addresses this gap by building a semantically enriched, interoperable knowledge base focused on Latin geographical literature from the 6th to the 15th centuries. By combining expertise in medieval studies, philology, and Digital Humanities, IMAGO employs Semantic Web technologies and a dedicated ontology extending CIDOC CRM, FRBRoo, and LRMoo. The project facilitates data integration and reuse through Linked Open Data (LOD) principles, enhancing the discoverability and interoperability of cultural heritage data. A key goal of the project was to analyse the collected data. Six main knowledge extraction targets were defined, and corresponding SPARQL queries were developed to retrieve relevant information from the knowledge graph. These queries showcase the IMAGO infrastructure’s potential for data retrieval and deeper scholarly analysis. Finally, a user-friendly web application further enables access to the knowledge base via interactive maps, dynamic tables, and exportable formats.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 28/Aug/2025
Suggestion:
Major Revision
Review Comment:

This paper presents original research which is of considerable significance, and is well-written. The relationship to previous Semantic Web work related to medieval and Renaissance manuscripts is well set out, and the new contribution made by the Imago project is clearly explained. No "data file" was provided, however, making it difficult to replicate the work reported in the paper.

I was able to test the public interface to the knowledge graph (at imagoarchive.it) as far as browsing and searching were concerned, and to examine the data as presented on the Web (though not as raw triples).
I attempted to run the six SPARQL queries given in Appendix A but these returned various errors (see below).
I did not attempt to test the annotation tool, the triplifier, or the linking tool, nor to validate the ontology using the Openllet reasoner.

The following amendments are recommended:

* Fix the SPARQL queries in Appendix A: none of the queries works correctly. For Q1, Q2, Q3, and Q5: the hyperlinks in the answer set return “not found” errors. For Q4: the “manifestation” variable returns a “parse error”. For Q6: line 31 returns a “lexical error”.

* Make the software tools available in a consistent location. The “Java-based triplifier” (p. 4) should be documented in the paper. The linking tool and the annotation tool are in two different GitHub repositories (p. 4), neither of which has a README file. A more recent version of the annotation tool is in the /prate91 repository, which does have a README. There should be a single repository with up-to-date copies of all three tools.

* Make the project data available (in a triple format like Turtle) in a standard repository like GitHub or Zenodo, preferably with a copy of the ontology too. The data should include the IRIs derived from Wikidata and Mirabile, as well as those created by the Imago project.

* Provide a table showing the size of the dataset: the knowledge graph contains 142,047 triples (page 4), but how many works, authors, places, libraries, and manuscripts are included?

* Explain whether the vocabulary for places can handle variant versions of names, especially those used in medieval works, e.g., Dubrovnik and Ragusa.

* Comment on the long-term stability of the imagoarchive.it site.

Review #2
Anonymous submitted on 17/Dec/2025
Suggestion:
Accept
Review Comment:

This paper presents the IMAGO project: an ontology-driven knowledge graph for Medieval and Renaissance Latin geographical works, designed to address fragmentation of relevant sources by providing an interoperable, semantically enriched resource grounded in Linked Open Data principles. The authors describe (i) the IMAGO ontology extending well-established cultural-heritage and bibliographic standards (CIDOC CRM, FRBRoo/LRMoo, plus geospatial standards), (ii) a semi-automatic workflow and tooling for ontology population with alignment to external authorities (e.g., Wikidata and domain archives), and (iii) a public KG and web application enabling exploration via interactive maps/tables and reusable exports.

The main strengths are the strong commitment to interoperability through the careful reuse of community standards, a clear end-to-end pipeline from expert data entry to triplification and publication, and a convincing demonstration of scholarly utility via six representative knowledge-extraction targets implemented as SPARQL queries. The paper also reports basic validation and consistency checks (e.g., OWL DL reasoning) and provides concrete examples of queries and outputs, making the contribution replicable and useful for both DH and Semantic Web audiences.

Below are a few observations. It would be useful to briefly mention in the conclusions the long-term maintenance plan for identifiers and mappings (especially where custom IRIs are minted), and to include a short note on how the query set and user interface will evolve (e.g., support for partial/fuzzy search beyond exact matches). These are small additions that do not detract from the overall quality.

The work is solid, well-motivated, standards-aligned, and demonstrably useful; I recommend acceptance.

Review #3
By Miguel Ceriani submitted on 27/Jan/2026
Suggestion:
Major Revision
Review Comment:

The manuscript describes a knowledge graph with information about a corpus of geographical works of the past, specifically from the medieval and renaissance period.

The authors appropriately motivate and describe such resource, created in the context of a dedicated project.
This is my opinion a valuable contribution to the journal, if considered in the "Dataset Descriptions" category (referring to https://www.semantic-web-journal.net/reviewers#types).
The paper was sumbitted in the "Full Paper" category, though.
It must thus be evaluated as a research contribution.
Anyway, in order to reduce the back and forth, I split the description of the main issues below in two sections.
While the first section address the relevance as research contribution, the second one, longer, address aspects that are relevant also if the category changes.

# Issues as a Research Contribution

The main contribution to the field, according to what the authors state, is the creation and pubblication of the KG, which in my view does not amount to a research contribution on itself.
Specifically, a research contribution should clearly state one or more research questions and describe an experiment that can address some aspect of those.
One way to go forward, if the authors want to go with this type of contribution, would be to focus on some aspects of the pipeline/metodology that they consider of general interest and novel.
They should detail why it is the case and evaluate them in multiple use cases.
That means more experimentation and an extensive rewriting of the paper.

# General Issues

## 1. Paper Organisation

The section names "Ontology Population" and "Knowledge Graph Creation" are in my opinion misleading: the first one is about the tool used to generate the instances, not the terminology; the second one decribes technical details of the implementation and validation.
I would suggest having only a "Knowledge Graph Population" section, reorganised (see 4.) and possibly containing dedicated subsections (e.g., "Data Entry Tool", "Implementation", "Validation").

The section name "Data Analysis" is very generic. Maybe it should be called "Evaluation", since that is the purpose of the section.

The GUI proposed to access the information in the KG, mentioned in the abstract, Introduction, Discussion, and Conclusion, is only described in the Discussion.
This is an odd choice, one would expect the description of the GUI to be placed before the evaluation ("Data Analysis" section).

## 2. Ontology Design

The ontology design process is described and appears sound.
Nevertheless, as there many established ontology design methodologies, it would be good to mention if a specific existing one was adopted or what are the reasons for proceeding otherwise.
Furthermore, the authors do not refer to any documentation of intermediate results (minutes/details of the interviews, scenarios, competency questions).

## 3. Ontology/KG Description

### 3.1 Missing Diagrams

The description of the ontology itself is quite minimal and devoid of diagrams.
This is partly justified by the reuse of existing models.
Nevertheless, it would helpful to see visually the main classes and how they are related (focusing on the main relations and the terms that are used when populating the knowledge graph).

### 3.2 Ontology/KG Structure

There is no description of the modular organisation of the ontology, while from the OWL code can be seen that the main ontology module imports other two ontologies:
- , an OWL representation of LRMoo, including also CIDOC CRM;
- a geographical thesaurus.

Similarly, the KG available at the SPARQL endpoint is organised in three named graphs but the manuscript do not mention that nor explain such organisation.
The three named graphs are the following ones:
- , with 127,899 triples;
- , with 6,430 triples;
- , with 7,932 triples.

The total triples checks with the number given in the paper (142,047) so this organisation is presumably not novel.

Based on the name, the third one comes probably from the Mapping Manuscript Migrations (MMM) project.
The manuscript mentions that the MMM dataset *can* be integrated.
Have (a part of) it already been integrated? If yes, how exactly?

### 3.3 Incoherencies between Description and Implementation

Many classes and properties listed respectively in tables 1 and 2 are not defined in the ontology.
Precisely all the terms stated as equivalent to existing ones (in CIDOC CRM or FRBRoo/LRMoo) are actually neither defined in the ontology nor used in the KG.
For those cases, the KG directly adopts the original terms from CIDOC CRM or FRBRoo/LRMoo.
While this choice is totally understandable and welcome for the purpose of favouring interpoperability, the description in the paper is highly misleading.
The authors should just state the extensions they made to the existing ontologies and then describe (again, diagrams would help) how the data have been modeled: i.e., using terms defined by them as well as some terms from existing ontologies.

In addition to what already mentioned, table 2 has two issues:
- property names are not shown, only domain and range;
- the stated equivalencies (which, again, are not actually represented in the ontology) are formally incorrect because they are between properties that have different domains/ranges.

From the Mapping Manuscript Migrations Metadata Schema (http://ldf.fi/schema/mmm/) a single class (Source) and a single property (data_provider_url) are used in the KG, for MMM. This is not documented.

### 3.4 Evolution/Maintainance

The manuscript does not specify if there is any planned method to update the ontology and the KG.
Specifically, while the code of the software tools is published in public repositories (on GitHub), the authors do not mention if there is a repository to track the evolution of the ontology.

### 3.5 Reporting Guidelines

Finally, it would be good to explicitly refer to existing best practices and guidelines for the documentation of ontologies and KGs.
An example, for the ontology, are the MIRO guidelines [1], which are quite detailed and have quite broad adoption.

## 4. Knowledge Graph Population

### 4.1 Description and Motivation of the Adopted Population Process

Before going into the technical details, the process of KG population should be described in more general terms.
It is unclear what the input of the process is and how the experts use the annotation tool.
How the corpus of manuscripts has been selected?
Do the experts analyse each manuscripts and then fill the fields in the application?
Is any pre-existing knowledge (metadata already associated to a manuscript) used?
Is any automatic annotation system adopted, even if just for suggestions? If not, why?

### 4.2 Validation

The second paragraph of "Knowledge Graph Creation" mentions the validation of the KG using a dedicated tool (Openllet).
The authors assert that the KG successfully passed four validation tasks:
(i) logical consistency;
(ii) correspondence between "the class hierarchy" and "the structure defined by the IMAGO ontology";
(ii) data integrity;
(iv) ability to support complex SPARQL queries.

A part from the first one that is straightforward, the other tasks would require a more detailed description.
What is the meaning of task (ii), are not those two (the class hierarchy in the KG and the one in the IMAGO ontology) the same thing?
How data integrity is validated in task (iii)?
What exactly task (iv) does? Does it executes a set of predefined queries? Are there the same shown in the paper? Does it generate queries somehow? How are the results checked?

Specifically, in respect to data integrity, ontological formalisation alone is not of very much use for imposing constraints on a dataset.
For that purpose shape languages, like SHACL, are often employed.
Have you considered implementing shapes-based validation?

Finally, as for other aspects of the design/implementation process (see 4.1) it would be good if the authors share the associated data, in this case the configuration and output of the validation tool.

## 5. Ontology/KG Availability

Both the ontology and the KG are publicly available online. Nevertheless there are a some issues that should be addressed.

## 5.1 Long-Term Persistence of URLs

Both the ontology modules, the SPARQL endpoint, and the individuals of the KG use a project-related namespace (https://imagoarchive.it/).
These kind of URLs are at risk of breaking if the project stops being maintained or there is some organisational change.
Authors should consider using w3id (https://w3id.org/) or similar redirection services to decouple the namespaces adopted for URLs/URIs from the servers currently holding the implementation/data.

## 5.2 Long-Term Persistence of Datasets

The ontology and the KG are only available on the afore mentioned project-related host.
It is highly advisable to upload snapshots of those resources on public repositories like Zenodo or Figshare, especially considering that neither the ontology (see 3.4) nor the KG can be generated if the project host becomes not available.
Furthermore, usage of persistent identifiers for specific versions of the resources allows to track their history and associate the paper with specific versions.

## 5.3 Availability of KG Dumps

Currently there is no way to directly download the full KG dataset as a dump (it can be currently done with few CONSTRUCT queries on the SPARQL endpoint, but that would become trickier if the size increase).
The usage of public repositories to store the dumps, as recommended in 5.2, would address also this issue.

## 5.4 URI Deferenceability

It is good practice to use derefenceable URIs, employing content negotiation to respond with either a human-tailored description of the resource (a web page) or a machine-readable description (some RDF serialization).
The URIs used in this KG are instead not derefenceable, neither individuals (e.g., https://imagoarchive.it/ontology/resources/manifestation/manuscript/mm-626) nor ontology terms (e.g., https://imagoarchive.it/ontology/has_curator).
In the case of the ontology modules, the URI ontology as a whole is derefenceable (e.g., https://imagoarchive.it/ontology/) but only as a machine-readable resource (in RDF Turtle).
To get the human-readable documentation of the main ontology a different URL must be used (https://imagoarchive.it/doc/index-en.html).

## 6. Ontology/KG Implementation

In ontologies and KG multiple incompatible namespaces are used for both CIDOC CRM and FRBRoo.

Namespaces for CIDOC CRM:
- , used in ontology modules;
- , used in the KG, for archive and toponyms;
- , used in the KG, for MMM.

Namespaces for FRBRoo:
- , used in ontology modules and in the KG, for archive and toponyms;
- , used in the KG, for MMM.

Datatype property has_reprint_date has xsd:string as range. Any reason to prefer xsd:string to xsd:date or xsd:datetime?

For the entity representing the whole ontology in https://imagoarchive.it/ontology/ (the one with rdf:type owl:Ontology) a blank node is used, instead of a URI.

The thesaurus included along the ontology (https://imagoarchive.it/Thes) has labels only in Italian and no definitions (using rdfs:comment or similar properties).
These limitations may hinder reusability of the dataset, especially by people not understanding Italian.
And even Italian-speaking experts may not be able to guess the precise meaning of a topic if it has not accompanying definition.

## 7. Evaluation

The evaluation of the KG is based on its ability to perform six types of "knowledge extraction targets" with corresponding SPARQL queries.
Albeit not called explictly that way, these have the role of competency questions (CQs) in designing and evaluation the ontology/KG.

While CQs are often adopted as a mean to evaluate ontologies and KGs, there should be other forms of evaluation, involving users/experts not involved in the design process. And possibly multiple usage contexts.
The authors mention that they are doing a user-based evaluation of the GUI. That could be included as a form of "in-use evaluation" of the ontology/KG (albeit mediated by that specific UI).

Furtermore, it would be good to discuss the role of the described queries in the context of higher-level tasks performed by the experts (e.g., researching a topic) and draw comparisons with existing query methods and repositories.

# References

[1] Matentzoglu, Nicolas, et al. "MIRO: guidelines for minimum information for the reporting of an ontology." Journal of biomedical semantics 9.1 (2018): 6.