RomanOpenData: An Application of Semantic Technologies to Roman Amphora Epigraphy

Tracking #: 2387-3601

Authors: 
Xavi Giménez
Alessandro Mosca
Jordi Pérez
José Remesal
Guillem Rull

Responsible editor: 
Christoph Schlieder

Submission type: 
Tool/System Report
Abstract: 
The romanopendata.eu portal is the culmination of the knowledge representation efforts conducted during the "EPNet: Production and distribution of food during the Roman Empire" project. The aim of the portal is to allow access to the project's epigraphic data on Roman amphorae to both the project members and the community in a standard and interoperable way. To this end, it provides two main interfaces. The core interface is a virtual knowledge graph, which is presented in RDF, conforms to an ontology, and is queryable through a SPARQL endpoint. The second interface is a visual query system, built on top of the SPARQL endpoint, that allows non-technical users to explore the data without having to worry about writing queries.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Martin G. Skjaeveland submitted on 04/May/2020
Suggestion:
Reject
Review Comment:

This manuscript was submitted as 'Tools and Systems Report' and should
be reviewed along the following dimensions: (1) Quality, importance,
and impact of the described tool or system (convincing evidence must
be provided). (2) Clarity, illustration, and readability of the
describing paper, which shall convey to the reader both the
capabilities and the limitations of the tool.

----

The paper presents the romanopenportal.eu web portal which uses an
ontology based data access (OBDA) system to provide data about roman
amphora epigraphy. The portal has two interfaces: a SPARQL endpoint
and a visual query interface built on top of the SPARQL endpoint. The
visual query interface is the most prominent part of the portal as it
appears to the user. It provides keyword and faceted search and
results are nicely presented using maps and timelines and a gallery
view. The ontology, which is created by the project, is also presented
in the paper. The OBDA system used is Ontop.

I have three main concerns about the paper. First, the paper is
submitted as a 'Tools and Systems Report'. A much fit would be
"Application reports – short papers describing deployed applications
of Semantic Web technologies". The portal seen from a system's
viewpoint looks to be an assembly of pre-existing systems like Ontop
-- perhaps apart from the visual query interface. (It is not clear how
much of the visual query interface is developed by the project and how
much, if any, is developed by others.) Also, as the title of the paper
is "An Application of Semantic Technologies to Roman Amphora
Epigraphy", I assume that the authors are not aware of that this
submission type exists.

My second concern is the lack of insight, lessons learnt and detail
presented in the paper---regardless if the paper was submitted as a
different paper type. There is little new transferable knowledge to
learn for the semantic web researcher or practitioner. Most of the
presentation is either specific to this application and has limited
transfer value to other use cases, e.g., the description of the
ontology and the query templates for fetching data, or is generic and
already well-known to the community, e.g., the example driven
explanation of OBDA query rewriting and optimisation. While reading
the paper I often find my self asking for more details and for the
answers to the questions or challenges that the authors themselves
pose. I will give examples below.

My third concern is that not all parts of the OBDA system are
described nor available for review (the mappings and the database
schema). The parts that are available have quality issues.

Because of these concerns I suggest to reject the manuscript.

In the following, I will treat the submission as an application
report.

* Impact

SWJ defines impact as "the demonstrable uptake of your work by the
research community, industry, governments, or the general public."
http://www.semantic-web-journal.net/faq#q20

The paper makes no attempt of providing evidence for the impact of
the application. There are statements in the paper that indicate
what the impact of the application should be, but these are not
discussed further in the paper:

page 1, introduction: "The main objective [of the overall
project of which the portal is a part] was to create an
interdisciplinary experimental laboratory for the exploration,
validation and falsification of existing theories, and for the
formulation of new ones." Has the portal helped in meeting this
objective? In which way? How has semantic web technologies been
the enabler -- or not?

page 1, abstract: "The aim of the portal is to allow access to
(...) data (...) to both the project members and the community
in a standard and interoperable way. (...) that allows
non-technical users to explore the data without having to worry
about writing queries." Has the aim been met, and to what
extent? Has the project measured the effect through user
studies? How many users, and are they and getting the answers
they look for? What role does semantic web technologies have in
this -- both positive and negative experiences?

* Quality

The main contributions of the paper are according to the authors
the presentation of the ontology, the example illustrating OBDA
query rewriting and optimisation, and the visual query interface.

** Ontology

The presentation of the ontology describes the four axes which
are used to characterise the use case data. The presentation
gives a description of each axis and the (core?) classes and
properties in each axis. There is no discussion of the
modelling choices or methodology used to build the ontology, or
comparison with other ontologies.

page 3: "A first version of the ontology (...) extended the
well-known CIDOC CRM and the FaBiO ontology for bibliographic
data. Since then, the ontology has evolved and become more
compact. The goal was to facilitate the writing of queries, so
they can be make shorter, less verbose and computationally more
efficient." It would be interesting to know more about this;
what was done and to what effect? It seems that CIDOC CRM is no
longer used, why is that?

The presentation of the ontology does not provide a link to the
ontology, only a reference to the diagram of the ontology,
which, upon inspection, contains the ontology source.

The ontology source has many undocumented oddities. This makes
me questions its quality. Here is a sample:

- The IRI of the ontology is
http://www.semanticweb.org/ontologies/2015/1/EPNet-ONTOP_Ontology#
This IRI does not resolve to the ontology, but gives a 404 error.

- The ontology seems to use many other existing ontologies:
e.g., SKOS, the collections ontology and Fabio, but rather
than importing them, seems to include them in verbatim. Why
is this?

- The ontology contains many classes and properties that are
"isolated" in the ontology, i.e., they have no annotations
nor are they used in any axiom.

- The ontology includes SRWL rules. What are they used for? How
does this work with OBDA and query rewriting?

- There are many modelling choices with seem odd, but without
any textual and little axiomatic definition it is hard to
judge:

- Should not the classes SameFamilyFinding,
SameTranscriptionFinding be object properties?

- There are many classes called ...Type and ...Class, that
seem to contain resources which are classes, such as
HandleType which contains the resource #Handle-grooved.

- There are two person classes:
http://www.semanticweb.org/ontologies/2015/1/EPNet-ONTOP_Ontology#Person
and http://purl.org/vocab/frbr/core#Person

The contains relatively few axioms compared to its number of
classes and properties. It would be interesting to know if,
when and how reasoning has played/plays a role in the
development and use of the system.

** Mapping example

This part of the paper explains, using an example, how the OBDA
system Ontop performs query rewriting and optimisation. The
example material is not taken directly from the application,
but a simplified sample from it. This has therefore limited
value beyond being a simple example-driven explanation of query
answering with OBDA, which is well-known in the literature.

page 2: "A crucial task in the development of any OBDA/I
information system is the writing of a set of mapping rules
(...)." In OBDA, of the relational data source, the ontology
and the mappings, the latter are the most immature and
least-studied artefact. There are fewer tools available to help
develop and maintain a set of OBDA mappings, than for
relational sources and ontologies. It would be interesting to
know more about the mappings, such as: how many mappings are
there, how complex are they, how were they developed and if and
how will they be maintained? It would also be appropriate to
give more information about the relational database, such as:
how big the schema is and how complex, size of the data, is the
schema of the database fixed or was it possible to adapt the
database to the fit with the mappings and/or the ontology?

** Visual query interface

The visual query interface seems to work well. It is easy to
construct complex queries by setting many facets in the
advanced search. It is also possible to perform complex queries
by overlaying multiple queries on the same map. However, it is
difficult to judge the presentation of the results, since I
have no domain knowledge.

The data is not made available following linked data
principles. In the visual query interface, there are no links,
e.g., on could imagine that one could click the name of an
author in the bibliography and get all the entries associated
with this author, but this is not possible. The results of the
SPARQL endpoint suffer under the fact that the data is not made
available through a linked open data front-end. The namespace
used for the data is the same as for the ontology:
http://www.semanticweb.org/ontologies/2015/1/EPNet-ONTOP_Ontology#
This seems to be a lost opportunity to support easy exploration
of the data for non-technical users.

The data does not seem to connect to other datasets where this
is appropriate, e.g., the IRI for the country United Kingdom is
http://www.semanticweb.org/ontologies/2015/1/EPNet-ONTOP_Ontology#Place-...
It is not connected to other datasets such as, e.g., geonames
(https://sws.geonames.org/2635167/). In fact the dataset
contains no owl:sameAs relationships. This goes somewhat
against the claim the authors make that the data is made
available in an interoperable way.

I wonder about the maintainability of the visual query
interface: is it the case that the facets are fixed or are they
driven by the contents of the ontology---meaning if the
ontology is extended will the visual interface be automatically
updated or not? Has the authors tested other existing
approaches where the visual query interface is completely
driven by the ontology?

Also, if the facets are fixed, would it not be easier and
faster (both in development time and query time) just to fix a
set of SQL queries directly over the relational database? It
would be interesting to know more about the pros and cons here.

* Importance

One of the main tasks of application reports of semantic web
applications is to show when and how semantic web technologies
work and when they do not---preferably contrasted to other
technologies.

The paper presents many of the standard benefits of semantic
technologies and OBDA systems without providing any evidence or
arguments if these are realised or exploited by this
application.

There is no real discussion or motivation for why semantic
technologies and the OBDA architecture was chosen, and there is
no comparison of the chosen system architecture with others. This
could be both other semantic technologies, e.g., materialised
OBDA, or "traditional", e.g., relational databases.

The authors claim that OBDA help hide the complexity of the
relational database. This can be correct, but it is difficult to
judge without knowing the complexity of the database. Also there
is the cost of developing and maintaining the ontology and
mappings. It would be interesting with a discussion of this
trade-off.

From what I understand, only the ontology is published. There
are few real (in the sense not synthetic) and complete (the
source schema and data, the ontology and mappings -- and
preferably also real queries) available for the OBDA research
community to study. Is it possible to publish the complete OBDA
system?

* Clarity and readability

The paper is readable and the presentation is clear.

Review #2
By Torsten Hiltmann submitted on 25/May/2020
Suggestion:
Major Revision
Review Comment:

(1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided).

The paper under review presents the portal "RomanOpenData", which aims to publish and make available for analysis information on epigraphic inscriptions on Roman amphorae.
Apparently, the portal is based on the work of the CEIPAC Working Group and was developed within the framework of the ERC funded project EPNet, which in turn is dedicated to research on the production and distribution of food in the Roman Empire. It collects inscriptions on amphorae (mostly ceramic vessels used for the transport of goods including food), which can be dated in the time of the Roman Empire (from about 27 BC to about 480 AD).

*Quality of the described tool or system*

The article claims on page 3 that a first version of the ontology was based on Cidoc CRM and that the ontology has become more compact since then. Looking at the various examples given in the article, it seems that the project has completely abandoned the use of Cidoc CRM, at least in the visible part of the ontology. If this is the case, it would mean that one has renounced to keep the data, e.g. the information about the objects of their chronology, interoperable. It would be very helpful to know whether the current ontology is still mapped on Cidoc CRM or not.
While testing the system, I tried to follow the links defining a give resource e.g. of a municipality (http://www.semanticweb.org/ontologies/2015/1/EPNet-ONTOP_Ontology#Place-...), but this attempt only ended with a time-out. At least when I was trying, the ontology and its resources were not accessible. So, I was not able neither to verify to which degree they are integrated into the Linked Open Data Cloud, by referring to other resources too, or whether they can be used for federated queries.
For the chronological information, at least the description in the article (p. 3: “This is expressed as either a range of years, a textual description, or by indicating a specific time period such as the rule of a particular emperor”) does not allow to assume that that information is represented in a standardized form which would allow the dates to be computed (in which a historian, of course, would be interested in). In the present state, based on the information provided by the paper, one must assume that the inscriptions cannot really be sorted by date or, more importantly, that the information contained in this database can be compared with information from other sources.
Also, some of the decisions during the creation of the ontology are, at least for me, not completely comprehensible. While no reference is made to the classes and properties of Cidoc CRM anymore, the project uses the Dublin Core Terms ontology, a metadata ontology from the library domain, and applies the property dcterms:title (see page 8 line 5 and 20) to refer to what actually are the labels of the resources. At least for me, this use of dcterms:title is a bit unexpected here and could be reconsidered.
From the point of view of a historian, but this is just a little remark, I would also expect some information on how the project dealt with uncertainties, whether in dating or in assigning the type of amphora. At least the fact that the attribution of the amphora type is a 1:n relationship seems to allow the conclusion that different types can be attributed to the same amphora.

In the visual search, I cannot distinguish between the place of production and the finding spot. Thus, one is not able, for instance, to identify those amphorae which have been found on the same spot they have been produced or those who have been found very far away from it. Besides, while testing the portal, it froze several times, but this could have been also just bad luck.

*Importance*
The portal described in the paper adds more information on ancient objects and sources to the existing resources. This is very necessary and very welcome. However, one of the main problems of the portal seems to be its lack of interoperability. Compared to other domains in historical research, ancient history is very well developed if it comes to the use of semantic Web technologies and corresponding resources, just like Pleiades or the Nomisma ontology and e.g. the Online Coins of the Roman Empire (OCRE) which is based on it. Based on the methodological and technical decisions, taken in the project, the portal, as it is presented here, is not able to validly fit into this part of the Linked data cloud, to have its data analysed together with the data from other resources. For some of them (Pleiades, EDH and ADS), possible ways of integrating data have been tested. The article, however, is not clear about whether these data have eventually been integrated yet on not. But if they have, then only by integrating data dumps and not by linkage, since its technical solution is based only on relational databases and explicitly not on RDF.
The name "Roman OpenData" suggests that the portal comprises much more than just epigraphic inscriptions on amphorae. I think this choice is a bit unfortunate and may overstate the importance of this database. It would only make sense if the authors could demonstrate how they can integrate or interlink other data beyond epigraphic inscriptions on amphorae in order to make these data collectively searchable.
The article introduces the portal on page 1 as an “interdisciplinary experimental laboratory for the exploration, validation and falsification of existing theories, and for the formulation of new ones” and distinguishes it explicitly from other projects that would be mostly focused on the creation of digital collections by means of the aggregation of content from heterogeneous sources. In its current state, the article fails to show how this portal exceeds the current solutions and offers new possibilities for research. In the state described in the papers, the tool is limited to the data from one research project which can be studied by using a facetted search. It would be important to demonstrate, e.g. by means of examples, that it not only gives access to new data but opens up new paths for research by the way it is construed, which aren’t possible yet using the existing solutions. Given its current state, the article fails to prove the importance of the portal as new tool or system (not as data).

*Impact*
Finally, the name of the portal would also presuppose that not only the results of the queries but also the data itself would be available for download, which seems currently not to be the case.
Although the data provided by this portal is certainly important, I am not sure whether the technical solutions implemented in the current state of the portal can be considered good examples of best practice. To be more convincing, the decisions leading to the current state of the portal and its pro and cons should be discussed in much more detail. It will certainly have some impact in its field. But perhaps rather for the data than for the infrastructure and new research possibilities, at least as they have been described in the paper.

(2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool.

In its current version, the paper lacks clarity and doesn’t sufficiently discuss the capabilities and limitations of the portal.

Reading the article, I found it difficult to understand the relationships between the portal and the various projects and working groups and to contextualize it in this way. It might help to put the small project history currently given in the conclusion at the beginning and to sharpen it a bit. Some more detailed information about the data to be found there (for what period, for what geographical area) could also provide a better understanding of the portal to those who are not so familiar with ancient history as well as for historians and classicists, who would be happy about this information. For the moment, there is no proper description of the origin and the extent of the data. For instance, have the amphorae included here been found on the soil of the former Empire, or rather originated from the “Roman Empire”, what would explain that also India is included in the database?

A first version of the information system has already been described in previous articles in the years 2015-2017. The current article refers to them but does not really point out the changes made between the two versions. A short paragraph about the differences could help to better understand the current version of the portal and the underlying ontology.
The article claims on page 3 that a first version of the ontology was based on Cidoc CRM and that the ontology has become more compact since then. Looking at the various examples given in the article, it seems that the project has completely abandoned the use of Cidoc CRM. If this is the case, I think it would be important to explain these decisions and to give reasons why they were decided as they were taken.

Regarding the physical characteristics of the amphorae, as they are described as part of the ontology (p. 3), there is no reference to the origin of the nomenclature used here to denote the different types of amphorae, while on page 6 it is said that the description of the types is based on the ADS Roman Amphorae Repository. Page 1 mentions that ADS was added as additional data source to the data, but without giving any further specification. It would be helpful if the extent and the origins of the data and the vocabulary of the ontology would be described in the same place.
The same is true for the data on places taken from Pleiades. It is stated that the data on places is based on the data from Pleiades, but since the data itself is only linked to an specific ontology which is not reachable, this cannot be verified or even used for research.

Also, it is not really clear to me as well to what extent the data from the Epigraphic Database Heidelberg has been integrated. On page 2 line 14 and page 6 line 28 the article says that this was “tested” with a snapshot of the data. On page 9 line 28f it says that the CEIPAC corpus was the largest openly accessible repository of Latin amphora inscriptions even before “the integration” with the Epigraphical Database Heidelberg. A clear indication of whether and in what form the data was finally integrated (or not) could be helpful.

The contribution is also a little bit unbalanced in its structure. It becomes clear that the paper doesn’t really focus on the ontology when the description of the ontology is incorporated in the chapter about OBDA, while the text itself announced on page 2 to offer two different chapters for them. It would be good to correct this.
I can understand that for reasons of efficiency it may be desirable to store the data within the portal in a relational database. However, a much more detailed discussion of this choice, including the advantages and disadvantages, would be desirable. Especially since this approach seems to strip away the essential advantages of Semantic Web technologies like interoperability. So, apparently, the portal can only be linked to other relational databases, not to semantic web resources. Besides that, it also seems to be more error-prone. It would be very important to discuss also the consequences of those decisions and thus the capabilities and limitations of the tool which come with them.

Minor observations: There are also some typos on page 5, line 8 (TiulusTranscription instead of TitulusTranscription) and page 6, line 14 (Pleides instead of Pleiades).

Although the data provided by this portal is certainly important, I am not sure that the technical solutions implemented in the current state of the portal can be considered an example for best practice, nor are they properly discussed in this paper. In my opinion, the paper could only be published with major revisions.