Ce qui est écrit et ce qui est parlé. CRMtex for modelling textual entities on the Semantic Web

Tracking #: 2374-3588

Achille Felicetti
Francesca Murano

Responsible editor: 
Special Issue Cultural Heritage 2019

Submission type: 
Full Paper
This paper presents the new developments of CRMtex, an ontological model based on CIDOC CRM created to describe ancient texts and other semiotic features appearing on inscriptions, papyri, manuscripts and other similar supports. The model is also designed to describe in a formal way the phenomena related to the production, use, conservation, study and interpretation of textual entities. CRMtex was originally meant to detect the close relationship linking ancient texts with the physical objects they are carried by, the tools and writing systems used for their production, the various scientific investigations and readings carried out on the text by modern scholars. It eventually evolved to provide researchers with the fundamental concepts for the correct and complete rendering of textual objects, the events representing their history and the cultural and social environments in and for which they were created. The full compatibility of CRMtex with the CIDOC CRM ontology and its extensions ensures persistent interoperability of data encoded by means of its entities with other semantic information produced in cultural heritage and digital humanities. The new entities present-ed in this paper deal more closely with textual and intertextual structures and try to deepen the close relationships existing between fragments of text or sequences of signs and the underlying meaning they were originally intended to convey.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Holly Wright submitted on 03/Jan/2020
Minor Revision
Review Comment:

This manuscript communicates the results of a useful and significant piece of original research, and represents a practical contribution to the family of CIDOC CRM extensions for use across cultural heritage domains. It sets out the features of the extension and potential implementations in a clear way, with excellent examples. Key aspects of the ontology are well illustrated.

Suggested revisions:
Existing standards (such as Nomisma) are mentioned, and the relationship to EPNet is well explained, but it would be useful to have a more full discussion of why this new extension is needed. CIDOC CRM is a complex ontology, requiring investment in expertise which is not always available within cultural heritage, so it would be helpful to better understand under what circumstances that investment is appropriate. Is CRMtex appropriate for all levels of expertise and complexity of datasets, or are there scenarios where a less complex, existing solution would work just as well? What is the return on investment for choosing to use CRMtex over another solution? Is there potential to create mappings between CRMtex and other existing standards?

The writing is perfectly understandable as it is, but a native English edit would be welcome.

Review #2
By Jouni Tuominen submitted on 16/Jan/2020
Minor Revision
Review Comment:

The paper presents the latest developments of CRMtex, CIDOC CRM's extension for modeling ancient texts and events related to them, such as their production, study, and interpretation. The work reported is incremental in relation to authors' previous articles of CRMtex (referenced as [7] and [8] in the paper), namely presenting new modeling primitives (support for written text segments and new classes for representing glyphs and graphemes). The work on CRMtex is still in progress, as the model is being actively developed. CRMtex is an important contribution in the field of cultural heritage, enabling interoperability of epigraphic contents and other, accompanying sources of information.

The paper is mostly well written (especially the presentation of the data model in Sections 3-5 is easy to follow), but some sections could be improved:

- Introduction: more discussion on motivation and background could be added - Why is CRMtex needed? What problems does it solve? What are the anticipated use cases?

- The relation of EpiDoc and CRMtex is discussed on quite abstract level, and with examples on how certain information can be modeled with either of the models. It would be interesting to have a more concrete discussion on how both of the models can be used simultaneously, complementing each other. It might not be sensible to model the full textual content of a manuscript as RDF/CRMtex (every individual word/character as an entity, generating lots of triples), but as XML (e.g. EpiDoc) with CRMtex used for modeling the needed semantic aspects.

- The related work section could be extended, with more references to scientific articles.

- More thorough evaluation of the model would improve the quality of the paper. The authors give some examples for using the model, but its applicability to different kind of datasets, use cases, or systems is not investigated in details.

In the example RDF snippets on pages 5 and 6, concerning the triple:
<...> crm:P44_has_condition crm:P43_has_dimension "erasure"
you might consider using a URI for the object instead of the literal value "erasure" (in Fig. 2, the class "E3 Condition State" is used for objects).

Minor language remarks:

- Discussion on EPNet (in 2.1. Ontologies and application profiles: a work in progress):
The style of the language could be adjusted a bit, e.g. rephrasing the following expressions:
"very interesting", "very promising", "excellent results" into more neutral style

- Fig. 2: "P2 hase type" -> "P2 has type"

- Page 7: "Figure 3" -> "Fig. 3"

- Page 7: "CRMtex native ability" -> "CRMtex's native ability" / "The native ability of CRMtex"

Review #3
Anonymous submitted on 19/Mar/2020
Major Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The paper addresses the interesting and rather novel topic of creating an ontology for describing ancient texts and semiotic features on inscriptions, papyri etc. The paper first gives a nice introduction with related ontology and mark up work on the topic. Then the paper focuses on recent developments of CRMtex, a CIDOC CRM extension that has been accepted in the family of its extensions.

(1) Originality

CRMtex has originality as an alternative to the state of the art model in use, the EpiDoc model based on TEI markup, and with to other related approaches. Representing text as linked data seems like a useful idea.

(2) Significance

Significance of the results are a bit difficult to evaluate. The model seems to be well-thought and has been accepted in the CIDOC CRM extension family suggesting high quality and compatibility with the mother model that is an ISO standard.

The paper presents application scenarios that illustrate the underlying ideas of the data model, but fails to document how much the model is actually used and what kind of experiences there have been in using the system. If this information can perhaps be found in the earlier related papers of the authors? In any case this infromation should be restated here shortly, too, in order to convince the reader more on the significance of the model.

(3) Quality of writing

This is a well-written and polished paper. I noticed only two minor issue in the quote [18]: may chose -> may choose

However, as SWJ is a technical journal, I would have expected to see a more rigorous formal representation of the data model in sections 3-4 that is now described only informally. For example, the paper does not tell what classes and properties there actually are in the ontology. There are two figures 1 and 2 that probably tell this but the figures are not at all explained to the reader! Are there other classes and properties than those in the figures? A more precise technical description of the ontology is needed. Is there a LOD model available or just the conceptual model and if so where?

Furthermore, the font in the figures is too small and difficult to read. Please, make it larger.

A question is, if this paper was submitted in a wrong category? The paper describes an extension to CIDOC CRM ontology, with a focus on its recent developments, and I wonder whether it fits better as an "ontology description paper". There the criteria in SWJ are:

"Descriptions of ontologies – short papers describing ontology modeling and creation efforts. The descriptions should be brief and pointed, indicating the design principles, methodologies applied at creation, comparison with other ontologies on the same topic, and pointers to existing applications or use-case experiments. It is strongly encouraged, that the described ontologies are free, open, and accessible on the Web. If this is not possible, then the ontologies have to be made available to the reviewers. For commercial ontologies, exceptions can be arranged through the editors. These submissions will be reviewed along the following dimensions: (1) Quality and relevance of the described ontology (convincing evidence must be provided). (2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology."

It is a good idea to keep these criteria in mind when revising the paper.

Review #4
By Oyvind Eide submitted on 02/Apr/2020
Major Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The topic of this paper is interesting and relevant, and I hope to see a paper published. The need for a model as the one described here is clearly there and the model itself is an interesting attempt to solve a number of problems. However, there are some fundamental questions that in my opinion needs to be clarified before the paper is ready for publication.

The paper describes DRMtex, ”created to describe ancient texts and other semiotic features appearing on inscriptions, papyri, manuscripts and other similar supports.” But it venture to do this without taking into consideration the tradition which is at least hundreds of years, if not millennia, old: scholarly edition and textual studies. This, combined with a simplified at best definition of the core concept of text leads to a number of unfounded claims about the object of study. There is also a clear bias towards the transition of epigraphy at the cost of the traditions of manuscripts studies, including but not limited to palaeography. I will give some examples of the consequences of this below.

The problem outlined here is not easy to solve in one paper, given the vast size of the research one potentially would have to study and the lack of agreed-upon definitions of ’text’. I would suggest a strategy based on two pillars.

I. Update the definition of text, and the relationship between oral and written text, to include more recent literature, and try to focus on the literature that is seen as central in the disciplines themselves. Put special focus on digital palaeography and digital scholarly editing.

II. Refrain from claiming that the model is based on what text _is_ and instead suggest a _pragmatic_working_definition_ of text. That is, rather than saying ”text is xxx, therefore…” I would suggest saying ”given that we assume text to be xxx, we suggest that…”

Some specific claims that I see as highlighting the problems outlined above:

Page 1:

”the fundamental concepts for the correct and complete rendering of textual objects” The idea that the textual object can ever be rendered correct and complete is counter to significant research traditions in textual studies at large. See Jerry McGann and Elena Pierazzo for recent examples in literary and textual studies and scholarly editing.

Page 3:

In order to clarify the relationship between oral and written text, one text from 1972 is quoted, seemingly also to prove that speech has priority over text. There is a huge literature discussing this, including claims for the opposite (see, e.g. Derrida).

As for semiotics, only Saussure is quoted. There is no reason to question the historical importance of his work, but more than 100 years of research has taken place since 1916 which is not mentioned at all here.

As for the complexity of defining text at all, I would here advise the authors to look into Patrick Sahle’s seminal work ”Digitale Editionsformen” from 2013 where he suggests that any serious descriptive definition of ’text’ needs to contextualise the definition into at least six different traditions, as expressed in his so-called text wheel: http://computerphilologie.tu-darmstadt.de/jg08/media/fischer-6.png

For clarifying the relationship between what is sometimes called text and token, digital palaeography, for instance as researched by Arianna Ciula and Peter Stokes, would be a good starting point.

Page 4:

”It is evident, in this perspective, that the study of ancient texts typically starts from the analysis of the physical characteristics of the text itself before moving to the investigation of their archaeological, paleographic, linguistic and historical features.”

I think this is far from evident and I would like to see this claim supported by scholarly evidence from central works in the disciplines in question.

To end this review on a more positive note — my claim above that I want to see an article along the lines of this one published is truthful — I would say that the model as presented here seems quite good and has a significant potential. Following up on the now 15-20 year old tradition of integrating TEI and TEI based standards with ontologies is important. I would suggest the following for the future development of the model, and for a future version of this article:

1. To include domain experts and to take into consideration the research literature in all the central areas to be covered by the ontology (and maybe leave some traditions out for now, to come back to them later — focussing on epigraphy could be part of a solution).

2. To accept once and for all that simple definitions of ’text’ is not possible and try to find a workable way to establish the concept without claiming it is covering all meanings across several disciplines.

Any solution, in my opinion, has to include connecting the research to recent central literature in each discipline.