OntoPoetry: Postdata Ontology for poetry domain

Tracking #: 3007-4221

Authors: 
Elena González-Blanco García
Omar Khalil
Salvador Ros
Mirella de Sisto
Laura Hernández
Javier de la Rosa
Alvaro Pérez1
Oscar Corcho

Responsible editor: 
Stefano Borgo

Submission type: 
Ontology Description
Abstract: 
The idiosyncrasy of literary studies has been an obstacle to its technological improvement for years, especially to represent their knowledge in a machine-readable format. The richness, variety, and different study`s perspectives that scholars find in their studies make this task a highly complex challenge. This complexity is even more noticed in the poetry genre, where each poetic tradition has independently developed its analytical terminology and methodology. In this work, we have addressed the construction of a poetry ontology to express the scholar´s knowledge spread out in isolated databases or works. Ontopoetry ontology has been developed following Neon methodology, and it has been structured in three modules: a) core, b) poetic analysis and c) transmission, covering the essential aspects in a poetry literary study. Ontopoetry core module has been aligned with FRBRoo ontology guaranteeing its interoperability. This paper is focused on the description of the core module, its classes and relationships and the design decisions taken during the process. We also describe the proposed controlled vocabularies for this module and their relationship with the remaining modules.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 04/Feb/2022
Suggestion:
Reject
Review Comment:

The paper discusses a Semantic Web ontology for the representation of poetry. In particular, it addresses how the core module of the ontology, called OntoPoetry Core, is aligned to CIDOC-CRM and FRBRoo, given the wide use of the latter two in the digital humanities.

I think that the topic is interesting. I'm not an expert of literature or poetry, therefore my review considers the clarity of the presentation, and the robustness of the research from an applied ontology perspective. From these views, I'm not convinced that the paper can be published in its current state for two main reasons.

First, it is not clear why the authors **align** rather than **re-engineering** their ontology to CRM/FRBRoo. This is surprising considering that, browsing the OWL file of the ontology (which I found at: https://github.com/linhd-postdata/OntoPoetry/tree/master/Core), one finds natural language annotations - for classes declared to be logically equivalent to either CRM's or FRBRoo's classes - explicitly saying that these classes have been "cloned" from CRM/FRBRoo.

Hence, the authors have first developed their own ontology by "cloning" instead of simply reusing CRM/FRBRoo; then, they developed a formal alignment. This choice leads to the duplication of several modeling elements. For instance, the modeling pattern shown in Fig. 17 duplicates what already exists in CRM without adding any novelty. I think that, by simply reusing CRM/FRBRoo, even from a logical perspective, the OntoPoetry Core module would result much cleaner and simpler without all the equivalent declarations.

I see two possibilities for the authors: the first option, I would re-engineer OntoPoetry by specializing CRM/FRBRoo with elements relevant in the scope of the presented research but avoiding duplicating classes. This is a standard approach when working with top-level ontologies and specializing them for specific domains. The second option, to motivate the needs and benefits of the adopted approach, namely, explaining why "cloning" existing ontologies and developing alignments is a better strategy rather than reusing them.

Second, there is no conceptual attempt at clarifying the introduced core notions, one for all the notion of (literary) "work". There is a huge literature on this topic in the humanities and applied ontology (see references below). FRBRoo is largely used in the digital humanities; it remains however highly ambiguous in what a work is. Indeed, on the one hand, a FRBRoo's work seems an idea in the author's mind (see, e.g., FRBRoo v.2.4, 2015, p. 27); on the other hand, it seems an entity relevant for cataloging purposes (see, e.g., the notion of F1 Complex Work). I'm wondering whether the authors are aware of this ambiguity, which seems to apply to their proposal, too; e.g., they talk of works sometimes as ideas, sometimes as abstract concepts, but it is not clear what these terms mean.

Consider two simple sentences: "The cat is on the mat" and "El gato está en la alfombra" (consider them as two poetic verses). Would the authors claim that they are two expressions, in two different languages, for the same "work"? If this is the case, isn't the notion of "work" related to that of "meaning"? I know that this discussion raises foundational questions together with a critical attitude towards FRBRoo. However, if the authors wish to bring a research contribution for the ontological characterization of (poetic) works, something about "what a work is" must be said, especially in the light of a larger view on the state of the art not limited to CRM and FRBRoo. In the applied ontology literature, the authors can find some references on this looking for the notion of "information entity" (see, e.g., Gangemi and Peroni 2016 for an approach based on ontology design patterns; the authors can also find papers related to this topic in the Applied Ontology journal).

Other remarks:

- Please clarify which version of CRM is used. Note that the latest release of the ontology includes relevant differences wrt previous versions (e.g., P78, P87 etc have been deprecated).

- At p. 6 the authors say to reuse content ontology design patterns. However, it is not clear in the paper how these patterns were adopted.

- I think that the paper could be reduced in length without the loss of relevant information. This would facilitate reading. For instance, there are sections that, to the best of my knowledge, do not add new material with respect to the state of the art but simply say how some elements of CRM/FRBRoo have been used (e.g., 5.2.2, 5.3, 5.4, section about data properties for quantities). I would recommend the authors to focus on the presentation of those aspects of the ontology which are novel with respect to the state of the art and relevant in the context of application for poetry. This would help the reader in better appreciating the authors' contribution. Some paragraphs are also redundant; e.g., footnote 15 is also part of the main text.

- The authors need to introduce the modeling elements of CRM/FRBRoo reused in the paper. Otherwise the reader cannot properly follow the discussion. E.g., what is the difference between individual and complex work in FRBRoo?

- The graphical notation used in figures 5-7 etc is not clear. What do the arrows stand for? I strongly recommend the authors to use a well-known notation like UML Class Diagram.

- Section 5.2.3. It is not clear the pattern for the representation of agent roles. If I understand correctly, the authors treat the relation "PC14 carried out by" as a class in order to tell that a person participates in an event with a certain role, that is, the relation is reified. This is a common move to represent n-ary relation (n>2) in Semantic Web languages. However, what does it mean that the class AgentRole is a subclass of "PC14 carried out by" (btw, in the OWL file, AgentRole is *equivalent* to PC14)? Intuitively, PC14 is still a relation but formally treated as a class; its instances should have three arguments, e.g., arg1 the event, arg2 the actor, and arg3 the actor's role. Differently, instances of AgentRole are **not** relations; they stand for roles played by agents when they participate in events. Looking at Fig. 23, If I understand what the authors mean to do, AgentRole should be in the place of skos:Concept.

- Section "Datatype properties related to appellations". Looking at Fig. 29, it seems that "p102 has title" is both a data property (related to xsd:string) and an object property (related to E35 Title). This needs clarification.

- It is a common practice in the development of domain or application specific ontologies, to drive the development through the use of experts' requirements (sometimes represented as competency questions). These allow us to understand whether the resulting ontology fits experts' needs. I think that it would be valuable to write a section in this direction; the authors may present a case study exploiting the ontology, possibly showing how it matches experts' requirements.

I strongly encourange the authors to put forward this research and present a new version of their paper. The authors may actually consider submitting a new version of it to a journal specialized in the digital humanities where readers likely know more about CRM/FRBRoo, and can better appreciate the presented contribution with respect to the state of the art. I would also suggest assuming a critical attitude towards CRM and FRBRoo, and to better explore the state of the art.

Some references

Eggert, P. (2019). The Work and the Reader in Literary Studies. Cambridge University Press.

Gangemi, A., & Peroni, S. (2016). The information realization pattern. In Ontology Engineering with Ontology Design Patterns (pp. 299-312). IOS Press.

Pierazzo, E. (2016). Digital scholarly editing: Theories, models and methods. Routledge.

Thomasson, A. L. (2015). The ontology of literary works. In The Routledge Companion to Philosophy of Literature (pp. 349-358). Routledge.

Review #2
Anonymous submitted on 17/Feb/2022
Suggestion:
Major Revision
Review Comment:

The problems that the study of poetry and its various facets poses to scholars is well presented by the paper and the need to create an ontology to standardise information is very well motivated by the authors. The methodology employed to define the user requirements and to build a test dataset by assembling information coming from repertoires significant for this research field was carried out in an optimal way. However, the resulting ontology is very complex and the paper does not always make it easy its understanding

Some comments and suggestions to the authors:

I recommend using FRBRoo version 3.0 instead of 2.4, and CIDOC CRM version 7.1 which is the current official version of the model and contains many improvements over version 6.0 used for OntoPoetry, which is now almost 7 years old.

Figures 5, 6, 7, 8, 9 and the other similar ones are not very clear: what do the yellow arrows represent? Do they indicate properties or mark subclasses? And in the latter case, does the recursive arrows returning to the same entity mean that a class can be a subclass of itself? It is advisable to better explain the relationships between the entities represented. It would also be useful, for each class, to report the names in full, including numbers and labels, both in the figures, in the text and in the Annex 1 as well. The use of adequate namespaces would improve the quality of the paper.

There are many imprecisions in naming the components of the ontology. For instance, in paragraph 5.1.1 mention is made of a pdc:Work class, but the class defined in the ontology is called pdc:PoeticWork, as also shown in figure 10 and in the text following it (again with a different format, i.e. pdc:poeticwork, without capital letters). A “Work” class is mentioned in Annex 1 but no other information are provided about it. Many other similar cases of name ambiguity are present in the paper. For instance: the E33 class of CIDOC CRM occurs in 3 different forms: “E33 LinguisticObject”, “E33_Linguistic Object”, “E33 Linguistic Object”. The same happens for many other class and property names. I think it would be paramount to align the names of the classes throughout the whole paper to increase it readability.

Observation concerning the described entities: in paragraph 5.2.1 the authors state that “in Ontopoetry Ontology, we identified two principal types of events: a) certainty b) death and birth”. I wonder why other types of events such as the conception of the work, the writing, editing, publication, distribution of its manifestations, etc. and the related dates have not been taken into consideration at this level of the ontology. Are they probably part of the other modules? In this case, this should be specified somewhere, in this paragraph or elsewhere.

Minor typos:
Ch. 4, line 2, “NeOn” and not “Neon” (I assume)
Same typo in paragraph 4.1, line 1.
Pay attention to how “CIDOC CRM” is mentioned along the paper (“CIDOC-CRM” vs “CIDOC CRM”, the latter being the correct form, without dash in between)

In conclusion
The paper adequately exposes the need for an ontology capable of describing all the phenomena related to the domain of poetry, a tool that is certainly lacking at the moment.
However, due to its complexity, the way in which the ontology is presented makes it very difficult for the reader to fully grasp its structure and the general logic that governs it.
I believe the paper requires an extensive review and alignment of class and property names to solve the issue of the same entities written in different forms throughout the entire text and in the Annex 1. I also recommend defining adequate namespaces to avail the immediate identification of the entities of external models (e.g. crm: for CIDOC CRM entities, frbr: for FRBRoo etc.). This would tremendously improve the reading of the paper.

Review #3
By Stefano De Giorgis submitted on 13/Apr/2022
Suggestion:
Major Revision
Review Comment:

Submitted by: Stefano De Giorgis
Recommendation: Major Revision

Detail Comments

Summary:

The paper describes OntoPoetry, an ontology for poetry domain developed following NeOn methodology and aligned to ontological standards like FRBRoo and CIDOC-CRM.
The paper is well written and expresses in a clear way the modelling choices made by the authors.
However, some ontological refinements are necessary and some reference and comparison with previous relevant work in the cultural heritage knowledge and meaning negotiation area is lacking, and could shape some modelling choices while contributing the creation of an even bigger knowledge base of Linked Open Data.

Quality and relevance:

The paper is well written and many diagrams and images contribute in its being a pleasant reading, furthermore the modularisation structure and the reuse and alignment to ontological resources and ontological standards follow good modelling practices.

However, some points could be improved in particular:

1. In the paper it is declared that ODPs are reused, and they are correctly shown in the OWL ontological module, but no mention of specific ODP is declared in the paper sections, it would be useful to have them mentioned when and where they are reused. (E.g. in Section 5.1 a mention to the possible mereological relations and choices + the PartOf ODP would be appreciated.)

2. Although the formalisation, reuse and modelling of classes and properties seem to me sound and clear, there is no mention of previous works in the cultural heritage domain such as the ArCo ontology [1], which is aligned to FRBR and CIDOC-CRM as well, and has faced the same issues (also introducing ODPs which could be reused in the OntoPoetry Core Module) In particular:
2a) the mereological aspects of pdf:Ensamble.
2b) to better express the "certainty" class and property, and regarding the description of the item's cultural heritage context I suggest to look at ArCo Core module (and extension 0.3) and this [2] Daquino's paper, in particular some parts are a bit unclear e.g. the "pdc:isIntendedFor" property: based on which source? According to whom? Being able to express this information could improve the whole work.
3b) Furthermore, if, as it seems, you are willing to inject some form of semiotics in separating meaning from physical support and textual information (and if not here I think it's going to be necessary in the Poetic Analysis module) I suggest to start introducing some Amie Thomasson reference [3] and to refer to Sanfilippo's works about textual meaning formalization [4].

3. At the very end of Section 2 it is written "...this new version fully aligned to the foundational ontologies": which one? As a follow up to this point in Section 5 then: "CIDOC-CRM is a foundational ontology..." but I would disagree in saying that CIDOC-CRM is a foundational ontology. I link here [4] a very useful contribute by Maria Keet, in which some main distinctions among the most well known foundational ontologies are listed, described and compared.

4. There are some inconsistencies in the ontology: the class :ExactDateExpression seems to be inconsistent due to a restriction of the data property :stringContent, which takes only :ApproximateDateExpression as domain, but at the same time this class is disjoint with :ExactDateExpression. A suggestion here is either to relax the restriction and set it to :DataExpression, or to include both classes with an OR, to avoid inconsistencies.

5. I would suggest to consider also Time indexed participation ODP, Time indexed person role, Time indexed PartOf (to describe e.g. poems which have been part of some :Ensemble only for a specific amount of time) and Time indexed situation.

6. The property :editorOf and :hasEditor are inconsistent, also, they are subproperties of all other object properties, which seems non plausible.

An additional question:

7. How do you represent "nicknames" and "pseudonyms" ? E.g. Italian writer "Italo Svevo", whose real name is Aron Ettore Schmitz, and his pseudonym is taken from his double Italian and German origin.

[1]: https://link.springer.com/chapter/10.1007/978-3-030-30796-7_3

[2]: https://www.researchgate.net/profile/Francesca-Tomasi-2/publication/2817...

[3]: Thomasson, A. L. (1999). Fiction and metaphysics. Cambridge University Press.

[4]: http://ceur-ws.org/Vol-2969/paper61-FOUST.pdf

[5]: https://eng.libretexts.org/Bookshelves/Computer_Science/Programming_and_...(Keet)/07%3A_Top-Down_Ontology_Development/7.02%3A_Foundational_Ontologies

Clarity and Provided Data:

The paper is well written, concepts are properly explained, the narrative is convincing and data are available on a stable location. Also, the controlled vocabulary provided and the already populated knowledge base show that this does not seem a purely theoretical ontology floating in the vacuum.

Other Suggestions:

Having already reused many ODPs and cultural heritage main core ontologies, consider the possibility to align to broader established resources like ArCo ontology, and consequently to DOLCE foundational ontology, mainly to not spend energies in "reinventing the wheel" when some previous work is available to support this good work.
The "major" review is mainly due to the ontological inconsistencies and some semiotics flaws, while the project seems sound and well organised.

4 Typos and Writing Suggestions

- Abstract Section: "Neon" --> "NeOn"
- "Ontopoetry" and OntoPoetry" are both used in the paper, which one is the correct one? I would suggest the second one, also, it is the one used on the Postdata website.
- Section 5.1 "pdf:hasCommentarty" --> I think "pdf:hasCommentary"
- Section 5.2.2 page 13 "The properties supported in the ontology and their alignment with FRBR are in " the sentence breaks here.