Pattern-based design applied to cultural heritage knowledge graphs

Tracking #: 2378-3592

Authors: 
Valentina Anita Carriero
Aldo Gangemi
Maria Letizia Mancinelli
Andrea Giovanni Nuzzolese
Valentina Presutti
Chiara Veninata

Responsible editor: 
Special Issue Cultural Heritage 2019

Submission type: 
Full Paper
Abstract: 
Ontology Design Patterns (ODPs) have become an established and recognised practice for guaranteeing good quality ontology engineering. There are several ODP repositories where ODPs are shared as well as ontology design methodologies recommending their reuse. Performing rigorous testing is recommended as well for supporting ontology maintenance and validating the resulting resource against its motivating requirements. Nevertheless, it is less than straightforward to find guidelines on how to apply such methodologies for developing domain-specific knowledge graphs. ArCo is the knowledge graph of Italian Cultural Heritage and has been developed by using eXtreme Design (XD), an ODP- and test-driven methodology. During its development, XD has been adapted to the need of the CH domain e.g. gathering requirements from an open, diverse community of consumers, a new ODP has been defined and many have been specialised to address specific CH requirements. This paper presents ArCo and describes how to apply XD to the development and validation of a CH knowledge graph, also detailing the (intellectual) process implemented for matching the encountered modelling problems to ODPs. Relevant contributions also include a novel web tool for supporting unit-testing of knowledge graphs, a rigorous evaluation of ArCo, and a discussion of methodological lessons learned during ArCo development.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 20/Dec/2019
Suggestion:
Major Revision
Review Comment:

This paper describes the choices undertaken under, the methodology followed in, and the lesson learned during the ArCo project. The outcome of this project is a knowledge graph of the Italian cultural heritage. The paper perfectly fits the CfP of the Special Issue Cultural Heritage 2019. Furthermore, ArCo seems a quite mature project with good and well refined design and test strategies.

I have to start by saying that my global evaluation "Major revisions" is mainly motivated by the fact that the paper lacks a deep ontological analysis of the proposed modeling choices and a clear comparison with other ontologies, two quite penalizing flaws in my view (see my main comments (3) and (4)) Furthermore, I don't really see the scientific novelty of this work (main comment (1)) and the quality of the writing has to be improved (main comment (2)). Said that, from an engineering perspective the ArCo project introduces and follows some interesting methodologies that help in building and evaluating complex ontological resources. If this aspect is considered as fundamental for this special issue, I'm ready to change my evaluation to "Minor revisions"

--------------
MAIN COMMENTS:
--------------

I have four main criticisms.

(1) I understand the *applicative* significance of the project, but I don't see its *scientific* novelty. The methodology followed in the project (the eXtreme Design methodology), the evaluation criteria for the developed knowledge graph, and most of the design patterns used in the paper have already been presented and discussed in published papers. Sure, all these general tools have been used, in the specific domain of cultural heritage, to build the ArCo knowledge graph, but honestly I don't see any general impact on them due to the analysis carried out in this work.

(2) Section 6 describes in detail the evaluation of the ArCo knowledge graph in terms of the structural, logical, and functional dimensions. I appreciate this effort, however this analysis provides an evaluation in absolute terms, i.e., one knows how ArCo performs according to a given evaluation scale. However, how other ontologies perform with respect the same dimensions is not reported. The only comparative analysis concerns the terminological coverage where ArCo is compared with EDM and CIDOC-CRM. But, EDM and CIDOC-CRM are reference modules that, *by design*, contain only few general notions. Thus this comparison is not really informative. More importantly, a deep analysis of the different ontological choices adopted by these ontology (e.g., in terms of the classes and properties taken into account) is almost lacking. The high-level discussion in section 3 allows to understand neither the main conceptual/ontological disagreements - at the level of granularity of CIDOC-CRM and EDM - nor the adequacy of these choices for representing cultural heritage data (the authors consider only few examples, for instance, the representation of the change of location of a cultural property). It unclear if ArCo, EDM and CIDOC-CRM are consistent: the authors claim that "ArCo is aligned to both CIDOC-CRM and EDM", p.7, but they provide no details on this aligment; actually it is even unclear whether ArCo totally complies with the standards considered by ICCD (the Central Institute for Catalogue and Documentation) or whether some adjustments are required.

(3) Several *fundamental* notions are not analyzed in detail. The notion of "Cultural property", probably the main notion in ArCo, lacks any ontological analysis. In section 5 (and figure 14) the authors claim that fossils and scientific devices can be classified under (appropriate specializations of) CulturalProperty. At the same time they claim (sect. 5.1) that tangible cultural properties are physical objects. CulturalProperty seems then a sort of *role*, before being discovered and classified, fossils are physical objects that still are not cultural properties, and a scientific device becomes a cultural property only when its role in the history of science is recognized as important. In sect. 4.3.2 the authors claim that cultural properties can be "commissioned". This seems to suggest that before their realization some commissioned cultural properties do not have a physical counterpart. For instance, a statue intended as a cultural property could exist without any physical realization. However, the link between a cultural property and its (physical) realization(s) is not considered at all in the paper, even though the way this link may be represented is a notoriously delicate modeling aspect that can impact the whole system.
The notion of Situation itself, re-used several time in ArCo, is not clarified. In particular I do not understand if situations are just reifications of propositions, e.g., the reification of a statement like R(a,b,c,d) (where the arguments can belong to given classes, e.g., they can be temporal arguments), or if they are linked to more complex mechanisms. I often have the feeling that Situations are introduced mainly to overcome the OWL limitation about n-ary predicates (with n>2) but the authors are not explicit on this point. In this case, what happens for binary predicates, are the propositions R(a,b) also systematically represented via Situations? Indeed, in sect. 4.3.1, the authors claim: "Dynamic concepts, such as situations that change over time, are present in every domain". They seem then to suggest that the same situation can change across time. This merits a better explanation.
Sometimes the authors introduce some complex notions without any explication. For instance, in fig. 11, they refer to technical characteristics as "arguments" of a situation but their ontological nature in not discussed. Other times the authors are quite confusing. Let us take, for instance, the new notion of recurrent event. At the beginning they seem to claim that recurrent events are events, then they claim that they have different events as instances (end of p.14), while in figure 12 they result to have events as members, i.e., they are sort of collections of events, and they also are situations. This surely does not help the reader to understand the modeling choices adopted in ArCo.
Sometimes the description of the adopted model is very partial. For instance, fig.8 illustrates how it is possible to represent several versions of the same catalogue record but it is not discussed how a given version of the record is linked to the data contained in the record. Furthermore, the nature of the hasCatalogueRecordVersion is not analyzed. Is it a simple parthood? A composition? The same apply to the member relation above discussed in the case of recurrent events. In general, the relations considered in the conceptual schemas are never linked to more abstract relations that, as stated by the authors, are present in the Core module (at least they refer to parthood) and I imagine in the ODPs.
Personally, I would interested in a more deep conceptual/ontological analysis.

(4) The paper is not well written. First, its structure can be improved.

(4.1) Section 2 and Section 6 are quite verbose.
(4.2) Section 4 mixes methodological, conceptual, and implementation aspects. I would really prefer these aspects to be separate. The integration of the description of the conceptual/ontological choices (design patterns) in section 4 with the examples in section 5 would allow to better understand these conceptual choices. Furthermore, I find section 5 quite heavy and not very informative: (a) the SPARQL queries are quite trivial and do not add nothing; (b) there are several examples, each one touching a different representation problem, I would really prefer a single (more complex) example touching all/several modeling problems; (c) the owl definitions are machine, but not really human, readable, graphical definitions would improve the readability of the paper (as done for the conceptual schemas).
(4.3) Section 3 suffers problems analogous to the one highlighted for Section 4.
(4.4) Section 7 reports quite expected lessons, and again mixes different aspects.

Furthermore, in several places, the language sounds to me quite imprecise making the understanding of several sentences quite problematic (see the commentsbelow).

------------------
MINOR COMMENTS:
------------------

(p.2) LOD is not introduced before

(p.2) "The semantics emerging from ArCo"
--> what does it mean?

(p.2) "by distinguishing knowledge of cultural entities and their context, versus the knowledge dynamically assembled in catalographic records. The first type creates a CH ontology unprecedented in its depth and latitude, while the second allows to trace the epistemological aspects of CH catalogues"
--> very confusing sentence

(p.3) "the epistemological perspective on cultural properties as opposed to their ontological perspective".
--> this distinction is really badly explained along the whole paper.

(p.3) "In order to govern a knowledge graph development process able to address requirements from a diversity of potential consumers, to provide a rich expressivity, and to preserve high quality and easiness of reuse, ArCo follows a pattern-based ontology design methodology named eXtreme Design (XD) [9, 10], and has contributed to extend and improve it"
--> this comes at the end of section 1 after discussing some examples, why is it needed at this point?

(p.3) "the model redundancy by representing same concepts as both n-ary and bi-nary relations"
--> I don't really understand this remark, it is not explicitly taken into account in the rest of the paper

(p.4) XSD is not introduced

(p.4) "Cataloguing cultural heritage is the process of identifying and describing, through metadata, information
resources "
--> is really the process of cataloguing reducible to identify and describe information resources through metadata? there are no links to the (physical) object?

(p.4) "(see Figure 2."
--> (see Figure 2).

(p.5) "Many parts of these standards are similar, and a recent effort to map these sections was made"
--> which sections?

(p.6) Concerning fig.4, for me, the fusion of MSTL+MSTD+MSTS into MSTL is not the most interesting aspect. The fact that MSTS shifted from "sede espositiva" in v3.00 to "note" in v4.00 is more interesting and has a big impact in data. Do the developers have a motivation for this shift?

(p.6) "Along with the publication of LOD collections, ontologies representing the CH domain are being modelled"
--> what does it mean to model an ontology?

(p.7) "EDM has an object-centric approach, where the cultural property is directly connected to its features, hence reducing the possibility to express temporal and contextual information."
--> is this due to the limitation of DL-languages that allow only binary relations? I think this is an interesting aspect that has not been discussed enough in the paper. It seems to me that a lot of conceptual choices depend on this limitation.

Actually a similar comment appears also at the end of (p.7): "Nevertheless, the adopted ontologies only capture a subset and a simplified encoding of the available information about a cultural property because they prefer a lightweight modelling i.e. based on binary relations, as opposed to more complex predicated, e.g. n-ary relations."
--> (1) (probably) predicated -> predicates; (2) if I correctly understand, all the *predicates* (belonging to the logical language) are at most binary, the difference concerns the possibility to reify into the domain of quantification assertions concerning n-ary relations which are linked to their components (involved objects, time, location, etc.) by means of binary predicates

(p.7) "CIDOC-CRM is a richer model than EDM and has an event-centric approach, where many of the features expressed as object properties in EDM are modelled using an event"
--> please, make clear what this means, maybe through an example. In addition, it would be interesting to understand how different is the CIDOC-CRM notion of "event" with respect to the notion of "situation" endorsed in ArCo

(p.7) "ArCo needs to satisfy a significant number of modelling issues, overlooked by other ontologies so far, such as the diagnosis of a paleopathology and (...)"
--> are these classical problems? If yes, why not considering them later in a separate subsection, comparing how ArCo addresses them with respect what done in other ontologies?

(p.8) "CQs and constraints represent the project’s ontological commitment"
--> in which sense CQs represent ontological commitments? For instance, consider "CQ7: When a cultural property has been located in a place?" This is a NL sentence that must be translated into a sort of formula of language adopted in the project. However, there are several possible translations that may presuppose completely different ontological commitments. For instance, one could assume a 3d/endurantist view, where cultural properties are objects that are wholly present at different times, or one can assume a 4d/perdurantist view where cultural properties have different temporal slices at different times and where these temporal slices, rather than the whole 4d-worm, are the subjects of location.

(p.8) "The key aspect of XD, and in general of pattern-based design, is the ability to match CQs to ODPs."
--> the methodology to do this matching is not described in the paper, i.e., it is not clear at all how and on the basis of which considerations, the ODPs discussed in the paper has been chosen. I think this is an important aspect that needs to be stressed because these choices clearly impact the successfulness of the whole project.

(p.8) KG --> the abbreviation has not been introduced before

(p.8) "(iv) a set of user stories"
--> I think that a clarification about what kind of user stories has been considered in the project would be useful to better understand the input

(p.9) "Catalogue records have their own ontological characteristics and relevance into the CH domain. They are about cultural properties, hence they encode their epistemic perspective."
--> I'm not sure to understand this reference to the "epistemic perspective". First, all the data concerning an object can be intended as epistemological, they always pertain what we know about the object even though these data can be collected in a more subjective or inter-subjective way. The authors seem to assume that "intrinsic aspects such as length, weight, materials, and conservation status" are somewhat objective, but the weight of a small statue can be approximatively and qualitatively evaluated on the field by an archeologist without the use of scale or by means of a scale with a very rough resolution. The weight of this statue can be successively re-evaluated with a more precise scale, etc. Vice versa, according to the authors, the dating is among the subjective qualities, but carbon 14 methods are quite standardized. I then don't see why there is a different representation for these two kinds of qualities of objects (denotative description vs. context description).
Second, it seems to me that the authors are mixing knowledge about the object and knowledge about the record. For instance, assume that a new version of a record is introduced by agent A at time T. This information does not concern the cultural property or the data about the cultural property. For instance, the information represented by the new version may be relative to the state of the cultural property at time T' (different from T) according to agent A' (different from A).

(p.10) "A catalogue record is an entity that describes a cultural property. As it denotes a real-word object, it can be defined as an information object"
--> are here "describe" and "denote" used as synonymous?

(p.10) "The content of a catalogue record, i.e. the description of a cultural property, can change: “information about the creation of a catalogue record and possible following computerisation, update and corrections”.
--> is it the content of a record or the record itself that changes (see previous comment)? Second, what is computerisation? Intuitively, it seems to me that this aspect concerns the realization of the record not its content.

(p.11) "The Time Interval ODP is implemented to represent the temporal validity of each version"
--> implemented?

(p.12) "For an immovable cultural property (e.g. a monumental park), this place overlaps with the area occupied by the cultural property, and to which it is fixed."
--> it seems to me that the same hold for a movable object

(p.12) spacial --> spatial

(p.12) "express the site (intended as a physical building) and the geographical entity involved in the situation
--> what is the exact difference between a site and a geographical entity?

(p.12) "Each situation defines a contextual relation between the cultural property and the other entities involved."
--> in which sense a situation represents only *contextual* relations

(p.14) "For example, a artwork technical description may be defined as the relation between constituting material, employed technique, and shape."
--> please explain, in which sense a description is a relation?

(p.17) "This step focuses in checking the inferences caused by the ontologies"
--> caused?

(p.19) "As depicted in Figure 14, the concept of
:CulturalProperty is modelled as a partition of two classes:
:TangibleCulturalProperty and :IntangibleCulturalProperty"
--> I don't see how fig. 14 depicts this *partition*

(p.23) Is the evaluation of the Structural dimension of the KG done also during the developing process like for logical and functional dimensions?

(p.24) "Figure 15 shows the top-50 ranked classes based on
the number of individuals they have in the knowledge
graph"
--> why is this relevant?

(p.25) "a high number of to leaf classes"
--> delete "to"

(p.26 l14) "the the value"
--> then the value

(p.27) "on a scalse"
--> on a scale

(p.29) "Additionally, using available ontologies as input to generate new ontologies is a difficult process, far from being automated [34]"
--> [34] has been published more than twenty years ago

Review #2
Anonymous submitted on 18/Feb/2020
Suggestion:
Minor Revision
Review Comment:

The authors propose an ontological knowledge base for Italian Cultural Heritage (CH), called ArCO. ArCo has been developed by exploiting the eXtreme Design (XD) methodology, which is inspired by the Extreme Programming (XP) approach, a software development paradigm. ArCo's main data-sources have come from the General Catalogue (GC) of Italian Heritage that is maintained by the Ministry of Cultural Heritage and Activities (MiBAC) and built upon the SiGECweb platform, the latter hosting a relational database of catalogue records (schede di catalogo).
ArCo ontologies are aligned to EDM and CIDOC-CRM, but they extend them both in variety and in granularity of concepts. The ontology models cultural entities in several fashions such as technical descriptions, status, catalogue records, and so on, by exploiting well-known ontological patterns. Additionally, the paper presents many examples and case studies.
The paper extends a previous work by a) presenting how XD has been extended to the CH context and exploited to produce ArCo, b) by describing the design process, design choices, testing methods, evaluation of ArCo, and the lesson learned during the development phase.

Overall evaluation

The paper is well-written and provides a clear presentation of the ontology, which is very interesting. Moreover, the topic fits the scope of the journal. I am in favour of accepting this work, but I strongly recommend the authors to address the following points.
Major remarks
How ArCo extends EDM and CIDOC-CRM in variety and in granularity of concepts is not clear. The authors should explain the differences between ArCo and EDM, and between Arco and CIDOC-CRM. A short introduction to EDM and CIDOC-CRM in the preliminary section is highly recommended, followed by a clear explanation on how ArCo extends EDM and CIDOC.
My major concern regards the proposed definition of eXtreme Design (XD). The relationship between XD and XP is not clear at all. How XD is inspired by XP? What does XD inherit from XP? What does not? In which point XD is different from XP? In which does not? Why XD is necessary and why XP is not enough in the context of ontology design? The authors should answer those questions in a dedicated section that illustrates clearly the relationships between XP and XD. Moreover, there is no reference on how to apply XD to the design phase of an ontology, which represents the main difference between the development process of an ontology-based software and an entity-relation (ER) database-oriented software. In fact, the authors limit themselves to present the ontology patterns adopted in the implementation of ArCo without mentioning how to identify them: applying well-known patterns is a standard practice in software engineering and does not depend on the paradigm adopted. Without such clarification the reader has the feeling that XD is simply an à la "Spiral" approach (as it seems to be confirmed in Section 4.6) and that the “test driven” approach involves only the implementation (coding) of the ontology and not its design process.
Finally, how XD has been extended due the presented domain matters is missing from the paper. This point should also be clarified.

Minor comments:

-Page 2, footnote 2. By visiting the repository, I have noticed that you are using Virtuoso, but there is no mention in the paper and no explanation about this choice. Please, motivate.

-Page 3, Column 2, Line 43-46, "database(s) where the data are stored and maintained, and that are used as main sources for feeding some presentation interface". Are you stating that databases exist only to present data with some visualization tools? Please, clarify.

-Page 5, Fig.3, The figure is in Italian and annotated in English. For a non-Italian reader, the image is understandable but quite hard to figure out. Analogous situation in Fig. 4 at page 6. Maybe the text inside the pictures should be suggested.

-Page 7, Column 1, starting at Line 40. I am not sure about this statement. I suppose that location type and temporal validity may be expressed in CIDOC by means of the E55 Type and E52 Time Span concepts, respectively. Please, clarify this aspect.

-Page 10, I guess you may use some widespread ontologies for physical locations and geographical coordinates (e.g., LinkedGeoData). Investigate and clarify this aspect.

-Page 11. Column 1, line 38. May the "old version of an information object" be (sometime) considered out-dated (i.e., information is no longer valid)? If yes, this situation should be modelled.

-Page 12. Section 4.3. May "Situations" be replaced with "Event”? For example, the situation of "coin issuance" can be replaced with the event of "coin issuance". It seems to me that most of “situations” exist only due to the occurring of some events, but such relationship is not modelled. I guess that in ArCo the two concepts are overlapped, see, for instance, the example in Section 5.4. The situation of authorship exists only when the event "an author is attributed to a cultural property" occurs, namely, an "attribution event". Same thing for the example in Listing 4. This point should be clarified.

-Page 15. Column 1, Line 8. I guess that "hasUnifyingFactor" carries an improper meaning. In this case I would use "common factors", "common purpose", or analogous terms.

-Page 19, Column 2, line 4. Only at this point the reader discovers that you provided an automatic tool that converts ICCD sheets in ArCo ontologies. This information should be given in the introduction.

-Page 20. Is there any particular motivation for introducing the RDF annotations in the example of Listing 1?. Do you make any use of such annotations (for example, in some query)? I know that you are using OPLa for ontology metadata purposes, but I suppose that this is a different situation. Please, clarify.

-Page 23. Is “Technical status” limited to physical features? Can it change over time? I guess the answer is Yes in the first case and No in the second one. In such a configuration, a refactoring of the model may be necessary. In fact, status changes over time and this fact should be modelled. Please, clarify.

-Page 28, Section 7. "eXtreme Design is a methodology that encourages the reuse of Ontology Design Patterns". In computer science, reusing pattern is the practice. I would remove such an assertion.

Review #3
Anonymous submitted on 31/Mar/2020
Suggestion:
Minor Revision
Review Comment:

Summary

The paper presents a case study on the application of the eXtreme Design (XD) methodology to develop ArCo, a knowledge graph on the cultural heritage (CH) domain. ArCo has been developed in collaboration with the Central Institute for Catalogue and Documentation, a public Italian organization that is responsible for managing a catalogue that aggregates Italian cultural heritage data and that develops standards for the representation and exchange of such data.

The paper contains: (i) a discussion of how the authors applied XD to develop the ArCo ontology, focusing on the processes of applying ontology design patterns, testing the ontology, and involving potential users in the requirements elicitation phase; (ii) a description of some fragments of ArCo, illustrated with examples; (iii) an evaluation of the ArCo ontology in terms of its structural, logical, and functional dimensions; and (iv) a reflection on the lessons learned in the ArCo project.

================

Originality

The paper reports on the application of an existing ontology engineering methodology and reports some lessons learned. With that respect, the contributions are not really novel.

The ontology created through the application of this methodology is not the first in the CH domain. It is, however, the first with such a broad scope and detail as far as I could assess.

================

Significance of the results

ArCo, as an ontology network, can play an important role in stimulating the publication of linked open data in the CH domain, both in Italy and in the world. In particular, because it can be directly reused by organizations that need to publish CH data but have no budget to develop their ontologies.

ArCO, as a knowledge graph, can be reused by a broad audience and support data-driven innovation, as well as research.

The case study report in itself could be useful for organizations that want to follow a similar data publication process, i.e., expose their legacy data using a knowledge graph.

The only insight from the case study that is relevant to the scientific community is the idea of the Early Adoption Program.

================

English

The paper is generally well written. I only found a few issues:

Page 2, line 7: “[…] the whole world CH […]” => the world’s CH
Page 4, line 12: “[…] making publicly available data on, cultural heritage.” => There is no need for a comma there.
Page 11, CQ 11: “When a cultural property has been created?” => When has a cultural…
Page 11, CQ 11: “When a cultural property has been located in a place?” => When has a cultural…

================

Please find specific comments below, organized by section:

SECTION 1

In the introduction, ArCo is called many things. It is a resource, a project, a knowledge graph, a methodology, a data source, an ontology, a set of ontologies. Could that be clarified?

The authors claim that “ArCo as a project can contribute to push the state of the art in knowledge graph engineering, with special focus on the CH domain, by sharing its “behind the scene””. While I agree that this case study contributes to the maturity of ontology engineering, I cannot see any specific contributions to the engineering of CH ontologies specifically.

Page 2, line 14: “In [7] we introduce ArCo […]”. ArCo seems to be an acronym. Maybe the authors could explain it?

Page 2, line 20: “The semantics emerging from ArCo addresses directly the CH domain, by distinguishing knowledge of cultural entities and their context, versus the knowledge dynamically
assembled in catalographic records.” These two aspects are explained later in the paper. However, I suggest the inclusion of a brief example at this point to make help readers understand what

Page 2, line 43, footnote 1: Why adding the URL https://doi.org/10.1007/978-3-030-30796-7_3 instead of just citing paper [7]?

I appreciate that the authors made ArCo available through a Docker container, including the ontology documentation and an SPARQL endpoint ready to be queried. This is a fantastic way for people to test the ontology and experiment with its data.

SECTION 2

Figures 3 and 4 depict fragments of ICCD cataloging standards. I guess they screenshots of the original documents enriched with translations to English. I honestly do not see the benefit of depicting fragments of the original document. It just makes it more complex for readers. I suggest simply reproducing the original document structure and translate the content.

SECTION 3

Page 7, right column, line 23: “[…] a cultural institution makes a first relevant choice: it can either publish Linked Open Data by building and using its own infrastructure, or give its data to a cultural heritage data aggregator such as Europeana. A third case is of an institution that invests in infrastructure for publishing its data as well as in the whole process for producing them, by using the ontology model of an aggregator.” This fragment could be improved. The text initially suggests that institutions have two options, and then a third is option is mentioned.

Page 8, left column, line 6: “In our opinion, when possible, it is preferable that […]”. The authors’ position is not well justified in the text.

SECTION 4

Page 8, left column, line 27: “After the ontology project initiation, each iteration […]”. The methodological step is mentioned out of the blue.

Page 8, left column, line 27: XD is explained in a single and very long paragraph.

The sub-section on eXtreme Design should be extended. The case study is about the application of the methodology, including practical tips, opportunities for improvement, new tooling, and customizations made just for the ArCo project. So, I would like to see a more comprehensive presentation of the methodology instead of the quarter of a page available now. It could even be a section on its own, so the research baseline is separated from the contributions.

Additionally, the authors mention the use of “textual stories” to capture their initial ontology requirements. I would like to see at least one example in the paper, as many questions came to mind while reading about them. For instance: “Do they follow a specific pattern as in software development (e.g. “As a [persona], I [want to], [so that].”), or “How close to competency questions are they?”

In any case, I wonder how useful were the stories to define ArCo’s scope (and the competency questions as well for that matter). It is mentioned that the designers used the ICCD standards as their main input and that the goal of the project was to make the data in the GC available as a knowledge graph. Isn’t this enough to delimit the scope of the ontology the authors needed to design?

Page 8, right column, line 3: There is no need to explain the structure of the remainder of section 4. I would simply remove this paragraph.

The modularization process, reported in Section 4.2, is not properly explained. First, why did the authors opt for an ontology network instead of a monolithic ontology? Does XD say anything about modularization?

The authors should further elaborate on ArCo’s core module. Is it just a reflection of the legacy core XSD schema ICCD provided them? If not, what else was added to it (and under which condition)? For instance, if an element (e.g. an object property, class) was used by at least two modules, would it be placed there?

Moreover, it is mentioned that the core module contains general concepts, such as part-whole relations. Is this a suggestion that a top-level ontology was used (e.g. Dolce, BFO, gUFO) in the project? Since this is a methodological paper, discussing the adoption of foundational ontology (or the explicit decision not to) is very relevant.

Page 8, Fig. 6: Adding the prefixes to the ontology modules could improve understandability.
Page 8, Fig. 6: A caption is not a paragraph.

Page 10, left column, line 14: “[…] including events that recur over time (cf. Section 4.3.3) […]”. This is the only forward reference made.

IMHO, section 4.3 should be integrated with section 5. The ODPs are very abstract and hard to follow without concrete examples of how they are used in the ontology and how they end up being instantiated. Additionally, many of the examples shown in section 5 actually refer to ODPs, which may require users to go back and forth in the paper.

The argument made in section 4.4.1 for the value of annotating reused patterns is not convincing at all. The authors state that it “simplifies future reuse of ArCo by third parties as well as matching to other resources.” I fail to see how this would help users of the ontology if they are not extremely knowledgeable on the ODP catalogue used by the authors (which contains a significant number of patterns, some of which are not properly documented and exemplified).

I appreciate the test-driven aspect of XD and the tool presented by the authors. Having a good test suite is fundamental to assert the quality of a software tool. This practice should certainly be more widespread in ontology engineering and this paper makes important progress in that direction.

If possible, I would include a screenshot of TESTaLOD in the paper, just to give readers an idea about what it looks like. A link to its source code could be added to the paper as well.

I would also like to see on the paper a more in-depth discussion on the authors’ experience with testing in the ArCo project. For instance, what did they consider a good test suite? Did they isolate specific fragments of the ontology (e.g. patterns, modules) to run tests? How much of their ontology was covered by the tests?

SECTION 5

The code snippets used to illustrate instances of the ontology is simply awful to understand, particularly because of the constant line breaks. It would be a kind gesture by the authors towards readers to replace them with simple diagrams (as done it section 4) or at least make listings that take both columns.

The connection between a catalogue record version and the cultural property it describes is clear. It is not clear, however, how a record version relates to the actual data it contains. If this relationship does not exist, I believe the authors should explain why and motivate the inclusion of the records in the knowledge graph.

SECTION 6

I do not agree that the structural assessment made by the authors says much about the quality of the ontology, in particular the metrics listed in Table 3. Several do not even make sense to me. For instance, based on what can one claim that an axiom/class ratio of 39.5 is good? What is the point of measuring how often classes have multiple parents? That does not say anything about how well an ontology represents a domain or how well it serves the purpose to which it has been designed. The same could be said for the number of leaf classes or that of root classes.

The metrics in Table 1 and Figure 15 are informative for users to understand the size and complexity of the ArCo knowledge graph.

===

Are the authors aware that there is another ontology called ArCo? See Villazón, R., Bravo, G., & Cifuentes, D. (2008). ArCo: An ontology for architectural concepts in construction. Knowledge Generation, Communication and Management: KGCM-15, Orlando.