Review Comment:
This paper describes the choices undertaken under, the methodology followed in, and the lesson learned during the ArCo project. The outcome of this project is a knowledge graph of the Italian cultural heritage. The paper perfectly fits the CfP of the Special Issue Cultural Heritage 2019. Furthermore, ArCo seems a quite mature project with good and well refined design and test strategies.
I have to start by saying that my global evaluation "Major revisions" is mainly motivated by the fact that the paper lacks a deep ontological analysis of the proposed modeling choices and a clear comparison with other ontologies, two quite penalizing flaws in my view (see my main comments (3) and (4)) Furthermore, I don't really see the scientific novelty of this work (main comment (1)) and the quality of the writing has to be improved (main comment (2)). Said that, from an engineering perspective the ArCo project introduces and follows some interesting methodologies that help in building and evaluating complex ontological resources. If this aspect is considered as fundamental for this special issue, I'm ready to change my evaluation to "Minor revisions"
--------------
MAIN COMMENTS:
--------------
I have four main criticisms.
(1) I understand the *applicative* significance of the project, but I don't see its *scientific* novelty. The methodology followed in the project (the eXtreme Design methodology), the evaluation criteria for the developed knowledge graph, and most of the design patterns used in the paper have already been presented and discussed in published papers. Sure, all these general tools have been used, in the specific domain of cultural heritage, to build the ArCo knowledge graph, but honestly I don't see any general impact on them due to the analysis carried out in this work.
(2) Section 6 describes in detail the evaluation of the ArCo knowledge graph in terms of the structural, logical, and functional dimensions. I appreciate this effort, however this analysis provides an evaluation in absolute terms, i.e., one knows how ArCo performs according to a given evaluation scale. However, how other ontologies perform with respect the same dimensions is not reported. The only comparative analysis concerns the terminological coverage where ArCo is compared with EDM and CIDOC-CRM. But, EDM and CIDOC-CRM are reference modules that, *by design*, contain only few general notions. Thus this comparison is not really informative. More importantly, a deep analysis of the different ontological choices adopted by these ontology (e.g., in terms of the classes and properties taken into account) is almost lacking. The high-level discussion in section 3 allows to understand neither the main conceptual/ontological disagreements - at the level of granularity of CIDOC-CRM and EDM - nor the adequacy of these choices for representing cultural heritage data (the authors consider only few examples, for instance, the representation of the change of location of a cultural property). It unclear if ArCo, EDM and CIDOC-CRM are consistent: the authors claim that "ArCo is aligned to both CIDOC-CRM and EDM", p.7, but they provide no details on this aligment; actually it is even unclear whether ArCo totally complies with the standards considered by ICCD (the Central Institute for Catalogue and Documentation) or whether some adjustments are required.
(3) Several *fundamental* notions are not analyzed in detail. The notion of "Cultural property", probably the main notion in ArCo, lacks any ontological analysis. In section 5 (and figure 14) the authors claim that fossils and scientific devices can be classified under (appropriate specializations of) CulturalProperty. At the same time they claim (sect. 5.1) that tangible cultural properties are physical objects. CulturalProperty seems then a sort of *role*, before being discovered and classified, fossils are physical objects that still are not cultural properties, and a scientific device becomes a cultural property only when its role in the history of science is recognized as important. In sect. 4.3.2 the authors claim that cultural properties can be "commissioned". This seems to suggest that before their realization some commissioned cultural properties do not have a physical counterpart. For instance, a statue intended as a cultural property could exist without any physical realization. However, the link between a cultural property and its (physical) realization(s) is not considered at all in the paper, even though the way this link may be represented is a notoriously delicate modeling aspect that can impact the whole system.
The notion of Situation itself, re-used several time in ArCo, is not clarified. In particular I do not understand if situations are just reifications of propositions, e.g., the reification of a statement like R(a,b,c,d) (where the arguments can belong to given classes, e.g., they can be temporal arguments), or if they are linked to more complex mechanisms. I often have the feeling that Situations are introduced mainly to overcome the OWL limitation about n-ary predicates (with n>2) but the authors are not explicit on this point. In this case, what happens for binary predicates, are the propositions R(a,b) also systematically represented via Situations? Indeed, in sect. 4.3.1, the authors claim: "Dynamic concepts, such as situations that change over time, are present in every domain". They seem then to suggest that the same situation can change across time. This merits a better explanation.
Sometimes the authors introduce some complex notions without any explication. For instance, in fig. 11, they refer to technical characteristics as "arguments" of a situation but their ontological nature in not discussed. Other times the authors are quite confusing. Let us take, for instance, the new notion of recurrent event. At the beginning they seem to claim that recurrent events are events, then they claim that they have different events as instances (end of p.14), while in figure 12 they result to have events as members, i.e., they are sort of collections of events, and they also are situations. This surely does not help the reader to understand the modeling choices adopted in ArCo.
Sometimes the description of the adopted model is very partial. For instance, fig.8 illustrates how it is possible to represent several versions of the same catalogue record but it is not discussed how a given version of the record is linked to the data contained in the record. Furthermore, the nature of the hasCatalogueRecordVersion is not analyzed. Is it a simple parthood? A composition? The same apply to the member relation above discussed in the case of recurrent events. In general, the relations considered in the conceptual schemas are never linked to more abstract relations that, as stated by the authors, are present in the Core module (at least they refer to parthood) and I imagine in the ODPs.
Personally, I would interested in a more deep conceptual/ontological analysis.
(4) The paper is not well written. First, its structure can be improved.
(4.1) Section 2 and Section 6 are quite verbose.
(4.2) Section 4 mixes methodological, conceptual, and implementation aspects. I would really prefer these aspects to be separate. The integration of the description of the conceptual/ontological choices (design patterns) in section 4 with the examples in section 5 would allow to better understand these conceptual choices. Furthermore, I find section 5 quite heavy and not very informative: (a) the SPARQL queries are quite trivial and do not add nothing; (b) there are several examples, each one touching a different representation problem, I would really prefer a single (more complex) example touching all/several modeling problems; (c) the owl definitions are machine, but not really human, readable, graphical definitions would improve the readability of the paper (as done for the conceptual schemas).
(4.3) Section 3 suffers problems analogous to the one highlighted for Section 4.
(4.4) Section 7 reports quite expected lessons, and again mixes different aspects.
Furthermore, in several places, the language sounds to me quite imprecise making the understanding of several sentences quite problematic (see the commentsbelow).
------------------
MINOR COMMENTS:
------------------
(p.2) LOD is not introduced before
(p.2) "The semantics emerging from ArCo"
--> what does it mean?
(p.2) "by distinguishing knowledge of cultural entities and their context, versus the knowledge dynamically assembled in catalographic records. The first type creates a CH ontology unprecedented in its depth and latitude, while the second allows to trace the epistemological aspects of CH catalogues"
--> very confusing sentence
(p.3) "the epistemological perspective on cultural properties as opposed to their ontological perspective".
--> this distinction is really badly explained along the whole paper.
(p.3) "In order to govern a knowledge graph development process able to address requirements from a diversity of potential consumers, to provide a rich expressivity, and to preserve high quality and easiness of reuse, ArCo follows a pattern-based ontology design methodology named eXtreme Design (XD) [9, 10], and has contributed to extend and improve it"
--> this comes at the end of section 1 after discussing some examples, why is it needed at this point?
(p.3) "the model redundancy by representing same concepts as both n-ary and bi-nary relations"
--> I don't really understand this remark, it is not explicitly taken into account in the rest of the paper
(p.4) XSD is not introduced
(p.4) "Cataloguing cultural heritage is the process of identifying and describing, through metadata, information
resources "
--> is really the process of cataloguing reducible to identify and describe information resources through metadata? there are no links to the (physical) object?
(p.4) "(see Figure 2."
--> (see Figure 2).
(p.5) "Many parts of these standards are similar, and a recent effort to map these sections was made"
--> which sections?
(p.6) Concerning fig.4, for me, the fusion of MSTL+MSTD+MSTS into MSTL is not the most interesting aspect. The fact that MSTS shifted from "sede espositiva" in v3.00 to "note" in v4.00 is more interesting and has a big impact in data. Do the developers have a motivation for this shift?
(p.6) "Along with the publication of LOD collections, ontologies representing the CH domain are being modelled"
--> what does it mean to model an ontology?
(p.7) "EDM has an object-centric approach, where the cultural property is directly connected to its features, hence reducing the possibility to express temporal and contextual information."
--> is this due to the limitation of DL-languages that allow only binary relations? I think this is an interesting aspect that has not been discussed enough in the paper. It seems to me that a lot of conceptual choices depend on this limitation.
Actually a similar comment appears also at the end of (p.7): "Nevertheless, the adopted ontologies only capture a subset and a simplified encoding of the available information about a cultural property because they prefer a lightweight modelling i.e. based on binary relations, as opposed to more complex predicated, e.g. n-ary relations."
--> (1) (probably) predicated -> predicates; (2) if I correctly understand, all the *predicates* (belonging to the logical language) are at most binary, the difference concerns the possibility to reify into the domain of quantification assertions concerning n-ary relations which are linked to their components (involved objects, time, location, etc.) by means of binary predicates
(p.7) "CIDOC-CRM is a richer model than EDM and has an event-centric approach, where many of the features expressed as object properties in EDM are modelled using an event"
--> please, make clear what this means, maybe through an example. In addition, it would be interesting to understand how different is the CIDOC-CRM notion of "event" with respect to the notion of "situation" endorsed in ArCo
(p.7) "ArCo needs to satisfy a significant number of modelling issues, overlooked by other ontologies so far, such as the diagnosis of a paleopathology and (...)"
--> are these classical problems? If yes, why not considering them later in a separate subsection, comparing how ArCo addresses them with respect what done in other ontologies?
(p.8) "CQs and constraints represent the project’s ontological commitment"
--> in which sense CQs represent ontological commitments? For instance, consider "CQ7: When a cultural property has been located in a place?" This is a NL sentence that must be translated into a sort of formula of language adopted in the project. However, there are several possible translations that may presuppose completely different ontological commitments. For instance, one could assume a 3d/endurantist view, where cultural properties are objects that are wholly present at different times, or one can assume a 4d/perdurantist view where cultural properties have different temporal slices at different times and where these temporal slices, rather than the whole 4d-worm, are the subjects of location.
(p.8) "The key aspect of XD, and in general of pattern-based design, is the ability to match CQs to ODPs."
--> the methodology to do this matching is not described in the paper, i.e., it is not clear at all how and on the basis of which considerations, the ODPs discussed in the paper has been chosen. I think this is an important aspect that needs to be stressed because these choices clearly impact the successfulness of the whole project.
(p.8) KG --> the abbreviation has not been introduced before
(p.8) "(iv) a set of user stories"
--> I think that a clarification about what kind of user stories has been considered in the project would be useful to better understand the input
(p.9) "Catalogue records have their own ontological characteristics and relevance into the CH domain. They are about cultural properties, hence they encode their epistemic perspective."
--> I'm not sure to understand this reference to the "epistemic perspective". First, all the data concerning an object can be intended as epistemological, they always pertain what we know about the object even though these data can be collected in a more subjective or inter-subjective way. The authors seem to assume that "intrinsic aspects such as length, weight, materials, and conservation status" are somewhat objective, but the weight of a small statue can be approximatively and qualitatively evaluated on the field by an archeologist without the use of scale or by means of a scale with a very rough resolution. The weight of this statue can be successively re-evaluated with a more precise scale, etc. Vice versa, according to the authors, the dating is among the subjective qualities, but carbon 14 methods are quite standardized. I then don't see why there is a different representation for these two kinds of qualities of objects (denotative description vs. context description).
Second, it seems to me that the authors are mixing knowledge about the object and knowledge about the record. For instance, assume that a new version of a record is introduced by agent A at time T. This information does not concern the cultural property or the data about the cultural property. For instance, the information represented by the new version may be relative to the state of the cultural property at time T' (different from T) according to agent A' (different from A).
(p.10) "A catalogue record is an entity that describes a cultural property. As it denotes a real-word object, it can be defined as an information object"
--> are here "describe" and "denote" used as synonymous?
(p.10) "The content of a catalogue record, i.e. the description of a cultural property, can change: “information about the creation of a catalogue record and possible following computerisation, update and corrections”.
--> is it the content of a record or the record itself that changes (see previous comment)? Second, what is computerisation? Intuitively, it seems to me that this aspect concerns the realization of the record not its content.
(p.11) "The Time Interval ODP is implemented to represent the temporal validity of each version"
--> implemented?
(p.12) "For an immovable cultural property (e.g. a monumental park), this place overlaps with the area occupied by the cultural property, and to which it is fixed."
--> it seems to me that the same hold for a movable object
(p.12) spacial --> spatial
(p.12) "express the site (intended as a physical building) and the geographical entity involved in the situation
--> what is the exact difference between a site and a geographical entity?
(p.12) "Each situation defines a contextual relation between the cultural property and the other entities involved."
--> in which sense a situation represents only *contextual* relations
(p.14) "For example, a artwork technical description may be defined as the relation between constituting material, employed technique, and shape."
--> please explain, in which sense a description is a relation?
(p.17) "This step focuses in checking the inferences caused by the ontologies"
--> caused?
(p.19) "As depicted in Figure 14, the concept of
:CulturalProperty is modelled as a partition of two classes:
:TangibleCulturalProperty and :IntangibleCulturalProperty"
--> I don't see how fig. 14 depicts this *partition*
(p.23) Is the evaluation of the Structural dimension of the KG done also during the developing process like for logical and functional dimensions?
(p.24) "Figure 15 shows the top-50 ranked classes based on
the number of individuals they have in the knowledge
graph"
--> why is this relevant?
(p.25) "a high number of to leaf classes"
--> delete "to"
(p.26 l14) "the the value"
--> then the value
(p.27) "on a scalse"
--> on a scale
(p.29) "Additionally, using available ontologies as input to generate new ontologies is a difficult process, far from being automated [34]"
--> [34] has been published more than twenty years ago
|