Review Comment:
The manuscript addresses the integration of heterogeneous data from different sources in the domain of Cultural Heritage (CH). This domain presents particular challenges, such as the need for machine-understandable rights statements for digital artifacts and the preservation of contextual information during data integration. According to the authors, the manuscript presents two main contributions: (i) CACAO, an ontology extending CIDOC-CRM for rights and context, along with an accompanying rights vocabulary; and (ii) ARTKB, a knowledge graph that utilizes CACAO. The CACAO ontology aims to serve as a shared data model among different CH institutions.
The work is situated in a relevant context, namely the construction of a central digital infrastructure for Cultural Heritage based on the digitization of collections from Galleries, Libraries, Archives, and Museums. This aligns with important initiatives such as the European Collaborative Cloud for Cultural Heritage. In this context, the need for data integration and the establishment of policies for use, reuse, and access naturally arises.
According to the authors, although there are conceptual models such as the Europeana Data Model and CIDOC Conceptual Reference Model (CIDOC-CRM), these models tend to be abstract in their definitions, often requiring specialization for specific domains. Furthermore, the conditions for (re)use of digital artifacts are typically expressed in plain text, which may lead to misinterpretation and prevents machine interpretability.
Thus, the work aims to integrate data from REEVALUATE CH institutes (and external initiatives) into a central knowledge base. CACAO is intended to represent contextual information associated with artifacts as well as terms for rights statements, while ARTKB serves as the central knowledge base for REEVALUATE.
Despite the contributions of the manuscript, some issues need to be addressed:
1) Regarding Section 3.1, its role in the manuscript is unclear. None of the described top-level ontologies appears to have been directly used in the work. Therefore, it is not evident why they are presented in detail. This section could likely be removed without loss to the manuscript.
2) Section 3.6 appears to describe methodological steps of the ontology engineering process, particularly the identification of ontologies for reuse. Such content is typically part of the early stages of ontology development and would be more appropriately placed in a methodology section rather than under “Related Work.”
3) Similarly, Section 3.7 also seems to address methodological aspects and therefore may not belong in the “Related Work” section.
4) In my opinion, the methodological aspects related to the CACAO development process need to be strengthened. The description could be structured according to the activities and expected outputs of the Linked Open Terms (LOT) methodology:
4.1) The issues mentioned in points 2 and 3 above could be incorporated into a dedicated methodology section.
4.2) The development process could be illustrated with explicit inputs and outputs produced during CACAO development process.
4.3) In the first paragraph of Section 4, the authors state that “CH experts were consulted to formulate both functional and non-functional requirements.” However, the four requirements presented are not clearly classified into these categories. What are the non-functional requirements? Additionally, it is somewhat surprising that only four requirements were identified after consulting domain experts and ontology engineering specialists. One would expect a larger set of requirements. What explains this limited number?
4.4) The process of reusing CIDOC-CRM and ODRL, as well as the alignment/mapping effort between them, is not clearly described (as a step of CACAO development process). The ontology engineering literature provides established approaches for ontology alignment and mapping that could support and better justify how CACAO was constructed from these sources.
4.5) Regarding the alignment/mapping effort, most relationships appear to be modeled as subClassOf. Were other types of relations considered (e.g., equivalence, sameAs, overlap, covering)? If not, are they unnecessary in this context?
4.6) In addition to diagrams, mapping tables could be useful. These are commonly used to document correspondences between ontologies’ concepts and often serve as the basis for diagram construction.
5) Regarding Table 1:
5.1) Should not all competency questions (CQs) be related to requirements R3 and R4? These requirements appear to have a transversal nature, unlike R1 and R2. It is unclear whether R3 and R4 should be associated with all CQs or only some. Additionally, in cases such as CQ7 and CQ8, can R3 and R4 alone adequately capture the requirements? Could these be considered non-functional requirements?
5.2) To improve the understanding of conceptual coverage during CACAO development process, an additional column could be added to Table 1 indicating which ontologies (CIDOC-CRM, ODRL, or CACAO) address each high-level concept. This would offer a good notion of how each ontology contributed answer the CQ.
6) The authors could also discuss the benefits and drawbacks/difficulties of reusing and aligning/mapping ontologies for designing a third one. Besides the benefits, such processes often involve trade-offs and may introduce unintended side effects.
7) The design process of ARTKB is an important aspect of the work. However, given its relevance, it deserves more detailed treatment. In particular, the process of linking Wikidata entities to CACAO concepts and relations should be better explained. Was this process manual, semi-automatic, or automated? The manuscript describes what was done, but it could be more detailed on how it was done.
In summary, the manuscript presents a valuable contribution in a relevant context and is generally well presented. However, in my opinion, some issues need to be addressed.
|