Review Comment:
The paper reports and reflects on the experience of the authors in applying the eXtreme Design (XD) method to develop ArCo, a knowledge graph on the cultural heritage (CH) domain. ArCo has been developed in a homonymous project in cooperation with the Italian Central Institute for Catalogue and Documentation.
The paper discusses how XD was tailored for the ArCo project and how these adjustments helped the authors overcome some of the method's limitations, such as the lack of test automation tools and the lack of guidelines for defining an ontology's architecture. A big chunk of the paper presents how certain ontology patterns were chosen to meet the some requirements defined for the ArCo ontology, while discussing the respective implications.
This revised version of the manuscript has addressed the most relevant concerns identified in the first round of reviews. The structure of the paper makes a lot more sense now and many overlooked issues have now been addressed, such as the ontological foundations of the ArCo ontology, the contextualization of the wide number of metrics provided in the evaluation section, and the clear analysis and comparison with related work. Additionally, it is a lot easier to understand the ODP selection and application rationale, since all patterns are now presented alongside examples and showcased in a graphical format. All that being said, I must also recognize that the paper is a little tough to digest, particularly because of its length (43 pages with double columns make quite a lengthy paper).
My general assessment is that this is a very good paper for practitioners, as it discusses a lot of methodological aspects often overlooked/hidden in papers that describe an ontology or a knowledge graph. It is even more useful for institutions that are considering or starting to embark on a linked data journey, who are potential "customers" of the ArCo ontology.
My main criticism regards the use of the two DOLCE variants, DOLCE UltraLite and DOLCE-Zero, as the foundation for the ArCo ontology, as they go in the opposite direction of what the state-of-the-art foundational ontologies advocate (e.g. BFO, UFO, [original] DOLCE). DOLCE-Zero, in particular simply throws away useful ontological distinctions by creating union classes of core disjoint classes. Moreover, the argument in favor of co-predication is weak and based on the assumption that DBPedia has a good ontology design, which it simply does not. While most discussions and lessons learned in this paper are very useful for practitioners, I think this aspect is actually a disservice. It seems that the authors mix the discussion on conceptual ontological concerns with the practical implementation limitations of using RDF/OWL.
Please find my arguments for this criticism and other related issues below.
Page 13, Line 30. “For example, the Uffizi in Florence can be categorized as a Building (physical object), a Museum (a social object), and a relative Location (a spatial region)”. I strongly disagree with the author’s argument for supporting co-predication, particularly with the provided example. The Uffizi as an organization is not the same entity as the Uffizi as a building. This is a typical example of systematic polysemy, i.e. a word being used with different (but often interconnected) meanings. Good ontological modeling advocates exactly for the opposite, i.e. disentangling the different concepts collapsed into a single term. It surprises me quite a lot that the authors see these distinctions but intentionally choose to merge them. The argument that DBPedia would generate millions of inconsistencies is not an argument in favor of their modeling choice, but one that DBPedia is not modeled properly. In sum, the authors started with a strong foundational approach with DOLCE, but are throwing its value out of the window with DOLCE-Zero. If ambiguous definitions and little ontological commitments are what the authors are looking for, why use a foundational ontology in the first place? Please note that I realize that one of the authors of this paper is one of the creators of DOLCE, which only makes this approach even more astonishing to me.
Page 14, Line 7. I find the authors’ strategy to use of core:Situation to capture dynamically changing information that needs to be time-index perfectly reasonable and useful. However, their explanation of the ontological nature of their notion of situation is quite confusing and should certainly be revisited. First, the argument based on the same cognitive structure of n-ary relations and events (and event types, actions…) is a little cryptic. I mean, how can an event and an event type have the same cognitive structure? For one, the former is an individual and the latter is a type. Is the authors’ proposal to treat both an event and an event type as a situation? But how can an event type be a situation? Second, core:Situation is said to be equivalent to d0:Eventuality, which in turn is the union of dul:Event and dul:EventType. It seems that the authors use core:Situation to mean some sort of stative event, i.e. a kind of static event that happens throughout a certain period. For instance, the state in which I’m a father (e.g. playing the role of Father), the state in which the Monalisa is located at the Louvre. Thus, I fail to see why the authors made core:Situation equivalent to the union of dul:Event and dul:EventType, instead of simply making it equivalent to dul:Event. The second paragraph in section 4.4 is further evidence for my point here. All the concepts mentioned, namely E4 Period, E5 Event, E3 Condition State, and E2 Temporal Entity seem good candidates to specialize dul:Event, but certainly not dul:EventType.
Page 15, Line 15. “... ArCo situations do not commit to the distinction between objects and events as applied in DOLCE”. I’m not sure what is meant by this phrase, but if the authors mean that they do not want to make the distinction between events and objects (or perdurants and endurants; occurents and continuants) why have they picked DOLCE in the first place? I’m not convinced by the argumentation in the paper that it makes sense to pick a foundational ontology that adheres to a 3D view of the world and then adopting this constructivist approach (which seems similar to a 4D stance).
Page 15, Line 39 “Once an entity is recognized as being part of cultural heritage, it never stops being a :CulturalProperty. For example, a commissioned artwork is not an instance of ArCo’s :CulturalProperty, unless or until it is officially recognized as such. Hence, according to the definition by [47], being a cultural property is an essential characteristic of all instances of :CulturalProperty”. When Guarino and Welty [47] say that a rigid property is an essential property to all its instances, they mean that at every possible point in time, not only after the individual instantiates the property for the first time. It is not like the addOnly constraint that existed in a previous version of UML. If things need to be recognized as cultural heritage, then they must necessarily not be so at a certain point in time, which makes the property anti-rigid. Take the example of a photograph P (given on page 16). It is essential for P to be a photograph from the moment it comes into existence to the moment it is destroyed. Thus being a photograph is a rigid property. However, no photograph would come into existence already as a cultural property because, as the authors explain, it requires external recognition. Thus, being a cultural property is an accidental property of P, which makes it an anti-rigid property. If the authors want to implement CulturualProperty in their OWL model as a rigid class, that is a design choice, but it does not make the concept of Cultural Property rigid.
============
Please find some minor comments below, like typos and simple suggestions for improving the manuscript.
Page 1, Line 36. "lessons learned during ArCo development" => ArCo's development
Page 2, Line 16. "a resource that contributes to this vision by..." It is not clear which vision the authors are referring to. Do they mean the recent trend of cultural institutions publishing open data?
Page 2, Line 25. "liked data projects" => linked data projects
Page 2, Line 30, Left. Please consider using a bullet list to improve legibility.
Page 3, Line 31. "Section 8 discusses relevant related work and Section 7 summarises the lessons learned..." Out of order.
Page 4, Line 36. "The quality of the database..." Quality in which sense? Good design, accuracy, completeness? The same vague term is used at the beginning of section 2.2.
Page 4, Line 25. "a PDF document that contains, as shown in Figure 2: a table listing..." => as shown in Figure 2, a table
Page 4, Line 46. "For each of the 30 typologies..." => the 30 types
Page 6, Line 29. “Their adoption guarantees a high level of the overall ontology quality, and favor its re-usability [21]”. Claiming their adoption guarantee quality is too strong (and not proven). The cited paper indicates a positive correlation between the use of ODPs and some aspects of ontology quality.
Page 6, Line 45. “A very recent and promising contribution to fill this gap is CoModIDE [23]…”. Has this tool been used in the ArCo project? It seems like something that would make sense. However, if it hasn’t, I don’t see the point of mentioning its details.
Page 6, Line 11. “Experiments have proved its positive impact on ontology engineering and ontology quality [8, 23]. “ This claim is also exaggerated. How about “indicated”, “suggested”, or “demonstrated”?
Page 6, Line 32. “…each involving one or more teams: a customer team,…”.
- Please consider using bullet points to improve readability.
- Are members of theses teams expected to be from the “customer side”?
- Does the XD method say something about people participating in multiple teams? E.g. a person who moves between the customer and the design teams. Is it encouraged, discouraged, forbidden?
Page 7, Fig 4. Not all steps reported in the figure are properly explained in the text. I’m left curious, for instance, about what happens in the project initiation? Is there something, in particular, that is done with the domain experts? The steps that are not covered are project initiation, data production, release, and versioning.
Page 7, Line 28. “An example of simple user story is…”. I would appreciate some further details on what the user story in itself should look like in XD, in general, and how it looked like in their project, in particular. The example in Fig. 5 is a description of an individual that should be handled by the ontology. It looks very different, however, from how user stories are commonly used in software engineering, which is “As a , I want , [so that ]”. In fact, on page 10 we can see a story that is quite close to this template.
Page 8, Line 14. “Testing and Integration”. In software development, developers usually write their own unit tests. In test-driven development, in particular, developers are even encouraged to write tests before coding. Do the authors have anything to report on their experience in this project regarding the separation between designers and testers? Is it actually beneficial? I mean, how do the designers know their model satisfies the requirements if they don’t write tests for it? They could event break previous tests when making changes in the ontology.
Page 8, Line 34. “… make the test positive”. I may be ignorant of this terminology, but why not simply say the test broke or failed?
Page 9, Footnote 26. “All unit tests passed so far. “ I don’t get this footnote. What are the authors referring to? That all test must have passed or that no unit tests were broken during regression testing? By the way, it may be useful to add a sentence to explain what regression tests are for lay readers (i.e. without a computer science background).
Page 9, Line 9. “… and possible additional unit tests, on the whole ontology, after integrating the new piece”. Where would additional tests come from if all the others come from user stories?
Page 9, Line 16. “So far, we have described eXtreme Design (XD) according to [8, 9, 26].” Consider starting a new subsection here. Something like “Limitations of the eXtreme Design methodology”.
Page 9, Line 27. “(i) we opened the process of requirements collection in the style of open-source projects”. Could the authors elaborate on this? I don’t think most readers will know what requirements collection style they are referring to. At least a reference to a publication discussing this style would be appreciated.
Page 9, Line 41. “Furthermore, proposals for improvement and bugs can be submitted GitHub issues”. -> “via GitHub issues” or "as Github issues"
Page 10, Line 1. “A story is a non-structured text of maximum 250 characters…” I suggest rephrasing this to make it clear that this is the size adopted in the ArCo project, so it is compliant with what was said before.
Page 13, Line 15. What does D&S stand for?
Page 14, Line 5. “While apparently this is a representation problem, …”. I would change this to “While this is a representation problem, …”
Page 14, Section 4.3. Isn’t this approach to duplicate information risky? First, it may cause inconsistencies in the data, if only part of the data is inputted. Second, if a functional property, like locatedAt, is derived twice from two situations that identify distinct locations of a cultural property at t1 and t2, would we generate a contradiction? Or are all the constraints for object properties removed? I recommend that the authors further motivate this strategy and argue why it was a good solution for their project.
Page 14, Line 30. "CIDOC CRM E5 Event, subclass of E4 Period, is 30 defined as ….”. What do E4 and E5 before the class names mean?
Page 18, Line 1. Reflection: Did the authors really need to port the concept of a cataloging record to the linked data world? Wouldn’t it be enough to bring in the data it contains? The record itself reflects an old way of keeping track of information about cultural properties.
Page 20, Fig 10a. Show a rdfs:subClassOf arrow between from cis:Site to clvapit:Feature to help the reader realize that onSite is a subPropertyOf atLocation (as stated in the text).
Page 21, Fig 12. Isn’t Interpretation missing a relation with the agent who made it (as mentioned in the text)?
Page 22, Fig 13. The coin example made me wonder: Do the authors mean that the specific coin has Calandra Davide as an author or that this is an instance of coin designed by him? For instance, when we say that the Euro coins were designed by Luc Luycx, we mean the different types of coins, not each specific coin that has been produced.
Page 22, Line 23. “… the file format for a digital photograph (e.g. “.gif”, “.jpeg”),…” The authors' first state that the technical statuses only apply to physical cultural properties. Then, they provide an example in which a technical status is a digital file format applying to a digital photograph, which is not a physical cultural property.
Page 24, Line 4. "entity is neglected in literature". => in THE literature
Page 25, Fig. 15. Some of the relations used in the example are not shown in the pattern, namely hasTimePeriod, hasImmediatedPreviousSituation, hasImmediatedNextSituation, hasTimePeriodBeforeNextSituation
Page 24, Line 40. "Annotating reused patterns supports the identification of ontology alignments" and "...ODP annotations may ease the process to understand and explore an ontology". Could the authors provide any evidence for these claims? Or at least discuss in this section how this annotation was useful within the ArCo project (e.g. Did it help ontology designers, testers, or users?) Otherwise, if there is nothing to be said about this, maybe the authors could consider removing this section from the paper.
Page 26, Line 36. "...by means of indicators that might suggest quality weaknesses or strength". There is something off with how this was phrased.
Page 27, Line 29. "when missing, over test data generated using Fuseki" Fuseki is a SPARQL server, so how does it generate data? If the authors mean the testers manually created artificial data using Fuseki, I suggest they rephrase this. Otherwise, please provide an alternative URL for Fuseki's data generation feature.
Page 27, Footnote 71. I tried the testalod demo with the default values, but I got an application error. I simply clicked GO! and then TEST! on the COMPETENCY QUESTIONS feature. The same thing happened with the CONSISTENCY test. Maybe it's a Heroku problem?
Page 28, Line 7. "This is of utmost important to assess whether ArCo addresses its intended use, i.e. compliance to expertise." Compliance with expertise is a requirement, not a usage.
Page 29, Line 5. The keywords you listed are in Italian, and Arco is in English (as far as I could tell), as well as CIDOC-CRM and Europeana Data Model. How as this analysis?
Page 31. Table 3. For the metrics that should be considered relative to another one (e.g. NoR and #Classes), I suggest showing the ratios instead, as it is done for NoC in the text (page 33).
Page 35, Section 7.1. Although I highly appreciate pattern-based methods and tools for ontology engineering, the authors' insistence on the need of annotating an ontology with the used patterns is unjustified. The benefits of such an annotation provided in the paper are either speculative or abstract.
Page 36, Line 7. "Effort are also being made" => Efforts are
Page 38, Line 34. "... changes of the physical location of a cultural property are represented by move events...". This has already been partially explained on the previous page. I suggest merging these two passages.
Page 39, Line 45. "In making this choice, a cultural heritage..." This paragraph is way too big.
|