Semantic models and services for conservation and restoration of cultural heritage: a comprehensive survey

Tracking #: 2647-3861

Authors: 
Efthymia Moraitou
Yannis Christodoulou
George Caridakis

Responsible editor: 
Special Issue Cultural Heritage 2021

Submission type: 
Survey Article
Abstract: 
Over the last decade, the Cultural Heritage (CH) domain has gradually adopted Semantic Web (SW) technologies for organizing information and for tackling interoperability issues. Several semantic models have been proposed which accommo-date essential aspects of information management: retrieval, integration, reuse and sharing. In this context, the CH subdomain of Conservation and Restoration (CnR) exhibits an increasing interest in SW technologies, in an attempt to effectively handle the highly heterogeneous and often secluded CnR information. This paper investigates semantic models relevant to the CnR knowledge domain. The scope, development methodology, conceptualization aspects and expressive features of each model are described and discussed. Furthermore, the deployment of each model as part of a SW system is examined, with focus on the types and variety of services provided to support the CnR professional. Through this study, the following research questions are investigated: To what extent the various aspects of CnR are covered by existing CnR models? To what extent existing CnR models incorporate models of the broader CH domain and of relevant disciplines (e.g., Chemistry)? In what ways and to what extent services built upon the reviewed models facilitate CnR professionals in their various tasks? Finally, based on the findings, fields of interest that merit further investigation are suggested.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Vincenzo Lombardo submitted on 07/Feb/2021
Suggestion:
Major Revision
Review Comment:

The paper surveys ten selected projects of Semantic Web approaches to Conservation and Restoration (CnR) and aims at investigating three issues (coverage, re-use of existing models, deployment), concluding with the wish for an encompassing model geared to support decision making.
In general, exchange of data between different CH institutions is a long term challenge due to the complexity of management of CH data and is not limited to retrieval, integration, reuse and sharing of data.

The major comment is that the paper seems to be not mature for journal publication.
The first problem is the selection of models: it should come out of specified criteria, now just sketched as "novelty". It can be demonstrated in the paper, e.g. by listing the models that followed the selected one in the field.

Second, the descriptions of the models are overly long and heterogenous; either the reader already knows the projects (as it happens to me for very few of them) or it is difficult to understand the rationale behind the works presented. Indeed, though referring to different subdomains of cultural heritage, descriptions can be easily split between common features (that can span several subdomains) and subdomain-specific features, also discussing how such specificities constrain the whole design process.
In particular, the diversity of the terminologies makes it difficult to read. For example, workflows and the projects are explained all together; sometimes it is difficult to differentiate the project and the related model; maybe each project can be structured in a different way such as highlighting similar and different components. Or
figures and schemas can be helpful to understand the workflow especially image related
projects such as ODPA-3DR.

Third, the paper should weight more its actual contribution, namely the analysis at the end the paper. Also, it should be linked better with the rest of the paper.
For example, in the first section of the paper, authors explain briefly about tangible heritage and mention the CH assets worth conserving. Later in the conclusion section, the differentiation of the movable, immovable and tangible heritage is at the centre of the discussion. In order to understand the differences between the terms, it is necessary to extend definitions and why authors choose these specific issues at the beginning of the research. Maybe also examine Unesco’s definition of Cultural Heritage (http://www.unesco.org/new/en/culture/themes/illicit-trafficking-of-cultu...)
Also, some interesting features can be put at the end of the each section.

My suggestion is to revise the paper in the following way: introduce the knowledge about CnR at the beginning, with the possible subdomains and related subtasks; describe how the selection was carried out and from what corpus (to be briefly summarized); describe the elected model in an homogeneous way; analyze the model designs, by splitting between CnR general issues and subdomain specific developments and including whether the design was successful and why.

Minor typos:

In general, the bulleted lists - i) ii) iii) - are sometimes hard to read when within sentences

p.1
classified in four -> classified into four

p.2
a SW system -> a SW-compliant system ?
cf. p.13 SW-enabled systems ???

p.7, 3.4. DOC-CULUTURE

Section 3.5, the last paragraph needs to be referenced or may be linked to the project website to complete and ease the understanding of the reader.

3.7: please, refer the website of GRAVITATE platform.

p.12: link of Footnote 22 doesn’t work.

p.16
were similarly employed -> were similarly employed.

Fig.2: dots/lines are very confusing

Review #2
By Andreas Vlachidis submitted on 11/Feb/2021
Suggestion:
Minor Revision
Review Comment:

The paper presents an extensive review of semantic models relevant to the Conservation and Restoration knowledge domain. It is well-written, well-motivated and serves as useful reference material on the development and use of the semantic models in the particular domain. Ten separate works have been reviewed and analysed according to a set of well-defined conceptualisation aspects that include; administration, material and technology, alteration, investigation, and innervation. The analysis is thorough and consistent across the set of reviewed works and the study neatly summarises the finding using tables and illustrations before delivering conclusions on coverage of CnR aspects, re-use of existing models and provision of services. Therefore, I support its publication for the special issue on the Semantic Web journal.

Major Remarks

From the outset, the paper highlights how significant this study is for the understanding of the contribution of the models to the conservation practice in relation to the various aspect (tasks) of CnR (page 1). These aspects are clarified in the methodology and become even more apparent on page fifteen and Fig.1. It would be useful to name these aspects in the abstract and to make a pointer from page one to methodology (page 3), to improve the cohesion of the discussion around the aspects and their contribution to the analysis as this is unfolded o page 15. Most importantly, it will be useful to connect these aspects to the set of procedures that you mention on page 1 (i.e. research, investigation, CnR Innervations, etc). How these procedures map to the aspects used in paper, what is their relation. Do they concern the whole practice (procedures) or part of it? It will be useful to clarify the above to help non-domain familiar audience to appreciate the context better.

It is no clarified whether the authors have followed a systematic review of the literature or research method for discovering the reviewed works. Did the process involved looking into journal and conferences, using keywords on search engines, browsing academic databases. It would be useful to clarify as the paper delivers a useful review of a significant set of works but it needs to defend the claim of being a “comprehensive survey”. Why should be regarded as comprehensive? In addition, the selection criteria can be specified better. Do the models (works) satisfy all criteria or any of the three? It should be made clear that all criteria are explicitly referring to the CnR domain. It reads like criteria ii and iii are applicable to any domain.

Figure 1 on page 15 provides a very useful view of the findings in relation to the five conceptualisation aspects. However, it is not very transparent how the study has concluded to these values. For example, Intervention receives 6 models (works) but on page 18 the study refers 5 models (ie MDO, OP-PRA, CRMcr, CPM and HERACLES). Is this a typo or more substantial flaw of the method? Perhaps it would be useful to provide a matrix (table) where you explicitly state which models contribute to what aspect.

Finally, it seems that seven out of the ten models have re-used some aspects of the CIDOC-CRM models. It would be very useful for the aims of this study, its contribution and impact to summarise in a paragraph or two any useful observations about how CIDOC-CRM is being reused in the domain. Are there any common entities and properties that seem to be more useful and attractive to the domain? Has it been used as the main conceptual baseline of the reviewed models or simply it has been used in an ad-hoc manner? It will be useful to have this insight as other domains might follow similar practices when re-using CIDOC-CRM.

Minor Remarks

[8] presents 3.2c , better to mention the name of the author

[49] presents 3.5

It is a bit awkward to commence a new sentence and paragraph with a citation. Instead, use the name of the model/project if possible.

Review #3
By Johan Oomen submitted on 12/Feb/2021
Suggestion:
Minor Revision
Review Comment:

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.

It is highly suitable. I would suggest to provide an intro to the “SW-enabled systems” (Section 4.3) also in the introduction. Now the paper mentions “a set of procedures are applied which can be classified in four categories” (pages 1-2) but it's unclear how this relates to services provided (Fig 3.). More generally, I would expect the introduction also to mention how semantic technologies can be used for presentation/exploration.

(2) How comprehensive and how balanced is the presentation and coverage.

It is comprehensive and balanced. However, more detail news to be provided in two areas:
On basis of what the research questions (page 2.) have been selected. Why are these the most urgent/impactful questions?
Section 3. “Models review” Deployment of the proposed model as part of a system” => more detail needs to be provided about (i) whether the deployment was evaluated by the projects + the findings. (ii) the adoption of the models in practice. It's not clear at the moment what happened to the projects after their completion, and whether/where the works are currently being used. The column “Deployment” in Table 1 doesn’t provide sufficient clarity.

(3) Readability and clarity of the presentation.

The structure is clear. Spelling and grammar is good.

(4) Importance of the covered material to the broader Semantic Web community.

It's an important contribution for both the SW and Cultural Heritage communities. Also literature is excellent. I would advice the authors to add a Section to Chapter 4, to also evaluate the uptake/adoption of the models in CH practices. See above.

Review #4
By Valentina Carriero submitted on 27/Mar/2021
Suggestion:
Major Revision
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.

(1) The paper presents a survey on semantic models that are relevant to the Conservation and Restoration (CnR) Cultural Heritage subdomain. The authors address 3 research questions related to (i) the coverage of the domain by existing models, (ii) the reuse of existing models, (iii) the support to reuse and research through the services built upon the selected models.
IMO, the authors should address some issues in order to make this survey publishable.

(2) The authors selected 10 works, from 2011 up to today, that propose semantic models in the CnR domain and have also been used in SW systems/services.

- I find this last criterion of selection not totally convincing. I understand that the focus of the authors is also on services that support e.g. CnR professionals in their activities but, even so, I would have included also possible relevant ontologies that have not been used in any CnR-related service, but are still available online and reusable, thus worth to be mentioned and described to the reader. In this way, for example more general ontologies on the CH domain, which model some concepts related to e.g. cultural object conservation interventions too, have been excluded. I would at least devote a separated paragraph to these ontologies.
- For selecting relevant works, the authors have used Semantic Scholar, Springer Link, Science Direct and AATA Online. I wonder if they also used ontology repositories such as LOV, and, if not, I would like to know why.

- Links to ontologies, their namespaces when reporting some classes/properties, and links to online services, are notably absent. The reader should not been made to find URIs theirselves, even more so if papers are not openly available. I believe that in the context of a survey about ontologies and online services, links are as important as the references. As an example, as for the ontology described in 3.1, I had to find the cited paper online, only to discover that the link in that paper (http://www.20thcpaint.org/oppra-owl/) can't be reached anymore. Another example: the section related to COSCHKR here (I found this link by googling) https://i3mainz.pages.gitlab.rlp.net/forschung/cosch/coschstillgelegt/# seems not accessible.
Thus, an important question to the authors: did they check if the ontologies they selected based on the literature are still available online? I do not think that ontologies no more available are worth to be mentioned here (or, at least, this should be clarified or addressed in a separate section).
- Related to the previous comment, the authors do not discuss if the reviewed ontologies respect the FAIR principles: I believe that this is a relevant issue to consider, as the reader should be able at least to find and reuse the presented models.

- Each model is clearly described, wrt the project in the context of which it has been developed, the main concepts/areas modelled (including concepts reused from external ontologies), the related service/system.

- I find very interesting that the authors identify the main "thematic clusters" of the ontologies, but I would like them to explain following which criteria they "split" an ontology in these clusters (e.g. as for the level of granularity), unless they have not been presented by the authors of the ontologies as it is my understanding is the case at least of MDO ontology (3.2).

- The Discussion could be deeper. The authors could elaborate more on the motivations behind the results. It would be interesting to know "to what extent", with what granularity, each model addresses the cited aspects (Fig. 1). Moreover, the limits of the ontologies are not discussed in the respective paragraphs. In the Discussion, the authors say how many models address the main areas of the domain, while in each individual paragraph they say what is possible to model with each ontology, but they do not talk about clear limits related to e.g. how some concepts are modelled/important concepts that are not modelled.

(3) The paper is well written and very clear. The structure of the paper makes it easily readable, the sections describing the 10 models follow the same structure, which is a bit repetitive but has the advantage of making the models easily comparable and the paper "reader-friendly".

(4) The paper addresses an interesting topic, presenting ontologies, thesaurus and online services related to a specific subfield of cultural heritage, thus potentially supporting who needs to reuse models or do research in this domain.

-- In the following, more specific comments, questions and recommendations to the authors. --
- Provide a clear definition of what tangible, movable and immovable CH are.
- The issues at page 2 ("However, up to now [...] in diverse ways") are issues of CH in general, I would say that.
- (sec 2) explain what do you exactly mean by "first attempts".
- (sec 3.1) "According to [...] context of the 20thCPaint Project": this paragraph is a list with nested lists and is not easily readable.
- (sec 3.1) "This combination aims to be a representation that is both understandable for conservators and consistent for the Material Science and/or Chemistry community" --> do you mean that they are redundant by modelling the same info in 2 different ways? If so, this is interesting.
- (sec 3.1) "The system allows [...] the uploading of the experimental data to the knowledge base" --> who can upload the data? Is uploaded data validated?
- (sec 3.2) "after being validated" --> by whom? Wrt what?
- first paragraph of sec 3.4: I would split it and put "(Development of an integrated information environment for assessment and documentation of conservation interventions to cultural works/objects with nondestructive testing techniques)" out of parentheses
- (sec 3.5) "Argumentation": is this a kind of interpretation process?
- (sec 3.5) it is not clear to me what do you mean by "can be described using the DescriptionConcept class (e.g., Architectural-Component shapeByUsing BuildingTechnique, BuildingTechnique hasMaterial Material)"
- (sec 3.6) is the PARCOURS system an aggregator?
- (sec 4.1) why Administration and Material&Technology are not discussed?
- (sec 4.3) "semantic search and data integration are the most popular among the services provided, while visualization follows" --> why visualization follows, if it has the same value as data integration (6/10)?
- (sec 4.3) I would discuss also "semantic annotation" since it is implemented in 5 projects (only one less than data integration and visualization)

-- minor comments --
I would use either italic or quotes, not both of them (see e.g. footnote 12)
Page 2: "CnR data can be found in various forms structured (e.g." --> forms, structured (e.g.
Page 2: (e.g., 59, 79) --> (e.g., [59, 79])
Footnote 2: As [78] mention the term fresco --> As [78] mentions, the term fresco
Page 4: "(implemented in OWLIM (currentGraphDB) --> I would not use nested parentheses
Title of section 3.4 DOC-CULUTURE --> DOC-CULTURE
Footnote 22: the link is broken