Cultural Heritage Information Retrieval: Data Modelling and Applications

Babak Ranjgar
Abolghasem Sadeghi-Niaraki
Maryam Shakeri1
Soo-Mi Choi
Fatema Rahimi

Mehwish Alam

Survey Article
Knowledge organization and development of better information retrieval techniques were of great importance from a very early time period in human history. The need has grown high for such systems with the advent of digitization and the web era. Com-puter systems and web have offered easier retrieval of information in almost no time. However, as the amount of data increased, these systems were not able to work well in terms of accuracy and precision of retrieval. Semantic Web concept was introduced to overcome the issue by converting the web of documents to a web of data. Semantic Web technologies makes data machine-understandable so that information retrieval can be more precise and accurate. The Cultural Heritage community is one of the first domains to adopt Semantic Web recommendations and technologies, which can provide interoperability between various organi-zations by creating a shared understanding in the community. The data in the CH domain differs widely with types and formats. Also, a lot of organizations and experts from various fields interact through different processes within this community. Due to the mentioned needs, the CH community employed Semantic Web technologies step by step along its evolution process for better knowledge management and a uniform understanding among the community. In this paper, we presented this process from its initial steps and the various challenges faced to the latest developments in the CH information retrieval. The CH domain has the goal of preserving and dissemination of the historical information to people and society. Therefore, by making data machine-readable and achieving data interoperability thus a better information retrieval, there is a wide set of opportunities to develop smart applications based on rich CH information as a form of interactive, user-friendly, and context-aware dissemination of information to users. In this paper, we also reviewed intelligent applications and services developed in the CH domain after establishing se-mantic data models and Knowledge Organization Systems. Finally, challenges and possible future research directions are dis-cussed. The findings revealed that GLAMs (Galleries, Libraries, Archives, and Museums) are excellent and comprehensive sources of CH information. The CH community has put in a lot of time and effort to develop data models and knowledge organization tools; now it's time to use this valuable resource to construct smart applications that are still in their early phases. This could benefit the CH industry even more.
Solicited Reviews:
Review #1
By Aline Deicke submitted on 05/Aug/2022
Minor Revision
The paper aims to give an overview of developments in the application of knowledge organization and information retrieveal techniques to the Cultural Heritage (CH) domain. In doing so, the authors cover a wide range of topics from early KOSs to the potential of intelligent applications and services for the CH community.

The paper presents itself as a good and very comprehensive starting point for researchers already familiar with the CH domain and its data, challenges and politics to delve deeper into the techniques and aspects outlined above. Yet, in my opinion it would benefit from addressing more closely the specific challenges and demands of the CH domain in general but especially its diverse subfields, and incorporating these in more detail into the discussion of the mentioned standards and techniques. For example, the authors neglect to define how they understand the term „Cultural Heritage“ despite a long history of discussion arounds its meaning and inclusiveness (cf. Baker 2013; Brumann 2015; UNESCO World Heritage Centre 2021; Vecco 2010) – which types ob objects and entities (material or immaterial) they consider part of it, which geographical areas they focus on (for a claim to a global scale, Europe is very much overrepresented), which of its functions they see as most relevant to the topic at hand, or which actors they consider (the paper addresses initiatives from the field of [government sponsored] research as well as commercial ones but not consistently balanced in every chapter). In the same vein, while the paper addresses „needs“ of the CH domain and gives examples scattered throughout the text, the specific challenges of CH data in its many and diverse forms are not summarily addressed (for ex., in the introduction). Instead, an overview of „[f]unctions in the CH domain“ is given very late in the text in chap. 4.3. Finally, it might contribute to the clarity of the argumentation to differentiate more carefully from which fields of the CH domain the mentioned examples stem. Disciplines such as archaeology, art history or architecture deal with very different objects and questions which in turn lead to different approaches to and demands of knowledge modelling that can not be put next to each other without addressing their different contexts.

Related, this might pose limitations for the methodology outlined for the selection of the reviewed literature. As the keywords seem to be focused on combinations of technical terms with „cultural heritage“, it seems entirely possible that projects might not have been found that deal with subfields of CH but did not use the term. This in itself does not invalidate the methodology used, in my opinion, yet a more detailed critical reflection of this possibility might improve the paper in parts. For example, on p. 26 the authors state that „visualization is quiet [sic] young in the CH domain“ (also addressed by reviewer 1, comment 7, and reviewer 2, comment 13) and give 2004 as a starting point in fig. 5. Yet, as a random example, Marc Levoy and his students have been working on 3D-digitizations of objects of art since the 1990s (

Further room for improvement might exist in regards to chap. 5.1.2 and 5.1.3 (the second one added on the behest of reviewer 1, comment 6). I acknowledge that the legal framework of data reuse is a wide field and a thorough treatment might take away from the actual topic of this paper. Yet, if addressed, a few aspects seem to be missing such as the use of Creative Common licenses to ensure reuse of data; the FAIR- (Wilkinson et al. 2016) and CARE-principles (Carroll et al. 2021); and the challenge of different national legal frameworks regarding intellectual property in the international environment of the web.

As for the readability and clarity of the presentation, I have to stress again the comments made by reviewer 2 (comment 2 and 15). The paper contains numerous grammatical and typing errors. The language is often vague or imprecise to the point of being at times incomprehensible or misleading. I would recommend to have the text proofread by a native speaker to alleviate these issues. Additionally, many statements regarding the characteristics, social functions, or purposes of CH management practices remain decidedly vague, i.e. in chap. 4.3.: „[CH] Information is usually discrete and lacks the consistency that exists in other disciplines, such as geology“ – here, the statement might be improved by one or more examples what exactly the authors mean by „lack of consistency“ and how this differs from information in geology. The same issue concerns many other passages of the text. Also, often the language used is very absolute; statements such as „the actual reuse of it is the missing part“ (p. 28) should be phrased more differentiated as there are many examples of data reuse in research related to aspects of CH, if maybe not as wide-spread as could be.

Nevertheless the authors have taken on a monumental task in summarizing and reviewing knowledge management practices and their potential in the global CH community that must be acknowledged and will prove to be a fruitful resource for scholars interested in such standards and applications, be it from the side of computer science and the Semantic Web community or from the arts and (digital) humanities.


Review #2
By Victor de Boer submitted on 04/Sep/2022
I would like to thank the authors for the effort of very clearly addressing my comments to the earlier version. I feel these comments were taken very seriously and the changes made to the paper are significant and well-identified.

- The comment concerning challenges is well-addressed in the updated Critical discussion section.
- Grammar and language seem improved (based on rereading a selection of the document.)
- The rephrasing of the figure 1 makes sense.
- The added sentences to the abstract in response to my request for stating the results of the study are quite generic. I would expect here for example a list of the most important challenges identified or surprising, interesting results. But this might also be a matter of taste.
- Other smaller issues were adequately addressed

All in all, I would argue for accepting the paper.

Review #3
By Alessio Antonini submitted on 04/Nov/2022
The paper provides an historical prespective on modelling and application of information systems for and within the field of cultural heritage. The paper approaches this challenge with a simple approach: 1) models and 2) applications, further declined from a historical perspective. This approach does not pay off in terms of legibility with long sections 3 and 4, and a similarly unstructured (topic-based) discussion. Overall, the paper avoids addressing any crucial and specific challenge of cultural heritage, preferring to address this topic mostly through the selection of examples of models and applications. However, this light-touch approach does not prevent the feeling of reading an essay that uses CH applications as an example to explore the history of the semantic web.

The contribution avoids mentioning the field of Digital Humanities where the mentioned models and applications come from. The kernel of the paper is dated, and the selected contributions make its theoretical kernel focus on information systems in general, nothing that is specific to the Digital Humanities. for instance, archives and digital curation are missing from the list of applications.

Overall, I believe that this work is not suitable for publication as a survey in a research journal. I suggest the authors considering to publish it as a primer for students.

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.

The introduction takes the topic from far far away, starting with general statements on the Humanities. This would be a problem if the authors would then dive into the day-to-day challenges of, e.g., working with digitized artefacts, reconstructing digital manuscripts, or any other specific activity involving cultural entities and organizations.

(2) How comprehensive and how balanced is the presentation and coverage.

The paper spends most of its weight on issues of the late 90 / early 2000 not specific to cultural heritage.

(3) Readability and clarity of the presentation.

The lack of a framework makes the paper hard to appreciate. The authors made significant work in terms of reviewing a massive bibliography. But this work seems lacking in terms of direction and depth of analysis. This is the crucial problem of this paper and, I am afraid, of this endeavour, which directly reflects on its presentation.

Review #4
By Mehwish Alam submitted on 02/Dec/2022
Major Revision
The authors started the paper with a very broad problem definition. It should be more specifically towards the Cultural Heritage domain. Open problems from the perspective of this particular domain would also demonstrate the expertise of the authors in the field.

The authors should also acknowledge the diverse subfields of cultural heritage.

The methodology of how the studies were chosen is still missing. The authors should mention inclusion/exclusion criteria. Otherwise, it looks more like a random choice of papers.

In many places, it looks like simply dumping what the systems describe instead of motivating a storyline or a classification system (for grouping the papers) and then discussing the specific works.

In general, the paper still lacks the maturity to be published as a survey in a journal of good quality.