Move Cultural Heritage Knowledge Graphs in Everyone's Pocket

Tracking #: 2912-4126

Authors: 
Maria Angela Pellegrino
Vittorio Scarano
Carmine Spagnuolo

Responsible editor: 
Special Issue Cultural Heritage 2021

Submission type: 
Full Paper
Abstract: 
Last years witnessed a shift from the potential utility in digitization to a crucial need to enjoy activities virtually by noting that while in the past (before 2019) data curators recognised the utility in performing data digitization, during the COVID-19, due to the lockdown, no one could enjoy Cultural Heritage in person and it required a great investment in remotely offering activities to make culture survive. The Cultural Heritage community heavily invested in digitization campaigns, mainly modeling data as Knowledge Graphs by becoming one of the most successful application domains of the Semantic Web technologies. Despite the vast investment in Cultural Heritage Knowledge Graphs, the syntactic complexity of RDF query languages, e.g., SPARQL, negatively affects and threatens data exploitation, risking leaving this enormous potential untapped. Thus, we aim to support the Cultural Heritage community (and everyone interested in Cultural Heritage) in querying Knowledge Graphs without requiring technical competencies in Semantic Web technologies. We propose an engaging exploitation tool accessible to all without losing sight of developers' technological challenges. Engagement is achieved by letting the Cultural Heritage community leave the passive position of visitor and actively create their Virtual Assistant extensions to exploit proprietary or public Knowledge Graphs in question-answering. Accessible to all underlines that we propose a software framework freely available on GitHub and Zenodo with an open-source license. We do not lose sight of developers' technical challenges, carefully considered both in the design and in the evaluation phases. This article, first, analyzes the effort invested in publishing Cultural Heritage Knowledge Graphs to quantify data on which developers can rely in designing and implementing data exploitation tools in this domain. Moreover, we point out data aspects and challenges that developers may face in exploiting them in automatic approaches. Second, it presents a domain-agnostic Knowledge Graph exploitation approach based on virtual assistants as they naturally enable question-answering features where users formulate questions in natural language directly by their smartphones. Then, we discuss the design and implementation of this approach within an automatic community-shared software framework (a.k.a. generator) of virtual assistant extensions and its evaluation on a standard benchmark of question-answering systems. Finally, according to a taxonomy of the Cultural Heritage field, we present a use case for each category to show the applicability of the proposed approach in the Cultural Heritage domain. In overviewing our analysis and the proposed approach, we point out challenges that a developer may face in designing virtual assistant extensions to query Knowledge Graphs, and we show the effect of these challenges in practice.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Jacco van Ossenbruggen submitted on 17/Nov/2021
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

The paper presents a software framework to automatically generated extensions for Alexa to enable it to answer questions over a sparql endpoint, and provides this software framework with use case data on github and zenodo. It evaluates the framework by comparing the manually configured and auto configured skills against other systems on the QALD question sets. I also appreciate the overview of currently available KGs in the CH domain in section 3.

With respect to originality, the authors did not address my question in the first round of reviews that deals with the comparison with [30]: "If you decide to resubmit, you still need to motivate why CH users need your system: for which common CH tasks is a system as proposed by [30] insufficient and why?". Just the argument you are providing an extension generator iso an extension is not sufficient unless you come with concrete use cases where other generic extensions fail.

With respect to significance, I find it a weak point that your main technical contribution, the framework, seems not to have specific features that make it more applicable to CH than other approaches. This also affects the evaluation in 7.1, where non of the performance tests seem to be related to CH.
Things improve with the new section 7.2 (and idem for 7.3, 7.4) where the use experiment is performed using tasks that are clearly related and with participants from the CH domain.
About 7.3: people do not "spontaneously join a survey"... You need to have advertised the survey somewhere (where?). Or do mean your participants did not get payed? If so, please rephrase and you still need to explain how you recruited your participants.

Quality of writing: I think the addition of 7.2-5 is welcome addition, but the paper has become long and harder to read. Minor: if accepted, please let your paper check by a native speaker before submitting the final version.

Thanks for making your code/data available on github/zenodo.

Review #2
Anonymous submitted on 17/Nov/2021
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

-----

The new version of the paper has been revised and extended susbtantially in several places after the reviews of the first round.

In section 2 the relationships between the proposed approach are related works have been clarified.

There are clarifying edits here and there in the substantial review of section 3.

In section 4, contributions of the proposed virtual assistant approach are now better highlighted, as well as limitation in subsection 4.3.

The main concerns for my "major revision" recommendation in the 1st evaluation round were:

"In Section 5, the idea of mapping Amazon Alexa QA system intent to SPARQL templates is presented. ... This section remains a bit generic as no real examples of using the system are given."

Section 5 provides now some more details on how the virtual assistant generator operates.

"Section 6 presents an evaluation of the systems using three questions. The code is available in Gibhub. The most important question, whether the system is fit for its purpose from an end user point of view is, however, is not addressed or evaluated."

The evaluation is now in section 7. It has been extended substantially with lots of novel content.

" (1) originality. The paper has some originality in its attempt to map Alexia intents to SPARQL templates. However, the paper should document in more detail how this kind of system was actually implemented, a reference to Github is not enough."

More details related to the implementation can now be found in the paper.

(2) significance of the results. The wide analysis of CH linked dataset available in SPARQL endpoints was interesting. However, it remains unclear how useful the QA system would actually be to end users. How well does the template approach generalize to free questions set by the public? A more focused study related to one particular dataset would be useful and provide the reader with some deeper understanding on how the system actually works. I have doubts on how well the system would perform in real life, as there are several deep challenges in using linked data even when SPARQL is used by professional users. Cf. e.g. the papers in JASIST and ISWC 2021 regarding the MMM data used as a case study, and the Semantic Web Journal paper on the WarSampo knowledge graph used as example datasets in the evaluation. Even if many QUALD questions can be transformed into the eight basic queries, it is not clear how well this can be done for free end user questions is a real use case; real Digital Humanities questions far more complex that those used in the paper. However, NL understanding is important topic even if challenging, and in this sense the papers has some significance."

This concern has been addressed by new the evaluation section.

"(3) quality of writing. In general the paper is well written, but there were typos etc."

Some further minor comments can be found below.

In my mind, this paper could be accepted for the special issue with minor corrections explained below. It would be good to proof read the new parts of the text once again.

Minor comments / suggestions:

p 2/10
for Alexa -> for, e.g., Amazon Alexa [add reference here],

p 2/19
Add a reference for Google Assistant

p 2/21
syntactically and semantically -> syntactical and semantical

p 2/ 39
We summarize ... ->
If you refer to a figure you must also explain the figure to the reader in detail. Refer only to Fig 1 and explain it or remove all figure references here.

p 3/28
experience -> experience evaluation

p 3/33
follows -> are:

p 3/50
Finally, it -> Finally, the paper

p 18/3
As an example, the consequences of missing labels attached to resources -> unclear sentence. What impacts what?

p 32/50
Reference [41]. The MMM system knowledge graph and system has been published/explained in this JASIST journal paper https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24499 and in the ISWC 2021 proceedings by Springer https://link.springer.com/chapter/10.1007/978-3-030-88361-4_36

p 33/12
Reference [51] The WarSampo knowledge graph has been published/documented in this paper of the Semantic Web Journal that could be referred to, too: https://content.iospress.com/articles/semantic-web/sw200392