Security approaches for electronic health data handling through the Semantic Web: a scoping review

Tracking #: 2909-4123

Authors: 
Vinicius Costa Lima
Filipe Andrade Bernardi
Domingos Alves
Rui Pedro Charters Lopes Rijo

Responsible editor: 
Sabrina Kirrane

Submission type: 
Survey Article
Abstract: 
Integration of health information systems are crucial to advance the effective delivery of healthcare for individuals and communi-ties across organizational boundaries. Semantic Web technologies may be used to connect, correlate, and integrate heterogeneous datasets spread over the internet. However, when working with sensitive data, such as health data, security mechanisms are needed. A scoping review of the literature was undertaken to provide a broad view of security mechanisms applied to, or along with, Semantic Web technologies that could allow its use with health data. Searches were conducted in the most relevant data-bases for the scope of the present work. The findings were classified according to the main objective and features presented by each solution. Twenty studies were selected for the review. They introduced mechanisms that addressed several security attrib-utes, such as authentication, authorization, integrity, availability, confidentiality, privacy, and provenance. These mechanisms support access control frameworks, semantic and functional interoperability infrastructures, and privacy compliance solutions. The findings suggest that the application and use of Semantic Web technologies is still growing, with the healthcare area being particularly interested. The main security mechanisms for Semantic Web technologies, the key security attributes and properties, and the main gaps in the literature were identified, helping to understand the technical needs to mitigate the risks of handling per-sonal health information over the Semantic Web. Also, this research has shown that complex and robust solutions are available to successfully address several security properties and features, depending on the context that the electronic health data is being managed.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 11/Oct/2021
Suggestion:
Reject
Review Comment:

Summary:
This manuscript presents a survey of 20 studies on security of EHR data using semantic web technologies. It covers security attributes including authentication, authorization, integrity, availability, confidentiality, privacy, and provenance. The study shows the objectives and advantages of each study and classifies them into access control, interoperability, and privacy compliance classes.

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions:

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.
The manuscript presents 20 studies and their advantages and limitations in a table. As an introductory scope paper, it is essential to give background knowledge, terminology, and basic introduction to the readers first. But, these are missing in this manuscript. Between Introduction and Materials and Methods, it would be better to add a background, definition or terminology section to describe authentication, authorization, integrity, availability, confidentiality, privacy, and provenance (7 features) in detail. In the current version, those terms are presented briefly in the result section. And access control, interoperability, and privacy compliance are not explained in detail either. People who are new to the field need to know why there are three classes, how you classified papers into them, and why these 7 features are being looked at, are they standard security evaluation metrics, etc. Therefore, I think the current version of the manuscript is not sufficiently suitable as introductory text.

(2) How comprehensive and how balanced is the presentation and coverage.
The selected studies are discussed in a comprehensive manner. However, a few studies were covered in details but the others are not. In the results section, the manuscript shows several individual studies’ methods, but more technical details are needed. The current version only summarizes what these individual studies do in 1-2 sentences which can be found in the abstract of each study. The results and discussion of a survey need to be more than only summarizing. Additionally, I don’t know if the search strategy covers forward and backward searching. If the references and citations of selected papers have been looked at?

(3) Readability and clarity of the presentation.
The manuscript is well-written and clearly presented. The structure is easy to follow and logical. However, the review result table needs more work to be re-constructed. Firstly, the selected papers in the table need to be linked with the references. Second, the full result table can be stored in a repository with many columns or text. But the one presented in the paper should be informative and precise. For example, the authors, publication types, and year are not the key messages in the manuscript. At least, this information does not provide discussion points. Removing these columns, and refining “Results and security mechanisms” will increase the readability of the results table. Furthermore, the text in the columns of “Results and security mechanisms” and “advantages and features” is too long. They can be presented in points or short phrases. Lastly, a comparison of these studies needs to be indicated in the table. Are there any relations between these studies such as similarity, or extension, etc.
(4) Importance of the covered material to the broader Semantic Web community.
The summaries and findings of this manuscript is important for the broader semantic web community. However, the importance and significance needs to be elaborated in the manuscript. This is missing in the current version. In addition, related work (if similar surveys have been done by others) needs to be presented in the paper.

Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess

(A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data,
No. I expect the authors to publish their original search result, review results in an accessible repository to make it FAIR (finable, accessible, interoperable, reusable).

(B) whether the provided resources appear to be complete for replication of experiments, and if not, why,
Not completely. The search term in the article can retrieve the initial identified records, but the exclusion was not well-explained. Other researchers are not able to re-filter the studies based on the method description (Study Selection) in the paper,

(C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and
The review results are not published in any repository.

(4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.
Data artifacts are not complete.

Review #2
Anonymous submitted on 29/Oct/2021
Suggestion:
Major Revision
Review Comment:

This scoping reviews provides an overview of the security mechanisms applied to, or along with, semantic web technologies that could allow its use with health data and tries to identify possible research gaps in existing literature.

Well structured, easy to read paper, focusing on a somehow interesting area to the broader Semantic Web Community.

The methodology followed is solid, although I would expect a broader search to cover more works on the field. Further it somehow seemed too short/skin deep, with opportunities for an improvement in the presentation and clarity as well.

Please see bellow for more detailed comments.

--Introduction.--

I would expect in the introduction a comparison with other existing similar reviews or surveys. I would surprise me not to exist something near.

The introduction should be also updated to mention for which this paper is appropriate for (e.g. PhD students, practitioners etc.).

Also, I would also expect at the end of the introduction a short paragraph describing how the document is structured.

There are also specific workshops dedicated to semantic web and health data handling such as https://sites.google.com/view/swh2021/.

--Methods.--

I understand why the authors selected the specific terms. However, note that in many cases the term “semantic web” might be missing or masked. See [x1], [x2] for example. The “Limitations” section at the conclusion should be a bit extended to reflect this.

--Results.--

A table with the definitions Authentication, Authorization etc. would be useful.

Table 2 can be moved in an appendix. However, I would expect a more compact table within the article with the ref, the categories, the applicability, a few bullets/technologies/focus for the results, plus a few bullets for the advantages/disadvantages, so that the most important parts are easily extractable and the various works are easier to be compared. Currently the textual descriptions in the table do not help in extracting quickly an overview of each solution.

Also the column advantages disadvantages needs some clarity as it is not always obvious based on the text there which exactly is the advantage and if there are also some disadvantages.

The table could be also structured so that works in similar categories are group together.

Personally, I found the paper a bit skin deep and I would like to see more details on the surveyed papers.

Further In the discussion section, I believe it would be interesting to clearly separate for each category a) what are the drawbacks of the existing works b) opportunities for further research and c) maybe some recommendations to the interested reader who would like to dive into the topic.

--References.--

Some papers to be considered as well are the following:

[x1] Kondylakis, H., et al. (2015, December). Flexible access to patient data through e-Consent. In Proceedings of the 5th EAI international conference on wireless mobile communication and healthcare (pp. 263-266).
[x2] Papakonstantinou, V. et al. (2014, May). Securing access to sensitive RDF data. In European Semantic Web Conference (pp. 455-460). Springer, Cham.

Review #3
By Oshani Seneviratne submitted on 23/Nov/2021
Suggestion:
Accept
Review Comment:

This article presents a scoping review of the usage of security approaches grounded in the Semantic Web in electronic health. Semantic Web technologies have been applied in various access control mechanisms (including authentication and authorization protocols). It was refreshing to see summaries of many research articles presented in this survey paper. The paper is written well (except for minor typos). It is very suitable as an introductory text for someone venturing out into research at the intersection of the semantic web, security/privacy, and electronic health. The authors have clearly articulated the survey methodology and have categorized various papers as security mechanisms, applicability/related security properties, advantages, features, and limitations. The coverage of the articles selected is adequate given the niche scoping. However, as the authors pointed out in the limitations section, it would have been good to discuss other semantic web based security/privacy modeling approaches outside the electronic health space, perhaps as a comparison. This paper is probably not of interest to the broader Semantic Web community. But as I mentioned earlier, it may be of great interest to interdisciplinary researchers interested in understanding the application of semantic web technologies for implementing security solutions in electronic health.

Regarding clarity of presentation, I request the authors to clarify the following point and fix some of the typos/spelling/grammar errors in the article.

In pg 2: In determining the eligibility criteria of the articles, what are the security-related terms used? It would be helpful if you provided the list.

Some typos noted in Table 2:
Ref 8 -- "personal health personal data ..."
Ref 9 -- "an pre-authorized ..."
Red 14 -- there is a character encoding issue