ciTIzen-centric DatA pLatform (TIDAL): Sharing Distributed Personal Data in a Privacy-Preserving Manner for Health Research

Tracking #: 3220-4434

Authors: 
Chang Sun
Marc Gallofré Ocaña
Johan van Soest
Michel Dumontier

Responsible editor: 
Guest Editors SW Meets Health Data Management 2022

Submission type: 
Full Paper
Abstract: 
Developing personal data sharing tools and standards in conformity with data protection regulations is essential to empower citizens to control and share their health data with authorized parties for any purpose they approve. This can be, among others, for primary use in healthcare, or secondary use for research to improve human health and well-being. Ensuring that citizens are able to make fine-grained decisions about how their personal health data can be used and shared will significantly encourage citizens to participate in more health-related research. In this paper, we propose a ciTIzen-centric DatA pLatform (\tidal) to give individuals ownership of their own data, and connect them with researchers to donate the use of their personal data for research while being in control of the entire data life cycle, including data access, storage and analysis. We recognize that most existing technologies focus on one particular aspect such as personal data storage, or suffer from executing data analysis over a large number of participants, or face challenges of low data quality and insufficient data interoperability. To address these challenges, the \tidal\ platform integrates a set of components for requesting subsets of RDF (Resource Description Framework) data stored in personal data vaults based on SOcial LInked Data (Solid) technology and analyzing them in a privacy-preserving manner. We demonstrate the feasibility and efficiency of the \tidal\ platform by conducting a set of simulation experiments using three different pod providers (\textit{Inrupt}, \textit{Solidcommunity}, Self-hosted Server). On each pod provider, we evaluated the performance of \tidal\ by querying and analyzing personal health data \hl{with varying scales of participants and configurations. The reasonable total time consumption and a linear correlation between the number of pods and variables on all pod providers show the feasibility and potential to implement and use the TIDAL platform in practice. TIDAL facilitates individuals to access their personal data in a fine-grained manner and to make their own decision on their data. Researchers are able to reach out to individuals and send them digital consent directly for using personal data for health-related research. TIDAL can play an important role to connect citizens, researchers, and data organizations to increase the trust placed by citizens in the processing of personal data.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Vassilis Kilitntzis submitted on 03/Oct/2022
Suggestion:
Accept
Review Comment:

minor comment

P5L40 The permissions that granted to TIDAL - > that were granted or granted without that

Thank you for the acknowledgement correct the name to Kilintzis instead of Kilitntzis.

Review #2
By Dimitrios Karapiperis submitted on 12/Oct/2022
Suggestion:
Accept
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

Review #3
By Pavlos Fafalios submitted on 20/Oct/2022
Suggestion:
Minor Revision
Review Comment:

At first, I would like to thank the authors for the detailed responses to all my review comments.
The authors have addressed almost all the raised issues and the paper has been significantly improved.

Please find below some final comments for further improvements:

Introduction: "To achieve this objective, this study takes one step and addresses the research question that what the technical feasibility, efficiency, and scalability of ..."
=> This sentence does not read well, please revise (especially the part "addresses the research question *that what* the technical feasibility").

Section 4, second paragraph: "General users are not required to have any knowledge of Solid specifications, linked data, or writing RDF triples."
=> Again, this does not seem to be true. As shown in Fig. 1 (b), users need to provide URIs (such as https://schema.org/name). How are they supposed to know what URI to provide? Wouldn't be preferable for the users to select a name (not URI) from a pre-defined list and then provide the value (and also be able to easily get additional information about the properties they can provide)? In general, as it is now, the platform does not seem to be operational by non expert users (citizens).

About the queries in each step of the pipeline: The provided "queries" are not queries; they are procedures (code). Since all data are in RDF format, I was expecting to see plain SPARQL queries (e.g., what is the SPARQL query for retrieving the signature and verification key, what is the SPARQL query for getting all data from participants' pods, etc.). Please revise the text (wording) accordingly.

Section 5. The section starts by stating "Given the main design idea of TIDAL that is to **facilitate researchers to access and analyze data from an amount of individual Solid pods,** we designed experiments with three following objectives: ... ..."
This contradicts with the experimental setup. The conducted experiments do no show if researchers are **facilitated to access and analyze data...** (e.g. through a user study). They just show if the platform is efficient and scales well. Please revise the text accordingly.

Evaluation: a user study with real users (researchers) is needed for evaluating the usability of the platform and the feasibility of the whole idea. IMO, this will reveal a lot of usability problems and will allow improving the user interface of the platform. However, I understand that this is impossible to be done in the context of this paper.