Context-aware & privacy-preserving homecare monitoring through adaptive query derivation for IoT data streams with DIVIDE

Tracking #: 3129-4343

Authors: 
Mathias De Brouwer
Bram Steenwinckel
Ziye Fang
Marija Stojchevska
Pieter Bonte
Filip De Turck
Sofie Van Hoecke
Femke Ongenae

Responsible editor: 
Guest Editors SW Meets Health Data Management 2022

Submission type: 
Full Paper
Abstract: 
Integrating Internet of Things (IoT) sensor data from heterogeneous sources with domain knowledge and context information in real-time is a challenging task in IoT healthcare data management applications that can be solved with semantics. Existing IoT platforms often have an issue with preserving the privacy of patient data. Moreover, the configuration and management of context-aware stream processing queries in semantic IoT platforms requires much manual, labor-intensive effort. Generic queries can deal with context changes but often lead to performance issues caused by the need for expressive real-time semantic reasoning. In addition, query window parameters are part of the manual configuration and cannot be made context-dependent. To tackle these problems, this paper presents DIVIDE, a component for a semantic IoT platform that automatically and adaptively derives and manages the queries of the platform’s stream processing components in a privacy-preserving, context-aware and scalable manner. By performing semantic reasoning to derive the queries when context changes are observed, their real-time evaluation does require any reasoning. The results of an evaluation on a homecare monitoring use case demonstrate how activity detection queries derived with DIVIDE can be evaluated in on average less than 3.7 seconds and can therefore successfully run on low-end IoT devices.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Fotis Aisopos submitted on 01/Jun/2022
Suggestion:
Major Revision
Review Comment:

This manuscript presents a semantic healthcare IoT data management platform called DIVIDE, adaptively managing different streaming data queries in a privacy-preserving and context-aware way.

A good introduction on an interesting problem is provided. The quality of writing and presentation of the work is quite satisfactory.

However, the article seems to focus on many different and technical/practical topics - privacy, context awareness and semantic reasoners, stream processing components, query efficiency, which makes it difficult to understand the specific scientific contributions of this work.
Is it the semantic adaptation of queries on the integrated IoT data of patients, based on contextual changes?
The creation of specific semantic rules solves clear technical problems in a heuristic way, but is not an obvious research contribution that progresses beyond the SotA.

Related Work: the authors mainly discuss Semantic Web and Stream Reasoning technologies, while they should have mostly focused on semantic healthcare IoT data management-related research papers.

Use Case: The use case example introduction and sensors description is too detailed.
The usefulness of some IoT data signals, e.g. for light intensity or a television sensor is not obvious, as well as the integration and interpretation of such measurements, please provide some examples.
The "Knowledge Base" refers to an RDF-based Knowledge Graph? Please clarify.

In general the article is too long and technically oriented, similar to a detailed project report or a book chapter, and not a research paper.
Thus the authors need to summarize or remove some parts (e.g. too many details on sparql queries are provided, which could be moved to Appendices) and focus more on specific research contributions, results and conclusions.

Evaluation: Evaluating the time performance of the DIVIDE system is quite interesting.
However, a validation of query translation and action interpretation "accuracy" should also exist, in order to verify the correctness of contextual data processing. For example, the patient could also try to mislead the sensoring system (creating some false positives of showering actions) to stress test the correct application of rules.
Concerning the evaluation scenarios, perhaps it would be interesting to include actions related to patient health in a more straight-forward way, like exercise-time or meals' duration/number per day.

Lastly, concerning availability, I am not sure if the code of the DIVIDE system is open source?
The authors have provided various configuration files and semantic rules within the document.
Ideally the Java code of DIVIDE would be also useful to share in GitHub, or at least the imec-UGent HomeLab simulation data for replication of their experiments.

Minor comments:
1. a few typos and syntax errors have to be corrected (e.g. Section 3.1, line 16 "healthcare actors such their General Practitioner...")
2. Description of remaining Sections in the end of Section 3.2 has already been provided in the Introduction.

Review #2
Anonymous submitted on 15/Jun/2022
Suggestion:
Minor Revision
Review Comment:

This piece of research presents DIVIDE, an environmentally conscious semantic platform, which employs its interconnected IoT devices with a view to spontaneously generate and supervise versatile stream processing queries. The authors, interestingly, placed great emphasis on a particular case of using home care surveillance, demonstrating the flow of queries and the security of sensitive content throughout the sequential IoT component layout, while their system evaluation illustrates robustness and stability of performance compared to other baseline approaches through a series of experimental examinations.

In addition, it is worth mentioning that the authors provide their data files, which consist of the extended ontology they used, the evaluation material and the EYE reasoner implementation files along with the relevant README files. All prementioned files are well-organized on GitHub repository.

Overall, it is obvious that the authors made some effort to accomplish this work, exploiting a combination of existing tools to provide a methodology greater than the sum of its parts, on an interesting issue, that could aid vulnerable groups of people living mostly remote. The introductory chapter gets the reader into the spirit of this article, the motivation behind the labor and the description of contribution are clear, the placement of the work is of interest, since it is related to Semantic Web Journal topics, while the experimental methodology is persuasive. I liked the problem, the perspective and the presentation of this paper, however there are some points that prevent me from suggesting a solid accept, and l will try to mention them below in order of importance:

* The Related Work chapter provides a series of stream processing and semantic reasoning approaches that warm up the reader for what comes next. However, the privacy-preserving segment, which is bulleted first in the list of research objectives of the Introductory section, is not reinforced by the appropriate literature in the field of privacy-preservation. I would suggest enriching this chapter with similar system security solutions that can support this work.

* In a similar spirit to the comment above, some encryption on the information sent via the network to the main reasoner on the central server, would strengthen the privacy-preservation part of this research study and would keep sensitive content more secure from outside threats that could exploit these data. For example, by taking advantage of these data, a malicious person could figure out if the patient is in or out of the apartment.

* Although the presentation and general writeup are good, there are places where the readability of the article could be further improved. Such cases are extensive sentences (covering many rows) that employ identical terms over and over again, while there are some acronyms which are never explained, forcing the reader to investigate the relevant literature to understand them. For instance, CEP (standing for Complex Event Processing), or VKG (meaning, perhaps, Virtual Knowledge Graph), or even IRI (Internationalized Resource Identifier) are not defined. A reader without an Ontology-oriented background would not understand for what IRI stands for. I would recommend rewording or shortening long sentences and describing the unexplained acronyms used.

* In my point of view, it is very important to avoid personal pronouns (e.g., he or she) when composing a formal document. One such case that appears on this paper is: "In the running example of the use case scenario described in Section 3, there is one RSP query that actively monitors the patient's location in the home, and one query that detects when the patient is showering if he or she is located in the bathroom." (page 14, lines 15-17). I would advise rephrasing such sentences. For the given example a better expression might be: "In the running example of the use case scenario described in Section 3, there is one RSP query that actively monitors the location of the patient in the home, and one query that detects a showering condition when the patient is located in the bathroom.".

* Regarding the employed dataset, despite the fact mentioning the data collection process (using HomeLab's and wearable sensors that generated around 670K observation data), the structure of a sample of the dataset acquired, would be convenient for understanding the type of the information obtained by the observed IoT devices.

* In the presented use case scenario, the "Intermediate queries" and the "Context enrichment mode" are not supported. It would be better to present an alternative application example that covers all types of queries that the DIVIDE's query parser can uphold.

* The References chapter is divided into two parts. It is separated by two boxplot distribution schemes of Appendix C. I would propose moving the entire citation list to the end of the paper (before the appendix sectors).

* Minor typos and comments:
- [page 4, lines 38-39] only recent attempt has laid the first fundamentals on realizing the full vision of cascading reasoning with Streaming MASSIF [41]. -> only *a* recent attempt has laid the first fundamentals on realizing the full vision of cascading reasoning with Streaming MASSIF [41].
- [page 5, line 31] Z-Plus has helped us with designing the rules. -> Z-Plus helped us design the rules.
- [page 12, lines 49-50] and (ii) *and* the core of DIVIDE which is the query derivation. -> and (ii) the core of DIVIDE which is the query derivation.
- [page 17, lines 22-23] that it is instantiated by the semantic reasoner *reasoner* if the rule is applied in the proof during the query derivation. -> that it is instantiated by the semantic reasoner if the rule is applied in the proof during the query derivation.
- [page 18, lines 7-9] The direct consequences of a sensor observation matching the WHERE clause in lines 59–*59* would be the fact that an ongoing activity of the given type is detected for the given patient. -> The direct consequences of a sensor observation matching the WHERE clause in lines 59–69 would be the fact that an ongoing activity of the given type is detected for the given patient.
- [page 24, lines 5-6] This process can execute independently for each RSP engine and can therefore be parallelized by DIVIDE. -> This process can *be executed* independently for each RSP engine and can therefore be parallelized by DIVIDE.
- [page 24, lines 15-16] and will also be used as the running example in this section to illustrate the query derivation process *in this section*. -> and will also be used as the running example in this section to illustrate the query derivation process.
- [page 24, line 35] For every step, the inputs and *and* outputs are detailed on the figure. -> For every step, the inputs and outputs are detailed on the figure.
- [page 27, lines 49-50] depending on whether the substituted value is *a* IRI or a literal -> depending on whether the substituted value is an IRI or a literal
- [page 28, lines 5-6] This substitution is performed based on the generic RSP-QL query body that *is referred to in* the output of the query extraction in Listing 12. ->
This substitution is performed based on the generic RSP-QL query body that refers to the output of the query extraction in Listing 12.
- [page 31, line 10] The tasks of this final step are the following: construction the actual RSP-QL query string, -> The tasks of this final step are the following: construction *of* the actual RSP-QL query string,
- [page 33, line 2] This is the preferred option when deploying new systems -> This is the preferred option when deploying new systems*.* [punctuation - full stop at the end of the sentence]
- [page 34, lines 21-22] In the other case, the queries are translated to N3 rules which are then applied on the set of triples and, if reasoning is enabled, ontology rules. -> In the other case, the queries are translated to N3 rules which are then applied *to* the set of triples and, if reasoning is enabled, *to* ontology rules.
- [page 34, lines 36-37] General information about the collected data, the ontology and context, and activity rules used for these evaluations is presented in Section 8.1. -> General information about the collected data, the ontology and context, and activity rules used for these evaluations *are* presented in Section 8.1.
- [page 35, line 21] the state of windows *and* doors and blinds, and others. -> the state of windows, doors and blinds, and others.
- [page 36, lines 29-30] This section evaluates the real-time performance of evaluating these DIVIDE queries on the C-SPARQL RSP engine [15]. -> This section compares the real-time performance of evaluating these DIVIDE queries on the C-SPARQL RSP engine [15].
- [page 37, line 2] RFDox -> RDFox
- [page 38, lines 27-28] Figure 8 shows similar results of the comparison of the real-time evaluation *with* DIVIDE with the real-time reasoning approaches, but for the *toileting* query. -> Figure 8 shows similar results of the comparison of the real-time evaluation of DIVIDE with the real-time reasoning approaches, for the brushing teeth query.
- [page 38, lines 28-29] The properties of the graph are similar to those of the graph
presenting the results for the *brushing teeth* query. -> The properties of the graph are similar to those of the graph presenting the results for the toileting query.
- [page 44, lines 11-12, Figure 9 caption] The results show the total execution time distribution over the engine’s runtime and multiple runs, for *both the toileting and brushing teeth* DIVIDE queries. -> The results show the total execution time distribution over the engine’s runtime and multiple runs, for both the toileting and showering queries, as well as for the brushing teeth DIVIDE query.
- [page 47, lines 15-16] where the activity can be detected by a single independent sensor in the room that *crosse* a defined value threshold. -> where the activity can be detected by a single independent sensor in the room that crosses a defined value threshold.

Review #3
Anonymous submitted on 07/Jul/2022
Suggestion:
Major Revision
Review Comment:

The paper proposes DIVIDE, a component for a semantic IoT platform that generates queries of the platform’s stream processing components in a privacy-preserving, context-aware and scalable manner. The authors validate the proposed approach using real and simulated data from a smart home scenario. In Section 1, the authors present a proper contextualization of the proposed approach considering other state-of-the-art works.

In general, the article is well written and organized. The authors investigate a relevant problem, the proposed approach is sufficiently detailed throughout the sections and the experimental evaluation is convincing.

Regarding the long-term stable URL for resources, the provided source code is sufficiently organized and appears to be complete for replication. Also, the chosen repository is appropriate for discoverability.
My main concerns regarding this paper are related to the problem definition as well as to the privacy-preserving capabilities of the proposed approach.

Main Comments:

1) Although the term "privacy-preserving" is used in the title, the privacy aspect of the solution is not properly addressed in the paper. In fact, the privacy concern is addressed in a single paragraph in Section 3.1, in which the authors argue that, for preserving privacy, “the activity recognition service should largely run in-home, so that only the actual outputs such as detected activities are being sent. Obviously, data that not contained in the HomeLab should always be sent over a secure encrypted connection.” In my opinion, this explanation is definitely not enough to support the privacy-preserving guarantees of the approach. For instance, which state-of-the-art privacy attacks could be used in the solution environment? What strategies are employed by other state-of-the-art real-time IOT approaches to handle their privacy aspects? These questions should be properly answered in a specific privacy sketch section in the paper. Also, no experimental evaluation regarding the privacy capabilities of the solution is presented (in Section 10, this claim “the processing of all privacy-sensitive data can be done locally, which means that it should not “leave the home environment” is basically repeating the explanation presented in section 3.1 regarding the privacy strategy). It is also worth noting that the privacy-preserving IOT approaches have not been discussed in the Related Works section (Section 2). There are a number of works that should be properly discussed in this section. Clearly, the authors should definitely put more effort in order to convince the reader regarding the privacy capabilities of the presented solution. Finally, I strongly recommend that the authors present more clear distinctions between the adopted strategies aiming to improve the privacy and security of the approach (since they are distinct aspects).

2) The second weakness of this paper is related to the lack of formalization in the problem definition. In fact, the problem is basically described as a use case. The authors should include a section in order to properly (formally) define the investigated problem. The presented use case description may help the reader to understand the paper context, but it is not ideal to define the investigated problem. For instance, the authors could provide formal definitions regarding the following concepts: rules, conditions, property, observable property, sensor, entity, entity type, environment, condition threshold, condition combination, location dependency.

3) Listing 7 is an overwhelmingly complex specification in order to cover a fairly simple activity rule. I wonder if the specification of a complete monitoring system would require a massive amount of work to be implemented, taking into account more complex rules to be considered. Have the authors considered an approach to automate the specification of activity rules? For example, the specification code could be automatically derived from a simple specification following a specific BNF-like grammar.

4) On many occasions, the running example presents a relatively simple scenario (checking if the sensor data exceeds a predefined threshold). Would it be possible to provide a more interesting scenario to illustrate the queries?