Ontology of active and passive environmental exposure

Tracking #: 3478-4692

Authors: 
Csilla Vamos
Simon Scheider
Tabea Sonnenschein
Roel Vermeulen

Responsible editor: 
Cogan Shimizu

Submission type: 
Full Paper
Abstract: 
Exposure is a central concept of the health and behavioural sciences needed to study the influence of the environment on the health and behavior of people within a spatial context. While an increasing number of studies measure different forms of exposure, including the influence of air quality, noise, and crime, the influence of land cover on physical activity, or of the urban environment on food intake, we lack a \textit{common conceptual model} of environmental exposure that captures its main structure across all this variety. Against the background of such a model, it becomes possible not only to systematically compare different methodological approaches, but also to better link and align the content of the vast amount of scientific publications on this topic in a systematic way. For example, an important methodical distinction is between studies that model exposure as an exclusive outcome of some activity versus ones where the environment acts as a direct independent cause (\textit{active vs. passive exposure}). Here, we propose an information ontology design pattern that can be used to define exposure and to model its variants. It is built around causal relations between concepts including persons, activities, concentrations, exposures, environments and health risks. We formally define environmental stressors and variants of exposure using Description Logic (DL), which allows automatic inference from the RDF-encoded content of a paper. Furthermore, concepts can be linked with data models and modelling methods used in a study. To test the pattern, we translated competency questions into SPARQL queries and ran them over RDF-encoded content. Results show how study characteristics can be classified and summarized in a manner which reflects important methodical differences.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 15/Jul/2023
Suggestion:
Major Revision
Review Comment:

Thanks for the authors’ response.
Most of my comments have been addressed, however one remains unclear. It is related to the modeling of EnvironmentalFactor (and its comment). Although the authors present it is a future work or long-term goal to model various environments, it seems the authors make a decision on EnvironmentalFactor over Environment (but still refer to both of them in the rdfs:comment) without clearly presenting the reasons (Section 4.1.2). What would be the separate work for ontological models of environmental factors? Is the decision on EnvironmentalFactor restricted by the selected articles?

In the new updated section 3.2, the authors present the intensions of the selection of articles and introduced that it is aligned with the Exposome NL project. I think the authors can elaborate on introducing this project background to better motivate the selection of articles. For example, what are the scenarios modeled in the project; what are those standard literature; why do you need to make sure papers covering the same risk factors have different underlying exposure concepts (what do you mean exposure concepts here, is it active/passive exposure or general exposure-related concepts).

The authors explained each exposure in Section 3.2. are these different exposures defined in some existing/related work?

Information ontology -> at several places, the authors present information ontology. I think it is fair enough to say ontologies, or to be more specific, domain ontologies or application-oriented ontologies.

What are the guidelines for writing the comments for each exposure concepts?

Table 2 and Table 3 can be updated for better readability. For instance, prefix (exp) is not necessary here; O_3 exposure; Some content for multiple rows can be center-aligned vertically; For the hear say violent crime exposure and victim of violent crime exposure, why are the result of query 3 are non-violent crime?; In table 3, it should be explained why some datasets are None or include None (e.g., roads, None).

Some other small comments (the current manuscript needs a proofreading to make the writing consistent):
Biomedicine/bio-medicine
Knowledge based tools- Knowledge-based tools
modeled/modelled/modelling

Review #2
Anonymous submitted on 17/Jul/2023
Suggestion:
Accept
Review Comment:

Thank you for the changes and edits you added to the paper.
As far as I can see, not handling the "participant case" is pointed out in the reviews a couple of times. I think for both active and passive exposure, participation changes the causality and effects of exposures. Therefore it changes your ability to correctly asses risks, which seems to be a use case in your paper. So I disagree with your reasoning for excluding active participation-related aspects.
However, this is a valid model and the use cases are better explained with a number of different minor to major changes to the text.

Review #3
Anonymous submitted on 15/Aug/2023
Suggestion:
Minor Revision
Review Comment:

Since I did a first round of reviewing on this paper I stand by my original comments on Originality and Significance.
1) Originality - This paper aims to develop an ontology design pattern that defines exposure specifically in the context of environmental exposure, the health impacts on people, and the surrounding environmental influence of the exposure. Overall, this work is very interesting and much needed for several domains. The related section proves that there is a significant lack of reusable ontologies for exposure or any ontologies that use standardized definitions. The approach adopted by the authors to use terms/information extracted from scientific articles using NLP to drive the development of their ontology design pattern is great, although it might also be beneficial to see how standardized definitions for some of the concepts can be adopted.
2) Significance of the results - The development of such an ontology as this that represents the various aspects of exposure, including the agent or substance, the target, the environment of exposure, the duration and frequency of exposure, and the level or dose of exposure is significant in several contexts such as for health and safety, disaster management. This ontology can be useful to perform more sophisticated analyses and reasoning about the relationship between exposure and health outcomes. Furthermore, an ontology such as this can facilitate the development of computational tools and applications that rely on exposure data, such as exposure assessment models or decision support systems for risk management, ultimately contributing to a better understanding of the potential health impacts of environmental exposures.
3) Quality of writing – The authors have taken into consideration the feedback that was provided previously and addressed most issues. The readability and clarity of the paper are significantly better now. Having said that there are still some minor language issues and some modeling concerns that I have presented below:

General issues (formatting/language related):
1. Language inconsistencies: e.g., usage of both "modeled" and "modelled"
2. Throughout the paper the use of open quotation marks must be fixed (e.g., pg 2 line 28). If you are using latex editing software what you might want to use would be the backtick/left quote.
3. Be consistent with references. In some places, the authors’ names are used (e.g., pg 3 line 32, "... ontology by Zeshan and Mohamad [23]..."), whereas in others simply the reference number is used (e.g., pg 3 line 33, "... ontology by [24]..."). Adopting the latter format consistently might be better.
4. Pg 4 line 18: address --> addresses
5. Pg 4 line 43: sem-iautomated --> semi-automated
6. Pg 5 line 7: missing comma before "which"
7. Pg 8, line 32: including a reference for EPO:exposure class might be helpful
8. Pg 8, line 35: DCAT and PROV are introduced in the paper for the first time here, so mentioning their references would be good
9. Pg 11: Since Fig 2 seems to have a lot of white space around it, it might be helpful to draw out real-world examples (that are described in the text) against each exposure model
10. Pg 12 line 36: Should the concept "Environments" be "EnvironmentalFactor"? Both Fig. 3 and Fig. 4 only mention the latter as a class
11. Pg 13 line 20-21: missing parenthesis in the two axioms
12. Pg 14 line 35: what is the axiom 3 that is mentioned here? Perhaps numbering all the axioms might be helpful?
13. Pg 15 line 30: missing space after "These"
14. Pg 26-27 - It might be helpful to include references for the papers in these tables because its the reference numbers that are used in the Results section and not the paper names as mentioned in the tables here, so comparison is hard for the reader.
There are many more grammar/punctuation issues. Maybe using grammar software such as Grammarly (which is free) might be helpful in identifying and addressing these issues.
Modeling questions:
1. The notion of "temporal information" seems important for exposure, dose, and health risk. The authors talk about it quite a bit in the text on pg 11-13 (where they describe these concepts in detail). But the ontology does not include any temporal modeling, nor is it known how this information exists in any of the datasets (articles) used.
2. In page 3- the authors make the following claim "For every exposure there is a unique person who is exposed. Further the axioms in lines 20-21 state that an exposure is always caused by only one activity and the activity is always caused by only one person. This seems too restrictive IMO. Exposure can be due to a combination of several activities (e.g., dumping of toxic waste at a landfill site and a person's decision to live in proximity). In this example, there is no possible causal link between the dumping activity and a person's decision, and the exposure is rather a consequence of the two activities. Unless the implication is that for a given exposure, there is a primary, dominant, or most significant activity causing it. But is there even a possibility to determine this? Next, activities can be carried out by more than one person or even organizations (e.g., dumping of toxic waste at a landfill site by a waste-removal company). Likewise, an entire community (not simply an individual) can be exposed due to such an activity and not simply 1 person. This brings me to another question I have about Query 2 on Page 17 -- "Is the person who causes the activity always the one exposed". I cannot agree with this-- and there are so many real-world scenarios that may contradict this.
Overall I find this axiomatization too restrictive and maybe not a representation of many real-world scenarios.
3. I do not think the axiom for "Active" is doing what they say it's doing. It seems confusing.
4. Pg 19 line 4 - What do the authors mean by brute implementation? Manually or using a reasoner or inferencing within a graph database?