Ontology of active and passive environmental exposure

Tracking #: 3356-4570

Authors: 
Csilla Vamos
Simon Scheider
Tabea Sonnenschein
Roel Vermeulen

Responsible editor: 
Cogan Shimizu

Submission type: 
Full Paper
Abstract: 
Exposure is a central concept of the health and behavioural sciences needed to study the influence of the environment on the health and behavior of people within a spatial context. While an increasing number of studies measure different forms of exposure, including the influence of air quality, noise, and crime, the influence of land cover on physical activity, or of the urban environment on food intake, we lack a \textit{common conceptual model} of environmental exposure that captures its main structure across all this variety. Against the background of such a model, it becomes possible not only to systematically compare different methodological approaches, but also to better link and align the content of the vast amount of scientific publications on this topic in a systematic way. For example, an important methodical distinction is between studies that model exposure as an exclusive outcome of some activity versus ones where the environment acts as a direct independent cause (\textit{active vs. passive exposure}). Here, we propose an ontology design pattern that can be used to define exposure and to model its variants. It is built around causal relations between concepts including persons, activities, concentrations, exposures, environments and health risks. We formally define environmental stressors and variants of exposure using Description Logic (DL), which allows automatic inference from the RDF-encoded content of a paper. Furthermore, concepts can be linked with data models and modelling methods used in a study. To test the pattern, we translated competency questions into SPARQL queries and ran them over RDF-encoded content. Results show how study characteristics can be classified and summarized in a manner which reflects important methodical differences.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 01/Apr/2023
Suggestion:
Major Revision
Review Comment:

This paper presents an ontology that models domain knowledge relevant to exposures, environmental factors, person, dose, risk and how these concepts are related to each other. Despite being well-motivated and providing a contribution to the field, the paper has weaknesses in terms of how the development of the ontology follows ontology development methodologies, some modeling decisions, some unclear description of the ontology, and the limited evaluation of the ontology.

Therefore, my advice for this paper is Major Revision. My detailed comments are shown below.

In the abstract, the authors present “[…] systematically compare different methodological approaches, but also to better link and align […] scientific publications […]”. Reading through the paper, it is clear that the proposed ontology can link and align content of scientific publications. However, it is not clear what are the benefits of comparing different methodological approaches. The six articles the authors chose are from different domains such as food and air quality. Would it be more reasonable to compare two methodological approaches from the same domain to see if they model exposures in the same way or not?

Section 2:
The competency questions 1 and 5 ask some provenance information such as articles and datasets. However, your ontology (ExposureBasis.ttl) does not contain semantics that can answer these two questions. It is only somehow captured in your encoded RDF data of your six selected articles. It should be explicitly captured in your ODP such as how provenance information is represented for dataset and environmental factor.

In your “Helbich_2016.ttl” file, there are such triples as “_:accident a dcat:Dataset, expB:EnvironmentalFactor;rdfs:comment "accidents".
_:accidentdensity prov:wasDerivedFrom _:accident.
_:accidentdensity a expB:EnvironmentalFactor, dcat:Dataset ;rdfs:comment "accident density".”. It is not clear why an instance can be a dataset and environmental factor at the same time. Since your competency question 5 is related to your such modeling, when competency question 5 is considered in the process of developing ontologies, the ontology developer should therefore create concepts such as dataset and environmental factor as well as the relationship between them.

Section 3:
In the text, the authors mentioned “six articles on exposure to food, air quality, crime, active transport, and physical activity”. However, in Table 1, you have more specific exposure types such as neighborhood social norms and urban green space. These specific topics should be explained in detail and categorized according to the 5 mentioned topics.

Although the authors present the development follows the idea of pattern development [6, 11], the details of how the development follows the idea are not presented. For instance, to what extent you follow [6, 11]; how different steps in Figure 1 are aligned with development guidelines in [6, 11]; do you have to make any adaptions when you follow [6, 11].

The description of Figure 1 can be polished and improved. The description right now still misses some overview introduction of different steps in Figure 1, and some sentence is not clear. For instance, you reused existing ontologies for developing your ontologies, but such reuse step is not shown in Figure 1; The sentence “We then filled the slots of the pattern with examples manually extracted from exposure articles.” is not clear in terms of which step (or a missing step) it refers in Figure 1.

Section 4:
The authors presented “Our ontology pattern can be used across many domains […]”. These domains should explicitly be mentioned in the text.

Although section 4.1.3 provides a description of active and passive exposure in a nutshell, the description can be explained in a better way. Since the active and passive exposure seem to be distinguished by two different causal configurations. As a reader, I would like to get the information directly and precisely at the beginning, what are these two causal configurations; what are the chains of these configurations. After the overview introduction, introducing examples like food intake and noise exposure would be helpful. Currently, your modeling approach and the example introduction are mixed up and make it difficult to understand. Also, at some places you refer to the example or configuration using "latter case" or "second causal configuration" make it difficult to follow.

Figure 2 can be explained in more detail and can be referred when active and passive exposure are described. What do the notions (i.e., active and passive) actually mean in Figure 2? Does it mean an exposure with environmental factor involved always is a passive exposure? The notions of these arrows should be explained precisely, otherwise it will become controversial and confusing when a reader reads axiom 3 and definition 4.

Is Environment in Figure 2 same as EnvironemtalFactor? The terminology should be unified (same for the ontology file).

In addition in the ontology file for the "EnvironmentalFactor" concept, there is a rdfs:comment “Environment playing a role in some exposure. Can be conceptualized in different ways (see core concepts)”. What are these different ways to conceptualize EnvironmentalFactor?

For Axiom 1, why (Person AND Dose), (Person AND Risk), (Dose AND Risk) are not included in the left side of the axiom?

Why there are not axioms regarding the connection between Activity and EnvironmentalFactor? Can an activity be caused by an EnvironmentalFactor?

Minor issues:
Page 1: (cf. [1]. -> missing right parenthesis
Page 2: “ontology design challenge [9, 10]” -> it does not make sense why [9] and [10] are cited here.
Page 7: “) Furthermore, it can also be […]” -> missing period.
Page 8: NO2 -> $\mathrm{N}\mathrm{O}_\mathrm{2}$
For the ontology : It would be better to declare the domains and ranges for some object properties (such as causedBy). Therefore people who reuse the ontology can have a better understanding.

Review #2
By Hande Kucuk McGinty submitted on 18/Apr/2023
Suggestion:
Major Revision
Review Comment:

In section 2.2 and in the methodology section, the authors do not mention more modern approaches such as KNARM and MoMO as methodologies for ontology development even though steps from these approaches are followed. Similarly, they do not mention ongoing ontology efforts such as ExO and ENVO as well as others. I think these pieces of work should all be cited.
Even though the authors do not focus on vocabulary generation, I think their sub-language analysis (competency questions, etc.) can benefit greatly from reviewing those efforts.

On page 9 line 10 : I am not sure “exposure to food intake” can be classified as exposure, but rather an intentional act or process that the person decides to intake food rather than “exposing” one’s self to food. I think there is a difference and the model is referring to the latter; un intentional exposure, not the intentional acts and processes.

I think active and passive exposures are possible, but I don’t think buying food and eating food (no matter how many restaurants are around) can be modeled as exposure. From that perspective, I think this example should be revised and replaced with another example that really outlines what active and passive exposure can be.

The papers chosen for this paper are quite broad, but I am not sure how each of them contributed to the modeling.

In the model, there is no differentiation between a person taking place in an activity/process and causing an activity/process. I think the distinction is not clear for this pattern in the examples.

The future work section mentions using this pattern for predictions and other machine learning/statistical approaches, but it is unclear how this could be achieved since the authors didn't evaluate the model using various vocabularies that are available and might be used for this purpose.

Review #3
Anonymous submitted on 18/Apr/2023
Suggestion:
Major Revision
Review Comment:

1) Originality - This paper aims to develop an ontology design pattern that defines exposure specifically in the context of environmental exposure, the health impacts on people and the surrounding environmental influence of the exposure. Overall, this work is very interesting and much needed for several domains. The related section proves that there is a significant lack of reusable ontologies for exposure, or any ontologies that use standardized definitions. The approach adopted by the authors to use terms/information extracted from scientific articles using NLP to drive the development of their ontology design pattern is great, although it might also be beneficial to see how standardized definitions for some of the concepts can be adopted.

2) Significance of the results - The development of such an ontology as this that represents the various aspects of exposure, including the agent or substance, the target, the environment of exposure, the duration and frequency of exposure, and the level or dose of exposure is significant in several contexts such as for health and safety, disaster management. This ontology can be useful to perform more sophisticated analyses and reasoning about the relationship between exposure and health outcomes. Furthermore, an ontology such as this can facilitate the development of computational tools and applications that rely on exposure data, such as exposure assessment models or decision support systems for risk management, ultimately contributing to a better understanding of the potential health impacts of environmental exposures.

3) Quality of writing - This could be due to several authors, but there are several parts of the paper that did not seem coherent. Overall this paper needs significant rewriting for several reasons, some of which are documented below under general comments. There are several modeling issues - inconsistent figures and schema diagrams, incorrect axioms and very hard to follow ontology design description. The documentation of the ontology also needs significant work for it to be considered FAIR.

4) General comments:
a) The authors mention spatial aspects related to environmental exposure that seems quite important, but later on in the paper, the focus on this seems absent. For instance, in page 1, line 9 the authors mention spatial effects? What are spatial effects? Again in line 45 the authors mention "distance". How is that relevant or important there? Clearly the authors are talking about some spatial aspect, but it is sort of jumbled in there. Again I do not see any connection between environmental exposure and whatever spatial aspects/effects in the ontology modeling section.
b) Page 1, Line 47 - When you say information ontologies, do you mean simply ontologies?
c) Page 2, Line 8 - What methodology? Do you mean ontologies?
d) Page 4 - What exactly is the limitation of ORBM+?
e) Page 6 - Fig 1. What is the difference between "RDF encoding of articles" (ellipse) and "Encoded article content" (rectangle)? This figure is unnecessarily complicated! Your methodology figure should be crisp and clear at the place where you talk about it (which is section 3), without the reader having to flip so many pages later to understand the difference between "Run inferences" and "Run".
f) Table 1 says these six articles were used in the "development" of the ontology, But in the methodology figure I do not see any connection between "articles" and "design classes" or "Ontology Pattern"? It almost seems to me from the figure that the articles were only used to populate the ABox, which apparently is not the case. So there seems to be a big disconnect in the methodology figure.
g) While the Ontology Design section is comprehensive, reading it is extremely tedious. Some simple re-writing can effective improve readability.
i) In the informal description use a different font to highlight concepts in the ontology just like you have used later in the axioms. This way the reader is not scrambling to understand if the actual concept in the ontology is Exposure or Environmental Exposure.
ii) Use schema diagrams (for each module if possible). There must be connection between the informal description with concepts in the schema diagram. For example, where is the causal relation you mention (in page 7, line 42) in fig 2?
h) The axioms in Page 10 use the term EnvironmentalFactor, but fig 2 mentions Environment.
i) I have a problem with the semantics of the causal relation in this paper. Specifically the statement in page 8, line 28 "Activities are caused by persons". This may be in disagreement with DOLCE-Lite which you use, where they use a specific participation relation between activities and agents. This also makes me wonder why you do not align Activity with dul:Activity (see page 10, line 30). If there are specific reasons they must be explicitly mentioned.
j) Page 12 - what is the difference between "Active" and "ActiveExposure"? Based on your description they seem the same, so why different axioms and different class names? Do you mean to say Active is an Active Activity?? IF that is the case the first axiom under definition 4 is incorrect. It should be existential, since the range is already constrained.
k) Page 12 - line 8 says Environmental Stressors but the axiom on the next line says Stressor (consistency??)

5) Formatting and grammatical issues:
a) The way references are used must be consistent. For e.g., in page 1, line 40, cf. is used, but then later on it is not. Page 3, line 26 & line 32 appends author names to the reference suddenly.
b). Page 2, line 11 & line 34 - colon should be a period?
c). Page 2, line 32 - missing space between text and reference
d). Be consistent with using a comma following e.g. throughout the paper. Some have it and some do not.
e). If you are mentioning an ontology, please use a direct reference for that ontology rather using a reference of a review paper that mentions it, for e.g., disease, vaccine and symptom ontologies in page 3, line 33.
f). Be consistent with using double quotes: for instance why does exposure ratio (line 34) have quotes, but exposure (line 31) does not.
g). Be consistent with emphasizing words or phrases. I see so many emphasised phrases for no reason (e.g. competency questions in page 4, line 35).
h). Page 10, line 39 - isPromoteddBy --> isPromotedBy

6) Resource/Data file - The github repository has an empty README. There are several ttl files, but I made a wild guess and assumed one of the files is the actual ontology. The ontology has some unresolved imports. The ontology has several concepts that are not mentioned in the paper (e.g., Bearer). There is no information about the graph that the queries were run against (i.e., the results shown in Table 2 and 3). The github repository could also benefit with some schema diagrams that describe the core classes and relations in the ontology.