Modelling Digital Health Data: the ExaMode Ontology for Histopathology

Tracking #: 2964-4178

Authors: 
Dennis Dosso
Manfredo Atzori
Svetla Boytcheva
Francesco Ciompi
Giorgio Maria Di Nunzio
Fabio Giachelle
Stefano Marchesin
Filippo Fraggetta
Niccolò Marini
Henning Muller
Todor Primov
Gianmaria Silvello

Responsible editor: 
Rafael Goncalves

Submission type: 
Ontology Description
Abstract: 
Histopathology is the gold standard for cancer diagnostics and its digital health counterpart – Computational Pathology – is gaining traction in the clinical practice for unlocking innovative approaches for patient care. In this context, data processing and learning are key aspects for the advancement of the field and ontologies are needed to model the domain of interest, standardize terminology and make methods and services interoperable and reusable. This paper presents the ExaMode ontology defining the classes and relationships concerning diagnosing four largely diffused and studied histopathology diseases: colon cancer, lung cancer, uterine cervix cancer, and celiac disease. The ontology holistically models the classes and relations concerning these diseases and the medical context in which they are diagnosed. The concepts are divided into semantic areas about different aspects of the diseases and the diagnosis process: the patient and the clinical trial to which s/he might participate; the outcome of the diagnosis; the anatomical location, i.e., where the disease was found or from where tissues were taken; the procedures used to take the samples; the tests performed on the patients; and possible annotations containing further information about the disease. The ontological modeling was based on real-world anonymized clinical reports provided by two hospitals in Italy and The Netherlands. Overall, the ExaMode ontology has been employed to develop automatic methods to extract pathological concepts from medical reports, which are used to annotate medical images associated with the records themselves. These are then used to train prediction algorithms aimed to improve clinical decision systems. Moreover, the presented ontology is currently used to improve systems for clinical support and triaging.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 18/Jan/2022
Suggestion:
Reject
Review Comment:

Summary:

The work begins motivating digital mechanisms involved in processing digital pathology images, highlighting problems related availability of information regarding pathological observations as free text and annotation.

To deal with these limitations, convolutional neural networks (CNN) and natural language processing (NLP) are presented as common solutions to the problems pointed out. However, these approaches require extensive data available - what brings to surface other problems due to data variability - and extensive testing.

Ontologies are motivated as a feasible solution for digital pathology representation aiming at the tasks, considering that they are used together with other techniques such as NLP and Deep Learning. Thus, the use of ontologies for histopathology are motivated to support terminological standardization in the field.

A new ontology is proposed to represent four pathological processes of interest: colon cancer, uterine cervix cancer, lung cancer and celiac disease. Its scope incorporates patient data, clinical trial data she might participate, the outcome of the diagnosis, the anatomical location, tests performed on patients and further annotations about the disease.

Major problems:

• Why a related work and a background? Why not make them a single section? This makes the article quite long and the relevant parts are delayed for readers, having first to read almost 8 pages with introduction, motivation, related work, and background.

• Multiple motivations related to the ExaMode project available with similar contents.

• Reuse of entities are not described in detail in section 4. It is recommended to look on MIREOT (https://content.iospress.com/articles/applied-ontology/ao087) to properly refer to external ontologies.

• It is unclear how sections/modules/classes and object properties/relations were reused from other ontologies.

• On what were all classes and models representations based?

• “Disease or disorder” class, from a representational perspective, was previously described in an article from S. Schulz and collaborators (https://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-2-S2-S6) and may be seen problematic here.

• Why using DOID for representing “Disease or Disorder” when (for instance) we have SNOMED CT as a generally accepted representation for this?

• No reference to which ontology engineering methodology (NeOn, Methontology, among others) and principles (OBO Principles) was employed.

• Ontology Language? Expressivity?

• Several ontologies were put together to support the ExaMode Ontology. However, it is unclear to which top-level these “imported” classes refer. They could be mapped to Basic Formal Ontology v.2 (BFO2), once most referred ontologies and classes come from the biomedical domain and refer to OBO ontologies. Classes taken from multiple ontologies should be harmonized under a common ontological reference (for instance, BFO2, BTL2 for the biomedical domain, UFO, GFO, among others). For the presented ontology, unfortunately this seems not to be the case.

• There are no clear criteria on how ontologies were included to be reused.

• Explanations regarding ExaMode cases, Diagnosis and other sections, performed by means of listing classes names and explaining them are unproductive. The authors present their hierarchy to illustrate, but reading all explanation about how to read the image is not ideal. It is more relevant to represent the complex issues together with motivating explanations and present them together with images and clear axioms.

• The representation of Annotation may seem odd, considering authors explanation to it. Entities from regular ontologies are used and declared directly in database entries (for instance, UniProt uses GO for Annotations) or in clinical texts (such as reports) to replace free text entities with proper ontological ones. With this explanation in mind, why modelling Annotation? NLP tasks are going to seek “cervix cancer”, “polyp” among other specific histopathology classes that must be already represented in ExaMode Ontology.

• Foundational Model of Anatomy (FMA) is regularly used for representing anatomical entities.

• In the *.owl file, why are pathological processes modelled as instances?

• Who reviewed the ontology?

• How could the authors guarantee that the whole representation is adequate to the usage envisioned and ontologically correct (w.r.t. an ontological top-level)?

Minor problems:

• Paragraph organization in introduction can be organized properly. For instance, the second and third paragraphs should be together. Both contains single phrases that are complimentary;

• SNOMED CT is already a brand. It does not require to be identified as Standardized Nomenclature of Medicine – Clinical Terms;

• What do authors want to mean with “… holistically models…”? Unclear. Be precise.

• “… production was expected to be over 2k exabytes in 2020;…” We are in 2022.

• “The lack of labelled data, which are expensive and time-consuming to produce…” Could you clarify if you are saying that no-labelled data is time consuming and expensive to consume?

• To reuse ontology sections (also called modules) directly into the ontology, authors may want to take a look in an ontology modularity approach available here;(https://sites.google.com/site/ontologymodularity/). It may seem old, but it still works with certain protégé versions (java 8+ I guess);

• Marsh-Oberhuber and Corazza-Villanacci classification systems were not referenced in p8, l19-20;

• Visualization of clinical report with ExaNet is not completely clear. It may look better if only class names are used.

Recommendations for the authors:

• Organize the methods section considering proper ontology development methodologies and approaches frequently used for biomedical ontologies, as previously cited;

• Organize presentation of the ontology main representation aspects, pointing out the complex axioms and how it contributes to the histopathology landscape;

• This article seems mainly to focus on presenting an ontology; neither NLP tasks, nor visualization. I suggest to split this article in 2: (i) the ontology (which is the main subject of the current version), and (ii) putting the ontology into use for histopathology.

Reasons for rejection:

• The article requires new, deeper evaluation of the approach regarding properly using known ontology development methodologies;

• Reuse of ontological entities unclear;

• Lack of clear validation mechanisms;

• Lack of clear explanation on representational choices.

• It may be of interest for publication for the biomedical audience; however, it currently lacks basic methodological basis.

Review #2
By James Overton submitted on 14/Feb/2022
Suggestion:
Minor Revision
Review Comment:

In this paper the authors describe the ExaMode ontology and some of its applications. ExaMode includes a few hundred terms for disease, diagnosis, patients, tests, test results, procedures, etc. focused on histopathology, and specifically on four diseases: colon cancer, lung cancer, uterine cervix cancer, and celiac disease. The system has been used to automatically extract annotations from case reports in Italian and Dutch, and the results can be visualized as a graph.

This work is valuable and difficult. The authors are to be commended for building a working system from many disparate parts, and for making good use of existing ontologies to improve interoperability of their systems and data. They provide a good ontology description in this paper, detailed ontology documentation at and open code . I encourage the authors to continue this work.

# Specific review requirements:

(1) Quality and relevance of the described ontology: I believe ExaMode is relevant, and suitable for purpose, although I have concerns about quality discussed below.

(2) Clarity of the paper: Good overall, with specific points below.

The attached Zenodo archive includes `examode.owl` and conversion to several other formats. (A) iI includes a description but not a README per se. (B) The files are complete ontology files. (C) Zenodo is appropriate for archiving. (D) No other data artifacts are provided.

# General concerns

I do, however, have a number of concerns about how the ontology is built, which I hope will lead to clarifications and improvements of the ontology itself. These criticisms may also help with revisions of the paper.

This project seems to be squarely aimed at annotation, and I am persuaded that it is suitable for that purpose. When I look at the details of the OWL file, I see several problems. While it is good to reuse existing terms, ExaMode includes quite a mixture of terms from different source ontologies with different modelling strategies. For example, UBERON logically defines 'endometrium' as equivalent to "mucosa and part of some uterus", but ExaMode asserts that 'endometrium' is `owl:partOf` NCIt's 'mucosa' term. The OWL specification does not include `owl:partOf`, but more importantly this should be a subclass relation, and I do not understand why NCIt's 'mucosa' term was used instead of UBERON's 'mucosa' term. Term reuse aids with interoperability, but inconsistent term reuse undermines that goal.

It was not clear to me from the paper, but looking at the OWL file in Protege I was quite surprised that UBERON and MONDO classes had been "demoted" to owl:Individuals. This was not always done consistently, so MONDO:0002271 'Colon Adenocarcinoma' is an owl:Class, but MONDO:0002032 'colon carcinoma' is an owl:Individual. It suggests that there were two 'Resection' procedures, but I suspect there was only one. For the outcome on the left we have 'Mild Colon Dysplasia' as the `rdf:type`, but on the right the same term is used as the object of a `exa:hasDysplasia` predicate, and 'Moderate Colon Dysplasia' is also specified.

More generally and more subjectively, ExaMode's four "semantic areas" seem to each have quite different modelling approaches, which become apparent when looking at the OWL file as a whole. For example, an 'Onset' is not a subclass of 'Annotation', although a record may be annotated with information about an onset.

Although the authors do make good reuse of many existing ontology terms, I believe that there are good candidate terms in OBO and elsewhere for many of the terms that they do create in ExaMode. I would encourage the authors to expand the scope of their collaboration, and grow ExaMode toward closer integration with the larger open ontology community.

# Specific points about the manuscript

- Abstract and elsewhere: "four largely diffused and studied histopathology diseases"; "diffused" seems a strange word choice to me (as a native English speaker)
- Page 2 line 3: "complexity increment" is another strange word choice; "increase in complexity"?
- Page 2 line 17: "is subjected to" -> "is subject to"
- Page 4 line 27: I do not understand how the the NCIt is "More than an ontology", when the rest of the sentence spells out what a good ontology should be
- Page 5 line 20: EBI's Ontology Lookup Service is not limited to OBO (flatfile) format, it supports OWL format
- Page 10 figure 1:
- `doid:Disease` should be `doid:4`
- it seems strange to use one term 'patient' from IDOMAL when there are candidates in ontologies you are already using, such as OAE
- there are existing ontologies covering gender (from multiple perspectives); the `examode.owl` file actually uses NCIt 'gender'
- Page 11 line 44: `exa:NegativeOutcome` is elsewhere referred to as `exa:NegativeResult`
- Page 12 line 1: `oae: 0001850` should not have a space
- Page 12 figure 3: HP defines 'Onset' as "The age group in which disease manifestations appear." I don't see how this can be a *subclass* of `exa:Annotation`, or a sibling to `exa:SemanticArea`.
- Page 13 line 25: HPV is referred to earlier, but this is the first time that abbreviation is spelled out.
- Page 14 figure 4: The OWL specification does not define `owl:partOf`. UBERON logically defines 'endometrium' as equivalent to "mucosa and part of some uterus", so I believe that 'endometrium' cannot be part of 'mucosa'.
- Page 15 line 4: `uberon:0001052` has label 'rectum', but ExaMode seems to be relabelling it "rectum, Not Otherwise Specified (NOS)". If this is the case, I would prefer that the authors emphasize the changes they are making. This applies to several other "NOS" terms and any other label changes.
- Page 18 table 2: There is no comparison to a "gold standard" annotation of the reports, that I can see. Figure 7 below shows that not all relevant terms are in ExaMode.
- Page 18 line 43: The fact that clinical reports were translated to English is only mentioned briefly here, but I think it requires more discussion. How effective was this translation?
- Page 19 figure 7: Not all the highlighted words are present in ExaMode -- I would like to see a discussion of this.
- Page 19 figure 8: The text of the WebVOWL graph is illegible.
- Page 20 figure 9: I am quite confused by this figure. Were there really two 'Resection' procedures? I suspect there was only one. I do not understand how 'Mild Colon Dysplasia' is the type of one outcome while being a property of the other, or how one outcome has both mild and moderate colon dysplasia.

# Specific points about the examode.owl file

- I do not undertand why the various anatomical classes from UBERON and NCIt have been changed to owl:Individuals, and likewise for MONDO disease classes.
- I do not understand why MONDO:0002271 'Colon Adenocarcinoma' is an owl:Class, but MONDO:0002032 'colon carcinoma' is an owl:Individual.
- `https://hpo.jax.org/app/browse/term/HP:0003584` is not the correct identifier for ann HPO term. It should be `http://purl.obolibrary.org/obo/HP_0003584`
- The paper refers to `exa:Gender` but the OWL file uses `NCIT:C17357`.