DeepOnto: A Python Package for Ontology Engineering with Deep Learning

Tracking #: 3499-4713

Authors: 
Yuan He
Jiaoyan Chen
Hang Dong
Ian Horrocks
Carlo Allocca
Taehun Kim
Brahmananda Sapkota

Responsible editor: 
Eva Blomqvist

Submission type: 
Tool/System Report
Abstract: 
Applying deep learning techniques, particularly language models (LMs), in ontology engineering has raised widespread attention. However, deep learning frameworks like PyTorch and Tensorflow are predominantly developed for Python programming, while widely-used ontology APIs, such as the OWL API and Jena, are primarily Java-based. To facilitate seamless integration of these frameworks and APIs, we present Deeponto, a Python package designed for ontology engineering. The package encompasses a core ontology processing module founded on the widely-recognised and reliable OWL API, encapsulating its fundamental features in a more ``Pythonic'' manner and extending its capabilities to include other essential components including reasoning, verbalisation, normalisation, projection, and more. Building on this module, Ddeeponto offers a suite of tools, resources, and algorithms that support various ontology engineering tasks, such as ontology alignment and completion, by harnessing deep learning methodologies, primarily pre-trained LMs. In this paper, we also demonstrate the practical utility of Ddeeponto through two use-cases: the Digital Health Coaching in Samsung Research UK and the Bio-ML track of the Ontology Alignment Evaluation Initiative (OAEI).
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 10/Aug/2023
Suggestion:
Minor Revision
Review Comment:

The paper introduces the DeepOnto python package and is submitted in the tool report track.
Since more and more research is done in Python, packages that can parse and process ontologies
in this programming language are really required.
Overall the paper is nicely written and easy to follow.
In the introduction and design principle section, it could be explained more why the parsing step is not implemented in the library itself
and instead still relies on java/OWLAPI in the background. It is also not clear why mOWL is not used as a dependency but kind of
reimplemented in DeepOnto (is functionality missing?).

To allow better reusability of the package it would make sense to focus on one task.
Up to now, it looks like a combination of various approaches (from the authors) like alignment, completion, probing, etc.
Nevertheless, it is good to have a working implementation of those approaches. But this should already be the case for
each paper corresponding to the approaches (e.g. BERTMap and BERTSubs).
Thus a Python package that focuses on one (or few) task(s) and also includes approaches/datasets from various researchers would
make the library even more valuable.
This also includes that not only one track from OAEI is implemented in the library but also others as well.

Another dimension that could be added to the paper is the scalability of introduced approaches
as well as the parsing (I assume this heavily depends on the scalability of OWLAPI).

One important aspect of the tool track is to describe the impact of it.
Given the FAQ of SWJ [1] I see mainly an "impact within your own range of influence" by using it in the Digital Health Coaching project.
For the OAEI, DeepOnto is used to create the track but from what is given in the paper,
no other researchers are using the library so far to e.g. build up or modify other systems, etc.
In case the authors have some evidence that the package is used by other people, include it in the paper.

Even though the package documentation is really good and useful, a small study about the usability of the package would help
e.g. small survey after solving some tasks with the library.

I also tried out the package and following some (minor) improvements:
- Upload package also to conda forge[2]
- Provide option to select GPUs (although it could be done outside of the package)
- Documentation page introduction/installation [3]:
- Requirements section would be nice (python >=3.8, java? or is it shipped with the package? etc)
- Either choose pip or pip3
- Is torchvision/torchaudio a requirement? If not, remove it (maybe link to pytorch installation page)
- Documentation page Tutorial Verbalise [4]:
- verbaliser.verbalise_class_expression(complex_concepts[0]) does not work because complex_concepts is a set ( next(iter(complex_concepts)) )
maybe convert it to a list to make it more consistent with `get_subsumption_axioms`
- In case graphviz is not installed (for visualization), a hint would be useful (e.g. with anaconda the command is `conda install graphviz`)

The long-term stable link is okay.

For the revision:
- Include evidence that the library is used in the community
- Provide a user study or something similar
- Update documentation
- Show limitations

[1] https://www.semantic-web-journal.net/faq#q20
[2] https://conda-forge.org/
[3] https://krr-oxford.github.io/DeepOnto/
[4] https://krr-oxford.github.io/DeepOnto/verbaliser/

Review #2
By Daniel Garijo submitted on 14/Aug/2023
Suggestion:
Minor Revision
Review Comment:

This paper describes DeepOnto, a python toolkit designed to help in different ontology completion, pruning and alignment tasks. The authors showcase the usefulness of the tools through two use cases: one in the ontology alignment task of a workshop and another one in digital health.

The paper reads well, and it's relevant to the Semantic Web Journal audience and the Semantic Web community. I appreciate the involvement of the authors in preparing ontology alignment datasets (BioML), which showcase the usefulness of the tool and help the community push forward the state of the art. The tool is openly available (with 100+ stars), documented and available with an open license. The authors seem to have done a great effort to integrate state of the art models in order to help in ontology alignment tasks, and the result is a promising tool for the community. Therefore I lean towards accepting this paper.

That said, I found the following limitations, which should be addressed in the manuscript:

- There is no mention to the long term sustainability of the tool. Deep Learning frameworks and models are known to be associated with a software stack that updates rapidly. It would be nice to indicate whether there is a community using the tool besides the authors with their own use cases. Is there any evidence about usage of the tool in other tasks or projects besides the ones reported in the paper?

- The motivation in the abstract and introduction for the tool is quite weak. It looks like the gap addressed by the tool is a programming language issue when using Java libraries from Python. But the tool seems to cover a wide range of tasks that go beyond a Python-Java integration. The authors should motivate the existence of the tool based on the tasks that it supports, rather than the programming languages it supports.

- The authors refer to several tasks in pages 1-5 without defining them first, which is confusing for following the paper properly. For example, it was not clear why ontology verbalisation and projection are needed until page 5 defines them.

- Figure 1 is confusing. What do different arrows and colors mean? Why do boxes overlap? I believe this should be reviewed.

- Shouldn't completion be based on requirements and competency questions? There may be interest in only part of the domain. This does not seem to be covered in the task.

- "This standard approach is effective for storing or exchanging an ontology, but its utility for ontology visualisation or applying graph-based algorithms, such as Random Walk and Graph Neural Networks, is limited" -> Is there evidence for this claim? If so, please include it.

- When explaining the subsumptions, adding an example would help following the difference between existing approaches.

- In Table 2 (human evaluation results), the agreement between evaluators is not clear. Is there any? I find it interesting that substring matching has so high precision. Could this be specific to this evaluation task? I am not sure this translates to other evaluation datasets, please add a comment on the generalizability of this type of evaluation.

- What about the errors that DL introduces and have no explanation? For example, there may be some alignment matches proposed by the tool that may be difficult to understand without better context or explanations.

Review #3
By Andrea Giovanni Nuzzolese submitted on 09/Oct/2023
Suggestion:
Minor Revision
Review Comment:

This article addresses the challenge of harmonizing Java-based prevalent ontology APIs with the techniques of deep learning, which are primarily developed in Python. It introduces DeepOnto, a Python package tailored for ontology engineering.

Starting from the premise that deep learning methodologies have gained substantial traction across many research landscapes, the paper shows through examples how these methodologies have demonstrated notable superiority over traditional ontology engineering tools. DeepOnto emerges as a venture to bridge the existing gap by offering a robust and Python-compatible package to aid deep learning-oriented ontology engineering, marking its flexibility and extensibility for additional implementations. Nevertheless, the literature review provided does not overtly illustrate the importance and advantages of merging these two paradigms, an aspect which is later provided in the use cases section. The frameworks proposed by Pan et al. (2023; https://arxiv.org/abs/2306.08302) effectively highlights how large language models often are black-box models and may fail in capturing and accessing factual knowledge.

DeepOnto encompasses a core ontology processing module, introducing the features of the OWL API, such as accessing ontology entities, querying concepts, entity deletion, axiom modification, and annotation retrieval. Furthermore, the ontology class houses several crucial sub-modules spanning reasoning to projection, which are described in-depth in the Architecture segment of the article, supported by illustrative figures. While the term “user-friendly” and less verbosity of the method compared to existing solutions recur throughout the text, and its flexibility for easy updates is noted, a more in-depth explanation motivating these statements could benefit the narrative.

Moreover, DeepOnto is equipped with different tools and resources aimed at ontology engineering tasks like ontology matching and ontology completion, all of which are elaborated in depth with supporting links provided. Mention of how linguistic resources such as Framester (Gangemi et al., 2016; https://framester.github.io/), could potentially improve ontology matching alongside BERT-based models, could be an interesting addition.

In demonstrating DeepOnto's practical utility, the authors describe two use cases: Digital Health Coaching at Samsung Research UK, and the Bio-ML track of the Ontology Alignment Evaluation Initiative (OAEI). This second part of the paper underscores the inherent challenges of benchmarking in ontology matching research, particularly within biomedical ontologies. It critiques existing datasets for their inadequacy in accommodating machine learning-based models and for their focus solely on matching equivalent concepts, while possessing incomplete ground truth mappings. Bio-ML, on the other hand, aims to utilize human-curated mappings as ground truth, employing a broader spectrum of ranking metrics, and includes subsumption matching. The paper thoughtfully articulates different data split configurations and an intention to optimize computational resources. The results reflect promising scores concerning performance, although with an identified issue in matching medical concepts, where certain non-disease concepts are erroneously matched to disease concepts due to lexical overlaps. Although unaddressed in the paper, this issue could bear significant implications in digital health coaching scenarios. It is suggested to mention possible solutions for improving these cases.

Towards the end, the paper touches on how modules in DeepOnto support the prompt learning paradigm, a foundational aspect of large language models like ChatGPT. However, in the conclusion, the authors suggest expanding the existing toolset with newer models such as the GPT series, creating ambiguity regarding what has been implemented versus what is projected for future integration.

In conclusion, the paper is well-written with good support from figures and extensive resources, along with a solid bibliography. It could benefit from more detailed explanations on certain design choices and a clearer distinction between current implementations and future plans.