EmEL-V: EL++ Ontology Embeddings for Many-to-Many Relationships

Tracking #: 3229-4443

Biswesh Mohapatra
Arundhati Bhattacharya
Sumit Bhatia
Raghava Mutharaju
G. Srinivasaraghavan

Responsible editor: 
Guest Editors NeSy 2022

Submission type: 
Full Paper
Knowledge Graph (KG) embeddings provide a dense, low-dimensional representation of entities and relations in a Knowledge Graph and are used successfully for various applications such as reasoning, and missing link prediction, question answering and search. However, most of the existing KG embeddings only consider the network structure of the graph and ignore the semantics and the characteristics of the underlying ontology that provides crucial information about relationships between entities in the KG. Recent efforts in this direction involve learning embeddings for a description logic (logical underpinning for OWL 2 ontologies) named EL++. However, such methods consider all the relations defined in the ontology to be one-to-one which severely limits their performance and applications. We provide a simple and effective solution, named EmEL-V, to overcome this shortcoming that allows such methods to consider many-to-many relationships while learning embedding representations. Experiments conducted using three EL++ ontologies and one benchmark generated ontology on a reasoning task (class subsumption prediction) show substantial performance improvement over five baselines. Our proposed solution also paves the way for learning embedding representations for even more expressive description logics such as SROIQ. The source code and the instructions to run it are available at https://github.com/kracr/el-embeddings. The ontologies used in the evaluation are available at https://doi.org/10.5281/zenodo.7023568.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Maxat Kulmanov submitted on 30/Oct/2022
Review Comment:

Authors present a method for embedding ontologies which account for many-to-many relations between ontology classes. They build on top of EL-Embeddings method and modify its optimization function by using an additional variance function for relations. They perform thorough evaluations and show that their model outperforms the baseline in different tasks including many-to-many relations. In general, I think the manuscript is written well, claims are supported by evidence and all the source codes and data are available.

Review #2
By Md Kamruzzaman Sarker submitted on 28/Jan/2023
Minor Revision
Review Comment:

This paper proposes a method of EL++ ontology embeddings focusing on many-to-many relations. The work is build on top of the existing EmEL and EmEL++, but it's focus is many-to-many relations. Authors contributions are taking the uncertainty/variance of relations into account while other existing works did not take that into account. Authors also evaluated their work on substantial amount of existing papers. Though the results mentioned on the paper are not always better than existing works, it often overperform than the existing works.

Strong points:
* Idea of taking the uncertainty/variance of relations into account is well motivated given the complex nature of real-world ontologies.
* Authors provided pretty good amount of comparison with the existing state of art.
* Mentioned source code and also provided data source which help to reproduce the result.
* Paper is well written and understandable.

Weak points:
* The claim of "show substantial performance improvement over five baselines." seems not justified with the experimental results. In some experiments the newly proposed method is not the best one. Taking a more justified stance would be better.

Some minor typing mistakes:
On the table 2 and Table 3, it is often mentioned as many-to-many, Many-to-Many, One-to-Many. It should be consistent.

Review #3
Anonymous submitted on 09/Mar/2023
Major Revision
Review Comment:

When we consider the state-of-the-art in Knowledge Graph Embedding (KGE) methods, the focus is mostly on graph structures rather than on the ontologies used to “generate” the graph, which define the semantics of the relations that connect the graph nodes. However, recent work has started to address this second, less explored area, by investigating one-to-one relations. In the current paper, the authors try to make an additional step forward, and introduce EmEL-V, a method to address many-to-many relationships. Such method is based on geometric models, building upon the approach at the basis of EmEL and EmEL++. Overall, I think that the paper addresses an important problem, and shows incremental results with respect to the existing literature.


- There’s a general lack of examples throughout the paper. Although Galen, GO and SNOMED are well known ontologies, it’s not immediately clear even to the experienced scholar which many-to-many relations are used to test the EmEL-V method. Adding at least one example per ontology would be helpful. Also related to this point: section 6.2.3-6.2.5 provide an overview across models and type of relations, but it’d be informative to look at positive and negative examples, related to the performance.

- In terms of evaluation metrics (section 6.1.3), the choice of subsumption over link prediction is well-motivated. I’m wondering though, and asking to the authors, is it sufficient? This is a general problem, not a peculiar issue with this contribution: in testing a model’s capability of leveraging ontological information, how do we factor in semantic relationships (part-of, causality, etc.), beyond IS-A? This is instrumental to be able to claim that the proposed ontology embedding method is more than “just” a taxonomy embedding method. Since EmEL-V “lives” within the expressivity space of EL++, can other features of the language be considered for semantic relations (e.g., existential quantification)?

Overall, the paper is clear, technically sound and contextualized through an appropriate related work section. However, being quite derivative in nature, I’d not consider this contribution particularly original. The associated repository is well documented, and reproducibility of experiments seems to be straightforward.

My final assessment is accepting with major revisions, summarized as follows:
- Include examples of many-to-many relations that can guide the reader throughout the paper.
- Add an Error Analysis section.
- Expand the discussion on evaluation metrics, in section 6.1.3 or 7 (Conclusion and Future Work)

Review #4
Anonymous submitted on 18/Apr/2023
Review Comment:

The paper presents a method to translates an OWL ontology in a latent space, using a methodology that is similar to Knowledge Graph Embeddings (KGEs). The problem of KGEs is that they capture only the topology of the graph without considering the semantics associated to the underlying ontology.

To this end, several approaches that target OWL ontologies have been presented. This paper considers one of them, which they call EmEL. It addresses an important limitation of EmEL, that is it is unable to represent efficiently one-to-many and many-to-many relations. The proposed solution consists of adding a special parameter (\sigma) to every relation. This new parameter moves away from a point-based representation since \sigma can be used to encode an area in the latent space and this allows the method to represent one-to-many and many-to-many relations.

Although the extension is conceptually simple, the empirical evaluation shows that it leads to an improvement of the performance. This makes the presented contribution potentially interesting. Unfortunately, there are several problems with the current version of the manuscript to the point when I am not sure that someone could reproduce the method.

First of all, a general problem of the paper is that the quality of the writing is not always good. For instance, sentences are often not well constructed (e.g., a common problem seems to be an incorrect usage of the comma, starting with "but", and so on). There are also many typos and many concepts are not properly introduced. It appears to me that the paper was not properly proof-read.

Moreover, the introduction does not describe what the actual contribution is, which is unusual in scientific publications. It also mentions terms like "connectionist" or "translation operator" without introducing them.

The related work section is very short, almost half of a page, which is surprising considering the amount of publications around this topic.

The preliminaries section fails to describe clearly the basic notions that are necessary to understand the paper. I believe that the paper can be understood only by a reader that is familiar with OWL and its terminology. To make it more understandable, an example would be extremely helpful to understand the notions of concepts, roles, and individuals. Also the chain operator could be better explained with an example. Also terms like ABox and TBox should be introduced.

Notice that initially the preliminaries uses OWL terms like concepts, roles, etc. Then, it starts using terms like entities, classes, and relations, but these are not defined. I think that also section 3.2 could be substantially improved by adding an example.

Section 4 has some important problems. Figures 1a and 1b are shown two pages later, which hinders the readability. The term "R" keeps getting re-defined. First it is defined using a grammar (page 3, line 37), then as a member of N_R (page 4, line 27), then a member of the real numbers! (page 5, line 11). The definition of the loss in eq 1 does not seems to be correct since it suggests that there is a loss function for every (C,D) pair, since the function is defined for C and D. Notice that terms like C and D are not always reported in math mode. Another undefined concept is the one of n-balls. Finally, Equation 7 contains a new term, called e_v, for which I could not find a definition.

Section 5 reports the actual contribution of the paper. Notice that this section is quite short, even not a full page. This section suffers from the same problems mentioned above. In page 6, line 45 it defines R as a member of \mathbb{R}', which I don't know what it is. The author mentions that they use the absolute value of \sigma in the regularization part of the loss, but in equation 9 there is no trace of the modulus of \sigma. At that point I realized that I would not be able to reproduce this method since too many things are missing or correctly defined. Fortunately, the authors do release the code although I think that a paper should be understandable without checking the related source code.

The evaluation reports many experiments but a general conclusion that I drew after reading it was that this method is not so much better than OWL2Vec (notice that the section mentions Table 6.1.1, which is not mentioned anywhere while Table 3 does not seem to be referenced in the text).

To conclude, if I take into account the problems related to the presentation and the relatively little gain reported in the experiments, I am not sure whether this work meets the minimum bar for acceptance in this journal. It is true that with a significant revision the paper could improve substantially, but it would require another extensive reviewing round and there would still be the problem of the relatively little gain compared to other state of the art methods.