Focused Categorization Power of Ontologies: General Framework and Study on Simple Existential Concept Expressions

Tracking #: 3196-4410

Vojtěch Svátek
Ondřej Zamazal
Viet Bach Nguyen1
Jiří Ivánek
Ján Kluka
Miroslav Vacura

Responsible editor: 
Axel Polleres

Submission type: 
Full Paper
When reusing existing ontologies for publishing a dataset in RDF (or developing a new ontology), preference may be given to those providing extensive subcategorization for important classes (denoted as focus classes). The subcategories may consist not only of named classes but also of compound class expressions. We define the notion of focused categorization power of a given ontology, with respect to a focus class and a concept expression language, as the (estimated) weighted count of the categories that can be built from the ontology’s signature, conform to the language, and are subsumed by the focus class. For the sake of tractable initial experiments we then formulate a restricted concept expression language based on existential restrictions, and heuristically map it to syntactic patterns over ontology axioms (so-called FCE patterns). The characteristics of the chosen concept expression language and associated FCE patterns are investigated using three different empirical sources derived from ontology collections: first, the concept expression pattern frequency in class definitions; second, the occurrence of FCE patterns in the Tbox of ontologies; and last, for class expressions generated from the Tbox of ontologies (through the FCE patterns); their ‘meaningfulness’ was assessed by different groups of users, yielding a ‘quality ordering’ of the concept expression patterns. The complementary analyses are then compared and summarized. To allow for further experimentation, a web-based prototype was also implemented, which covers the whole process of ontology reuse from keyword-based ontology search through the FCP computation to the selection of ontologies and their enrichment with new concepts built from compound expressions.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Dörthe Arndt submitted on 09/Nov/2022
Minor Revision
Review Comment:

The paper has significantly improved since its former version and most of my questions have been clarified.

However, I still see some problems:

1. I still do not understand why the authors are not using the owl axioms directly corresponding to the DL axioms they introduce and instead go for approximations (e.g. domain-expressions). The reason is explained on page 9 and the second point, the idea of constructing new classes, is at least understandable for me. But I do not see the problem with the first point, maybe the authors could be more concrete here? What is the concrete problem? Why would we need punning in that case?

2. Even though the authors explain in detail how the two concepts of CE patterns and FCE pattern (and thereby also FCP and ^FCP) are related, I still fail to see that relationship. Domain declarations are simply not the same as existential expressions and the argumentation that we can construct one from the other does not really convince me.

3. As stated in my last review, I think that in some cases the reuse of ontologies as described by the authors can be problematic, especially when we construct OWL classes out of RDFS ontologies and even if that does not affect the calculations done in the paper, it might have impact in practice. Maybe the authors could at least add a comment that the combination and re-use of ontologies also needs to take these kinds of logical considerations into account.

Having said that, I am aware that my second point is very hard to address and that it is rather an opinion than a concrete suggestion for improvement. Since I think that we do not need to agree here (on the contrary, science lives from discussion), I would be interested in your argumentation in the next response letter but I would not impose any changes on the paper in that regard. I am still looking forward to your answer.

Detailed comments:

page 8: in this paper we will considering… ->consider

page 9: you talk about the relation between CE patterns and FCE patterns but according to Definition 4, FCE patterns contain CE patterns. I guess you mean the last four elements of the 5-tuple?

page 9: argument against using the owl-dl expression directly: in the first point you say that the classes, properties and individuals in the CE patterns need to be strictly disjoint while on page 6 you say, that OWL-2 DL allows them to be non disjoint (which is correct). Could you please clarify? What is the concrete argument here?

page 13: “The existence of a property assertion [...] can be easily checked on the Abox…” This wording is a little bit misleading: either a property assertion is present in the Tbox or it is not, this cannot be checked on the Abox. I guess the authors mean that it is possible to find instances of the class defined by the property assertion by querying the Abox. That can indeed only be done for existential restrictions but not for universal restrictions.

Review #2
Anonymous submitted on 10/Dec/2022
Review Comment:

This revision addresses all issues I raised in my original review. I therefore recommend to accept the paper. I below give a detailed analysis of all changes, and my reasons for the recommendation. I commend the authors for the great work on the revision (particularly the changes to point 3 and point 5 below is very effective - the other points as well of course, as detailed below).

In more detail, this submission is the revision of submission #3008-4222 () which was a significantly updated resubmission of submission #2406-3620 ().

In my original review, I made the following summarizing comments: The paper

> (*) has invested clear and high effort in a relevant aspect of ontologies
> (*) undersells this by sometimes overly complex exposition [...] - the paper could transport clearer contribution, novelty and impact in shorter exposition

In particular I raised seven points that the authors have addressed in this revision, leading to - in my opinion - a great final result. I give the details below.

> (1) General readability.
> The manuscript, while significantly improved, is not yet at the readability and polish required for a journal publication. I want to explicitly commend the authors' effort in improving this, but remark that a number of items are still of the quality required. In particular this includes general clarity of narrative (the intended meaning behind many paragraphs is at times vague and could be made crisper), colloquial use of punctuation symbols (e.g., brackets opening paragraphs, symbols within words), etc. There are many paragraphs that, to the best of my understanding comparing both versions, could still be improved in that respect. There are some new paragraphs (e.g., the "pizza" example from the introduction) that are not clear without context. I do add that more examples are definitely good for accessibility of this paper.

The authors have significantly improved the general readability and polish.

I also checked the changes on the "pizza" example, which now gives additional narration and introduction, and should make it easier for readers not yet familiar with the example to understand the context and in particular the numbering of the archetypes.

The additional examples in the new Section 1.3 are certainly going to help readers appreciate the contribution more.

> (2) More "upfront" narrative and examples.
> This also relates to some comments from earlier reviews (Reviewer #2 from the earlier version stating "The paper would be inaccessible to the general audience. This is supposed to be a journal publication, but I don’t think that an ordinary PhD student or a young PostDoc would be able to learn much from this paper.") I think this has significantly improved, but still needs a revision to make sure that notions and goals are introduced "upfront". Let me give an example: Compound concept expressions and their corresponding description logics suddenly appear in the (nice) new "working assumptions" subsection. However, for a reader, it is at a point where it is still hard to understand the context.

I re-read the sections, and I think that this has reached a good balance:

The work on the section on "working assumptions" (Section 1.1) is now at a level where a reader has enough information, or at least enough pointers to literature, to gain value from the paper even if their background is limited.

The additional improvements in that section (also mentioned in the above point (1)) together with the further improvements make the introduction balanced in terms of accessibility - see also the next point.

> (3) Clarity of concepts.
> The introduction is quite vague at concepts (making it hard for the reader to grasp what it aims for) while the formal definitions are relatively abstract, with very little connection between the two. The examples, while very nice, are not enough to fully transport this. I think there are multiple solutions for this (also depending on the opinion of the other reviewers on this). One of them is
> (3.1) Extend the introduction by introducing the concepts one-by-one, including examples, so that a reader gets an understanding of them, then following it by the formal definitions in Section 2 as is.
> (3.2) Making the introduction more high-level, and providing Section 2 in a more accessible way, interlacing concepts, intuitions and examples.
> I think (3.1) would work better - but any version of the paper that makes clearer the combination of concepts, their intuitions and examples would help here.

The authors chose option (3.1) of the ones above, introducing the new Section 1.3.

I think that the new Section 1.3 is extremely effective for connecting the high-level introduction and the technical content that follows.

Together with the other changes to the introduction, I find this a very convincing overall improvement of the early parts of the paper.

I commend the significant efforts that the authors put into improving this part of the paper.

> (4) Some notions and intentions unclear.
> Some notions are used before it is clear what they are aiming for, e.g., "heuristic linking" is used without context in the introduction, and then referenced in the formal definition section. In the introduction that is to a certain extent ok, but in the formal definitions section, a reader does require some context. With regard to intentions, in some parts of the paper (to give one example: Section 2.5) a lot of effort is used to transport some very particular computation, but it is unclear what the intention is, the meaning of the specific choices, etc. Make the intentions clearer, that could make such section short and more crisp (see also points (2) and (3) above).

The authors have addressed this point through

(i) adding the new Section 1.3 and
(ii) reformulating Section 2.5.

I see a clear added value to the reader through these coordinated changes in both sections.

> (5) Some section need a better "roadmap".
> I give as an example Section 2, which Section 3 later refers to as "[having defined the framework]": it has a good initial summary, but then loses the reader in sections, where interesting topics are discussed, but the overall structure becomes unclear. Perhaps a more elaborate introduction of such a section could help here, or better statement for each subsection (*) what was shown so far (*) what needs to be discussed next and (*) how do the two connect. Perhaps both would make it a more convincing section.

The authors addressed this through very effective introduction paragraphs to the subsections.

This greatly improved the understanding a reader gets through the elaborate construction that is given in Section 2.

I also commend the new Section 2.9, it is very effective at catching the reader (having reached page 12) and giving a roadmap for the further steps.

> (6) Some sections need a clearer contribution.
> I commend Sections 4 and 5 for the good examples. I think they should stay if at all possible within the space - it is great. What these sections do not transport in enough detail is the contribution they make scientifically to the overall contribution of the paper. The connections between the results explained in Section 5.3 needs to be better connected to the framework, and the scientific contribution and its novelty highlighted more.

This is solved elegantly through the new Sections:

- Section 4.4
- Section 5.4 (expanded)
- Section 6.4

I find them very effective at highlighting the scientific contribution and novelty of the work.

> (7) Experiments need more explanation of impact.
> The cognitive experiments are very interesting and a valuable addition to the field, as are the others. At the moment, I believe the impact of the results is not described enough, especially given how much effort was involved in performing, in particular, the cognitive experiments. I think highlighting more of the impact would make the effort more valued.

Similarly to the above, this was addressed through:

- Section 7.3

The extended introduction to the code works very well.

In total, all seven points were addressed with high effort and quite effectively, leading to a clear recommendation for acceptance.