Repairing $\mathcal{EL_\perp}$ Ontologies Using Debugging, Weakening and Completing

Tracking #: 3641-4855

Authors: 
Ying Li
Patrick Lambrix

Responsible editor: 
Stefan Schlobach

Submission type: 
Full Paper
Abstract: 
The quality of ontologies in terms of their correctness and completeness is crucial for developing high-quality ontology-based applications. Traditional debugging techniques repair ontologies by removing unwanted axioms, but may thereby remove consequences that are correct in the domain of the ontology. In this paper we propose an interactive approach to mitigate this for $\mathcal{EL_\perp}$ ontologies by axiom weakening and completing. We present the first approach for repairing that takes into account debugging, removing, weakening and completing. We show different combination strategies, discuss the influence on the final ontologies and show experimental results. We show that previous work has only considered special cases, and that there is a trade-off, and how to deal with it, involving the amount of validation work for a domain expert and the quality of the ontology in terms of correctness and completeness. We also present new algorithms for weakening and completing.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 27/Mar/2024
Suggestion:
Major Revision
Review Comment:

Review of: "Repairing EL_$\bot$ Ontologies Using Debugging, Weakening and Completing"

The paper deals with the subject of repairing ontologies. It addresses the combinations of different variants of fundamental operations that constitute a repairing process. The paper introduces an interactive approach for repairing ontologies, and discusses the quality of the obtained repairs in terms of their correctness and completeness with respect to their entailments. Additionally, this work includes both an implementation of the discussed approach and an experiment.

I find the topic of this paper to be of great relevance to the journal.

The paper presents a framework for repairing ontologies that is general enough to characterize previous works on the subject. Given an EL bottom TBox and a set of wrong axioms, the framework allows the combination of debugging, removing, weakening and completing operations to obtain a repaired TBox that does not entail any of the wrong axioms. The framework also incorporates an oracle (user) in order to validate axioms during the entire process of repairing. The paper presents multiple variants of those operations, and discusses their trade-offs in terms of completeness and correctness of the repaired TBoxes, where completeness and correctness concern the entailments of the resulting TBoxes. The paper also introduces new algorithms for weakening and completing. Furthermore, it describes two systems: an implementation of the framework as an extension of the EL version of RepOSE, and a Protégé plugin for a specific instantiation of the framework. Additionally, an experiment evaluating the discussed approach is also included. The authors also provide access to implementations and resources used in this work.

This paper extends the authors' previous work. Here, the considered Description Logic is EL bottom instead of EL. Wrong axioms can be any entailed axiom instead of only asserted axioms. Additionally, the debugging operation has been integrated into the framework.

Overall, the paper is well structured and interesting to read. However, I am not sure if the provided extensions are sufficient to merit a significant delta. Furthermore, there are some issues that require clarification.

The authors assert that all proposed algorithms generate repairs as defined in Definition 1. However, I have concerns about this claim based on several observations. Notably, in the authors' previous work, debugging was not included in the repairing pipeline, and the set W of wrong axioms was assumed to be complete in a way that, once removed from a TBox T, T would not entail any axioms in W. The authors have this assumption in this paper as well. However, since debugging is now part of the framework, the set W of wrong axioms is not exactly the set D of axioms that need to be removed, which now have to be computed instead of being given as part of the input.

Furthermore, the paper mentions that the oracle used in the approach has no restrictions and may provide incorrect or inconsistent answers. This raises questions about the correctness of the set D, which is computed based on the oracle validations, as illustrated by Algorithm C14 for example. The possibility of oracle errors suggests that the set D might not be comprehensive, potentially affecting the reliability of the repair output. For example, if T = {A <= B, B <=C} and W = {A <= C}, an erroneous oracle could lead to a scenario where A = {A <= B, B <= C} and D = {}, resulting in an output T_r identical to the input T, which still entails the incorrect axiom.

I understand the reason behind not requiring the oracle to always be correct, as this mirrors the reality where domain experts can make mistakes. However, it should be clearly stated in the text and the algorithms that generating a repair is not always guaranteed.

I tested the previous example using the provided Protégé plugin. I noticed that, aside from the absence of warnings indicating the TBox has not been repaired, the plugin also requires the axiom A <= C to be asserted, which should no longer be a requirement.

Additionally, here are the remaining points that require further explanation.

- Preliminaries:

- individual names are introduced but they are never used in the paper, as the footnote also suggests. So what is the reason behind introducing them?
- P and Q are used here as arbitrary concepts, but later in the paper they are used as concept names. The authors make it clear that this is the case. For clarity, using different letters to differentiate between concept names and complex concepts might benefit the reader.

- Problem Formulation

- Definition 1: Let W be a finite set of TBox axioms in T ... -> The footnote on the first page indicates that, unlike in the authors' previous work, the wrong axioms in this paper do not need to be explicitly asserted. However, in this definition, "a set of TBox axioms in T" is understood as these axioms are indeed asserted.
- line 6: ... a repair for Debug-Problem DP(T,Or,W) ... -> Symbols like "Or" and "(A,D)" have been introduced before the definition but "DP(T,Or,W)" was not. It may benefit the reader to be introduced to all components used in the definition beforehand.

- Basic operations - debugging, removing, weakening and completing

- line 46: N_C^T and N_R^T -> These symbols are used here but introduced later in Definition 4. It would be easier for the reader if the meaning of these symbol is clarified when they are first introduced.
- line 24 - 26: ... where P, Q, R \in N_C^T and r \in N_R^T ... -> The authors note here that fresh concept names might be introduced. But with normalization, for example, added axioms might not adhere to having concept names only from N_C^T. Similarly with a weakening that can introduce new names.

- Combination strategies

- In Table 2, it is not clear to me what is the purpose of operations "R-none" and "AB-none". If no axioms should be removed (added), then why would one need a special process that removes (adds) nothing?
- page 10 line 20: T_1 \sqsubseteq T_2 -> Subsumption between TBoxes is not defined anywhere in the paper. I assume the authors imply T_2 \models T_1 with this notation. But it is unclear to me why entailment is not simply used?
- line 20: Der(T_1) \sqsubseteq Der(T_2) -> Why not use \subseteq instead of \sqsubseteq?
- line 24: ... sets of wrong asserted axioms D_1 and D_2 such that D_1 \sqsubseteq T ... -> Again, why not use \subseteq instead of \sqsubseteq?
- page 11 lines 10 - 14: I'm not clear on the argument presented here. Which computation is being referred to here? It would be helpful if this could be explained further.
- line 17: ... updating after each wrong axiom is the same ... -> Does this refer to the update after each axiom is weakened or removed, or something else. This requires more clarification.
- I believe discussing the impact of various operation variants and their sequence on the input size for the next operation in the pipeline, as mentioned in this section, is crucial. However, have the authors also considered how the order in which axioms are weakened affects the quality of the repaired ontology (and similarly for completing), as well as the overall quality of the repairing process?

- Appendix

- In Algorithm 11, what is the difference between the if statements in lines 8 and 11?
- Also in Algorithm 11, what happens when concepts Q, R or P are complex concepts? how are these concepts normalized?

- Resources

- Instructions on how to use the provided tools, as well as the data used in the experiment, are made available. However, I could not find any scripts or instructions for reproducing the results.

Finally, I've noted some typos and minor wording issues that can be improved.

- Introduction

- page 1 line 40: ... and the repairing. -> ... and their repair.
- page 2 line 2: In this paper we mitigate these effects of removing wrong axioms by, in addition to removing those axioms,... -> This might be a bit redundant, as it is already understood that certain axioms will be removed.

- Preliminaries:

- line 22: ... during the repairing and ... -> ... during repairing or during the repair process ...

- Problem Formulation

- line 6: ... a repair for Debug-Problem DP(T,Or,W) ... -> ... a repair for a Debug-Problem ...
- line 15: ... that formalize these intuitions, respectively. -> ... that respectively formalize these intuitions.
- reconsidering the line breaks in all definitions, particularly in Definitions 2 and 3, could yield a cleaner format.

- Basic operations - debugging, removing, weakening and completing

- The title of the previous section uses a title case style, whereas the rest of the titles use a sentence case style. Consistency across all section titles would enhance the document's appearance.
- line 43: ... simple complex concept set for a TBox T, ... -> ... simple complex concept set for a TBox T, denoted by SCC(T), ...
- line 14: It might look better if "T" were to fit in the previous line.

- line 31: ... removed from the ontologies ... -> ... removed from the ontology ...
- line 32: ... Figure 1 derived wrong axiom ... -> ... Figure 1, the derived wrong axiom ...
- Hitting Set: sometimes "Hitting set" is used, and other times "hitting set".
- line 38: ... for computing the justifications ... -> ... for computing all justifications ...
- Figure 1: Completion: wanted axiom \alpha2 \sqsubseteq \beta2 is replaced by correct axiom ... -> The term "replace" is somewhat misleading. It implies that an asserted axiom is swapped with another, whereas, in reality, the axiom is not and will instead be entailed as a result of completing.
- page 7 line 16: a similar issue to the one previously mentioned.
- Algorithm 1: Generate the justifications ... -> Generate all justifications ...
Line 3 in the algorithm, the spacing in "GenerateJustifications" looks a bit off, maybe using something like $\mathit{GenerateJustifications}$ could solve the problem.
"Or" is italicized in line 6 but appears in a non-italicized format in the algorithm's input section (this issue appears in all algorithms).
The titles of most of the algorithms provided in this paper describe in length what the algorithms actually do, which results in some lengthy titles, for example Algorithm C15.

- Combination strategies

- line 5: In this section -> In this section,
- line 6 and table 2: one at the time -> one at a time
- line 8: ... the influence of using different choices ... -> ... the influence of using different combinations ...
- line 13: ... between the choices for different combination strategies ... -> ... between the choices of different combination strategies ...
- Table 2: operations are not italicized, whereas in the main text, they are.
- line 47: ... to generate asserted wrong axioms ... -> ... to extract asserted wrong axioms ...
- line 19: ... as soon as one is computed ... -> ... as soon as it is computed ...
- line 23: Similarly as for weakening, ... -> Similarly, as with weakening, ...
- line 34: Figure 2b -> Figure 2c
- line 36: Figure 2c -> Figure 2d
- line 36: ... one at a time completing ... -> ... one at a time completing strategy ...
- Figure 2: It seems that a different font is used for the node labels which makes the style of operation names inconsistent with the one in the text.
- page 10 footnote: I think this should be moved to the main text.
- page 11 line 8: If one wrong axiom at the time is removed ... -> If one wrong axiom is removed at a time ...
- line 11: ... then they will be added back at the end or not. -> ... then they might be added back.
- line 17: First, we note that updating immediately ... -> First, we note that updating the TBox immediately ...
- line 26: When completing one axiom at a time ... -> When completing for one axiom at a time ...
- line 42: ... transformed into the sequence of operators of a second algorithm, ... - > ... transformed into a sequence of operators of another algorithm, ...
- pages 12, 15, 19, etc.: The document contains a significant amount of white space. It may enhance readability if the space were managed more efficiently.

- Implemented systems

- line 4: ... for repairing based ... -> ... for repairing ontologies based ...
- line 11 and 12: ... left/right hand concepts ... -> ... left/right hand side concepts ...
- The paper contains many long sentences; for example, the sentence spanning lines 8 to 12, or the one from lines 46 to 49, just to point out a few. I believe breaking such sentences into shorter ones would enhance the paper's readability.
- line 18: ... that focused on completing ... -> ... that focuses on completing ...
- line 22: ... and save ... -> ... and saving ...
- line 35: After loading the ontology, ... -> After loading an ontology, ...
- All examples in this section are labeled "Example." Numbering them, for instance, might improve their presentation.
- page 22 line 29: For the completion step ... -> for the completing step ...
- line 33: Figure2c -> Figure 2d
- line 35: ... we also implemented the function that the user can remove the specified wrong asserted axioms ... -> ... we also implemented a function that allows users to remove specified wrong asserted axioms ...

Review #2
By Patrick Koopmann submitted on 30/Apr/2024
Suggestion:
Reject
Review Comment:

The paper is concerned with repairing ontologies wrt. a set of
unwanted consequences, where repairing may involve not only removing
axioms, but also adding axioms, weakening existing axioms, and
strengthening existing axioms. Furthermore, it is assumed that an
oracle is available that can confirm or reject whether a given axiom is
correct or not. In practical applications, this oracle could
correspond to a domain expert that is asked questions during the
computation of the repair. Rather than presenting one algorithm for
computing repairs, the authors present a collection of operations that
can be used to compute a repair, and then analyze how different
combinations of these operations affect the repair result.

The different operations are: 1) removing unwanted axioms directly, 2)
removing axioms that can be used to entail an unwanted axiom (as in
classical repairs), 3) replacing an axiom by a set of logically weaker
ones, and 4) replacing an axiom by a set of logically stronger
ones. In each case, the oracle is used to determine whether only wrong
axioms are removed and only correct axioms are added. To
ensure that Operation 3 and 4 have only finitely many options, the
authors put the restriction that only concepts are used that are
concept names, conjunctions of two concept names, or a role
restriction with a concept name. Further variations are introduced by
distinguishing how operations are combined, e.g. whether always all
instances of an operation are applied at once, or whether the
instances are applied one after the other in parallel with another
operation. This leads to a total of 20 repair algorithms that are all
shown in algorithm environments in the paper (Table 8 on Page 30 gives
an overview).

The different algorithms are compared wrt. to whether
the resulting ontologies are "more complete" or "less incorrect",
which the authors visualize using Hasse diagrams. In addition to
arguing why these relations hold (which can be shown by analyzing
subset relations between the generated ontologies). This leads to
insightful observations such as: if my operations are "remove wrong
axioms" and "add wrong axioms back to the ontology", How is the
correctness of the ontology affected based on whether I compute a
hitting set of just one wrong axiom at a time vs. computing a
justification for all wrong axioms? The authors
additionally confirm their theoretical observations using an
experimental evaluation on
a small set of small ontologies. Furthermore, the authors discuss a
graphical user interface and a Protege plugin of the approach.

The paper has a range of limitations and is in my opinion not good enough
for a journal publication. There are some issues with the presentation
that should be improved - I give a list further down to
help the authors with future publications. The main issue however lies
in the contribution: the observations made in the paper in great
detail are often quite obvious and not so surprising, and it is not
clear what one can gain from this detailed analysis. Furthermore,
compared to other works on gentle repairs or abduction (which
corresponds to the "completion" operation in the paper), the setting
considered here is quite limited, since the authors put an explicit
bound on the complexity of concepts introduced. Finally, the
experimental evaluation can be improved both in presentation and
setup. I think however that the Protege plugin could be useful in
practice.

* Resources

The resources are provided via FigShare, which should be fine. I
downloaded it and could access the ontology files. There are also the
protege plugin and the graphical tool discussed in the paper. However,
it is not described how to reproduce the experiments, unless this is
supposed to be done manually with the graphical frontends.

* Comments to the authors:

** Style

Listing 20 algorithms in algorithm environments, which just differ in
the order in which operations are applied, and which anyway are
discussed in the paper, does not make a lot of sense. A good paper
should avoid redundancy, or only use it when it helps in understanding
the content better, for instance through intuitive explanations. There
are also other places where text is unecessarily extended: The best
example is probably Section 4.3, which is dedicated to explaining the
operation of removing axioms from the ontology, which really just
consists of applying set difference, for which it manages to spent
several lines and even a reference to a previous section. In total, 18
pages of the appendix are all dedicated to listing algorithms (12
pages), and tables with lists of axioms (6 pages). This is not a good
use of space.

Some general advice:
- avoid pixelated, fuzzy images as in Page 10, Fig. 2. Such diagrams
should use vector graphics
- lists of abbreviated axioms as in Figure 3 are not human readable
and do not add much to the text. Use a better formatting with align,
but generally avoid examples that involve so many axioms. The idea
of examples in the text is to illustrate ideas to the reader, not to
provide ontologies used in an experimental evaluation in full
detail. For this, you should use an online repository such as
zenodo.
- don't overuse footnotes - if an explanation is central to the text,
it should be in the main text
- never use past tense when describing your design decisions for your
framework.
- make sure all used notations are defined, and are defined before
they are used

** Method

Some aspects don't seem well thought-through: for instance, the
concept refinement method may introduce fresh concept names due to a
normalization step, as the authors also point out themselves. But
since those fresh names will never be part of the "correct" ontology,
the oracle would always filter them out.

Alg 1: in the text you describe the hitting set algorithm, but the
algorithm here does something very different and not really friendly
to the oracle: it goes over all axioms of all justifications
(potentially visiting the same axiom more than once), and validates
each of them with the oracle!

You exclude Top and Bottom from the set SCC(T) of concepts to be used
by weakening and completing. You argue that this is done to avoid
tautological axioms being generated when weakening---however this
affects you in the completing step: what if the correct completion
would have Top on the left-hand-side or Bottom on the right-hand side?

On the bottom of Page 7, you argue an optimization that avoid the
introduction of axioms that make concepts equivalent - but again, what
if the correct ontology contains such equivalences?

** Evaluation

In general, when doing an experimental evaluation of some prototypical
implementation, one would also be interested in the run time of the
experiments, for which it would also be good to know about the
hardware configuration of the computer used.

Apart from this, your description leaves some important questions open:
- What are the versions of the ontologies used, and where did you take
them from? For instance, it feels strange that your version of NCI
only has 3304 concept names and one role name. So mentioning which
version you used and where you took it from would help.
- You say that you remove parts of axioms that are not in EL---how
exactly was that done? Did you replace subconcepts by a fresh
concept name or by TOP or something like that? What one sees more
commonly is that entire axioms are removed if they are not
supported. I like the more fine grained approach you used, but you
should be more explicit in how you did it.
- Tables 4-7 are not that informative: first, the reader cannot
understand what the algorithm numbers stand for, especially since
the list of algorithms is only shown in the appendix. Second,
listing the axioms explicitly leaves a lot of work for the
reader. More interesting would be metrics: how many axioms were
removed, added? What are the relationships between the different
outcomes in terms of additional/different axioms? Which proportion
of the wrong axioms was removed, and which proportion of missing
axioms added? In the appendix, more tables like that are shown with
all the different axioms that are involved by different algorithms
(6 pages in total). I don't think this is so useful and fruitful -
rather, I would supply a resource with the actual files that were
generated.
- I understand that for each ontology, you manually changed some
axioms or marked them as wrong. This really calls for an automated,
more principled approach, rather than a manual selection. Since the
modificatiosn are so simple, it would both save you work with the
experiment, and give you more data to evaluate, which in turn would
add significance to your results.
- Finally, I don't understand why you used HermiT as reasoner - all
your ontologies are restricted to EL+, so I would use the ELK
reasoner which is much faster on EL ontologies.

** Detailed comments

Preliminaries
- if all roles are atomic, there is no use in naming them explicitly
"atomic roles"
- "the interpretation function is straight-forwardly extended to
complex concepts" -- how? (refer to table?)
- "Note that P and Q are arbitrary concepts. In the remainder we often
use P and Q for atomic concepts." -- this is unecessarily
confusing. Just stick with one form of notation How about A,B for
atomic concepts and C,D for complex concepts, as most papers do it
- Footnote 3: if you do not use individuals, then don't introduce
them
- Footnote 4: this should really be in the main text

Section 3:
- Page 3, Line 33 "We have not required/We did not require" --> "We do
not require"
- Page 4, Line 6/18/23 the layout in your definitions is incosistent
- Page 4, Line 9: what is an asserted axiom? (Do you just mean axioms in T?
But then this formulation is redundant)
- Page 4, Line 16: What is the point in having a notion of ontologies
O1 and O2, which are represented as TBoxes T1 and T2? Is there
anything additional in O1 and O2? In the context of your paper, an
ontology is simply a TBox, and discussing O1 and O2 adds nothing to
the text but additional confusion
- Page 4, Line 28 (and later): this hyphen should be an M-dash
- Normalization of EL axioms is quite standard and I don't think there
is a need for providing an algorithm directly in the paper
- SCC(T) is used before it is introduced. In general, the paragraph
before Definition 4 is not really easier to understand than the
definition itself - I would just leave it out.
- I do not get how you get to that formula in Footnote 7: It is
clearly n^2+n+tn -- n^2 binary conjunctions, n concepts, tn role
restrictions. Why do you divide by 2?
- Page 5: Definition 5 is not a definition.
- Page 5, line 27: Something doesn't quite work here: N^T_C and N^T_R
are, according to Def 4, atomic concepts and roles that occur in the
TBox. How can you then introduce new atomic concepts? At the same
time, introducing atomic concepts when computing repairs is a bit
questionable, since the resulting axioms can never be correct (how
can the algorithm know the correct names?), and consequently would
be filtered out by the algorithm. In general, it
is interesting why this is even needed: the only interesting case
where you would introduce new atomic concept names is for an axiom
of the form "exists r.P sqsubseteq exists s.Q", but wouldn't it then
be easier to just allow these axioms to your normal form?
- Section 4.3 really shouldn't be a section

Page 6:
- Fig 1 is not so clear - I think just showing the axioms would be
much more insightful

Page 7:
- sup(alpha) / sub(beta) is not defined (or is a T missing?)
- Line 43: the point of the names "source" and "target" only becomes
clear in the evaluation - then introduce them there. At this point,
this notation is confusing, and in the evaluation section, readers
may have forgotten what they stand for.

Page 8:
- what is the point of the AB-operations? Why would one want to add
wrong axioms back into the ontology??

Page 10:
- Fig 2: Resolution! Also: what is D^*?
- the operation "\sqsubseteq" (square subset) on TBoxes is never
defined, and I have also never seen it before. Do you mean
entailment between TBoxes? Then you should use "\models", and also
define it in the preliminaries? Do you mean the subset relation?
Then use the subset symbol!

Page 17:
- Use a reference for the paper for a tool if you are using it

Page 19:
- why is there so much space?

Page 24:
- Line 45: "these approaches" comes again and again, but which
approaches has not been mentioned yet
- " In our approach we assume that when removing axioms from the
ontology, the wrong axioms cannot be derived anymore."
--> this is a very strong and in practice unrealistic assumption,
and has not been mentioned before!

Review #3
Anonymous submitted on 19/Oct/2024
Suggestion:
Major Revision
Review Comment:

This submission studies combinations of methods for (i) finding wrong axioms, (ii) weakening them, and (iii) adding correct consequences, as a pipeline for debugging ontologies in the inexpressive description logic ELbottom (from now on called EL in this review).

The overarching goal is to primarily remove from an ontology some consequences that have been labelled as erroneous, but the authors allow also to add new other (correct) consequences to avoid losing information previously implicitly included in the ontology. Following existing approaches from the area of axiom pinpointing, the authors explain how to detect the potentially wrong axioms; they also consider a simple (very limited) variant of axiom weakening which avoids the problems of other axiom weakening proposals by restricting the shape of the possible weakenings; and finally detect from the class of consequences those that should be restored in the repaired ontology. The main contribution is an analysis of strategies for obtaining the final ontology in terms of correctness, completeness, and human effort along with an empirical study associated to it.

The general work is meaningful, although not extremely novel. It falls within the scope of the journal, and has potential, but there are issues of presentation and formalisation that should be fixed before it can be published.

First of all, repairs are defined as pairs of sets of axioms (A,D) where A are the axioms to add, and D the axioms to delete from the ontology. But the rest of the paper ignores this definition. In particular, the algorithms and experiments do not compute repairs in this sense.

The whole work is based on two properties that the authors call "less incorrect" and "more complete". These refer, in essence, to false positives and true positives from the ontology. Why is there no analogous notions for false negatives and true positives used?

Definition 5 is quite strange. It defines the super- and sub-concepts of a concept name, which will be used during the weakening phase. Yet, these are defined w.r.t. the *original* TBox, which we know that contains errors. This yields lots of superfluous elements, which need to be verified later.

After the definition (before Section 4.2), the authors contradict themselves in successive sentences. They say that axioms can only use concept names appearing in the TBox, before noting that atomic concepts not in the ontology can be used. It is either one or the other.

In Section 4.3, the authors refer to "Removing" which is "performed by applying Remove-axioms(T,D) as defined in Section 4.1". Going back to Section 4.1, "Remove-axioms" is just a set difference T\D. What is the scope of all this meandering? Things should be easy to understand.

Algorithm 1 is extremely inefficient because it requires to compute all the (potentially exponentially many) justifications first and then validate each axiom in each justification. This means that (i) if an axiom appears in many justifications, it is validated several times and (ii) the structure of the justifications is actually irrelevant. Finding just the *union* of the justifications would be much more efficient.

At the end of Section 5.1 the authors promise to prove some relationships in Section 5.2, but in reality they only give a high level argument that does not formally prove any of the claims.

The names of the algorithms (C14, C9, etc.) have no intuitive meaning. Where do these come from? Would it be possible to provide more meaningful names? Also, why are they presented out of order?

The experiments are simply presented through tables without a real analysis of what they mean. The results just show what is the resulting ontology, but that information is not meaningful without knowing other parameters like the resources used to compute them, or the dependency on the order of the axioms chosen. Also tables 5-7 have a second row with sequences of numbers that are never explained.

Minor comments:
- Definition 2: in this setting ontologies are TBoxes (as ABoxes were said to be ignored for the work), so it does not make sense to speak about "ontologies O represented by TBoxes T". The definition can be greatly simplified. Same for Definition 3, of course.
- Section 4: I was surprised not to see references to much work developed by Baader and his colleagues on debugging description logic ontologies and in particular on EL. Specifically:
[1,2] explain how to find all justifications and use this to correct errors in EL; [3] presents an overview on the problem; [4] suggests a different (more efficient) strategy to reduce the search space; and [5] shows that it is not necessary to compute all justifications and then all the hitting sets, but the sets of diagnoses can be found directly
- page 9 line 34: the text for figure 2b actually refers to 2c; similarly in line 35, the text refers to figure 2d.
- in Section 5.2, all the \sqsubseteq should be standard set inclusions

= References:

[1] Franz Baader, Rafael Peñaloza, Boontawee Suntisrivaraporn: Pinpointing in the Description Logic EL+. KI 2007: 52-67
[2] Franz Baader, Boontawee Suntisrivaraporn: Debugging SNOMED CT Using Axiom Pinpointing in the Description Logic EL+. KR-MED 2008
[3] Franz Baader, Rafael Peñaloza: Axiom Pinpointing in General Tableaux. J. Log. Comput. 20(1): 5-34 (2010)
[4] Zhangquan Zhou, Guilin Qi, Boontawee Suntisrivaraporn: A New Method of Finding All Justifications in OWL 2 EL. Web Intelligence 2013: 213-220
[5] Michel Ludwig, Rafael Peñaloza: Error-Tolerant Reasoning in the Description Logic EL. JELIA 2014: 107-121

Review #4
Anonymous submitted on 27/Oct/2024
Suggestion:
Minor Revision
Review Comment:

In)coherence:
p.3, sect.2, l.-5 of: "...a TBox is incoherent if it contains an unsatisfiable concept": As per definition, a TBox contains concept inclusions, not concepts. The constant ⊥, present in the language, is always unsatisfiable. Is ⊥ not considered a concept? Clarify.
Oracles & experts:
p.3, sect.3, l.7 of: "we did not require that an oracle always answers correctly...": It seems that some assumption(s) on oracle behaviour are desirable, like perhaps an oracle gives a correct answer every once in a while, for otherwise it is not clear why an oracle is any better than a random bit generator. It is also not clear whether an oracle should always give the same answers for two identical queries.
The difference, if there is any, between validating an axiom and an oracle query should be briefly explained, as well as any assumptions on validated axioms — are these supposed to always be correct?
Axioms & concept inclusions:
The authors use three terms — asserted axiom, axiom, and CGI — for what appears to be two concepts: an element of a given TBox, and a concept inclusion outside that TBox. There is a tendency to call an arbitrary concept inclusion an axiom. This is somewhat strange — unless this terminology follows an established tradition, in which case the authors should clarify their use of the terms, particularly that of "axiom". "Derived axioms" are typically called theorems.
Normalization and ⊥ :
From " ⊤ and ⊥ are not in SCC(T)" (p.4, l.-2) I conclude that ⊤ and ⊥ are not elements of N_C. With this understanding, the claim "Every EL⊥ TBox can ... be transformed into a normalized TBox..." is untrue: The inconsistent TBox { ⊤ ⊑ ⊥ } cannot be transformed into a normal TBox because any normal (as defined in 4.1) TBox is consistent — interpret all elements of N_C by Δ, and all elements of N_R by Δ×Δ.
Algorithm A11 is apparently only supposed to handle EL rather than EL⊥ TBoxes. It is unclear what the normalization of, say, the consistent { P ⊓ Q ⊑ ⊥ } should look like.
The authors should probably spell out (some) reasons for restricting to normal TBoxes.
p.4, Defs.2&3: Would it not make sense to identify ontologies with TBoxes for the purposes of the paper in order not to duplicate the T/O notation?
p.6, Algorithm 1: Is it OK that the algorithm may well ask to validate a given axiom more than once?
Page 2 of 3
ReviewSWJ(1) 27/10/2024, 12:14
p.7, Algorithms 2 and 3: It is unclear whether α ⊑ β is supposed to be in or outside T.
It is also unclear what ((sb ⊑ sb' ⋀ sp' ⊏ sp) ⋁ ...) means, as this is not a statement — did you forget to prefix this with " T ⊨ "? or perhaps with " T ∖ { α ⊑ β } ⊨ "?
p.11, sect.5.2: "the building blocks can be used to compare different combination algorithms...": That's like saying "flowers can be used to compare different bouquets".
p.25, sect.9: "We also introduced a way to compare combination strategies...": All conclusions have the form "Combination A is more (or equally) complete and incorrect than Combination B", and that more validation work benefits correctness. The authors do not state, let alone apply, any criteria for determining which completeness/correctness tradeoffs are sweeter than others.
English usage, misprints etc.:
p.3, sect.3, l.3 of: "few ... information" ↦ "little ... information" (twice)
p.5, ll.8–9: "more ... or equally complete (...), than the ontology...": delete comma. p.5, l.-9: "⋃_{S∈\cal S} S" ↦ "\bigcup \cal S"
p.7, l.-7: "the amounts of concepts" ↦ "the number of concepts"
p.8, sect.5; ff: "at the time" ↦ "at a time" (multiple occurrences): the expression "at the time" carries a different meaning, as in "It seemed like a good idea at the time". p.10, sect.5.2: " ⊑ " ↦ " ⊆ " (multiple occurrences): these are set inclusions, not concept inclusions.