An Abduction-based Method for Explaining Non-entailments of Semantic Matching

Tracking #: 3794-5008

Authors: 
Ivan Gocev
Georgios Meditskos
Nick Bassiliades

Responsible editor: 
Stefano Borgo

Submission type: 
Full Paper
Abstract: 
Explainable Artificial Intelligence (XAI) attempts to give explanations for decisions made by AI systems. Despite the fact that for knowledge-based systems this is perceived as inherently easier than for black-box AI systems based on Machine Learning, research is still required for computing satisfactory explanations of reasoning results. In this paper, we focus on explaining non-entailments as a result of semantic matching in EL⊥ ontologies. In the cases where the result of semantic matching is an entailment, the already established methods of justifications and proofs provide excellent results. On the other hand, the cases in which the result of semantic matching presents in the form of a non-entailment demand an alternative approach. Inspired by abductive reasoning techniques, we present a method for computing subtree isomorphisms between graphical representations of EL⊥ concept descriptions, which are then used to construct solutions to abduction problems, i.e. explanations, for semantic matching non-entailments in EL⊥ ontologies. We then illustrate our method with an example scenario and discuss the results.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 10/Feb/2025
Suggestion:
Minor Revision
Review Comment:

I am for the most part satisfied with the corrections to the previous version of the paper.

Some comments:

* Page 6, line 9: I think it's best to state somewhere that \pi is the transitive closure of the edge relation E

* Page 10, Definition 9: It is not clear to me why the definitions of the plugin and subsume isomorphisms follow the opposite 'directions' than the definition of plugin and subsume matches of page 7. Shouldn't they be \sqcap \gamma_{V_C} \sqsubseteq \sqcap \gamma_{V_D} for the former and \sqcap \gamma_{V_C} \sqsupseteq \sqcap \gamma_{V_D} for the latter, much like the previous version of the paper (with the typo corrected)?

* Page 10, Theorem 1: now the statement of the theorem talks only about the \equiv case, and not also of the plugin and subsume cases. It is fine that the proof only discusses the \equiv case and says that the other two are similar; but the statement should say that if there is an exact isomorphism then D \equiv C, if there is a plugin one then C \sqsubseteq D, et cetera.

* Page 13, definition 2: likewise, it is not clear to me why in the plugin and subsume cases we have S2 before S1 rather than the other way around. It works either way, I think - after all, it is an isomorphism between subtrees - but I find it a little confusing that in Definition 5 a plugin match proved that C \sqsubseteq D where here a plugin subtree isomorphism is a way to prove that D \sqsubseteq C.

* Page 17: Proposition 4 and Corollary 5 are well known and don't need a proof, I think (it's harmless to keep them, but in my opinion they could be removed).

* Page 18: Same for Proposition 6.

* Page 21, line 1: it is not enough to prove that P(\phi_i) is true for \phi_i = \phi_0 \cup m_i: this would just show that the statement works for the root and one node below it. What we need is that if P(\phi) is true and m \in M(v', w') where one of the leaves of \phi associates v' to m' then P(\phi \cup m) is also true.

The idea of the proof is correct; but I think that the presentation still needs improvement (for example, in line 30 it is stated that P(\phi_i) is true, but this is not the same \phi_i mentioned on top of the page but another expression).

* Page 27, table 1: I find it interesting that, when it comes to the number of solutions #H, the median seems nearly constant (1 or 2) while the mean grows with the number of concepts: it would appear that, no matter how many concepts we have, most abduction problems have one or few solutions, but that a few problems have a high number of solutions. Why is that the case? Do you have any intuition to offer about this?

* Page 29, figure 8: it is not clear what the shaded area in the graph represents. Also, I'm not sure that I would say that figure 8 presents a convincing positive correlation between the Jaccard indexes and the number of solutions: yes, most examples in which there are many solutions have a somewhat high index, but in Scenarios 2 and 3 it almost seems that if the index is too high there are few or no solutions...

* Page 29: "Since we are dealing with isomorphisms, the ratio of the average branching factors is calculated as the lower value divided by the larger value". It is not clear to me what this means. It cannot mean that we take the average branching factors of the two trees and we divide the smaller by the larger, because then the ratio would be always <= 1 (and this is not what I'm seeing in Figure 9...)

Review #2
Anonymous submitted on 13/Feb/2025
Suggestion:
Minor Revision
Review Comment:

The quality of the paper has greatly improved, and many of the points raised have been sufficiently addressed.
However, some points are still missing and some additions are not fully convincing:

1) Especially the experimental part is still not convincing: although there is now an application to a real dataset, it is still artificial in the sense that random concepts are aligned, which is not a classical use case. Especially the runtime comparison is not expressive under these circumstances.
Therefore, it would be necessary to at least determine the trivial cases and exclude them from the runtime analysis. For statements comparing this approach with others, a more in-depth analysis is necessary: Have these experiments been performed on the same data? What are the numerical results? The statements "it's faster" and "more concise solutions" alone are not convincing.
In addition, the quality of the results should be considered in more detail: is the explanatory power really higher with this approach than with the others?

2) In the abstract and the introduction, the experimental result is not stated on the spot, instead of "discuss the result" it should be "improve existing results by..." or similar.

3) The section on non-entailments in semantic matching is still too long and contains too many references. It should be more focused on the intuition of using it as a motivation for the non-entailment problem.

4) p.6: pi is defined after it is used, and the definition of rooted trees could be improved in terms of structure.

5) (9) still seems to be wrong, possibly no quantification over v_i\in V

6) Theorem 1: The other two cases are not mentioned in the theorem, only in the proof.

7) Definition 11, Oberservation 1: T[S] is not a tree, as S could contain vertices of T that are not connected.

8) In (21): The overloading of i is confusing

Overall, although the quality of the paper has improved, especially the theoretical part, the experimental part is still not convincing and needs improvement. In addition, the entire paper would benefit from proofreading, especially for minor inaccuracies (some of which are noted below).

Minor errors:
- p.1,l.41: Subsymbolic -> subsymbolic
- sometimes knowledge based systems, sometimes knowledge-based systems, sometimes knowledge base systems
- p.4,l.42: generate
- p.7,l.28: $match_\sqcap$
- p.8,l.39: B_1 instead of A_2
- e.g., p.8,l.50: sometimes N_\mathcal{C,R}, sometimes N_C,R
- p.11,Fig.1: A_2 -> A_5 and r- and s-arrows not in blue for better readability
- p.14,l.51+p.15,l.1: \rho_\subseteq?
- p.22: Apendix ??
- p.22,l.16: delete "currently"
- p.24, Fig.5: change v_4 and v_3
- p.28, Fig.6: legend for colors of scenarios 1,2,3 missing
- p.28, Fig.7: 1), 2), 3)
- p.30, Fig.9: the scale of the graphs makes reading hard, the outliers are too extreme.