Review Comment:
###Summary of the paper###
This paper considers the problem of explaining the behaviour of machine learning models. In particular, the authors propose a framework to explain the behaviour of a black-box classifier, which exploits the use of semantic annotations on the sample data. The meaning of such annotations is supported by an underlying ontology formulated using a Description Logic (DL).
The general idea of the framework is as follows.
- Given are: a classifier F, a set of data samples annotated with semantic descriptions from a vocabulary V and a DL ontology defined over V. For each data sample of interest, the assertional component of this ontology (the ABox) identifies it with a constant, states its classification according to F, and contains formulas expressing the corresponding semantic annotations.
- Goal: To produce a set of rules expressed over V that explain the behaviour of F in an understandable and meaningful way. More precisely, a rule produced by the framework is of the form:
R_C = Body(x,x_1,...,x_n) ----> C(x),
where Body is a conjunction of unary and binary predicates from V with parameters in {x,x_1,...,x_n}, and C is a unary predicate representing a classification class of F. Intuitively, a rule expresses sufficient conditions for an item to be classified in the class C.
Besides the proposed framework, the other main contributions of the paper are:
- A suite of algorithms to compute (approximate) explanation rules. These algorithms are based on the fact that finding a correct rule R_C can be reduced to finding a conjunctive query Q_C whose certain answers w.r.t. the ontology are all positive instances of C.
- Experiments are conducted to evaluate the quality of the queries produced by the algorithms, in terms of how accurate they represent the behaviour of the classifier.
###General Evaluation###
This paper is a revised version of a submission that I have previously reviewed. It is concerned with the topic of eXplainable Artificial Intelligence (XAI), which has recently drawn considerable attention in AI research, and it is definitely relevant for this journal.
Overall, the paper has been considerably improved w.r.t. previous versions. I believe the results are of value and interest for the Semantic Web community. There is only one minor issue, but nevertheless important, that I described below. Once this is addressed/corrected, I would recommend the paper to be accepted for publication.
###Minor issue###
One of the strategies used to merge queries in the proposed algorithms is to compute the Query Least Common Subsumer (QLCS). The existence of such query depends on assuming that every conjunctive query (CQ) contains an atom of the form TOP(x), where TOP is the well-known constructor from DLs. However, this assumption is not entirely consistent with the definition of CQs, i.e.,
- by definition, a CQ cannot have an empty body or an atom of the form TOP(x). Note that TOP is not a concept name. Therefore, it is wrong to assume that every CQ contains such an atom. This makes, in addition, the use of the empty query as a shorthand for { | TOP(x)} not well-defined.
One way to achieve the desired effect could be to add to the TBox the GCI $TOP \sqsubseteq A$ where $A$ is a fresh concept name, and then assume that all CQs contain the atom A(x). This does not change the set of certain answers, and should not be a problem for the query subsumption partial order.
Perhaps this is what the authors meant in the first place. However, this must be carefully explained.
###Some typos###
- p.4, l.38: ...expressivity of a *knowledge base* (instead of $\mathcal{K}$).
- p.7, l.8: ... is *the* main strength...
- p.10, l.16: please, mind the calligraphy used in $$.
- p.11, l.41: there should be space after the comma in *D,C*.
- , l.44: ...knowledge *base*...
- p.12, l.23: ...but only *by* a set...
- p.17, l.33: remove space before the comma at *...in Alg.1 , and...*
- , l.34: a missing comma after $a_1$.
|