InterpretME : A Tool for Interpretations of Machine Learning Models Over Knowledge Graphs

Tracking #: 3237-4451

Disha Purohit
Yashrajsinh Chudasama
Philipp Rohde
Julian Gercke
Maria-Esther Vidal

Responsible editor: 
Guest Editors Tools Systems 2022

Submission type: 
Tool/System Report
In recent years, knowledge graphs have been considered as pyramids of interconnected data enriched with semantics for complex decision-making. The potential of knowledge graphs and the demand for interpretability of machine learning models in diverse domains (e.g., healthcare) have gained more attention. The lack of model transparency impacts negatively the understanding and, in consequence, interpretability of the predictions made by a model. Data-driven models should be empowered with the knowledge required to trace down their decisions, and the transformations made to the input data to increase model transparency. In this paper, we propose InterpretME, a tool for fine-grained representations, in a knowledge graph, of the main characteristics of trained machine learning models. They include data- (e.g., features' definition and SHACL validation) and model-based characteristics (e.g., relevant features, and interpretations of prediction probabilities and model decisions). InterpretME allows for the definition of a model's features over knowledge graphs; SHACL states domain integrity constraints. InterpretME traces the steps of data collection, curation, integration, and prediction, and documents the collected metadata in a knowledge graph. InterpretME is publicly available as a tool; it includes a pipeline for enhancing the interpretability of machine learning models, and a knowledge graph and an ontology to describe the main characteristics of trained machine learning models.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 16/Oct/2022
Review Comment:

In this paper, the authors present InterpretME. This tool aims at improving the interpretability of machine learning models by tracing all decisions and steps of a Machine Learning pipeline (e.g., data engineering, model building, exploitation, data exploration). The paper clearly motivates the need for such a tool and highlights the interest of relying on knowledge graphs and associated technologies for increased interpretability (e.g. SHACL validation, storing metadata in a knowledge graph). In particular, the benefits of this tool are presented for each of the various steps of predictive modeling frameworks. A motivating example from the biomedical domain is also provided and discussed to illustrate features of InterpretME. Additionally, an empirical evaluation allows to asses four research questions related to KGs for enhanced interpretability of ML models and the InterpretME tool. Tool and examples are provided via GitHub, Zenodo, and a python package available on

I have two major concerns regarding this paper.

First, the tool does not seem mature enough to me. As indicated in the paper, it is currently used in the context of three collaborative projects (CLARIFY, ImProVIT, and EraMed) in which some authors are involved. The authors also state that they expect an improvement of the evaluation of InterpretME "once [it] starts to be used in real-world applications" (p11 line 46). The provided GitHub repository has neither been stared, nor forked. The Zenodo page only shows 57 views and 4 downloads.

Second, some important features / limitations of the tool are unclear to me. For example
- are original data given to the Machine Learning model in the form of a KG or are they tabular? Page 2 line 4 mentions "data collected from KGs", Page 4 line 46 mentions "input KG", but the biomedical example involves tabular data.
- I think there is a problematic of alignement that is not clear to me. Page 8 line 41 states that "Entities in the InterpretME KG and the input KGs are aligned". If original data are in the form of a KG, why not reuse their URI instead of defining another URI? This need for several URIs is also mentioned elsewhere in the paper but not motivated.
- Independent and dependent variables are not defined
- LIME and Shape are mentioned in the paper but it seems that only LIME is implemented in InterpretME (page 12 line 28). This limitation could be more clearly stated at the beginning of the paper.
- is there only one InterpretME KG or could each run of the tool generate a new and separate InterpretME KG?
- Regarding the empirical evaluation (section 4)
- RQ1: could you motivate that an increase in the degree of an entity leads to a better interpretability of this entity?
- RQ4: would it be possible to provide additional explanations for the cause of the additional time? Especially as it seems not negligeable.

Minor comments:
- page 9 line 28: I would have expected more details about the extension of the KG
- page 3 line 36: please expand the EHR acronym

Potential additional reference:
Petar Ristoski, Heiko Paulheim: Semantic Web in data mining and knowledge discovery: A comprehensive survey. J. Web Semant. 36: 1-22 (2016)

Review #2
Anonymous submitted on 10/Nov/2022
Major Revision
Review Comment:

Summary of submission:

InterpretMe is analytical tool that traces the behaviour of predictive models built over data collected from KGs, for their explainability.

Using SHACL, the tool implements a set of integrity constraints that provide a meaningful description of a target entity of a prediction model.

Here it is unclear what target entity refers to. It might be clearer to describe the type of target entity: is it a hyper parameter, or a feature? It seems it is an instance from the train, val or test set.

The tool is targeted to predictive modelling (forecasting future outcomes based on past data) with data from KGs.

The tool's focus is automation assistance: interpretME captures metadata from the input KG (features, target classes), model and records SHACL constraints for data validation. InterpretME traces the optimised hyperparameters and estimated features’ relevancy, and records the model performance metric outcomes (e.g., precision) for a particular run. Moreover, SHACL validation reports are stored. Tracing the metadata collected from input KGs will help to provide explanations about the predictions made by the predictive models.

**This manuscript was submitted as 'Tools and Systems Report' and should be reviewed along the following dimensions:**

**(1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided).**

The authors argue that from three types of automation (mechanisation (algorithms that can run by themselves), composition (in which sequences of tasks can be performed) and assistance (where a user is assisted in algorithm and output interpretation) least work has been done on aiding automated assistance. The authors see knowledge graphs as a great potential for aiding with this assistance, —> *I really like this idea of using semantic web technologies for easy integration between an input dataset and metadata produced in the machine learning pipeline for better interpretability, although the impact could be made clearer in my opinion: how interpretability is aided, and what the added benefit of doing this with KGs is, possibly with some references.*

With a use case in which cancer patient features should predict lung cancer biomarkers, they report *five questions* oncologists would still have after using well-known tools (e.g. LIME and SHAP) for model interpretability, that should be answerable with their InterpretmeKG. —> *Where do these five questions come from? Do they come from real oncologists? Do these five questions cover interests of domain experts after performing a predictive task on their KG data?*

Section 4.1: I am not entirely sure I agree that a higher node degree necessarily means that an entity is more *human-interpretable*, if there is no example to illustrate how that extra node information helps an oncologist or other domain expert interpret the results. Maybe add some examples on the information that is added for nodes and what an oncologist or other expert can learn from that additional information.

(2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool. Please also assess the data file provided by the authors under “Long-term stable URL for resources”.

The paper does not contain many spelling errors, but the text could use a some grammar checks: some sentences miss an article here and there. Some sentences are a bit unclear as well and can be elaborated on.

It is clear what common methods miss by the motivating example. It is, however, not so clearly described what the limitations and requirements are for using this tool. Can you use it with any given KG, or does the KG + the training examples need to have a certain shape? What is the benefit of using this semantic technique over other non-semantic ones, specifically what is the benefit of mapping the data collected from the predictive models using RML to RDF? Does the user need to write its own SHACL shape constraints, and are there recommendations for doing so? It is unclear from Figure 4b what these constraints encode, maybe a more illuminating example would help. The motivating example mentions ‘great potential of integrating knowledge graphs with predictive modelling frameworks —> some references could be added here for clarification.

What is ‘the target entity’ referred to in the motivating example and also later in the texts? are these entities in the test dataset?

The three forms of automation from Bie et al. could be explained a bit better, the three short sentences are a big unclear (I had to look them up in the original paper to understand them).

In section 3.2 and image 3, as well as in the running example, it is at times unclear which part is done automatically by interpretMe, and which step is facilitated by InterpretMe but is a task for the user.

**4** empirical evaluation: ‘Each of the SHACL constraints validates a person’ —> could you give an example of a SHACL constraint here, since it would be the easier example to understand.

**4.1** —> unclear why degree distribution is a heading here, how does it relate to any of the RQs? Would help if there was a clearer mapping between the RQs and the headings that follow.

*WithInterpretME* —> I assume that this relates to the degree distribution on the input KG + the interpretmeKG, but this is not defined anywhere.

‘The execution of queries 1 and 4’ —> what do these queries query for?

**4.2 —>** ‘in terms of 20’ in terms of 20 what?

Some minor things:

Line 49 ‘entities of the target classes, e.g., HasSpouse’ —> is this not a relation?

Figure 4 text is very small & *og:1501042 → e*g:1501042 ? The part on entity alignment ‘entity alignment is performed to trace original entity of input KG with SHACL validation results and predictive modeling pipeline,’ is unclear to me. Why is entity alignment necessary and how is it done? Constraints could be more clearly described.

**(A) assess whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (D) whether the provided data artifacts are complete. **

URI for resources: the github contains an elaborate and has some worked out examples that can be run on the fly, making it easy for the user to replicate the experiments mentioned in the paper. The files are well organised. A brief descriptions of what the example queries do would be clarifying.

Review #3
Anonymous submitted on 01/Dec/2022
Minor Revision
Review Comment:


In this paper, the authors introduce a new tool named InterpretME which includes a pipeline to enable the interpretability of predictive machine learning models along with a knowledge graph and an ontology to support the process of describing the characteristics of trained ML models.


-Well motivated

-It is an important tool as it could be used in many domains to interpret the result of various ML models and also to facilitate their predictive improvement.

-All resources are made publicly available for reproducibility.


- The paper is not well organized .. some concepts/components need to be defined formally like the InterpretMe KG

- A detailed discussion of the results is necessary.

Detailed comments:

- Section 2.2 is well-formed by motivating the topic with an example.

- Since LIME and SHAP are some of the core components of the pipeline, they should be discussed in detail.

- On page 3 line 34, it is mentioned that the input lung cancer dataset is collected from an RDF KG. It would be more clear if it is also pointed out how the extraction was performed.

- What kind of constraints are considered? Does the tool support any type of constraint?

- InterpretMe KG is first mentioned on page 5 section 3.2 casually. It should rather be introduced properly with a formal definition.

- The results of the evaluation of the tool on the two datasets are not compared.


- What would be the quality of the tool in interpreting a fine-tuned model?

- Assume that the results of the interpretation are used to improve the accuracy by re-training the models, how could you interpret the re-trained model?

- How could the tool perform bias-free interpretation?