InterpretME: A Tool for Interpretations of Machine Learning Models Over Knowledge Graphs

Tracking #: 3404-4618

Yashrajsinh Chudasama
Disha Purohit
Philipp Rohde
Julian Gercke
Maria-Esther Vidal

Responsible editor: 
Guest Editors Tools Systems 2022

Submission type: 
Tool/System Report
In recent years, knowledge graphs have been considered pyramids of interconnected data enriched with semantics for complex decision-making. The potential of knowledge graphs and the demand for interpretability of machine learning (ML) models in diverse domains (e.g., healthcare) have gained more attention. The lack of model transparency negatively impacts the understanding and, in consequence, interpretability of the predictions made by a model. Data-driven models should be empowered with the knowledge required to trace down their decisions and the transformations made to the input data to increase model transparency. In this paper, we propose InterpretME, a tool for fine-grained representations in a knowledge graph, of the main characteristics of trained machine learning models. They include data- (e.g., features’ definition and SHACL validation) and model-based characteristics (e.g., relevant features and interpretations of prediction probabilities and model decisions). InterpretME allows for defining a model’s features over knowledge graphs (KGS) and relational data in various formats, including CSV and JSON; SHACL states domain integrity constraints. InterpretME traces the steps of data collection, curation, integration, and prediction; it documents the collected metadata in the InterpretME KG. InterpretME is publicly available as a tool; it includes a pipeline for enhancing the interpretability of ML models, the InterpretME KG, and an ontology to describe the main characteristics of trained ML models.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 21/May/2023
Minor Revision
Review Comment:

I would like to thank the authors for thoroughly addressing my remarks. I think the paper is now much clearer in presenting the tool, its features, advantages and limits.

I only have the following minor remarks:

- p2/l.3: "over data collected from KGs" and from CSV/JSON files, if I am not mistaken
- p2/l.4: "target entity" is used without being defined. I know it is defined later in the paper but it would help readers if it is defined at its first usage.
- p3/l.37-38-45: what are V_i and T_i(v)? Specifically, I am wondering the need for the subscript i which is not used anywhere else
- p3/Fig.1: "InterpretME KG shows an instance of a target entity". I find strange to instanciate an entity.
- p4/l.24: "input KG" or CSV/JSON file?
- p4/l.33-34: "pipeline for predictive modeling are unable to generate human- and machine-readable decisions to assist users and enhance their efficiency". I think it would be great to motivate here why LIME or SHAP do not offer such features / are not sufficient in your opinion.
- p7/l.27-32: The description of the RML mapping (especially the part about the predicate intr:hasDefinition) is a bit hard to understand here, without an example that is provided at the end of the paper (and not mentioned here). Either mention the example or provide the explanation when the example is described in the paper.
- p9/Fig.6: I am wondering what is the entity "eg:165883" in the Figure. Indeed, it seems to be a description of the datasets (has important features, has support, precision, recall), a run (has run to itself), but also a patient (same As to some patient 1501042), but also a list of patients (has the entity 1501042). I am wondering whether this is a mistake on the Figure or if the modelisation could be a bit further detailed to explain these points.
- p9/l.12-13: I cannot find the prediction probability (0.65 and 0.35) on the described Figure
- p10/l.40: "independent variables". As you mention the target class, shouldn't this be a dependent variable?
- p11/Fig.7: same comment as for Fig.6. How an entity (113315856) that represents a patient can have hyperparameters and precision (which are more dataset-level elements)?
- p13/Fig.9: Fig9b is not mentioned in the text when described and a bit hard to understand, even with the caption, and without the text.
- p13/l.42-23: "Since, the InterpretME KG covers a new domain of structured data, the eloquence of several suggested criteria is still relatively low". I am not sure to understand what you mean with this sentence. Could you provide additional details?
- p15/l.42-46: I think these lines do a great job at summarizing the contribution. They should also be in the introduction of the paper to better clarify what are the outputs.

Typos / incomplete sentences:
- Only one auhor has the asterisk for equal contribution
- p2/l.10: "in input data is collected from files"
- p4/Fig.2: "Figure 2b and Figure 2c showing"
- p5/Fig.3: "shows that such frameworks still lack in terms of interpretability and traceability"
- p5/l.34-35: "For instance, 'If a patient is given a drug for the treatment, then he/she is cured'"
- p6/Fig.4: "Tracing Metadata depicts traced metdata"
- p6/l.15-22: please split this sentence as it is very lenghty.
- p6/l.25-26: "The structure of the query enables entities in the input KGs can be aligned to the identifiers"
- p6/l.30: "This validation reports state"
- p6/l.32: "where True represents the particular entity is valid, inversely represented by False"
- p6/l.36: "binary features usable for predictive models"
- p6/l.37-38: "The Lung Cancer KG..." [I don't understand the purpose of this sentence here, so I think maybe it should be elsewhere?].
- p6/l.39: "Sampling strategies [are also received to] reduce data imbalance"
- p7/l.28: "RML triple map; it is composed"
- p7/l.41: "which is hard to interperet and understand the characteristics of the target entity"
- p7/l.48: "The prerequisites to run an example of the French Royalty KG with InterpretME is available". I also find that this sentence is a bit odd as it differ from the running example. You could maybe mention that it is another example that can be found online.
- p8/l.42-43: "patient with target class ALK is 872 instances while a ptient with Others as target class is 432"
- p9/l.3: "to trace [the] original entity of [the] input KG"
- p9/l.5: "to perform [the] predictive task"
- p9/l.28: "appendix 7"
- p9/l.39: "how all the properties that characterize the entity."
- p10/l.34: "For instance, in the predictive task about the patient positive for biomarker ALK."
- p10/l.37-38: "While InterpretME tries to increase more contextual edges of a patient node via annotating the other contextual information and behavior about the patient (e.g., Validation) in the pipeline."
- p10/l.39: "not only human-understandable easier but also machine-readable."
- p13/l.37: "Representational category quantities the form how information is available"
- p13-14/l.51-1: "comprises RDF triples 31,599"
- p14/l.23-24: "The possible answers are: i) the impact of knowledge graphs interpretability, ii) without knowledge graph is enough, iii) Maybe."
- p15/l.6: "cross-validation and classification report are"
- Please check ref [3] Hern_`andez-Orallo

Review #2
Anonymous submitted on 27/May/2023
Minor Revision
Review Comment:

This manuscript was submitted as 'Tools and Systems Report' and should be reviewed along the following dimensions:

(1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided).
Although a more extensive user study can be completed, first results indicate an increase in explainability of the retrieved results over existing tools (LIME). However, some details are missing from the user study: the questions about the notion of interpretability for LIME are omitted, as well as more details about how the study was performed (did the users ever use LIME or other tools before, did they work with the tools for a while, did they just get the output of the LIME vs the output of the interpretKG, and in which order?).
I appreciate that the tool has been used in a course to understand better how such a tool can be used in practice, and the fact that it is developed with domain experts makes it also more likely to be picked up by the community.

(2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess

There are still some minor grammatical errors, but not many. Overall, the readability has significantly improved due to additions over the previous version (e.g., new schematics).

(A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data,

The github is well organized and contains a README.

(B) whether the provided resources appear to be complete for replication of experiments, and if not, why,

Yes, provided resources appear complete.

(C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

Yes, github and zenodo used.

I like this idea of using semantic web technologies for easy integration between an input dataset and metadata produced in the machine learning pipeline for better interpretability, and the authors have made the impact of the tool clearer by including a first user study. Also, the questions I had in the first round have been adequately addressed. If more details for the user study are included, as requested, I would recommend the paper for publication.