--- title: "Explanations in natural language" author: "Adam Izdebski" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Explanations in natural language} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = FALSE, comment = "#>", fig.width = 7, fig.height = 3.5, warning = FALSE, message = FALSE ) ``` # Introduction We adress the problem of insuficient interpretability of explanations for domain experts. We solve this issue by introducing `describe()` function, which automaticly generates natural language descriptions of explanations generated with `ingredients` package. # ingredients Package The `ingredients` package allows for generating prediction validation and predition perturbation explanations. They allow for both global and local model explanation. Generic function `decribe()` generates a natural language description for explanations generated with `feature_importance()`, `ceteris_paribus()` functions. To show generating automatic descriptions we first load the data set and build a random forest model classifying, which of the passangers survived sinking of the titanic. Then, using `DALEX` package, we generate an explainer of the model. Lastly we select a random passanger, which prediction's should be explained. ```{r message=FALSE, warning=FALSE} library("DALEX") library("ingredients") library("ranger") model_titanic_rf <- ranger(survived ~ ., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "Random Forest") passanger <- titanic_imputed[sample(nrow(titanic_imputed), 1) ,-8] passanger ``` Now we are ready for generating various explantions and then describing it with `describe()` function. ## Feature Importance Feature importance explanation shows the importance of all the model's variables. As it is a global explanation technique, no passanger need to be specified. ```{r} importance_rf <- feature_importance(explain_titanic_rf) plot(importance_rf) ``` Function `describe()` easily describes which variables are the most important. Argument `nonsignificance_treshold` as always sets the level above which variables become significant. For higher treshold, less variables will be described as significant. ```{r} describe(importance_rf) ``` ## Ceteris Paribus Profiles Ceteris Paribus profiles shows how the model's input changes with the change of a specified variable. ```{r} perturbed_variable <- "class" cp_rf <- ceteris_paribus(explain_titanic_rf, passanger, variables = perturbed_variable) plot(cp_rf, variable_type = "categorical") ``` For a user with no experience, interpreting the above plot may be not straightforward. Thus we generate a natural language description in order to make it easier. ```{r} describe(cp_rf) ``` Natural lannguage descriptions should be flexible in order to provide the desired level of complexity and specificity. Thus various parameters can modify the description being generated. ```{r} describe(cp_rf, display_numbers = TRUE, label = "the probability that the passanger will survive") ``` Please note, that `describe()` can handle only one variable at a time, so it is recommended to specify, which variables should be described. ```{r} describe(cp_rf, display_numbers = TRUE, label = "the probability that the passanger will survive", variables = perturbed_variable) ``` Continuous variables are described as well. ```{r} perturbed_variable_continuous <- "age" cp_rf <- ceteris_paribus(explain_titanic_rf, passanger) plot(cp_rf, variables = perturbed_variable_continuous) describe(cp_rf, variables = perturbed_variable_continuous) ``` Ceteris Paribus profiles are described only for a single observation. If we want to access the influence of more than one observation, we need to describe dependence profiles. ## Partial Dependence Profiles ```{r} pdp <- aggregate_profiles(cp_rf, type = "partial") plot(pdp, variables = "fare") describe(pdp, variables = "fare") ``` ```{r} pdp <- aggregate_profiles(cp_rf, type = "partial", variable_type = "categorical") plot(pdp, variables = perturbed_variable) describe(pdp, variables = perturbed_variable) ```