---
title: "Explanations in natural language"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Explanations in natural language}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = FALSE,
  comment = "#>",
  fig.width = 7,
  fig.height = 3.5,
  warning = FALSE,
  message = FALSE
)
```

## Introduction

We address the problem of the insuficient interpretability of explanations for domain experts. We solve this issue by introducing `describe()` function, which automatically generates natural language descriptions of explanations generated with `iBrakDown` package. 

## iBreakDown Package

The `iBreakDown` package allows for generating feature attribution explanations. Feature attribution explanations justify a model's prediction by showing which of the model's variables affect the prediction and to what extent. It is done by attaching to each variable an importance coefficient, which sum should approximate the model's prediction.

There are two methods used by `iBreakDown` package: `shap()` and `break_down()`.

Function `shap()` generates a SHAP explanation, that is, the function assigns Shapley values to each variable. Function `break_down` uses break_down algorithm to generate an efficient approximation of the Shapley values. We show how to generate both explanations on an easy example using titanic data set and explainers from `DALEX` package. 

First, we load the data set and build a random forest model classifying which of the passengers survived the sinking of the Titanic. Then, using `DALEX` package, we generate an explainer of the model. Lastly, we select a random passenger, which prediction's should be explained. 

```{r message=FALSE, warning=FALSE}
library("DALEX")
library("iBreakDown")
library("randomForest")
titanic <- na.omit(titanic)

model_titanic_rf <- randomForest(survived == "yes" ~ .,
                                 data = titanic 
                                 )

explain_titanic_rf <- explain(model_titanic_rf,
                            data = titanic[,-9],
                            y = titanic$survived == "yes",
                            label = "Random Forest")

passanger <- titanic[sample(nrow(titanic), 1) ,-9]
passanger
```

Now we are ready for generating `shap()` and `iBreakDown()` explanations. 

```{r}
bd_rf <- break_down(explain_titanic_rf,
                    passanger,
                    keep_distributions = TRUE) # distributions should be kept
shap_rf <- shap(explain_titanic_rf,
                passanger)

plot(bd_rf)
plot(shap_rf)
```

## Describing an explanation

The displayed explanations, despite their visual clarity, may not be interpretable for 
someone not familiar with iBreakDown or shap explanation. Therefore, we generate a simple natural language description for both explainers. 

```{r}
describe(bd_rf)
describe(shap_rf)
```


## Parameters of describe() function

Natural language descriptions should be flexible enough to generate a description with the desired level of specificity and length. We describe the parameters used for describing both explanations. As both explanations have the same parameters, we turn our attention to describe the iBreakDown explanation. 

#### Adjusting nonsignificance treshold

The nonsignificance treshold controls which predictions are close to the average prediction. By setting a higher value, more predictions will be described as close to the average model prediction, and more variables will be described as nonsignificant.

```{r}
describe(bd_rf, nonsignificance_treshold = 1)
```
#### Adjusting label of the explanation

The label of the prediction could be changed to display more specific descriptions.

```{r}
describe(bd_rf, 
         label = "the passanger survived with probability")
```
#### Short descriptions

Generating short descriptions can be useful, as they can make nice plot subtitles. 
```{r}
describe(bd_rf, short_description = TRUE)
```

#### Displaying values

Displaying variable values can easily make the description more informative. 

```{r}
describe(bd_rf, display_values = TRUE)
```

#### Displaying numbers

Displaying numbers changes the whole argumentation style making the description longer. 
```{r}
describe(bd_rf, display_numbers = TRUE)
```


#### Distribution details

Describing distribution details is useful if we want to have a big picture about other instances' behaviour.

```{r}
describe(bd_rf, display_distribution_details = TRUE)
```

#### SHAP

Explanations generated by `shap()` functions have the same arguments expected from `display_shap`, what add an additional information, if the calculated variable's contributions have high or low variability. 

```{r}
describe(shap_rf, display_shap = TRUE)
```

#### Combining all the parameters

Of course, all the arguments can be set according to preferences allowing for flexible natural language descriptions.

```{r}
describe(shap_rf,
         label = "the passanger survived with probability",
         display_values = TRUE,
         display_numbers = TRUE,
         display_shap = TRUE)
```