--- title: "General introduction: iBreakDown plots for Sinking of the RMS Titanic" author: "Przemyslaw Biecek" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{General introduction: iBreakDown plots for Sinking of the RMS Titanic} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = FALSE, comment = "#>", fig.width = 7, fig.height = 3.5, warning = FALSE, message = FALSE ) ``` # Data for Titanic survival Let's see an example for `iBreakDown` plots for survival probability of Titanic passengers. First, let's see the data, we will find quite nice data from in the `DALEX` package (orginally `stablelearner`). ```{r} library("DALEX") head(titanic) ``` # Model for Titanic survival Ok, now it's time to create a model. Let's use the Random Forest model. ```{r} # prepare model library("randomForest") titanic <- na.omit(titanic) model_titanic_rf <- randomForest(survived == "yes" ~ gender + age + class + embarked + fare + sibsp + parch, data = titanic) model_titanic_rf ``` # Explainer for Titanic survival The third step (it's optional but useful) is to create a DALEX explainer for Random Forest model. ```{r} library("DALEX") explain_titanic_rf <- explain(model_titanic_rf, data = titanic[,-9], y = titanic$survived == "yes", label = "Random Forest v7") ``` # Break Down plot with D3 Let's see Break Down for model predictions for 8 years old male from 1st class that embarked from port C. ```{r} new_passanger <- data.frame( class = factor("1st", levels = c("1st", "2nd", "3rd", "deck crew", "engineering crew", "restaurant staff", "victualling crew")), gender = factor("male", levels = c("female", "male")), age = 8, sibsp = 0, parch = 0, fare = 72, embarked = factor("Southampton", levels = c("Belfast", "Cherbourg", "Queenstown", "Southampton")) ) ``` ## Calculate variable attributions ```{r} library("iBreakDown") rf_la <- local_attributions(explain_titanic_rf, new_passanger) rf_la ``` ## Plot attributions with `ggplot2` ```{r} plot(rf_la) ``` ## Plot attributions with `D3` ```{r} plotD3(rf_la) ``` ## Calculate uncertainty for variable attributions ```{r} rf_la_un <- break_down_uncertainty(explain_titanic_rf, new_passanger, path = "average") plot(rf_la_un) ``` ## Show only top features ```{r} plotD3(rf_la, max_features = 3) ``` ## Force OX axis to be from 0 to 1 ```{r} plotD3(rf_la, max_features = 3, min_max = c(0,1), margin = 0) ```