--- title: "How to use shapper for regression" author: "Alicja Gosiewska" date: "`r Sys.Date()`" output: html_document: toc: true toc_float: true number_sections: true vignette: > %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{How to use shapper for regression} %\usepackage[UTF-8]{inputenc} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, eval = FALSE) ``` # Introduction The `shapper` is an R package which ports the `shap` python library in R. For details and examples see [shapper repository on github](https://github.com/ModelOriented/shapper) and [shapper website](https://modeloriented.github.io/shapper/). SHAP (SHapley Additive exPlanations) is a method to explain predictions of any machine learning model. For more details about this method see [shap repository on github](https://github.com/slundberg/shap). # Install shaper and shap ## R package shapper ```{r } library("shapper") ``` ## Python library shap To run shapper python library shap is required. It can be installed both by python or R. To install it throught R, you an use function `install_shap` from the `shapper` package. ```{r eval = FALSE} shapper::install_shap() ``` # Load data sets The example usage is presented on the `titanic` dataset form the R package `DALEX`. ```{r } library("DALEX") titanic_train <- titanic[,c("survived", "class", "gender", "age", "sibsp", "parch", "fare", "embarked")] titanic_train$survived <- factor(titanic_train$survived) titanic_train$gender <- factor(titanic_train$gender) titanic_train$embarked <- factor(titanic_train$embarked) titanic_train <- na.omit(titanic_train) head(titanic_train) ``` # Let's build a model ```{r } library("randomForest") set.seed(123) model_rf <- randomForest(survived ~ . , data = titanic_train) model_rf ``` ## Prediction to be explained Let's assume that we want to explain the prediction of a particular observation (male, 8 years old, traveling 1-st class embarked at C, without parents and siblings. ```{r } new_passanger <- data.frame( class = factor("1st", levels = c("1st", "2nd", "3rd", "deck crew", "engineering crew", "restaurant staff", "victualling crew")), gender = factor("male", levels = c("female", "male")), age = 8, sibsp = 0, parch = 0, fare = 72, embarked = factor("Cherbourg", levels = c("Belfast", "Cherbourg", "Queenstown", "Southampton")) ) ``` ## Here shapper starts To use the function `shap()` function (alias for `individual_variable_effect()`) we need four elements * a model, * a data set, * a function that calculated scores (predict function), * an instance (or instances) to be explained. The `shap()` function can be used directly with these four arguments, but for the simplicity here we are using the *DALEX* package with preimplemented predict functions. ```{r } library("DALEX") exp_rf <- explain(model_rf, data = titanic_train[,-1], y = as.numeric(titanic_train[,1])-1) ``` The explainer is an object that wraps up a model and meta-data. Meta data consists of, at least, the data set used to fit model and observations to explain. And now it's enough to generate SHAP attributions with explainer for RF model. ```{r } library("shapper") ive_rf <- shap(exp_rf, new_observation = new_passanger) ive_rf ``` # Plotting results ```{r } plot(ive_rf) ``` [](https://modeloriented.github.io/shapper/articles/shapper_regression_files/figure-html/unnamed-chunk-8-1.png)