Title: | Effects and Importances of Model Ingredients |
---|---|
Description: | Collection of tools for assessment of feature importance and feature effects. Key functions are: feature_importance() for assessment of global level feature importance, ceteris_paribus() for calculation of the what-if plots, partial_dependence() for partial dependence plots, conditional_dependence() for conditional dependence plots, accumulated_dependence() for accumulated local effects plots, aggregate_profiles() and cluster_profiles() for aggregation of ceteris paribus profiles, generic print() and plot() for better usability of selected explainers, generic plotD3() for interactive, D3 based explanations, and generic describe() for explanations in natural language. The package 'ingredients' is a part of the 'DrWhy.AI' universe (Biecek 2018) <arXiv:1806.08915>. |
Authors: | Przemyslaw Biecek [aut, cre] , Hubert Baniecki [aut] , Adam Izdebski [ctb] |
Maintainer: | Przemyslaw Biecek <[email protected]> |
License: | GPL-3 |
Version: | 2.3.1 |
Built: | 2024-12-30 06:22:54 UTC |
Source: | https://github.com/modeloriented/ingredients |
Accumulated Local Effects Profiles accumulate local changes in Ceteris Paribus Profiles.
Function accumulated_dependence
calls ceteris_paribus
and then aggregate_profiles
.
accumulated_dependence(x, ...) ## S3 method for class 'explainer' accumulated_dependence( x, variables = NULL, N = 500, variable_splits = NULL, grid_points = 101, ..., variable_type = "numerical" ) ## Default S3 method: accumulated_dependence( x, data, predict_function = predict, label = class(x)[1], variables = NULL, N = 500, variable_splits = NULL, grid_points = 101, ..., variable_type = "numerical" ) ## S3 method for class 'ceteris_paribus_explainer' accumulated_dependence(x, ..., variables = NULL) accumulated_dependency(x, ...)
accumulated_dependence(x, ...) ## S3 method for class 'explainer' accumulated_dependence( x, variables = NULL, N = 500, variable_splits = NULL, grid_points = 101, ..., variable_type = "numerical" ) ## Default S3 method: accumulated_dependence( x, data, predict_function = predict, label = class(x)[1], variables = NULL, N = 500, variable_splits = NULL, grid_points = 101, ..., variable_type = "numerical" ) ## S3 method for class 'ceteris_paribus_explainer' accumulated_dependence(x, ..., variables = NULL) accumulated_dependency(x, ...)
x |
an explainer created with function |
... |
other parameters |
variables |
names of variables for which profiles shall be calculated.
Will be passed to |
N |
number of observations used for calculation of partial dependence profiles.
By default, |
variable_splits |
named list of splits for variables, in most cases created with |
grid_points |
number of points for profile. Will be passed to |
variable_type |
a character. If |
data |
validation dataset Will be extracted from |
predict_function |
predict function Will be extracted from |
label |
name of the model. By default it's extracted from the |
Find more detailes in the Accumulated Local Dependence Chapter.
an object of the class aggregated_profiles_explainer
ALEPlot: Accumulated Local Effects (ALE) Plots and Partial Dependence (PD) Plots https://cran.r-project.org/package=ALEPlot, Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8], verbose = FALSE) adp_glm <- accumulated_dependence(explain_titanic_glm, N = 25, variables = c("age", "fare")) head(adp_glm) plot(adp_glm) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) adp_rf <- accumulated_dependence(explain_titanic_rf, N = 200, variable_type = "numerical") plot(adp_rf) adp_rf <- accumulated_dependence(explain_titanic_rf, N = 200, variable_type = "categorical") plotD3(adp_rf, label_margin = 80, scale_plot = TRUE)
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8], verbose = FALSE) adp_glm <- accumulated_dependence(explain_titanic_glm, N = 25, variables = c("age", "fare")) head(adp_glm) plot(adp_glm) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) adp_rf <- accumulated_dependence(explain_titanic_rf, N = 200, variable_type = "numerical") plot(adp_rf) adp_rf <- accumulated_dependence(explain_titanic_rf, N = 200, variable_type = "categorical") plotD3(adp_rf, label_margin = 80, scale_plot = TRUE)
The function aggregate_profiles()
calculates an aggregate of ceteris paribus profiles.
It can be: Partial Dependence Profile (average across Ceteris Paribus Profiles),
Conditional Dependence Profile (local weighted average across Ceteris Paribus Profiles) or
Accumulated Local Dependence Profile (cummulated average local changes in Ceteris Paribus Profiles).
aggregate_profiles( x, ..., variable_type = "numerical", groups = NULL, type = "partial", variables = NULL, span = 0.25, center = FALSE )
aggregate_profiles( x, ..., variable_type = "numerical", groups = NULL, type = "partial", variables = NULL, span = 0.25, center = FALSE )
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be calculated together |
variable_type |
a character. If |
groups |
a variable name that will be used for grouping.
By default |
type |
either |
variables |
if not |
span |
smoothing coefficient, by default |
center |
by default accumulated profiles start at 0. If |
an object of the class aggregated_profiles_explainer
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") library("ranger") head(titanic_imputed) model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_imputed, n = 100) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) head(cp_rf) # continuous variable pdp_rf_p <- aggregate_profiles(cp_rf, variables = "age", type = "partial") pdp_rf_p$`_label_` <- "RF_partial" pdp_rf_c <- aggregate_profiles(cp_rf, variables = "age", type = "conditional") pdp_rf_c$`_label_` <- "RF_conditional" pdp_rf_a <- aggregate_profiles(cp_rf, variables = "age", type = "accumulated") pdp_rf_a$`_label_` <- "RF_accumulated" plot(pdp_rf_p, pdp_rf_c, pdp_rf_a, color = "_label_") pdp_rf <- aggregate_profiles(cp_rf, variables = "age", groups = "gender") head(pdp_rf) plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") + show_rugs(cp_rf, variables = "age", color = "red") + show_aggregated_profiles(pdp_rf, size = 3, color = "_label_") # categorical variable pdp_rf_p <- aggregate_profiles(cp_rf, variables = "class", variable_type = "categorical", type = "partial") pdp_rf_p$`_label_` <- "RF_partial" pdp_rf_c <- aggregate_profiles(cp_rf, variables = "class", variable_type = "categorical", type = "conditional") pdp_rf_c$`_label_` <- "RF_conditional" pdp_rf_a <- aggregate_profiles(cp_rf, variables = "class", variable_type = "categorical", type = "accumulated") pdp_rf_a$`_label_` <- "RF_accumulated" plot(pdp_rf_p, pdp_rf_c, pdp_rf_a, color = "_label_") # or maybe flipped? library(ggplot2) plot(pdp_rf_p, pdp_rf_c, pdp_rf_a, color = "_label_") + coord_flip() pdp_rf <- aggregate_profiles(cp_rf, variables = "class", variable_type = "categorical", groups = "gender") head(pdp_rf) plot(pdp_rf, variables = "class") # or maybe flipped? plot(pdp_rf, variables = "class") + coord_flip()
library("DALEX") library("ingredients") library("ranger") head(titanic_imputed) model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_imputed, n = 100) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) head(cp_rf) # continuous variable pdp_rf_p <- aggregate_profiles(cp_rf, variables = "age", type = "partial") pdp_rf_p$`_label_` <- "RF_partial" pdp_rf_c <- aggregate_profiles(cp_rf, variables = "age", type = "conditional") pdp_rf_c$`_label_` <- "RF_conditional" pdp_rf_a <- aggregate_profiles(cp_rf, variables = "age", type = "accumulated") pdp_rf_a$`_label_` <- "RF_accumulated" plot(pdp_rf_p, pdp_rf_c, pdp_rf_a, color = "_label_") pdp_rf <- aggregate_profiles(cp_rf, variables = "age", groups = "gender") head(pdp_rf) plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") + show_rugs(cp_rf, variables = "age", color = "red") + show_aggregated_profiles(pdp_rf, size = 3, color = "_label_") # categorical variable pdp_rf_p <- aggregate_profiles(cp_rf, variables = "class", variable_type = "categorical", type = "partial") pdp_rf_p$`_label_` <- "RF_partial" pdp_rf_c <- aggregate_profiles(cp_rf, variables = "class", variable_type = "categorical", type = "conditional") pdp_rf_c$`_label_` <- "RF_conditional" pdp_rf_a <- aggregate_profiles(cp_rf, variables = "class", variable_type = "categorical", type = "accumulated") pdp_rf_a$`_label_` <- "RF_accumulated" plot(pdp_rf_p, pdp_rf_c, pdp_rf_a, color = "_label_") # or maybe flipped? library(ggplot2) plot(pdp_rf_p, pdp_rf_c, pdp_rf_a, color = "_label_") + coord_flip() pdp_rf <- aggregate_profiles(cp_rf, variables = "class", variable_type = "categorical", groups = "gender") head(pdp_rf) plot(pdp_rf, variables = "class") # or maybe flipped? plot(pdp_rf, variables = "class") + coord_flip()
This is an aesthetically efficient implementation of the
grid.arrange
bind_plots(..., byrow = FALSE)
bind_plots(..., byrow = FALSE)
... |
( |
byrow |
( |
(gtable
) A plottable object with plot()
.
library("DALEX") library("ingredients") titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_glm <- explain(titanic_glm, data = titanic_imputed, y = titanic_imputed$survived, verbose = FALSE) pdp_numerical <- partial_dependence(explain_glm, N = 50, variable_type = "numerical") pdp_categorical <- partial_dependence(explain_glm, N = 50, variable_type = "categorical") # Bind plots by rows bind_plots(plot(pdp_numerical), plot(pdp_categorical), byrow = TRUE) # Bind plots by columns bind_plots(plot(pdp_numerical), plot(pdp_categorical), byrow = FALSE)
library("DALEX") library("ingredients") titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_glm <- explain(titanic_glm, data = titanic_imputed, y = titanic_imputed$survived, verbose = FALSE) pdp_numerical <- partial_dependence(explain_glm, N = 50, variable_type = "numerical") pdp_categorical <- partial_dependence(explain_glm, N = 50, variable_type = "categorical") # Bind plots by rows bind_plots(plot(pdp_numerical), plot(pdp_categorical), byrow = TRUE) # Bind plots by columns bind_plots(plot(pdp_numerical), plot(pdp_categorical), byrow = FALSE)
Oscillations are proxies for local feature importance at the instance level. Find more details in Ceteris Paribus Oscillations Chapter.
calculate_oscillations(x, sort = TRUE, ...)
calculate_oscillations(x, sort = TRUE, ...)
x |
a ceteris paribus explainer produced with the |
sort |
a logical value. If |
... |
other arguments |
an object of the class ceteris_paribus_oscillations
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313) # build a model model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_small, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_small[,-8], y = titanic_small[,8]) cp_rf <- ceteris_paribus(explain_titanic_glm, titanic_small[1,]) calculate_oscillations(cp_rf) library("ranger") apartments_rf_model <- ranger(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartments_test[,-1], y = apartments_test$m2.price, label = "ranger forest", verbose = FALSE) apartment <- apartments_test[1,] cp_rf <- ceteris_paribus(explainer_rf, apartment) calculate_oscillations(cp_rf)
library("DALEX") library("ingredients") titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313) # build a model model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_small, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_small[,-8], y = titanic_small[,8]) cp_rf <- ceteris_paribus(explain_titanic_glm, titanic_small[1,]) calculate_oscillations(cp_rf) library("ranger") apartments_rf_model <- ranger(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartments_test[,-1], y = apartments_test$m2.price, label = "ranger forest", verbose = FALSE) apartment <- apartments_test[1,] cp_rf <- ceteris_paribus(explainer_rf, apartment) calculate_oscillations(cp_rf)
This function calculates individual variable profiles (ceteris paribus profiles), i.e. series of predictions from a model calculated for observations with altered single coordinate.
calculate_variable_profile( data, variable_splits, model, predict_function = predict, ... ) ## Default S3 method: calculate_variable_profile( data, variable_splits, model, predict_function = predict, ... )
calculate_variable_profile( data, variable_splits, model, predict_function = predict, ... ) ## Default S3 method: calculate_variable_profile( data, variable_splits, model, predict_function = predict, ... )
data |
set of observations. Profile will be calculated for every observation (every row) |
variable_splits |
named list of vectors. Elements of the list are vectors with points in which profiles should be calculated. See an example for more details. |
model |
a model that will be passed to the |
predict_function |
function that takes data and model and returns numeric predictions. Note that the ... arguments will be passed to this function. |
... |
other parameters that will be passed to the |
Note that calculate_variable_profile
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
a data frame with profiles for selected variables and selected observations
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
This function calculate candidate splits for each selected variable. For numerical variables splits are calculated as percentiles (in general uniform quantiles of the length grid_points). For all other variables splits are calculated as unique values.
calculate_variable_split( data, variables = colnames(data), grid_points = 101, variable_splits_type = "quantiles", new_observation = NA ) ## Default S3 method: calculate_variable_split( data, variables = colnames(data), grid_points = 101, variable_splits_type = "quantiles", new_observation = NA )
calculate_variable_split( data, variables = colnames(data), grid_points = 101, variable_splits_type = "quantiles", new_observation = NA ) ## Default S3 method: calculate_variable_split( data, variables = colnames(data), grid_points = 101, variable_splits_type = "quantiles", new_observation = NA )
data |
validation dataset. Is used to determine distribution of observations. |
variables |
names of variables for which splits shall be calculated |
grid_points |
number of points used for response path |
variable_splits_type |
how variable grids shall be calculated? Use "quantiles" (default) for percentiles or "uniform" to get uniform grid of points |
new_observation |
if specified (not |
Note that calculate_variable_split
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
A named list with splits for selected variables
This explainer works for individual observations. For each observation it calculates Ceteris Paribus Profiles for selected variables. Such profiles can be used to hypothesize about model results if selected variable is changed. For this reason it is also called 'What-If Profiles'.
ceteris_paribus(x, ...) ## S3 method for class 'explainer' ceteris_paribus( x, new_observation, y = NULL, variables = NULL, variable_splits = NULL, grid_points = 101, variable_splits_type = "quantiles", ... ) ## Default S3 method: ceteris_paribus( x, data, predict_function = predict, new_observation, y = NULL, variables = NULL, variable_splits = NULL, grid_points = 101, variable_splits_type = "quantiles", variable_splits_with_obs = FALSE, label = class(x)[1], ... )
ceteris_paribus(x, ...) ## S3 method for class 'explainer' ceteris_paribus( x, new_observation, y = NULL, variables = NULL, variable_splits = NULL, grid_points = 101, variable_splits_type = "quantiles", ... ) ## Default S3 method: ceteris_paribus( x, data, predict_function = predict, new_observation, y = NULL, variables = NULL, variable_splits = NULL, grid_points = 101, variable_splits_type = "quantiles", variable_splits_with_obs = FALSE, label = class(x)[1], ... )
x |
an explainer created with the |
... |
other parameters |
new_observation |
a new observation with columns that corresponds to variables used in the model |
y |
true labels for |
variables |
names of variables for which profiles shall be calculated.
Will be passed to |
variable_splits |
named list of splits for variables, in most cases created with |
grid_points |
maximum number of points for profile calculations. Note that the finaln number of points may be lower than |
variable_splits_type |
how variable grids shall be calculated? Use "quantiles" (default) for percentiles or "uniform" to get uniform grid of points |
data |
validation dataset. It will be extracted from |
predict_function |
predict function. It will be extracted from |
variable_splits_with_obs |
if |
label |
name of the model. By default it's extracted from the |
Find more details in Ceteris Paribus Chapter.
an object of the class ceteris_paribus_explainer
.
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313) # build a model model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_small, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_small[,-8], y = titanic_small[,8]) cp_rf <- ceteris_paribus(explain_titanic_glm, titanic_small[1,]) cp_rf plot(cp_rf, variables = "age") library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) # select few passangers selected_passangers <- select_sample(titanic_imputed, n = 20) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) cp_rf plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") + show_rugs(cp_rf, variables = "age", color = "red")
library("DALEX") library("ingredients") titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313) # build a model model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_small, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_small[,-8], y = titanic_small[,8]) cp_rf <- ceteris_paribus(explain_titanic_glm, titanic_small[1,]) cp_rf plot(cp_rf, variables = "age") library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) # select few passangers selected_passangers <- select_sample(titanic_imputed, n = 20) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) cp_rf plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") + show_rugs(cp_rf, variables = "age", color = "red")
This function calculates ceteris paribus profiles for grid of values spanned by two variables. It may be useful to identify or present interactions between two variables.
ceteris_paribus_2d(explainer, observation, grid_points = 101, variables = NULL)
ceteris_paribus_2d(explainer, observation, grid_points = 101, variables = NULL)
explainer |
a model to be explained, preprocessed by the |
observation |
a new observation for which predictions need to be explained |
grid_points |
number of points used for response path. Will be used for both variables |
variables |
if specified, then only these variables will be explained |
an object of the class ceteris_paribus_2d_explainer
.
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8]) cp_rf <- ceteris_paribus_2d(explain_titanic_glm, titanic_imputed[1,], variables = c("age", "fare", "sibsp")) head(cp_rf) plot(cp_rf) library("ranger") set.seed(59) apartments_rf_model <- ranger(m2.price ~., data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartments_test[,-1], y = apartments_test[,1], label = "ranger forest", verbose = FALSE) new_apartment <- apartments_test[1,] new_apartment wi_rf_2d <- ceteris_paribus_2d(explainer_rf, observation = new_apartment, variables = c("surface", "floor", "no.rooms")) head(wi_rf_2d) plot(wi_rf_2d)
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8]) cp_rf <- ceteris_paribus_2d(explain_titanic_glm, titanic_imputed[1,], variables = c("age", "fare", "sibsp")) head(cp_rf) plot(cp_rf) library("ranger") set.seed(59) apartments_rf_model <- ranger(m2.price ~., data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartments_test[,-1], y = apartments_test[,1], label = "ranger forest", verbose = FALSE) new_apartment <- apartments_test[1,] new_apartment wi_rf_2d <- ceteris_paribus_2d(explainer_rf, observation = new_apartment, variables = c("surface", "floor", "no.rooms")) head(wi_rf_2d) plot(wi_rf_2d)
This function calculates aggregates of ceteris paribus profiles based on hierarchical clustering.
cluster_profiles( x, ..., aggregate_function = mean, variable_type = "numerical", center = FALSE, k = 3, variables = NULL )
cluster_profiles( x, ..., aggregate_function = mean, variable_type = "numerical", center = FALSE, k = 3, variables = NULL )
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
aggregate_function |
a function for profile aggregation. By default it's |
variable_type |
a character. If |
center |
shall profiles be centered before clustering |
k |
number of clusters for the hclust function |
variables |
if not |
Find more detailes in the Clustering Profiles Chapter.
an object of the class aggregated_profiles_explainer
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") selected_passangers <- select_sample(titanic_imputed, n = 100) model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8]) cp_rf <- ceteris_paribus(explain_titanic_glm, selected_passangers) clust_rf <- cluster_profiles(cp_rf, k = 3, variables = "age") plot(clust_rf) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) cp_rf pdp_rf <- aggregate_profiles(cp_rf, variables = "age") head(pdp_rf) clust_rf <- cluster_profiles(cp_rf, k = 3, variables = "age") head(clust_rf) plot(clust_rf, color = "_label_") + show_aggregated_profiles(pdp_rf, color = "black", size = 3) plot(cp_rf, color = "grey", variables = "age") + show_aggregated_profiles(clust_rf, color = "_label_", size = 2) clust_rf <- cluster_profiles(cp_rf, k = 3, center = TRUE, variables = "age") head(clust_rf)
library("DALEX") library("ingredients") selected_passangers <- select_sample(titanic_imputed, n = 100) model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8]) cp_rf <- ceteris_paribus(explain_titanic_glm, selected_passangers) clust_rf <- cluster_profiles(cp_rf, k = 3, variables = "age") plot(clust_rf) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) cp_rf pdp_rf <- aggregate_profiles(cp_rf, variables = "age") head(pdp_rf) clust_rf <- cluster_profiles(cp_rf, k = 3, variables = "age") head(clust_rf) plot(clust_rf, color = "_label_") + show_aggregated_profiles(pdp_rf, color = "black", size = 3) plot(cp_rf, color = "grey", variables = "age") + show_aggregated_profiles(clust_rf, color = "_label_", size = 2) clust_rf <- cluster_profiles(cp_rf, k = 3, center = TRUE, variables = "age") head(clust_rf)
Conditional Dependence Profiles (aka Local Profiles) average localy Ceteris Paribus Profiles. Function 'conditional_dependence' calls 'ceteris_paribus' and then 'aggregate_profiles'.
conditional_dependence(x, ...) ## S3 method for class 'explainer' conditional_dependence( x, variables = NULL, N = 500, variable_splits = NULL, grid_points = 101, ..., variable_type = "numerical" ) ## Default S3 method: conditional_dependence( x, data, predict_function = predict, label = class(x)[1], variables = NULL, N = 500, variable_splits = NULL, grid_points = 101, ..., variable_type = "numerical" ) ## S3 method for class 'ceteris_paribus_explainer' conditional_dependence(x, ..., variables = NULL) local_dependency(x, ...) conditional_dependency(x, ...)
conditional_dependence(x, ...) ## S3 method for class 'explainer' conditional_dependence( x, variables = NULL, N = 500, variable_splits = NULL, grid_points = 101, ..., variable_type = "numerical" ) ## Default S3 method: conditional_dependence( x, data, predict_function = predict, label = class(x)[1], variables = NULL, N = 500, variable_splits = NULL, grid_points = 101, ..., variable_type = "numerical" ) ## S3 method for class 'ceteris_paribus_explainer' conditional_dependence(x, ..., variables = NULL) local_dependency(x, ...) conditional_dependency(x, ...)
x |
an explainer created with function |
... |
other parameters |
variables |
names of variables for which profiles shall be calculated.
Will be passed to |
N |
number of observations used for calculation of partial dependence profiles. By default |
variable_splits |
named list of splits for variables, in most cases created with |
grid_points |
number of points for profile. Will be passed to |
variable_type |
a character. If |
data |
validation dataset, will be extracted from |
predict_function |
predict function, will be extracted from |
label |
name of the model. By default it's extracted from the |
Find more details in the Accumulated Local Dependence Chapter.
an object of the class aggregated_profile_explainer
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8], verbose = FALSE) cdp_glm <- conditional_dependence(explain_titanic_glm, N = 150, variables = c("age", "fare")) head(cdp_glm) plot(cdp_glm) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) cdp_rf <- conditional_dependence(explain_titanic_rf, N = 200, variable_type = "numerical") plot(cdp_rf) cdp_rf <- conditional_dependence(explain_titanic_rf, N = 200, variable_type = "categorical") plotD3(cdp_rf, label_margin = 100, scale_plot = TRUE)
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8], verbose = FALSE) cdp_glm <- conditional_dependence(explain_titanic_glm, N = 150, variables = c("age", "fare")) head(cdp_glm) plot(cdp_glm) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) cdp_rf <- conditional_dependence(explain_titanic_rf, N = 200, variable_type = "numerical") plot(cdp_rf) cdp_rf <- conditional_dependence(explain_titanic_rf, N = 200, variable_type = "categorical") plotD3(cdp_rf, label_margin = 100, scale_plot = TRUE)
Generic function describe
generates a natural language
description of ceteris_paribus()
, aggregated_profiles()
and
feature_importance()
explanations what enchaces their interpretability.
## S3 method for class 'partial_dependence_explainer' describe( x, nonsignificance_treshold = 0.15, ..., display_values = FALSE, display_numbers = FALSE, variables = NULL, label = "prediction" ) describe(x, ...) ## S3 method for class 'ceteris_paribus_explainer' describe( x, nonsignificance_treshold = 0.15, ..., display_values = FALSE, display_numbers = FALSE, variables = NULL, label = "prediction" ) ## S3 method for class 'feature_importance_explainer' describe(x, nonsignificance_treshold = 0.15, ...)
## S3 method for class 'partial_dependence_explainer' describe( x, nonsignificance_treshold = 0.15, ..., display_values = FALSE, display_numbers = FALSE, variables = NULL, label = "prediction" ) describe(x, ...) ## S3 method for class 'ceteris_paribus_explainer' describe( x, nonsignificance_treshold = 0.15, ..., display_values = FALSE, display_numbers = FALSE, variables = NULL, label = "prediction" ) ## S3 method for class 'feature_importance_explainer' describe(x, nonsignificance_treshold = 0.15, ...)
x |
a ceteris paribus explanation produced with function |
nonsignificance_treshold |
a parameter specifying a treshold for variable importance |
... |
other arguments |
display_values |
allows for displaying variable values |
display_numbers |
allows for displaying numerical values |
variables |
a character of a single variable name to be described |
label |
label for model's prediction |
Function describe.ceteris_paribus()
generates a natural language description of
ceteris paribus profile. The description summarizes variable values, that would change
model's prediction at most. If a ceteris paribus profile for multiple variables is passed,
variables
must specify a single variable to be described. Works only for a ceteris paribus profile
for one observation. In current version only categorical values are discribed. For display_numbers = TRUE
three most important variable values are displayed, while display_numbers = FALSE
displays
all the important variables, however without further details.
Function describe.ceteris_paribus()
generates a natural language description of
ceteris paribus profile. The description summarizes variable values, that would change
model's prediction at most. If a ceteris paribus profile for multiple variables is passed,
variables
must specify a single variable to be described. Works only for a ceteris paribus profile
for one observation. For display_numbers = TRUE
three most important variable values are displayed, while display_numbers = FALSE
displays
all the important variables, however without further details.
Function describe.feature_importance_explainer()
generates a natural language
description of feature importance explanation. It prints the number of important variables, that
have significant dropout difference from the full model, depending on nonsignificance_treshold
.
The description prints the three most important variables for the model's prediction.
The current design of DALEX explainer does not allow for displaying variables values.
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_imputed, n = 10) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) pdp <- aggregate_profiles(cp_rf, type = "partial", variable_type = "categorical") describe(pdp, variables = "gender") library("DALEX") library("ingredients") library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passanger <- select_sample(titanic_imputed, n = 1, seed = 123) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passanger) plot(cp_rf, variable_type = "categorical") describe(cp_rf, variables = "class", label = "the predicted probability") library("DALEX") library("ingredients") lm_model <- lm(m2.price~., data = apartments) explainer_lm <- explain(lm_model, data = apartments[,-1], y = apartments[,1]) fi_lm <- feature_importance(explainer_lm, loss_function = DALEX::loss_root_mean_square) plot(fi_lm) describe(fi_lm)
library("DALEX") library("ingredients") library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_imputed, n = 10) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) pdp <- aggregate_profiles(cp_rf, type = "partial", variable_type = "categorical") describe(pdp, variables = "gender") library("DALEX") library("ingredients") library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passanger <- select_sample(titanic_imputed, n = 1, seed = 123) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passanger) plot(cp_rf, variable_type = "categorical") describe(cp_rf, variables = "class", label = "the predicted probability") library("DALEX") library("ingredients") lm_model <- lm(m2.price~., data = apartments) explainer_lm <- explain(lm_model, data = apartments[,-1], y = apartments[,1]) fi_lm <- feature_importance(explainer_lm, loss_function = DALEX::loss_root_mean_square) plot(fi_lm) describe(fi_lm)
This function calculates permutation based feature importance. For this reason it is also called the Variable Dropout Plot.
feature_importance(x, ...) ## S3 method for class 'explainer' feature_importance( x, loss_function = DALEX::loss_root_mean_square, ..., type = c("raw", "ratio", "difference"), n_sample = NULL, B = 10, variables = NULL, variable_groups = NULL, N = n_sample, label = NULL ) ## Default S3 method: feature_importance( x, data, y, predict_function = predict, loss_function = DALEX::loss_root_mean_square, ..., label = class(x)[1], type = c("raw", "ratio", "difference"), n_sample = NULL, B = 10, variables = NULL, N = n_sample, variable_groups = NULL )
feature_importance(x, ...) ## S3 method for class 'explainer' feature_importance( x, loss_function = DALEX::loss_root_mean_square, ..., type = c("raw", "ratio", "difference"), n_sample = NULL, B = 10, variables = NULL, variable_groups = NULL, N = n_sample, label = NULL ) ## Default S3 method: feature_importance( x, data, y, predict_function = predict, loss_function = DALEX::loss_root_mean_square, ..., label = class(x)[1], type = c("raw", "ratio", "difference"), n_sample = NULL, B = 10, variables = NULL, N = n_sample, variable_groups = NULL )
x |
an explainer created with function |
... |
other parameters passed to |
loss_function |
a function thet will be used to assess variable importance |
type |
character, type of transformation that should be applied for dropout loss.
"raw" results raw drop losses, "ratio" returns |
n_sample |
alias for |
B |
integer, number of permutation rounds to perform on each variable. By default it's |
variables |
vector of variables. If |
variable_groups |
list of variables names vectors. This is for testing joint variable importance.
If |
N |
number of observations that should be sampled for calculation of variable importance.
If |
label |
name of the model. By default it's extracted from the |
data |
validation dataset, will be extracted from |
y |
true labels for |
predict_function |
predict function, will be extracted from |
Find more details in the Feature Importance Chapter.
an object of the class feature_importance
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8]) fi_glm <- feature_importance(explain_titanic_glm, B = 1) plot(fi_glm) fi_glm_joint1 <- feature_importance(explain_titanic_glm, variable_groups = list("demographics" = c("gender", "age"), "ticket_type" = c("fare")), label = "lm 2 groups") plot(fi_glm_joint1) fi_glm_joint2 <- feature_importance(explain_titanic_glm, variable_groups = list("demographics" = c("gender", "age"), "wealth" = c("fare", "class"), "family" = c("sibsp", "parch"), "embarked" = "embarked"), label = "lm 5 groups") plot(fi_glm_joint2, fi_glm_joint1) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) fi_rf <- feature_importance(explain_titanic_rf) plot(fi_rf) fi_rf <- feature_importance(explain_titanic_rf, B = 6) # 6 replications plot(fi_rf) fi_rf_group <- feature_importance(explain_titanic_rf, variable_groups = list("demographics" = c("gender", "age"), "wealth" = c("fare", "class"), "family" = c("sibsp", "parch"), "embarked" = "embarked"), label = "rf 4 groups") plot(fi_rf_group, fi_rf) HR_rf_model <- ranger(status ~., data = HR, probability = TRUE) explainer_rf <- explain(HR_rf_model, data = HR, y = HR$status, model_info = list(type = 'multiclass')) fi_rf <- feature_importance(explainer_rf, type = "raw", loss_function = DALEX::loss_cross_entropy) head(fi_rf) plot(fi_rf) HR_glm_model <- glm(status == "fired"~., data = HR, family = "binomial") explainer_glm <- explain(HR_glm_model, data = HR, y = as.numeric(HR$status == "fired")) fi_glm <- feature_importance(explainer_glm, type = "raw", loss_function = DALEX::loss_root_mean_square) head(fi_glm) plot(fi_glm)
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8]) fi_glm <- feature_importance(explain_titanic_glm, B = 1) plot(fi_glm) fi_glm_joint1 <- feature_importance(explain_titanic_glm, variable_groups = list("demographics" = c("gender", "age"), "ticket_type" = c("fare")), label = "lm 2 groups") plot(fi_glm_joint1) fi_glm_joint2 <- feature_importance(explain_titanic_glm, variable_groups = list("demographics" = c("gender", "age"), "wealth" = c("fare", "class"), "family" = c("sibsp", "parch"), "embarked" = "embarked"), label = "lm 5 groups") plot(fi_glm_joint2, fi_glm_joint1) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) fi_rf <- feature_importance(explain_titanic_rf) plot(fi_rf) fi_rf <- feature_importance(explain_titanic_rf, B = 6) # 6 replications plot(fi_rf) fi_rf_group <- feature_importance(explain_titanic_rf, variable_groups = list("demographics" = c("gender", "age"), "wealth" = c("fare", "class"), "family" = c("sibsp", "parch"), "embarked" = "embarked"), label = "rf 4 groups") plot(fi_rf_group, fi_rf) HR_rf_model <- ranger(status ~., data = HR, probability = TRUE) explainer_rf <- explain(HR_rf_model, data = HR, y = HR$status, model_info = list(type = 'multiclass')) fi_rf <- feature_importance(explainer_rf, type = "raw", loss_function = DALEX::loss_cross_entropy) head(fi_rf) plot(fi_rf) HR_glm_model <- glm(status == "fired"~., data = HR, family = "binomial") explainer_glm <- explain(HR_glm_model, data = HR, y = as.numeric(HR$status == "fired")) fi_glm <- feature_importance(explainer_glm, type = "raw", loss_function = DALEX::loss_root_mean_square) head(fi_glm) plot(fi_glm)
Partial Dependence Profiles are averages from Ceteris Paribus Profiles.
Function partial_dependence
calls ceteris_paribus
and then aggregate_profiles
.
partial_dependence(x, ...) ## S3 method for class 'explainer' partial_dependence( x, variables = NULL, N = 500, variable_splits = NULL, grid_points = 101, ..., variable_type = "numerical" ) ## Default S3 method: partial_dependence( x, data, predict_function = predict, label = class(x)[1], variables = NULL, grid_points = 101, variable_splits = NULL, N = 500, ..., variable_type = "numerical" ) ## S3 method for class 'ceteris_paribus_explainer' partial_dependence(x, ..., variables = NULL) partial_dependency(x, ...)
partial_dependence(x, ...) ## S3 method for class 'explainer' partial_dependence( x, variables = NULL, N = 500, variable_splits = NULL, grid_points = 101, ..., variable_type = "numerical" ) ## Default S3 method: partial_dependence( x, data, predict_function = predict, label = class(x)[1], variables = NULL, grid_points = 101, variable_splits = NULL, N = 500, ..., variable_type = "numerical" ) ## S3 method for class 'ceteris_paribus_explainer' partial_dependence(x, ..., variables = NULL) partial_dependency(x, ...)
x |
an explainer created with function |
... |
other parameters |
variables |
names of variables for which profiles shall be calculated.
Will be passed to |
N |
number of observations used for calculation of partial dependence profiles. By default |
variable_splits |
named list of splits for variables, in most cases created with |
grid_points |
number of points for profile. Will be passed to |
variable_type |
a character. If |
data |
validation dataset, will be extracted from |
predict_function |
predict function, will be extracted from |
label |
name of the model. By default it's extracted from the |
Find more details in the Partial Dependence Profiles Chapter.
an object of the class aggregated_profiles_explainer
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8], verbose = FALSE) pdp_glm <- partial_dependence(explain_titanic_glm, N = 25, variables = c("age", "fare")) head(pdp_glm) plot(pdp_glm) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) pdp_rf <- partial_dependence(explain_titanic_rf, variable_type = "numerical") plot(pdp_rf) pdp_rf <- partial_dependence(explain_titanic_rf, variable_type = "categorical") plotD3(pdp_rf, label_margin = 80, scale_plot = TRUE)
library("DALEX") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8], verbose = FALSE) pdp_glm <- partial_dependence(explain_titanic_glm, N = 25, variables = c("age", "fare")) head(pdp_glm) plot(pdp_glm) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) pdp_rf <- partial_dependence(explain_titanic_rf, variable_type = "numerical") plot(pdp_rf) pdp_rf <- partial_dependence(explain_titanic_rf, variable_type = "categorical") plotD3(pdp_rf, label_margin = 80, scale_plot = TRUE)
Function plot.aggregated_profiles_explainer
plots partial dependence plot or accumulated effect plot.
It works in a similar way to plot.ceteris_paribus
, but instead of individual profiles
show average profiles for each variable listed in the variables
vector.
## S3 method for class 'aggregated_profiles_explainer' plot( x, ..., size = 1, alpha = 1, color = "_label_", facet_ncol = NULL, facet_scales = "free_x", variables = NULL, title = NULL, subtitle = NULL )
## S3 method for class 'aggregated_profiles_explainer' plot( x, ..., size = 1, alpha = 1, color = "_label_", facet_ncol = NULL, facet_scales = "free_x", variables = NULL, title = NULL, subtitle = NULL )
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between |
color |
a character. Either name of a color, or hex code for a color, or |
facet_ncol |
number of columns for the |
facet_scales |
a character value for the |
variables |
if not |
title |
a character. Partial and accumulated dependence explainers have deafult value. |
subtitle |
a character. If |
a ggplot2
object
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8], verbose = FALSE) pdp_rf_p <- partial_dependence(explain_titanic_glm, N = 50) pdp_rf_p$`_label_` <- "RF_partial" pdp_rf_l <- conditional_dependence(explain_titanic_glm, N = 50) pdp_rf_l$`_label_` <- "RF_local" pdp_rf_a<- accumulated_dependence(explain_titanic_glm, N = 50) pdp_rf_a$`_label_` <- "RF_accumulated" head(pdp_rf_p) plot(pdp_rf_p, pdp_rf_l, pdp_rf_a, color = "_label_") library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_imputed, n = 100) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) cp_rf pdp_rf_p <- aggregate_profiles(cp_rf, variables = "age", type = "partial") pdp_rf_p$`_label_` <- "RF_partial" pdp_rf_c <- aggregate_profiles(cp_rf, variables = "age", type = "conditional") pdp_rf_c$`_label_` <- "RF_conditional" pdp_rf_a <- aggregate_profiles(cp_rf, variables = "age", type = "accumulated") pdp_rf_a$`_label_` <- "RF_accumulated" head(pdp_rf_p) plot(pdp_rf_p) plot(pdp_rf_p, pdp_rf_c, pdp_rf_a) plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") + show_rugs(cp_rf, variables = "age", color = "red") + show_aggregated_profiles(pdp_rf_p, size = 2)
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8], verbose = FALSE) pdp_rf_p <- partial_dependence(explain_titanic_glm, N = 50) pdp_rf_p$`_label_` <- "RF_partial" pdp_rf_l <- conditional_dependence(explain_titanic_glm, N = 50) pdp_rf_l$`_label_` <- "RF_local" pdp_rf_a<- accumulated_dependence(explain_titanic_glm, N = 50) pdp_rf_a$`_label_` <- "RF_accumulated" head(pdp_rf_p) plot(pdp_rf_p, pdp_rf_l, pdp_rf_a, color = "_label_") library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_imputed, n = 100) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) cp_rf pdp_rf_p <- aggregate_profiles(cp_rf, variables = "age", type = "partial") pdp_rf_p$`_label_` <- "RF_partial" pdp_rf_c <- aggregate_profiles(cp_rf, variables = "age", type = "conditional") pdp_rf_c$`_label_` <- "RF_conditional" pdp_rf_a <- aggregate_profiles(cp_rf, variables = "age", type = "accumulated") pdp_rf_a$`_label_` <- "RF_accumulated" head(pdp_rf_p) plot(pdp_rf_p) plot(pdp_rf_p, pdp_rf_c, pdp_rf_a) plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") + show_rugs(cp_rf, variables = "age", color = "red") + show_aggregated_profiles(pdp_rf_p, size = 2)
This function plots What-If Plots for a single prediction / observation.
## S3 method for class 'ceteris_paribus_2d_explainer' plot( x, ..., facet_ncol = NULL, add_raster = TRUE, add_contour = TRUE, bins = 3, add_observation = TRUE, pch = "+", size = 6 )
## S3 method for class 'ceteris_paribus_2d_explainer' plot( x, ..., facet_ncol = NULL, add_raster = TRUE, add_contour = TRUE, bins = 3, add_observation = TRUE, pch = "+", size = 6 )
x |
a ceteris paribus explainer produced with the |
... |
currently will be ignored |
facet_ncol |
number of columns for the |
add_raster |
if |
add_contour |
if |
bins |
number of contours to be added |
add_observation |
if |
pch |
character, symbol used to plot observations |
size |
numeric, size of individual datapoints |
a ggplot2
object
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") library("ranger") apartments_rf_model <- ranger(m2.price ~., data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartments_test[,-1], y = apartments_test[,1], verbose = FALSE) new_apartment <- apartments_test[1,] new_apartment wi_rf_2d <- ceteris_paribus_2d(explainer_rf, observation = new_apartment) head(wi_rf_2d) plot(wi_rf_2d) plot(wi_rf_2d, add_contour = FALSE) plot(wi_rf_2d, add_observation = FALSE) plot(wi_rf_2d, add_raster = FALSE) # HR data model <- ranger(status ~ gender + age + hours + evaluation + salary, data = HR, probability = TRUE) pred1 <- function(m, x) predict(m, x)$predictions[,1] explainer_rf_fired <- explain(model, data = HR[,1:5], y = as.numeric(HR$status == "fired"), predict_function = pred1, label = "fired") new_emp <- HR[1,] new_emp wi_rf_2d <- ceteris_paribus_2d(explainer_rf_fired, observation = new_emp) head(wi_rf_2d) plot(wi_rf_2d)
library("DALEX") library("ingredients") library("ranger") apartments_rf_model <- ranger(m2.price ~., data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartments_test[,-1], y = apartments_test[,1], verbose = FALSE) new_apartment <- apartments_test[1,] new_apartment wi_rf_2d <- ceteris_paribus_2d(explainer_rf, observation = new_apartment) head(wi_rf_2d) plot(wi_rf_2d) plot(wi_rf_2d, add_contour = FALSE) plot(wi_rf_2d, add_observation = FALSE) plot(wi_rf_2d, add_raster = FALSE) # HR data model <- ranger(status ~ gender + age + hours + evaluation + salary, data = HR, probability = TRUE) pred1 <- function(m, x) predict(m, x)$predictions[,1] explainer_rf_fired <- explain(model, data = HR[,1:5], y = as.numeric(HR$status == "fired"), predict_function = pred1, label = "fired") new_emp <- HR[1,] new_emp wi_rf_2d <- ceteris_paribus_2d(explainer_rf_fired, observation = new_emp) head(wi_rf_2d) plot(wi_rf_2d)
Function plot.ceteris_paribus_explainer
plots Individual Variable Profiles for selected observations.
Various parameters help to decide what should be plotted, profiles, aggregated profiles, points or rugs.
Find more details in Ceteris Paribus Chapter.
## S3 method for class 'ceteris_paribus_explainer' plot( x, ..., size = 1, alpha = 1, color = "#46bac2", variable_type = "numerical", facet_ncol = NULL, facet_scales = NULL, variables = NULL, title = "Ceteris Paribus profile", subtitle = NULL, categorical_type = "profiles" )
## S3 method for class 'ceteris_paribus_explainer' plot( x, ..., size = 1, alpha = 1, color = "#46bac2", variable_type = "numerical", facet_ncol = NULL, facet_scales = NULL, variables = NULL, title = "Ceteris Paribus profile", subtitle = NULL, categorical_type = "profiles" )
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between |
color |
a character. Either name of a color or name of a variable that should be used for coloring |
variable_type |
a character. If |
facet_ncol |
number of columns for the |
facet_scales |
a character value for the |
variables |
if not |
title |
a character. Plot title. By default "Ceteris Paribus profile". |
subtitle |
a character. Plot subtitle. By default |
categorical_type |
a character. How categorical variables shall be plotted? Either |
a ggplot2
object
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8], verbose = FALSE) cp_glm <- ceteris_paribus(explain_titanic_glm, titanic_imputed[1,]) cp_glm plot(cp_glm, variables = "age") library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_imputed, n = 100) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) cp_rf plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") + show_rugs(cp_rf, variables = "age", color = "red") selected_passangers <- select_sample(titanic_imputed, n = 1) selected_passangers cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) plot(cp_rf) + show_observations(cp_rf) plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") plot(cp_rf, variables = "class") plot(cp_rf, variables = c("class", "embarked"), facet_ncol = 1) plot(cp_rf, variables = c("class", "embarked"), facet_ncol = 1, categorical_type = "bars") plotD3(cp_rf, variables = c("class", "embarked", "gender"), variable_type = "categorical", scale_plot = TRUE, label_margin = 70)
library("DALEX") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8], verbose = FALSE) cp_glm <- ceteris_paribus(explain_titanic_glm, titanic_imputed[1,]) cp_glm plot(cp_glm, variables = "age") library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_imputed, n = 100) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) cp_rf plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") + show_rugs(cp_rf, variables = "age", color = "red") selected_passangers <- select_sample(titanic_imputed, n = 1) selected_passangers cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) plot(cp_rf) + show_observations(cp_rf) plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") plot(cp_rf, variables = "class") plot(cp_rf, variables = c("class", "embarked"), facet_ncol = 1) plot(cp_rf, variables = c("class", "embarked"), facet_ncol = 1, categorical_type = "bars") plotD3(cp_rf, variables = c("class", "embarked", "gender"), variable_type = "categorical", scale_plot = TRUE, label_margin = 70)
This function plots local variable importance plots calculated as oscillations in the Ceteris Paribus Profiles.
## S3 method for class 'ceteris_paribus_oscillations' plot(x, ..., bar_width = 10)
## S3 method for class 'ceteris_paribus_oscillations' plot(x, ..., bar_width = 10)
x |
a ceteris paribus oscillation explainer produced with function |
... |
other explainers that shall be plotted together |
bar_width |
width of bars. By default |
a ggplot2
object
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ranger") apartments_rf_model <- ranger(m2.price ~., data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartments_test[,-1], y = apartments_test[,1], label = "ranger forest", verbose = FALSE) apartment <- apartments_test[1:2,] cp_rf <- ceteris_paribus(explainer_rf, apartment) plot(cp_rf, color = "_ids_") vips <- calculate_oscillations(cp_rf) vips plot(vips)
library("DALEX") library("ranger") apartments_rf_model <- ranger(m2.price ~., data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartments_test[,-1], y = apartments_test[,1], label = "ranger forest", verbose = FALSE) apartment <- apartments_test[1:2,] cp_rf <- ceteris_paribus(explainer_rf, apartment) plot(cp_rf, color = "_ids_") vips <- calculate_oscillations(cp_rf) vips plot(vips)
This function plots variable importance calculated as changes in the loss function after variable drops.
It uses output from feature_importance
function that corresponds to
permutation based measure of variable importance.
Variables are sorted in the same order in all panels.
The order depends on the average drop out loss.
In different panels variable contributions may not look like sorted if variable
importance is different in different in different models.
## S3 method for class 'feature_importance_explainer' plot( x, ..., max_vars = NULL, show_boxplots = TRUE, bar_width = 10, desc_sorting = TRUE, title = "Feature Importance", subtitle = NULL )
## S3 method for class 'feature_importance_explainer' plot( x, ..., max_vars = NULL, show_boxplots = TRUE, bar_width = 10, desc_sorting = TRUE, title = "Feature Importance", subtitle = NULL )
x |
a feature importance explainer produced with the |
... |
other explainers that shall be plotted together |
max_vars |
maximum number of variables that shall be presented for for each model.
By default |
show_boxplots |
logical if |
bar_width |
width of bars. By default |
desc_sorting |
logical. Should the bars be sorted descending? By default TRUE |
title |
the plot's title, by default |
subtitle |
the plot's subtitle. By default - |
Find more details in the Feature Importance Chapter.
a ggplot2
object
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8]) fi_rf <- feature_importance(explain_titanic_glm, B = 1) plot(fi_rf) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) fi_rf <- feature_importance(explain_titanic_rf) plot(fi_rf) HR_rf_model <- ranger(status~., data = HR, probability = TRUE) explainer_rf <- explain(HR_rf_model, data = HR, y = HR$status, verbose = FALSE, precalculate = FALSE) fi_rf <- feature_importance(explainer_rf, type = "raw", max_vars = 3, loss_function = DALEX::loss_cross_entropy) head(fi_rf) plot(fi_rf) HR_glm_model <- glm(status == "fired"~., data = HR, family = "binomial") explainer_glm <- explain(HR_glm_model, data = HR, y = as.numeric(HR$status == "fired")) fi_glm <- feature_importance(explainer_glm, type = "raw", loss_function = DALEX::loss_root_mean_square) head(fi_glm) plot(fi_glm)
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8]) fi_rf <- feature_importance(explain_titanic_glm, B = 1) plot(fi_rf) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) fi_rf <- feature_importance(explain_titanic_rf) plot(fi_rf) HR_rf_model <- ranger(status~., data = HR, probability = TRUE) explainer_rf <- explain(HR_rf_model, data = HR, y = HR$status, verbose = FALSE, precalculate = FALSE) fi_rf <- feature_importance(explainer_rf, type = "raw", max_vars = 3, loss_function = DALEX::loss_cross_entropy) head(fi_rf) plot(fi_rf) HR_glm_model <- glm(status == "fired"~., data = HR, family = "binomial") explainer_glm <- explain(HR_glm_model, data = HR, y = as.numeric(HR$status == "fired")) fi_glm <- feature_importance(explainer_glm, type = "raw", loss_function = DALEX::loss_root_mean_square) head(fi_glm) plot(fi_glm)
Function plotD3.ceteris_paribus_explainer
plots Individual Variable Profiles for selected observations.
It uses output from ceteris_paribus
function.
Various parameters help to decide what should be plotted, profiles, aggregated profiles, points or rugs.
Find more details in Ceteris Paribus Chapter.
plotD3(x, ...) ## S3 method for class 'ceteris_paribus_explainer' plotD3( x, ..., size = 2, alpha = 1, color = "#46bac2", variable_type = "numerical", facet_ncol = 2, scale_plot = FALSE, variables = NULL, chart_title = "Ceteris Paribus Profiles", label_margin = 60, show_observations = TRUE, show_rugs = TRUE )
plotD3(x, ...) ## S3 method for class 'ceteris_paribus_explainer' plotD3( x, ..., size = 2, alpha = 1, color = "#46bac2", variable_type = "numerical", facet_ncol = 2, scale_plot = FALSE, variables = NULL, chart_title = "Ceteris Paribus Profiles", label_margin = 60, show_observations = TRUE, show_rugs = TRUE )
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Set width of lines |
alpha |
a numeric between |
color |
a character. Set line color |
variable_type |
a character. If "numerical" then only numerical variables will be plotted. If "categorical" then only categorical variables will be plotted. |
facet_ncol |
number of columns for the |
scale_plot |
a logical. If |
variables |
if not |
chart_title |
a character. Set custom title |
label_margin |
a numeric. Set width of label margins in |
show_observations |
a logical. Adds observations layer to a plot. By default it's |
show_rugs |
a logical. Adds rugs layer to a plot. By default it's |
a r2d3
object.
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_imputed, n = 10) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) plotD3(cp_rf, variables = c("age","parch","fare","sibsp"), facet_ncol = 2, scale_plot = TRUE) selected_passanger <- select_sample(titanic_imputed, n = 1) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passanger) plotD3(cp_rf, variables = c("class", "embarked", "gender", "sibsp"), facet_ncol = 2, variable_type = "categorical", label_margin = 100, scale_plot = TRUE)
library("DALEX") library("ingredients") library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_imputed, n = 10) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) plotD3(cp_rf, variables = c("age","parch","fare","sibsp"), facet_ncol = 2, scale_plot = TRUE) selected_passanger <- select_sample(titanic_imputed, n = 1) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passanger) plotD3(cp_rf, variables = c("class", "embarked", "gender", "sibsp"), facet_ncol = 2, variable_type = "categorical", label_margin = 100, scale_plot = TRUE)
Function plotD3.aggregated_profiles_explainer
plots an aggregate of ceteris paribus profiles.
It works in a similar way to plotD3.ceteris_paribus_explainer
but, instead of individual profiles,
show average profiles for each variable listed in the variables
vector.
Find more details in Ceteris Paribus Chapter.
## S3 method for class 'aggregated_profiles_explainer' plotD3( x, ..., size = 2, alpha = 1, color = "#46bac2", facet_ncol = 2, scale_plot = FALSE, variables = NULL, chart_title = "Aggregated Profiles", label_margin = 60 )
## S3 method for class 'aggregated_profiles_explainer' plotD3( x, ..., size = 2, alpha = 1, color = "#46bac2", facet_ncol = 2, scale_plot = FALSE, variables = NULL, chart_title = "Aggregated Profiles", label_margin = 60 )
x |
a aggregated profiles explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Set width of lines |
alpha |
a numeric between |
color |
a character. Set line/bar color |
facet_ncol |
number of columns for the |
scale_plot |
a logical. If |
variables |
if not |
chart_title |
a character. Set custom title |
label_margin |
a numeric. Set width of label margins in |
a r2d3
object.
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") library("ranger") # smaller data, quicker example titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313) # build a model model_titanic_rf <- ranger(survived ~., data = titanic_small, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_small[,-8], y = titanic_small[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_small, n = 100) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) pdp_rf_p <- aggregate_profiles(cp_rf, type = "partial", variable_type = "numerical") pdp_rf_p$`_label_` <- "RF_partial" pdp_rf_c <- aggregate_profiles(cp_rf, type = "conditional", variable_type = "numerical") pdp_rf_c$`_label_` <- "RF_conditional" pdp_rf_a <- aggregate_profiles(cp_rf, type = "accumulated", variable_type = "numerical") pdp_rf_a$`_label_` <- "RF_accumulated" plotD3(pdp_rf_p, pdp_rf_c, pdp_rf_a, scale_plot = TRUE) pdp <- aggregate_profiles(cp_rf, type = "partial", variable_type = "categorical") pdp$`_label_` <- "RF_partial" plotD3(pdp, variables = c("gender","class"), label_margin = 70)
library("DALEX") library("ingredients") library("ranger") # smaller data, quicker example titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313) # build a model model_titanic_rf <- ranger(survived ~., data = titanic_small, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_small[,-8], y = titanic_small[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_small, n = 100) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) pdp_rf_p <- aggregate_profiles(cp_rf, type = "partial", variable_type = "numerical") pdp_rf_p$`_label_` <- "RF_partial" pdp_rf_c <- aggregate_profiles(cp_rf, type = "conditional", variable_type = "numerical") pdp_rf_c$`_label_` <- "RF_conditional" pdp_rf_a <- aggregate_profiles(cp_rf, type = "accumulated", variable_type = "numerical") pdp_rf_a$`_label_` <- "RF_accumulated" plotD3(pdp_rf_p, pdp_rf_c, pdp_rf_a, scale_plot = TRUE) pdp <- aggregate_profiles(cp_rf, type = "partial", variable_type = "categorical") pdp$`_label_` <- "RF_partial" plotD3(pdp, variables = c("gender","class"), label_margin = 70)
Function plotD3.feature_importance_explainer
plots dropouts for variables used in the model.
It uses output from feature_importance
function that corresponds to permutation based measure of feature importance.
Variables are sorted in the same order in all panels. The order depends on the average drop out loss.
In different panels variable contributions may not look like sorted if variable importance is different in different models.
## S3 method for class 'feature_importance_explainer' plotD3( x, ..., max_vars = NULL, show_boxplots = TRUE, bar_width = 12, split = "model", scale_height = FALSE, margin = 0.15, chart_title = "Feature importance" )
## S3 method for class 'feature_importance_explainer' plotD3( x, ..., max_vars = NULL, show_boxplots = TRUE, bar_width = 12, split = "model", scale_height = FALSE, margin = 0.15, chart_title = "Feature importance" )
x |
a feature importance explainer produced with the |
... |
other explainers that shall be plotted together |
max_vars |
maximum number of variables that shall be presented for for each model.
By default |
show_boxplots |
logical if |
bar_width |
width of bars in px. By default |
split |
either "model" or "feature" determines the plot layout |
scale_height |
a logical. If |
margin |
extend x axis domain range to adjust the plot.
Usually value between |
chart_title |
a character. Set custom title |
a r2d3
object.
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") lm_model <- lm(m2.price ~., data = apartments) explainer_lm <- explain(lm_model, data = apartments[,-1], y = apartments[,1], verbose = FALSE) fi_lm <- feature_importance(explainer_lm, loss_function = DALEX::loss_root_mean_square, B = 1) head(fi_lm) plotD3(fi_lm) library("ranger") rf_model <- ranger(m2.price~., data = apartments) explainer_rf <- explain(rf_model, data = apartments[,-1], y = apartments[,1], label = "ranger forest", verbose = FALSE) fi_rf <- feature_importance(explainer_rf, loss_function = DALEX::loss_root_mean_square) head(fi_rf) plotD3(fi_lm, fi_rf) plotD3(fi_lm, fi_rf, split = "feature") plotD3(fi_lm, fi_rf, max_vars = 3, bar_width = 16, scale_height = TRUE) plotD3(fi_lm, fi_rf, max_vars = 3, bar_width = 16, split = "feature", scale_height = TRUE) plotD3(fi_lm, margin = 0.2)
library("DALEX") library("ingredients") lm_model <- lm(m2.price ~., data = apartments) explainer_lm <- explain(lm_model, data = apartments[,-1], y = apartments[,1], verbose = FALSE) fi_lm <- feature_importance(explainer_lm, loss_function = DALEX::loss_root_mean_square, B = 1) head(fi_lm) plotD3(fi_lm) library("ranger") rf_model <- ranger(m2.price~., data = apartments) explainer_rf <- explain(rf_model, data = apartments[,-1], y = apartments[,1], label = "ranger forest", verbose = FALSE) fi_rf <- feature_importance(explainer_rf, loss_function = DALEX::loss_root_mean_square) head(fi_rf) plotD3(fi_lm, fi_rf) plotD3(fi_lm, fi_rf, split = "feature") plotD3(fi_lm, fi_rf, max_vars = 3, bar_width = 16, scale_height = TRUE) plotD3(fi_lm, fi_rf, max_vars = 3, bar_width = 16, split = "feature", scale_height = TRUE) plotD3(fi_lm, margin = 0.2)
Prints Aggregated Profiles
## S3 method for class 'aggregated_profiles_explainer' print(x, ...)
## S3 method for class 'aggregated_profiles_explainer' print(x, ...)
x |
an individual variable profile explainer produced with the |
... |
other arguments that will be passed to |
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8]) selected_passangers <- select_sample(titanic_imputed, n = 100) cp_rf <- ceteris_paribus(explain_titanic_glm, selected_passangers) head(cp_rf) pdp_rf <- aggregate_profiles(cp_rf, variables = "age") head(pdp_rf) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) cp_rf pdp_rf <- aggregate_profiles(cp_rf, variables = "age") head(pdp_rf)
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8]) selected_passangers <- select_sample(titanic_imputed, n = 100) cp_rf <- ceteris_paribus(explain_titanic_glm, selected_passangers) head(cp_rf) pdp_rf <- aggregate_profiles(cp_rf, variables = "age") head(pdp_rf) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) cp_rf pdp_rf <- aggregate_profiles(cp_rf, variables = "age") head(pdp_rf)
Prints Individual Variable Explainer Summary
## S3 method for class 'ceteris_paribus_explainer' print(x, ...)
## S3 method for class 'ceteris_paribus_explainer' print(x, ...)
x |
an individual variable profile explainer produced with the |
... |
other arguments that will be passed to |
library("DALEX") library("ingredients") titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313) # build a model model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_small, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_small[,-8], y = titanic_small[,8]) cp_glm <- ceteris_paribus(explain_titanic_glm, titanic_small[1,]) cp_glm library("ranger") apartments_rf_model <- ranger(m2.price ~., data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartments_test[,-1], y = apartments_test[,1], label = "ranger forest", verbose = FALSE) apartments_small <- select_sample(apartments_test, 10) cp_rf <- ceteris_paribus(explainer_rf, apartments_small) cp_rf
library("DALEX") library("ingredients") titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313) # build a model model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_small, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_small[,-8], y = titanic_small[,8]) cp_glm <- ceteris_paribus(explain_titanic_glm, titanic_small[1,]) cp_glm library("ranger") apartments_rf_model <- ranger(m2.price ~., data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartments_test[,-1], y = apartments_test[,1], label = "ranger forest", verbose = FALSE) apartments_small <- select_sample(apartments_test, 10) cp_rf <- ceteris_paribus(explainer_rf, apartments_small) cp_rf
Print Generic for Feature Importance Object
## S3 method for class 'feature_importance_explainer' print(x, ...)
## S3 method for class 'feature_importance_explainer' print(x, ...)
x |
an explanation created with |
... |
other parameters. |
a data frame.
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8], verbose = FALSE) fi_glm <- feature_importance(explain_titanic_glm) fi_glm
library("DALEX") library("ingredients") model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8], verbose = FALSE) fi_glm <- feature_importance(explain_titanic_glm) fi_glm
Function select_neighbours
selects subset of rows from data set.
This is useful if data is large and we need just a sample to calculate profiles.
select_neighbours( observation, data, variables = NULL, distance = gower::gower_dist, n = 20, frac = NULL )
select_neighbours( observation, data, variables = NULL, distance = gower::gower_dist, n = 20, frac = NULL )
observation |
single observation |
data |
set of observations |
variables |
names of variables that shall be used for calculation of distance.
By default these are all variables present in |
distance |
the distance function, by default the |
n |
number of neighbors to select |
frac |
if |
Note that select_neighbours()
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
a data frame with selected rows
library("ingredients") new_apartment <- DALEX::apartments[1,] small_apartments <- select_neighbours(new_apartment, DALEX::apartments_test, n = 10) new_apartment small_apartments
library("ingredients") new_apartment <- DALEX::apartments[1,] small_apartments <- select_neighbours(new_apartment, DALEX::apartments_test, n = 10) new_apartment small_apartments
Function select_sample
selects subset of rows from data set.
This is useful if data is large and we need just a sample to calculate profiles.
select_sample(data, n = 100, seed = 1313)
select_sample(data, n = 100, seed = 1313)
data |
set of observations. Profile will be calculated for every observation (every row) |
n |
number of observations to select. |
seed |
seed for random number generator. |
Note that select_subsample()
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
a data frame with selected rows
library("ingredients") small_apartments <- select_sample(DALEX::apartments_test) head(small_apartments)
library("ingredients") small_apartments <- select_sample(DALEX::apartments_test) head(small_apartments)
Function show_aggregated_profiles
adds a layer to a plot created
with plot.ceteris_paribus_explainer
.
show_aggregated_profiles( x, ..., size = 0.5, alpha = 1, color = "#371ea3", variables = NULL )
show_aggregated_profiles( x, ..., size = 0.5, alpha = 1, color = "#371ea3", variables = NULL )
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between |
color |
a character. Either name of a color or name of a variable that should be used for coloring |
variables |
if not |
a ggplot2
layer
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") selected_passangers <- select_sample(titanic_imputed, n = 100) model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8]) cp_rf <- ceteris_paribus(explain_titanic_glm, selected_passangers) pdp_rf <- aggregate_profiles(cp_rf, type = "partial", variables = "age") plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") + show_aggregated_profiles(pdp_rf, size = 3) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) cp_rf pdp_rf <- aggregate_profiles(cp_rf, type = "partial", variables = "age") head(pdp_rf) plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") + show_rugs(cp_rf, variables = "age", color = "red") + show_aggregated_profiles(pdp_rf, size = 3)
library("DALEX") library("ingredients") selected_passangers <- select_sample(titanic_imputed, n = 100) model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8]) cp_rf <- ceteris_paribus(explain_titanic_glm, selected_passangers) pdp_rf <- aggregate_profiles(cp_rf, type = "partial", variables = "age") plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") + show_aggregated_profiles(pdp_rf, size = 3) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) cp_rf pdp_rf <- aggregate_profiles(cp_rf, type = "partial", variables = "age") head(pdp_rf) plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") + show_rugs(cp_rf, variables = "age", color = "red") + show_aggregated_profiles(pdp_rf, size = 3)
Function show_observations
adds a layer to a plot created with
plot.ceteris_paribus_explainer
for selected observations.
Various parameters help to decide what should be plotted, profiles, aggregated profiles, points or rugs.
show_observations( x, ..., size = 2, alpha = 1, color = "#371ea3", variable_type = "numerical", variables = NULL )
show_observations( x, ..., size = 2, alpha = 1, color = "#371ea3", variable_type = "numerical", variables = NULL )
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between |
color |
a character. Either name of a color or name of a variable that should be used for coloring |
variable_type |
a character. If |
variables |
if not |
a ggplot2
layer
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") library("ranger") rf_model <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explainer_rf <- explain(rf_model, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_imputed, n = 100) cp_rf <- ceteris_paribus(explainer_rf, selected_passangers) cp_rf plot(cp_rf, variables = "age", color = "grey") + show_observations(cp_rf, variables = "age", color = "black") + show_rugs(cp_rf, variables = "age", color = "red")
library("DALEX") library("ingredients") library("ranger") rf_model <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explainer_rf <- explain(rf_model, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_imputed, n = 100) cp_rf <- ceteris_paribus(explainer_rf, selected_passangers) cp_rf plot(cp_rf, variables = "age", color = "grey") + show_observations(cp_rf, variables = "age", color = "black") + show_rugs(cp_rf, variables = "age", color = "red")
Function show_profiles
adds a layer to a plot created with
plot.ceteris_paribus_explainer
.
show_profiles( x, ..., size = 0.5, alpha = 1, color = "#371ea3", variables = NULL )
show_profiles( x, ..., size = 0.5, alpha = 1, color = "#371ea3", variables = NULL )
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between |
color |
a character. Either name of a color or name of a variable that should be used for coloring |
variables |
if not |
a ggplot2
layer
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") selected_passangers <- select_sample(titanic_imputed, n = 100) selected_john <- titanic_imputed[1,] model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "glm", verbose = FALSE) cp_rf <- ceteris_paribus(explain_titanic_glm, selected_passangers) cp_rf_john <- ceteris_paribus(explain_titanic_glm, selected_john) plot(cp_rf, variables = "age") + show_profiles(cp_rf_john, variables = "age", size = 2) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) cp_rf_john <- ceteris_paribus(explain_titanic_rf, selected_john) cp_rf pdp_rf <- aggregate_profiles(cp_rf, variables = "age") head(pdp_rf) plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") + show_rugs(cp_rf, variables = "age", color = "red") + show_profiles(cp_rf_john, variables = "age", color = "red", size = 2)
library("DALEX") library("ingredients") selected_passangers <- select_sample(titanic_imputed, n = 100) selected_john <- titanic_imputed[1,] model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_imputed, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "glm", verbose = FALSE) cp_rf <- ceteris_paribus(explain_titanic_glm, selected_passangers) cp_rf_john <- ceteris_paribus(explain_titanic_glm, selected_john) plot(cp_rf, variables = "age") + show_profiles(cp_rf_john, variables = "age", size = 2) library("ranger") model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers) cp_rf_john <- ceteris_paribus(explain_titanic_rf, selected_john) cp_rf pdp_rf <- aggregate_profiles(cp_rf, variables = "age") head(pdp_rf) plot(cp_rf, variables = "age") + show_observations(cp_rf, variables = "age") + show_rugs(cp_rf, variables = "age", color = "red") + show_profiles(cp_rf_john, variables = "age", color = "red", size = 2)
Function show_residuals
adds a layer to a plot created with
plot.ceteris_paribus_explainer
for selected observations.
Note that the y
argument has to be specified in the ceteris_paribus
function.
show_residuals( x, ..., size = 0.75, alpha = 1, color = c(`TRUE` = "#8bdcbe", `FALSE` = "#f05a71"), variables = NULL )
show_residuals( x, ..., size = 0.75, alpha = 1, color = c(`TRUE` = "#8bdcbe", `FALSE` = "#f05a71"), variables = NULL )
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between |
color |
a character. Either name of a color or name of a variable that should be used for coloring |
variables |
if not |
a ggplot2
layer
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") library("ranger") johny_d <- data.frame( class = factor("1st", levels = c("1st", "2nd", "3rd", "deck crew", "engineering crew", "restaurant staff", "victualling crew")), gender = factor("male", levels = c("female", "male")), age = 8, sibsp = 0, parch = 0, fare = 72, embarked = factor("Southampton", levels = c("Belfast", "Cherbourg", "Queenstown", "Southampton")) ) model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) johny_neighbours <- select_neighbours(data = titanic_imputed, observation = johny_d, variables = c("age", "gender", "class", "fare", "sibsp", "parch"), n = 10) cp_neighbours <- ceteris_paribus(explain_titanic_rf, johny_neighbours, y = johny_neighbours$survived == "yes", variable_splits = list(age = seq(0,70, length.out = 1000))) plot(cp_neighbours, variables = "age") + show_observations(cp_neighbours, variables = "age") cp_johny <- ceteris_paribus(explain_titanic_rf, johny_d, variable_splits = list(age = seq(0,70, length.out = 1000))) plot(cp_johny, variables = "age", size = 1.5, color = "#8bdcbe") + show_profiles(cp_neighbours, variables = "age", color = "#ceced9") + show_observations(cp_johny, variables = "age", size = 5, color = "#371ea3") + show_residuals(cp_neighbours, variables = "age")
library("DALEX") library("ingredients") library("ranger") johny_d <- data.frame( class = factor("1st", levels = c("1st", "2nd", "3rd", "deck crew", "engineering crew", "restaurant staff", "victualling crew")), gender = factor("male", levels = c("female", "male")), age = 8, sibsp = 0, parch = 0, fare = 72, embarked = factor("Southampton", levels = c("Belfast", "Cherbourg", "Queenstown", "Southampton")) ) model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explain_titanic_rf <- explain(model_titanic_rf, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) johny_neighbours <- select_neighbours(data = titanic_imputed, observation = johny_d, variables = c("age", "gender", "class", "fare", "sibsp", "parch"), n = 10) cp_neighbours <- ceteris_paribus(explain_titanic_rf, johny_neighbours, y = johny_neighbours$survived == "yes", variable_splits = list(age = seq(0,70, length.out = 1000))) plot(cp_neighbours, variables = "age") + show_observations(cp_neighbours, variables = "age") cp_johny <- ceteris_paribus(explain_titanic_rf, johny_d, variable_splits = list(age = seq(0,70, length.out = 1000))) plot(cp_johny, variables = "age", size = 1.5, color = "#8bdcbe") + show_profiles(cp_neighbours, variables = "age", color = "#ceced9") + show_observations(cp_johny, variables = "age", size = 5, color = "#371ea3") + show_residuals(cp_neighbours, variables = "age")
Function show_rugs
adds a layer to a plot created with
plot.ceteris_paribus_explainer
for selected observations.
Various parameters help to decide what should be plotted, profiles, aggregated profiles, points or rugs.
show_rugs( x, ..., size = 0.5, alpha = 1, color = "#371ea3", variable_type = "numerical", sides = "b", variables = NULL )
show_rugs( x, ..., size = 0.5, alpha = 1, color = "#371ea3", variable_type = "numerical", sides = "b", variables = NULL )
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between |
color |
a character. Either name of a color or name of a variable that should be used for coloring |
variable_type |
a character. If |
sides |
a string containing any of "trbl", for top, right, bottom, and left. Passed to geom rug. |
variables |
if not |
a ggplot2
layer
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
library("DALEX") library("ingredients") titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313) # build a model model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_small, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_small[,-8], y = titanic_small[,8]) cp_glm <- ceteris_paribus(explain_titanic_glm, titanic_small[1,]) cp_glm library("ranger") rf_model <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explainer_rf <- explain(rf_model, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_imputed, n = 100) cp_rf <- ceteris_paribus(explainer_rf, selected_passangers) cp_rf plot(cp_rf, variables = "age", color = "grey") + show_observations(cp_rf, variables = "age", color = "black") + show_rugs(cp_rf, variables = "age", color = "red")
library("DALEX") library("ingredients") titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313) # build a model model_titanic_glm <- glm(survived ~ gender + age + fare, data = titanic_small, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_small[,-8], y = titanic_small[,8]) cp_glm <- ceteris_paribus(explain_titanic_glm, titanic_small[1,]) cp_glm library("ranger") rf_model <- ranger(survived ~., data = titanic_imputed, probability = TRUE) explainer_rf <- explain(rf_model, data = titanic_imputed[,-8], y = titanic_imputed[,8], label = "ranger forest", verbose = FALSE) selected_passangers <- select_sample(titanic_imputed, n = 100) cp_rf <- ceteris_paribus(explainer_rf, selected_passangers) cp_rf plot(cp_rf, variables = "age", color = "grey") + show_observations(cp_rf, variables = "age", color = "black") + show_rugs(cp_rf, variables = "age", color = "red")