Title: | Compute SHAP Values for Your Tree-Based Models Using the 'TreeSHAP' Algorithm |
---|---|
Description: | An efficient implementation of the 'TreeSHAP' algorithm introduced by Lundberg et al., (2020) <doi:10.1038/s42256-019-0138-9>. It is capable of calculating SHAP (SHapley Additive exPlanations) values for tree-based models in polynomial time. Currently supported models include 'gbm', 'randomForest', 'ranger', 'xgboost', 'lightgbm'. |
Authors: | Konrad Komisarczyk [aut], Pawel Kozminski [aut], Szymon Maksymiuk [aut] , Lorenz A. Kapsner [ctb] , Mikolaj Spytek [ctb] , Mateusz Krzyzinski [ctb, cre] , Przemyslaw Biecek [aut, cph] |
Maintainer: | Mateusz Krzyzinski <[email protected]> |
License: | GPL-3 |
Version: | 0.3.1.9000 |
Built: | 2024-12-27 04:58:37 UTC |
Source: | https://github.com/modeloriented/treeshap |
DrWhy color palettes for ggplot objects
colors_discrete_drwhy(n = 2) colors_breakdown_drwhy()
colors_discrete_drwhy(n = 2) colors_breakdown_drwhy()
n |
number of colors for color palette |
color palette as vector of characters
Dataset consists of 56 columns, 55 numeric and one of type factor 'work_rate'
.
value_eur
is a potential target feature.
fifa20
fifa20
A data frame with 18278 rows and 56 columns. Most of variables representing skills are in range from 0 to 100 and will not be described here. To list non obvious features:
Overall score of player's skills
Potential of a player, younger players tend to have higher level of potential
Market value of a player (in mln EUR)
Range 1 to 5
Range 1 to 5
Range 1 to 5
Divided by slash levels of willingness to work in offense and defense respectively
"Data has been scraped from the publicly available website https://sofifa.com" https://www.kaggle.com/stefanoleone992/fifa-20-complete-player-dataset
Convert your GBM model into a standardized representation.
The returned representation is easy to be interpreted by the user and ready to be used as an argument in treeshap()
function.
gbm.unify(gbm_model, data)
gbm.unify(gbm_model, data)
gbm_model |
An object of |
data |
Reference dataset. A |
a unified model representation - a model_unified.object
object
lightgbm.unify
for LightGBM models
xgboost.unify
for XGBoost models
ranger.unify
for ranger models
randomForest.unify
for randomForest models
library(gbm) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] data['value_eur'] <- fifa20$target gbm_model <- gbm::gbm( formula = value_eur ~ ., data = data, distribution = "gaussian", n.trees = 20, interaction.depth = 4, n.cores = 1) unified_model <- gbm.unify(gbm_model, data) shaps <- treeshap(unified_model, data[1:2,]) plot_contribution(shaps, obs = 1)
library(gbm) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] data['value_eur'] <- fifa20$target gbm_model <- gbm::gbm( formula = value_eur ~ ., data = data, distribution = "gaussian", n.trees = 20, interaction.depth = 4, n.cores = 1) unified_model <- gbm.unify(gbm_model, data) shaps <- treeshap(unified_model, data[1:2,]) plot_contribution(shaps, obs = 1)
Does not check correctness of representation, only basic checks
is.model_unified(x)
is.model_unified(x)
x |
an object to check |
boolean
Does not check correctness of result, only basic checks
is.treeshap(x)
is.treeshap(x)
x |
an object to check |
boolean
Convert your LightGBM model into a standardized representation.
The returned representation is easy to be interpreted by the user and ready to be used as an argument in treeshap()
function.
lightgbm.unify(lgb_model, data, recalculate = FALSE)
lightgbm.unify(lgb_model, data, recalculate = FALSE)
lgb_model |
A lightgbm model - object of class |
data |
Reference dataset. A |
recalculate |
logical indicating if covers should be recalculated according to the dataset given in data. Keep it |
a unified model representation - a model_unified.object
object
gbm.unify
for GBM models
xgboost.unify
for XGBoost models
ranger.unify
for ranger models
randomForest.unify
for randomForest models
library(lightgbm) param_lgbm <- list(objective = "regression", max_depth = 2, force_row_wise = TRUE, num_iterations = 20) data_fifa <- fifa20$data[!colnames(fifa20$data) %in% c('work_rate', 'value_eur', 'gk_diving', 'gk_handling', 'gk_kicking', 'gk_reflexes', 'gk_speed', 'gk_positioning')] data <- na.omit(cbind(data_fifa, fifa20$target)) sparse_data <- as.matrix(data[,-ncol(data)]) x <- lightgbm::lgb.Dataset(sparse_data, label = as.matrix(data[,ncol(data)])) lgb_data <- lightgbm::lgb.Dataset.construct(x) lgb_model <- lightgbm::lightgbm(data = lgb_data, params = param_lgbm, verbose = -1, num_threads = 0) unified_model <- lightgbm.unify(lgb_model, sparse_data) shaps <- treeshap(unified_model, data[1:2, ]) plot_contribution(shaps, obs = 1)
library(lightgbm) param_lgbm <- list(objective = "regression", max_depth = 2, force_row_wise = TRUE, num_iterations = 20) data_fifa <- fifa20$data[!colnames(fifa20$data) %in% c('work_rate', 'value_eur', 'gk_diving', 'gk_handling', 'gk_kicking', 'gk_reflexes', 'gk_speed', 'gk_positioning')] data <- na.omit(cbind(data_fifa, fifa20$target)) sparse_data <- as.matrix(data[,-ncol(data)]) x <- lightgbm::lgb.Dataset(sparse_data, label = as.matrix(data[,ncol(data)])) lgb_data <- lightgbm::lgb.Dataset.construct(x) lgb_model <- lightgbm::lightgbm(data = lgb_data, params = param_lgbm, verbose = -1, num_threads = 0) unified_model <- lightgbm.unify(lgb_model, sparse_data) shaps <- treeshap(unified_model, data[1:2, ]) plot_contribution(shaps, obs = 1)
model_unified_multioutput
object produced by *.unify
or unify
function.
List consisting of model_unified
objects, one for each individual output of a model. For survival models, the list is named using the time points, for which predictions are calculated.
model_unified
object produced by *.unify
or unify
function.
List consisting of two elements:
model - A data.frame
representing model with following columns:
Tree |
0-indexed ID of a tree |
Node |
0-indexed ID of a node in a tree. In a tree the root always has ID 0 |
Feature |
In case of an internal node - name of a feature to split on. Otherwise - NA |
Decision.type |
A factor with two levels: "<" and "<=". In case of an internal node - predicate used for splitting observations. Otherwise - NA |
Split |
For internal nodes threshold used for splitting observations. All observations that satisfy the predicate Decision.type(Split) ('< Split' / '<= Split') are proceeded to the node marked as 'Yes'. Otherwise to the 'No' node. For leaves - NA |
Yes |
Index of a row containing a child Node. Thanks to explicit indicating the row it is much faster to move between nodes |
No |
Index of a row containing a child Node |
Missing |
Index of a row containing a child Node where are proceeded all observations with no value of the dividing feature |
Prediction |
For leaves: Value of prediction in the leaf. For internal nodes: NA |
Cover |
Number of observations seen by the internal node or collected by the leaf for the reference dataset |
data - Dataset used as a reference for calculating SHAP values. A dataset passed to the *.unify
, unify
or set_reference_dataset
function with data
argument. A data.frame
.
Object has two also attributes set:
model |
A string. By what package the model was produced. |
missing_support |
A boolean. Whether the model allows missing values to be present in explained dataset. |
This function plots contributions of features into the prediction for a single observation.
plot_contribution( treeshap, obs = 1, max_vars = 5, min_max = NA, digits = 3, explain_deviation = FALSE, title = "SHAP Break-Down", subtitle = "" )
plot_contribution( treeshap, obs = 1, max_vars = 5, min_max = NA, digits = 3, explain_deviation = FALSE, title = "SHAP Break-Down", subtitle = "" )
treeshap |
A treeshap object produced with the |
obs |
A numeric indicating which observation should be plotted. Be default it's first observation. |
max_vars |
maximum number of variables that shall be presented. Variables with the highest importance will be presented.
Remaining variables will be summed into one additional contribution. By default |
min_max |
a range of OX axis. By default |
digits |
number of decimal places ( |
explain_deviation |
if |
title |
the plot's title, by default |
subtitle |
the plot's subtitle. By default no subtitle. |
a ggplot2
object
treeshap
for calculation of SHAP values
plot_feature_importance
, plot_feature_dependence
, plot_interaction
library(xgboost) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] target <- fifa20$target param <- list(objective = "reg:squarederror", max_depth = 3) xgb_model <- xgboost::xgboost(as.matrix(data), params = param, label = target, nrounds = 20, verbose = FALSE) unified_model <- xgboost.unify(xgb_model, as.matrix(data)) x <- head(data, 1) shap <- treeshap(unified_model, x) plot_contribution(shap, 1, min_max = c(0, 120000000))
library(xgboost) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] target <- fifa20$target param <- list(objective = "reg:squarederror", max_depth = 3) xgb_model <- xgboost::xgboost(as.matrix(data), params = param, label = target, nrounds = 20, verbose = FALSE) unified_model <- xgboost.unify(xgb_model, as.matrix(data)) x <- head(data, 1) shap <- treeshap(unified_model, x) plot_contribution(shap, 1, min_max = c(0, 120000000))
Depending on the value of a variable: how does it contribute into the prediction?
plot_feature_dependence( treeshap, variable, title = "Feature Dependence", subtitle = NULL )
plot_feature_dependence( treeshap, variable, title = "Feature Dependence", subtitle = NULL )
treeshap |
A treeshap object produced with the |
variable |
name or index of variable for which feature dependence will be plotted. |
title |
the plot's title, by default |
subtitle |
the plot's subtitle. By default no subtitle. |
a ggplot2
object
treeshap
for calculation of SHAP values
plot_contribution
, plot_feature_importance
, plot_interaction
library(xgboost) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] target <- fifa20$target param <- list(objective = "reg:squarederror", max_depth = 3) xgb_model <- xgboost::xgboost(as.matrix(data), params = param, label = target, nrounds = 20, verbose = FALSE) unified_model <- xgboost.unify(xgb_model, as.matrix(data)) x <- head(data, 100) shaps <- treeshap(unified_model, x) plot_feature_dependence(shaps, variable = "overall")
library(xgboost) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] target <- fifa20$target param <- list(objective = "reg:squarederror", max_depth = 3) xgb_model <- xgboost::xgboost(as.matrix(data), params = param, label = target, nrounds = 20, verbose = FALSE) unified_model <- xgboost.unify(xgb_model, as.matrix(data)) x <- head(data, 100) shaps <- treeshap(unified_model, x) plot_feature_dependence(shaps, variable = "overall")
This function plots feature importance calculated as means of absolute values of SHAP values of variables (average impact on model output magnitude).
plot_feature_importance( treeshap, desc_sorting = TRUE, max_vars = ncol(shaps), title = "Feature Importance", subtitle = NULL )
plot_feature_importance( treeshap, desc_sorting = TRUE, max_vars = ncol(shaps), title = "Feature Importance", subtitle = NULL )
treeshap |
A treeshap object produced with the |
desc_sorting |
logical. Should the bars be sorted descending? By default TRUE. |
max_vars |
maximum number of variables that shall be presented. By default all are presented. |
title |
the plot's title, by default |
subtitle |
the plot's subtitle. By default no subtitle. |
a ggplot2
object
treeshap
for calculation of SHAP values
plot_contribution
, plot_feature_dependence
, plot_interaction
library(xgboost) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] target <- fifa20$target param <- list(objective = "reg:squarederror", max_depth = 3) xgb_model <- xgboost::xgboost(as.matrix(data), params = param, label = target, nrounds = 20, verbose = FALSE) unified_model <- xgboost.unify(xgb_model, as.matrix(data)) shaps <- treeshap(unified_model, as.matrix(head(data, 3))) plot_feature_importance(shaps, max_vars = 4)
library(xgboost) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] target <- fifa20$target param <- list(objective = "reg:squarederror", max_depth = 3) xgb_model <- xgboost::xgboost(as.matrix(data), params = param, label = target, nrounds = 20, verbose = FALSE) unified_model <- xgboost.unify(xgb_model, as.matrix(data)) shaps <- treeshap(unified_model, as.matrix(head(data, 3))) plot_feature_importance(shaps, max_vars = 4)
This function plots SHAP Interaction value for two variables depending on the value of the first variable. Value of the second variable is marked with the color.
plot_interaction( treeshap, var1, var2, title = "SHAP Interaction Value Plot", subtitle = "" )
plot_interaction( treeshap, var1, var2, title = "SHAP Interaction Value Plot", subtitle = "" )
treeshap |
A treeshap object produced with |
var1 |
name or index of the first variable - plotted on x axis. |
var2 |
name or index of the second variable - marked with color. |
title |
the plot's title, by default |
subtitle |
the plot's subtitle. By default no subtitle. |
a ggplot2
object
treeshap
for calculation of SHAP Interaction values
plot_contribution
, plot_feature_importance
, plot_feature_dependence
data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] target <- fifa20$target param2 <- list(objective = "reg:squarederror", max_depth = 5) xgb_model2 <- xgboost::xgboost(as.matrix(data), params = param2, label = target, nrounds = 10) unified_model2 <- xgboost.unify(xgb_model2, data) inters <- treeshap(unified_model2, as.matrix(data[1:50, ]), interactions = TRUE) plot_interaction(inters, "dribbling", "defending")
data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] target <- fifa20$target param2 <- list(objective = "reg:squarederror", max_depth = 5) xgb_model2 <- xgboost::xgboost(as.matrix(data), params = param2, label = target, nrounds = 10) unified_model2 <- xgboost.unify(xgb_model2, data) inters <- treeshap(unified_model2, as.matrix(data[1:50, ]), interactions = TRUE) plot_interaction(inters, "dribbling", "defending")
Predict using unified_model representation.
## S3 method for class 'model_unified' predict(object, x, ...)
## S3 method for class 'model_unified' predict(object, x, ...)
object |
Unified model representation of the model created with a (model).unify function. |
x |
Observations to predict. A |
... |
other parameters |
a vector of predictions.
library(gbm) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] data['value_eur'] <- fifa20$target gbm_model <- gbm::gbm( formula = value_eur ~ ., data = data, distribution = "laplace", n.trees = 20, interaction.depth = 4, n.cores = 1) unified <- gbm.unify(gbm_model, data) predict(unified, data[2001:2005, ])
library(gbm) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] data['value_eur'] <- fifa20$target gbm_model <- gbm::gbm( formula = value_eur ~ ., data = data, distribution = "laplace", n.trees = 20, interaction.depth = 4, n.cores = 1) unified <- gbm.unify(gbm_model, data) predict(unified, data[2001:2005, ])
Prints model_unified objects
## S3 method for class 'model_unified' print(x, ...)
## S3 method for class 'model_unified' print(x, ...)
x |
a model_unified object |
... |
other arguments |
No return value, called for printing
Prints model_unified_multioutput objects
## S3 method for class 'model_unified_multioutput' print(x, ...)
## S3 method for class 'model_unified_multioutput' print(x, ...)
x |
a model_unified_multioutput object |
... |
other arguments |
No return value, called for printing
Prints treeshap objects
## S3 method for class 'treeshap' print(x, ...)
## S3 method for class 'treeshap' print(x, ...)
x |
a treeshap object |
... |
other arguments |
No return value, called for printing
Prints treeshap_multioutput objects
## S3 method for class 'treeshap_multioutput' print(x, ...)
## S3 method for class 'treeshap_multioutput' print(x, ...)
x |
a treeshap_multioutput object |
... |
other arguments |
No return value, called for printing
Convert your randomForest model into a standardized representation.
The returned representation is easy to be interpreted by the user and ready to be used as an argument in treeshap()
function.
randomForest.unify(rf_model, data)
randomForest.unify(rf_model, data)
rf_model |
An object of |
data |
Reference dataset. A |
Binary classification models with a target variable that is a factor with two levels, 0 and 1, are supported
a unified model representation - a model_unified.object
object
lightgbm.unify
for LightGBM models
gbm.unify
for GBM models
xgboost.unify
for XGBoost models
ranger.unify
for ranger models
library(randomForest) data_fifa <- fifa20$data[!colnames(fifa20$data) %in% c('work_rate', 'value_eur', 'gk_diving', 'gk_handling', 'gk_kicking', 'gk_reflexes', 'gk_speed', 'gk_positioning')] data <- na.omit(cbind(data_fifa, target = fifa20$target)) rf <- randomForest::randomForest(target~., data = data, maxnodes = 10, ntree = 10) unified_model <- randomForest.unify(rf, data) shaps <- treeshap(unified_model, data[1:2,]) # plot_contribution(shaps, obs = 1)
library(randomForest) data_fifa <- fifa20$data[!colnames(fifa20$data) %in% c('work_rate', 'value_eur', 'gk_diving', 'gk_handling', 'gk_kicking', 'gk_reflexes', 'gk_speed', 'gk_positioning')] data <- na.omit(cbind(data_fifa, target = fifa20$target)) rf <- randomForest::randomForest(target~., data = data, maxnodes = 10, ntree = 10) unified_model <- randomForest.unify(rf, data) shaps <- treeshap(unified_model, data[1:2,]) # plot_contribution(shaps, obs = 1)
Convert your ranger model into a standardized representation.
The returned representation is easy to be interpreted by the user and ready to be used as an argument in treeshap()
function.
ranger_surv.unify( rf_model, data, type = c("risk", "survival", "chf"), times = NULL )
ranger_surv.unify( rf_model, data, type = c("risk", "survival", "chf"), times = NULL )
rf_model |
An object of |
data |
Reference dataset. A |
type |
A character to define the type of model prediction to use. Either |
times |
A numeric vector of unique death times at which the prediction should be evaluated. By default |
The survival forest implemented in the ranger
package stores cumulative hazard
functions (CHFs) in the leaves of survival trees, as proposed for Random Survival Forests
(Ishwaran et al. 2008). The final model prediction is made by averaging these CHFs
from all the trees. To provide explanations in the form of a survival function,
the CHFs from the leaves are converted into survival functions (SFs) using
the formula SF(t) = exp(-CHF(t)).
However, it is important to note that averaging these SFs does not yield the correct
model prediction as the model prediction is the average of CHFs transformed in the same way.
Therefore, when you obtain explanations based on the survival function,
they are only proxies and may not be fully consistent with the model predictions
obtained using for example predict
function.
For type = "risk"
a unified model representation is returned - a model_unified.object
object. For type = "survival"
or type = "chf"
- a model_unified_multioutput.object
object is returned, which is a list that contains unified model representation (model_unified.object
object) for each time point. In this case, the list names are time points at which the survival function was evaluated.
ranger.unify
for regression and classification ranger models
lightgbm.unify
for LightGBM models
gbm.unify
for GBM models
xgboost.unify
for XGBoost models
randomForest.unify
for randomForest models
library(ranger) data_colon <- data.table::data.table(survival::colon) data_colon <- na.omit(data_colon[get("etype") == 2, ]) surv_cols <- c("status", "time", "rx") feature_cols <- colnames(data_colon)[3:(ncol(data_colon) - 1)] train_x <- model.matrix( ~ -1 + ., data_colon[, .SD, .SDcols = setdiff(feature_cols, surv_cols[1:2])] ) train_y <- survival::Surv( event = (data_colon[, get("status")] |> as.character() |> as.integer()), time = data_colon[, get("time")], type = "right" ) rf <- ranger::ranger( x = train_x, y = train_y, data = data_colon, max.depth = 10, num.trees = 10 ) unified_model_risk <- ranger_surv.unify(rf, train_x, type = "risk") shaps <- treeshap(unified_model_risk, train_x[1:2,]) # compute shaps for 3 selected time points unified_model_surv <- ranger_surv.unify(rf, train_x, type = "survival", times = c(23, 50, 73)) shaps_surv <- treeshap(unified_model_surv, train_x[1:2,])
library(ranger) data_colon <- data.table::data.table(survival::colon) data_colon <- na.omit(data_colon[get("etype") == 2, ]) surv_cols <- c("status", "time", "rx") feature_cols <- colnames(data_colon)[3:(ncol(data_colon) - 1)] train_x <- model.matrix( ~ -1 + ., data_colon[, .SD, .SDcols = setdiff(feature_cols, surv_cols[1:2])] ) train_y <- survival::Surv( event = (data_colon[, get("status")] |> as.character() |> as.integer()), time = data_colon[, get("time")], type = "right" ) rf <- ranger::ranger( x = train_x, y = train_y, data = data_colon, max.depth = 10, num.trees = 10 ) unified_model_risk <- ranger_surv.unify(rf, train_x, type = "risk") shaps <- treeshap(unified_model_risk, train_x[1:2,]) # compute shaps for 3 selected time points unified_model_surv <- ranger_surv.unify(rf, train_x, type = "survival", times = c(23, 50, 73)) shaps_surv <- treeshap(unified_model_surv, train_x[1:2,])
Convert your ranger model into a standardized representation.
The returned representation is easy to be interpreted by the user and ready to be used as an argument in treeshap()
function.
ranger.unify(rf_model, data)
ranger.unify(rf_model, data)
rf_model |
An object of |
data |
Reference dataset. A |
a unified model representation - a model_unified.object
object
lightgbm.unify
for LightGBM models
gbm.unify
for GBM models
xgboost.unify
for XGBoost models
randomForest.unify
for randomForest models
library(ranger) data_fifa <- fifa20$data[!colnames(fifa20$data) %in% c('work_rate', 'value_eur', 'gk_diving', 'gk_handling', 'gk_kicking', 'gk_reflexes', 'gk_speed', 'gk_positioning')] data <- na.omit(cbind(data_fifa, target = fifa20$target)) rf <- ranger::ranger(target~., data = data, max.depth = 10, num.trees = 10) unified_model <- ranger.unify(rf, data) shaps <- treeshap(unified_model, data[1:2,]) plot_contribution(shaps, obs = 1)
library(ranger) data_fifa <- fifa20$data[!colnames(fifa20$data) %in% c('work_rate', 'value_eur', 'gk_diving', 'gk_handling', 'gk_kicking', 'gk_reflexes', 'gk_speed', 'gk_positioning')] data <- na.omit(cbind(data_fifa, target = fifa20$target)) rf <- ranger::ranger(target~., data = data, max.depth = 10, num.trees = 10) unified_model <- ranger.unify(rf, data) shaps <- treeshap(unified_model, data[1:2,]) plot_contribution(shaps, obs = 1)
Change a dataset used as reference for calculating SHAP values.
Reference dataset is initially set with data
argument in unifying function.
Usually reference dataset is dataset used to train the model.
Important property of reference dataset is that SHAP values for each observation add up to its deviation from mean prediction for a reference dataset.
set_reference_dataset(unified_model, x)
set_reference_dataset(unified_model, x)
unified_model |
Unified model representation of the model created with a (model).unify function. ( |
x |
Reference dataset. A |
model_unified.object
. Unified representation of the model as created with a (model).unify function,
but with changed reference dataset (Cover column containing updated values).
lightgbm.unify
for LightGBM models
gbm.unify
for GBM models
xgboost.unify
for XGBoost models
ranger.unify
for ranger models
randomForest.unify
for randomForest models
library(gbm) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] data['value_eur'] <- fifa20$target gbm_model <- gbm::gbm( formula = value_eur ~ ., data = data, distribution = "laplace", n.trees = 20, interaction.depth = 4, n.cores = 1) unified <- gbm.unify(gbm_model, data) set_reference_dataset(unified, data[200:700, ])
library(gbm) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] data['value_eur'] <- fifa20$target gbm_model <- gbm::gbm( formula = value_eur ~ ., data = data, distribution = "laplace", n.trees = 20, interaction.depth = 4, n.cores = 1) unified <- gbm.unify(gbm_model, data) set_reference_dataset(unified, data[200:700, ])
DrWhy Theme for ggplot objects
theme_drwhy() theme_drwhy_vertical()
theme_drwhy() theme_drwhy_vertical()
theme for ggplot2 objects
Calculate SHAP values and optionally SHAP Interaction values.
treeshap(unified_model, x, interactions = FALSE, verbose = TRUE)
treeshap(unified_model, x, interactions = FALSE, verbose = TRUE)
unified_model |
Unified data.frame representation of the model created with a (model).unify function. A |
x |
Observations to be explained. A |
interactions |
Whether to calculate SHAP interaction values. By default is |
verbose |
Whether to print progress bar to the console. Should be logical. Progress bar will not be displayed on Windows. |
A treeshap.object
object (for single-output models) or treeshap_multioutput.object
, which is a list of treeshap.object
objects (for multi-output models). SHAP values can be accessed from treeshap.object
with $shaps
, and interaction values can be accessed with $interactions
.
xgboost.unify
for XGBoost models
lightgbm.unify
for LightGBM models
gbm.unify
for GBM models
randomForest.unify
for randomForest models
ranger.unify
for ranger models
ranger_surv.unify
for ranger survival models
library(xgboost) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] target <- fifa20$target # calculating simple SHAP values param <- list(objective = "reg:squarederror", max_depth = 3) xgb_model <- xgboost::xgboost(as.matrix(data), params = param, label = target, nrounds = 20, verbose = FALSE) unified_model <- xgboost.unify(xgb_model, as.matrix(data)) treeshap1 <- treeshap(unified_model, head(data, 3)) plot_contribution(treeshap1, obs = 1) treeshap1$shaps # It's possible to calcualte explanation over different part of the data set unified_model_rec <- set_reference_dataset(unified_model, data[1:1000, ]) treeshap_rec <- treeshap(unified_model, head(data, 3)) plot_contribution(treeshap_rec, obs = 1) # calculating SHAP interaction values param2 <- list(objective = "reg:squarederror", max_depth = 7) xgb_model2 <- xgboost::xgboost(as.matrix(data), params = param2, label = target, nrounds = 10) unified_model2 <- xgboost.unify(xgb_model2, as.matrix(data)) treeshap2 <- treeshap(unified_model2, head(data, 3), interactions = TRUE) treeshap2$interactions
library(xgboost) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] target <- fifa20$target # calculating simple SHAP values param <- list(objective = "reg:squarederror", max_depth = 3) xgb_model <- xgboost::xgboost(as.matrix(data), params = param, label = target, nrounds = 20, verbose = FALSE) unified_model <- xgboost.unify(xgb_model, as.matrix(data)) treeshap1 <- treeshap(unified_model, head(data, 3)) plot_contribution(treeshap1, obs = 1) treeshap1$shaps # It's possible to calcualte explanation over different part of the data set unified_model_rec <- set_reference_dataset(unified_model, data[1:1000, ]) treeshap_rec <- treeshap(unified_model, head(data, 3)) plot_contribution(treeshap_rec, obs = 1) # calculating SHAP interaction values param2 <- list(objective = "reg:squarederror", max_depth = 7) xgb_model2 <- xgboost::xgboost(as.matrix(data), params = param2, label = target, nrounds = 10) unified_model2 <- xgboost.unify(xgb_model2, as.matrix(data)) treeshap2 <- treeshap(unified_model2, head(data, 3), interactions = TRUE) treeshap2$interactions
treeshap_multioutput
object produced by treeshap
function.
List consisting of treeshap
objects, one for each individual output of a model. For survival models, the list is named using the time points, for which TreeSHAP values are calculated.
treeshap
object produced by treeshap
function.
List consisting of four elements:
A data.frame
with M columns, X rows (M - number of features, X - number of explained observations). Every row corresponds to SHAP values for a observation.
An array
with dimensions (M, M, X) (M - number of features, X - number of explained observations). Every [, , i]
slice is a symmetric matrix - SHAP Interaction values for a observation. [a, b, i]
element is SHAP Interaction value of features a
and b
for observation i
. Is NULL
if interactions where not calculated (parameter interactions
set FALSE
.)
An object of type model_unified.object
. Unified representation of a model for which SHAP values were calculated. It is used by some of the plotting functions.
Explained dataset. data.frame
or matrix
. It is used by some of the plotting functions.
plot_contribution
, plot_feature_importance
, plot_feature_dependence
, plot_interaction
Convert your tree-based model into a standardized representation.
The returned representation is easy to be interpreted by the user and ready to be used as an argument in treeshap()
function.
unify(model, data, ...)
unify(model, data, ...)
model |
A tree-based model object of any supported class ( |
data |
Reference dataset. A |
... |
Additional parameters passed to the model-specific unification functions. |
A unified model representation - a model_unified.object
object (for single-output models) or model_unified_multioutput.object
, which is a list of model_unified.object
objects (for multi-output models).
lightgbm.unify
for LightGBM models
gbm.unify
for GBM models
xgboost.unify
for XGBoost models
ranger.unify
for ranger models
randomForest.unify
for randomForest models
library(ranger) data_fifa <- fifa20$data[!colnames(fifa20$data) %in% c('work_rate', 'value_eur', 'gk_diving', 'gk_handling', 'gk_kicking', 'gk_reflexes', 'gk_speed', 'gk_positioning')] data <- na.omit(cbind(data_fifa, target = fifa20$target)) rf1 <- ranger::ranger(target~., data = data, max.depth = 10, num.trees = 10) unified_model1 <- unify(rf1, data) shaps1 <- treeshap(unified_model1, data[1:2,]) plot_contribution(shaps1, obs = 1) rf2 <- randomForest::randomForest(target~., data = data, maxnodes = 10, ntree = 10) unified_model2 <- unify(rf2, data) shaps2 <- treeshap(unified_model2, data[1:2,]) plot_contribution(shaps2, obs = 1)
library(ranger) data_fifa <- fifa20$data[!colnames(fifa20$data) %in% c('work_rate', 'value_eur', 'gk_diving', 'gk_handling', 'gk_kicking', 'gk_reflexes', 'gk_speed', 'gk_positioning')] data <- na.omit(cbind(data_fifa, target = fifa20$target)) rf1 <- ranger::ranger(target~., data = data, max.depth = 10, num.trees = 10) unified_model1 <- unify(rf1, data) shaps1 <- treeshap(unified_model1, data[1:2,]) plot_contribution(shaps1, obs = 1) rf2 <- randomForest::randomForest(target~., data = data, maxnodes = 10, ntree = 10) unified_model2 <- unify(rf2, data) shaps2 <- treeshap(unified_model2, data[1:2,]) plot_contribution(shaps2, obs = 1)
Convert your XGBoost model into a standardized representation.
The returned representation is easy to be interpreted by the user and ready to be used as an argument in treeshap()
function.
xgboost.unify(xgb_model, data, recalculate = FALSE)
xgboost.unify(xgb_model, data, recalculate = FALSE)
xgb_model |
A XGBoost model - object of class |
data |
Reference dataset. A |
recalculate |
logical indicating if covers should be recalculated according to the dataset given in data. Keep it |
a unified model representation - a model_unified.object
object
lightgbm.unify
for LightGBM models
gbm.unify
for GBM models
ranger.unify
for ranger models
randomForest.unify
for randomForest models
library(xgboost) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] target <- fifa20$target param <- list(objective = "reg:squarederror", max_depth = 3) xgb_model <- xgboost::xgboost(as.matrix(data), params = param, label = target, nrounds = 20, verbose = 0) unified_model <- xgboost.unify(xgb_model, as.matrix(data)) shaps <- treeshap(unified_model, data[1:2,]) plot_contribution(shaps, obs = 1)
library(xgboost) data <- fifa20$data[colnames(fifa20$data) != 'work_rate'] target <- fifa20$target param <- list(objective = "reg:squarederror", max_depth = 3) xgb_model <- xgboost::xgboost(as.matrix(data), params = param, label = target, nrounds = 20, verbose = 0) unified_model <- xgboost.unify(xgb_model, as.matrix(data)) shaps <- treeshap(unified_model, data[1:2,]) plot_contribution(shaps, obs = 1)