sv_waterfall()
and sv_force()
: The x label has been changed from "SHAP value" to "Prediction".sort_features = TRUE
in sv_importance()
and sv_interaction()
. Set to FALSE
to show the features as they appear in your SHAP matrix. In that case, the plots will show the first max_display
features, not the most important features. Implements #137.shapviz.xgboost()
would fail if a single row is passed. This has been fixed in #142. Thanks @sebsilas for reporting.If no SHAP interaction values are available, by default, the color feature v'
is selected by the heuristic potential_interaction()
, which works as follows:
v
(the on the x-axis) is numeric, it is binned into nbins
bins.v
are regressed onto v'
and the R-squared is calculated. Rows with missing v'
are discarded.v'
values.This measures how much variability in the SHAP values of v
is explained by v'
, after accounting for v
.
We have introduced four parameters to control the heuristic. Their defaults are in line with the old behaviour.
nbin = NULL
: Into how many quantile bins should a numeric v
be binned? The default NULL
equals the smaller of $n/20$ and $\sqrt n$ (rounded up), where $n$ is the sample size.color_num
Should color features be converted to numeric, even if they are factors/characters? Default is TRUE
.scale = FALSE
: Should R-squared be multiplied with the sample variance of
within-bin SHAP values? If TRUE
, bins with stronger vertical scatter will get higher weight. The default is FALSE
.adjusted = FALSE
: Should adjusted R-squared be calculated?If SHAP interaction values are available, these parameters have no effect. In sv_dependence()
they are called ih_nbin
etc.
This partly implements the ideas in #119 of Roel Verbelen, thanks a lot for your patient explanations!
We will continue to experiment with the defaults, which might change in the future. A good alternative to the current (naive) defaults could be:
nbins = 7
: Smaller than now to not overfit too strongly with factor/character color features.color_num = FALSE
: To not naively integer encode factors/characters.scale = TRUE
: To account for non-equal spread in bins.adjusted = TRUE
: To not put too much weight on factors with many categories.sv_dependence()
: If color_var = "auto"
(default) and no color feature seems to be relevant (SHAP interaction is NULL
, or heuristic returns no positive value), there won't be any color scale. Furthermore, in some edge cases, a different
color feature might be selected.mshapviz()
objects can now be rowbinded via rbind()
or +
. Implemented by @jmaspons in #110.mshapviz()
is more strict when combining multiple "shapviz" objects. These now need to have identical column names, see #114.print.shapviz()
now shows top two rows of SHAP matrix.nthread = 1
in all calls to xgb.DMatrix()
as suggested by @jmaspons in #109.permshap()
connector is now part of {kerneshap} #122.sv_dependence2D()
: In case add_vars
are passed, x
and/or y
are removed from it in order to not use any variable twice. #116.split.shapviz()
now drops empty levels. They launched an error because empty "shapviz" objects are currently not supported. #117, #118sv_importance()
of a "mshapviz" object now returns a dodged barplot instead of separate barplots via {patchwork}. Use the new argument bar_type
to switch to a stacked barplot (bar_type = "stack"
), to "facets" (via {ggplot2}), or "separate" for the old behaviour.dimnames.shapviz()
has received a replacement method. You can thus change the column names of SHAP matrix and feature data (as well as SHAP interactions) by colnames(x) <- ...
, see https://github.com/ModelOriented/shapviz/issues/98package_version()
applied to numeric value will be deprecated in the future)sv_dependence2D()
: x and y coordinates are two features, while their summed SHAP values are shown on the color scale. If interaction = TRUE
, SHAP interaction values are shown on the color scale instead. The function is vectorized in x
and/or y
. This visualization is especially useful for models with geographic components.split(x, f)
splits a "shapviz" object x
into a "mshapviz" object.fastshap::explain()
offers the option shap_only
. To conveniently construct the "shapviz" object, use shapviz(fastshap::explain(..., shap_only = FALSE))
. This not only passes the SHAP matrix but also the feature data and the baseline. Thanks, Brandon Greenwell!Sometimes, you will find it necessary to work with several "shapviz" objects at the same time:
To simplify the workflow, {shapviz} introduces the "mshapviz" object ("m" like "multi"). You can create it in different ways:
shapviz()
on multiclass XGBoost or LightGBM models.shapviz()
on "kernelshap" objects created from multiclass/multioutput models.c(Mod_1 = s1, Mod_2 = s2, ...)
on "shapviz" objects s1
, s2
, ...mshapviz(list(Mod_1 = s1, Mod_2 = s2, ...))
The sv_*()
functions use the {patchwork} package to glue the individual plots together.
See the new vignette for more info and specific examples.
sv_dependence()
now allows multiple v
and/or color_var
to be plotted (glued via {patchwork}).row_id
of sv_waterfall()
and sv_force()
now also allows a vector of integers or a logical vector. If more than one row is selected, SHAP values and predictions are averaged before plotting (aggregated SHAP values in {DALEX}).x1
, x2
can now be concatenated in rowwise manner using x1 + x2
or rbind(x1, x2)
, again thanks to Adrian.colnames()
: "shapviz" objects x
have received a dimnames()
function, so you can now, e.g., use colnames(x)
to see the feature names.x
can now be subsetted using x[cond, features]
.sv_dependence()
, sv_importance(kind="bee")
, and sv_interaction()
.sv_dependence()
has been shortened to "SHAP interaction".show_other
of sv_importance()
has been removed.S_inter
.print.shapviz()
is much more compact, use summary.shapviz()
for more info.sv_waterfall()
: Using order_fun()
would not work as expected with max_display
. This has been fixed.sv_dependence()
: Passing viridis_args = NULL
would hide the color guide title. This has been fixed. But please pass viridis_args = list()
instead.sv_dependence()
now uses color_var = "auto"
instead of color_var = NULL
.sv_dependence()
now uses "SHAP value" as y label (instead of the more verbose "SHAP value of [feature]").S_inter
(3D array):
shapviz(object, ..., S_inter = NULL)
shapviz(object, ..., interactions = TRUE)
shapviz(object, ...)
sv_interaction(x)
shows matrix of beeswarm plots.sv_dependence(x, v = "x1", color_var = "x2", interactions = TRUE)
plots SHAP interaction values.sv_dependence(x, v = "x1", interactions = TRUE)
plots pure main effects of "x1".sv_dependence(..., color_var = "auto")
uses those to determine the most interacting color variable.collapse_shap()
also works for SHAP interaction arrays.get_shap_interactions()
.sv_importance()
: In case of too many features, sv_importance()
used to collapse the remaining features into an additional bar/beeswarm. This logic has been removed, and the show_other
argument has been deprecated.sv_dependence()
automatically adds horizontal jitter for discrete v
. This now also works if v
is numeric with at most seven unique values, not only for logicals, factors, and character v
.sv_importance()
does not use a flipped coordinate system anymore.sv_importance()
has received a new argument show_others = TRUE
. Set to FALSE
to hide the "other" bar/beeswarm.The following dependencies have been removed:
bee_width
: Relative width of the beeswarms. The default is 0.4. It replaces the width
argument passed via ...
.bee_adjust
: Relative adjustment factor of the bandwidth used in estimating the density of the beeswarms. Default is 0.5....
arguments are now passed to geom_point()
.plotly::ggplotly()
now works for most functionalities of sv_importance()
, including beeswarms.X
of the constructor of shapviz()
is now less picky. If it contains columns not present in the SHAP matrix, they are silently dropped. Furthermore, the column order of the SHAP matrix and X
is now determined by the SHAP matrix.shapviz_from_lgb_predict()
and shapviz_from_xgb_predict()
format_fun
argument in sv_force()
and sv_waterfall()
sort_fun
argument in sv_waterfall()
collapse_shap()
is not anymore an S3 method. It is just a normal function that can be applied to a matrix.sv_importance()
would return an error.X_pred
from matrix
to xgb.DMatrix
in shapviz.xgb.Booster()
.treeshap()
example to a ranger()
model.collapse
argument in shapviz()
. This is named list specifying which columns in the SHAP matrix are to be collapsed by rowwise summation. A typical application will be to combine the SHAP values of one-hot-encoded dummies and explain them by the corrsponding factor variable.sv_importance()
, see next section.The calculations behind sv_importance()
are unchanged, but defaults and some plot aspects have been reworked.
sv_importance()
now shows a bar plot by default. Use kind = "beeswarm"
to get a beeswarm plot.sv_importance()
does not show SHAP feature importances as text anymore. Use show_numbers = TRUE
to get them back. Furthermore, the numbers are now printed on top of the bars instead on their bottom.show_numbers
can be used to to add SHAP feature importance values for all plot types.max_display
has been increased from 10 to 15.bar_width
.color_bar_title
. Set to NULL
to remove the color bar altogether.format_fun
now uses a right-aligned number formatter with aligned decimal separator by default.dim()
method for "shapviz" object, implying nrow()
and ncol()
.format_fun
argument of sv_waterfall()
and sv_force()
has been replaced by format_shap
to format SHAP values and format_feat
to format numeric feature values. By default, they use the new global options "shapviz.format_shap" and "shapviz.format_feat", both with default function(z) prettyNum(z, digits = 3, scientific = FALSE)
.sv_waterfall()
now uses the more consistent argument order_fun = function(s) order(abs(s))
instead of the original sort_fun = function(shap) abs(shap)
that was then passed to order()
.viridis_args = getOption("shapviz.viridis_args")
to sv_dependence()
and sv_importance()
to control the viridis color scale options. The default global option equals list(begin = 0.25, end = 0.85, option = "inferno")
. For example, to switch to a standard viridis scale, you can either change the default with options(shapviz.viridis_args = NULL)
or set viridis_args = NULL
.shapviz_from_lgb_predict()
and shapviz_from_xgb_predict
in favour of the collapsing logic (see above). The functions will be removed in version 0.3.0.predict()
arguments of LightGBM (data -> newdata, predcontrib = TRUE -> type = "contrib").This is the initial CRAN release.