This release is intended to be the last before stable version 1.0.0.
Passing a background dataset bg_X
is now optional.
If the explanation data X
is sufficiently large (>= 50 rows), bg_X
is derived as a random sample of bg_n = 200
rows from X
. If X
has less than bg_n
rows, then simply
bg_X = X
. If X
has too few rows (< 50), you will have to pass an explicit bg_X
.
ranger()
survival models now also work out-of-the-box without passing a tailored prediction function. Use the new argument survival = "chf"
in kernelshap()
and permshap()
to distinguish cumulative hazards (default) and survival probabilities per time point.kernelshap()
and permshap()
now contain bg_X
and bg_w
used to calculate the SHAP values.gam::gam()
.New additive explainer additive_shap()
that works for models fitted via
lm()
,glm()
,mgcv::gam()
,mgcv::bam()
,gam::gam()
,survival::coxph()
,survival::survreg()
.The explainer uses predict(..., type = "terms")
, a beautiful trick
used in fastshap::explain.lm()
. The result will be identical to those returned by kernelshap()
and permshap()
but exponentially faster. Thanks David Watson for the great idea discussed in #130.
permshap()
now returns an object of class "kernelshap" to reduce the number of redundant methods.kernelshap()
, permshap()
(and additive_shap()
) got an element "algorithm".is.permshap()
has been removed.predict_type = "prob"
.permshap()
by caching calculations for the two special permutations of all 0 and all 1. Consequently, the m_exact
component in the output is reduced by 2.permshap()
to calculate exact permutation SHAP values. The function currently works for up to 14 features.S
and SE
lists.feature_names
as dimnames (https://github.com/ModelOriented/kernelshap/issues/96).ks_extract()
function. It was designed to extract objects like the matrix S
of SHAP values from the resulting "kernelshap" object x
. We feel that the standard extraction options (x$S
, x[["S"]]
, or getElement(x, "S")
) are sufficient.X
, and $K$ is the dimension of a single prediction (usually 1).verbose = FALSE
now does not suppress the warning on too large background data anymore. Use suppressWarnings()
instead.bg_X
contained more columns than X
, unflexible prediction functions could fail when being applied to bg_X
.feature_names
allows to specify the features to calculate SHAP values for. The default equals to colnames(X)
. This should be changed only in situations when X
(the dataset to be explained) contains non-feature columns.Thanks to David Watson, exact calculations are now also possible for $p>5$ features. By default, the algorithm uses exact calculations for $p \le 8$ and a hybrid strategy otherwise, see the next section. At the same time, the exact algorithm became much more efficient.
A word of caution: Exact calculations mean to create $2^p-2$ on-off vectors $z$ (cheap step) and evaluating the model on a whopping $(2^p-2)N$ rows, where $N$ is the number of rows of the background data (expensive step). As this explodes with large $p$, we do not recommend the exact strategy for $p > 10$.
The iterative Kernel SHAP sampling algorithm of Covert and Lee (2021) [1] works by randomly sample $m$ on-off vectors $z$ so that their sum follows the SHAP Kernel weight distribution (renormalized to the range from $1$ to $p-1$). Based on these vectors, many predictions are formed. Then, Kernel SHAP values are derived as the solution of a constrained linear regression, see [1] for details. This is done multiple times until convergence.
A drawback of this strategy is that many (at least 75%) of the $z$ vectors will have $\sum z \in {1, p-1}$, producing many duplicates. Similarly, at least 92% of the mass will be used for the $p(p+1)$ possible vectors with $\sum z \in {1, 2, p-1, p-2}$ etc. This inefficiency can be fixed by a hybrid strategy, combining exact calculations with sampling. The hybrid algorithm has two steps:
The default behaviour of kernelshap()
is as follows:
It is also possible to use a pure sampling strategy, see Section "User visible changes" below. While this is usually not advisable compared to a hybrid approach, the options of kernelshap()
allow to study different properties of Kernel SHAP and doing empirical research on the topic.
Kernel SHAP in the Python implementation "shap" uses a quite similar hybrid strategy, but without iterating. The new logic in the R package thus combines the efficiency of the Python implementation with the convergence monitoring of [1].
[1] Ian Covert and Su-In Lee. Improving KernelSHAP: Practical Shapley Value Estimation Using Linear Regression. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:3457-3465, 2021.
m
is reduced from $8p$ to $2p$ except when hybrid_degree = 0
(pure sampling).exact
is now TRUE
for $p \le 8$ instead of $p \le 5$.hybrid_degree
is introduced to control the exact part of the hybrid algorithm. The default is 2 for $4 \le p \le 16$ and degree 1 otherwise. Set to 0 to force a pure sampling strategy (not recommended but useful to demonstrate superiority of hybrid approaches).tol
was reduced from 0.01 to 0.005.max_iter
was reduced from 250 to 100.m
.print()
is now more slim.summary()
function shows more infos.m_exact
(the number of on-off vectors used for the exact part), prop_exact
(proportion of mass treated in exact fashion), exact
flag, and txt
(the info message when starting the algorithm).mgcv::gam()
would cause an error in check_pred()
(they are 1D-arrays).The interface of kernelshap()
has been revised. Instead of specifying a prediction function, it suffices now to pass the fitted model object. The default pred_fun
is now stats::predict
, which works in most cases. Some other cases are catched via model class ("ranger" and mlr3 "Learner"). The pred_fun
can be overwritten by a function of the form function(object, X, ...)
. Additional arguments to the prediction function are passed via ...
of kernelshap()
.
Some examples:
kernelshap(fit, X, bg_X)
kernelshap(fit, X, bg_X, type = "response")
kernelshap(fit, X, bg_X, pred_fun = function(m, X) exp(predict(m, X)))
kernelshap()
has received a more intuitive interface, see breaking change above.kernelshap()
, e.g., using the "doFuture" package, and then set parallel = TRUE
. Especially on Windows, sometimes not all global variables or packages are loaded in the parallel instances. These can be specified by parallel_args
, a list of arguments passed to foreach()
.kernelshap()
has become much faster.matrix
, data.frame
s, and tibble
s, the package now also accepts data.table
s (if the prediction function can deal with them).kernelshap()
is less picky regarding the output structure of pred_fun()
.kernelshap()
is less picky about the column structure of the background data bg_X
. It should simply contain the columns of X
(but can have more or in different order). The old behaviour was to launch an error if colnames(X) != colnames(bg_X)
.m = "auto"
has been changed from trunc(20 * sqrt(p))
to max(trunc(20 * sqrt(p)), 5 * p
. This will have an effect for cases where the number of features $p > 16$. The change will imply more robust results for large p.ks_extract(, what = "S")
.MASS::ginv()
, the Moore-Penrose pseudoinverse using svd()
.This is the initial release.