Title: | Kernel SHAP |
---|---|
Description: | Efficient implementation of Kernel SHAP, see Lundberg and Lee (2017), and Covert and Lee (2021) <http://proceedings.mlr.press/v130/covert21a>. Furthermore, for up to 14 features, exact permutation SHAP values can be calculated. The package plays well together with meta-learning packages like 'tidymodels', 'caret' or 'mlr3'. Visualizations can be done using the R package 'shapviz'. |
Authors: | Michael Mayer [aut, cre] , David Watson [aut] , Przemyslaw Biecek [ctb] |
Maintainer: | Michael Mayer <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.7.1 |
Built: | 2024-11-07 05:42:49 UTC |
Source: | https://github.com/modeloriented/kernelshap |
Exact additive SHAP assuming feature independence. The implementation works for models fitted via
lm()
,
gam::gam()
,
survival::coxph()
, and
additive_shap(object, X, verbose = TRUE, ...)
additive_shap(object, X, verbose = TRUE, ...)
object |
Fitted additive model. |
X |
Dataframe with rows to be explained. Passed to
|
verbose |
Set to |
... |
Currently unused. |
The SHAP values are extracted via predict(object, newdata = X, type = "terms")
,
a logic adopted from fastshap:::explain.lm(..., exact = TRUE)
.
Models with interactions (specified via :
or *
), or with terms of
multiple features like log(x1/x2)
are not supported.
Note that the SHAP values obtained by additive_shap()
are expected to
match those of permshap()
and kernelshap()
as long as their background
data equals the full training data (which is typically not feasible).
An object of class "kernelshap" with the following components:
S
: matrix with SHAP values.
X
: Same as input argument X
.
baseline
: The baseline.
exact
: TRUE
.
txt
: Summary text.
predictions
: Vector with predictions of X
on the scale of "terms".
algorithm
: "additive_shap".
# MODEL ONE: Linear regression fit <- lm(Sepal.Length ~ ., data = iris) s <- additive_shap(fit, head(iris)) s # MODEL TWO: More complicated (but not very clever) formula fit <- lm( Sepal.Length ~ poly(Sepal.Width, 2) + log(Petal.Length) + log(Sepal.Width), data = iris ) s_add <- additive_shap(fit, head(iris)) s_add # Equals kernelshap()/permshap() when background data is full training data s_kernel <- kernelshap( fit, head(iris[c("Sepal.Width", "Petal.Length")]), bg_X = iris ) all.equal(s_add$S, s_kernel$S)
# MODEL ONE: Linear regression fit <- lm(Sepal.Length ~ ., data = iris) s <- additive_shap(fit, head(iris)) s # MODEL TWO: More complicated (but not very clever) formula fit <- lm( Sepal.Length ~ poly(Sepal.Width, 2) + log(Petal.Length) + log(Sepal.Width), data = iris ) s_add <- additive_shap(fit, head(iris)) s_add # Equals kernelshap()/permshap() when background data is full training data s_kernel <- kernelshap( fit, head(iris[c("Sepal.Width", "Petal.Length")]), bg_X = iris ) all.equal(s_add$S, s_kernel$S)
Is object of class "kernelshap"?
is.kernelshap(object)
is.kernelshap(object)
object |
An R object. |
TRUE
if object
is of class "kernelshap", and FALSE
otherwise.
fit <- lm(Sepal.Length ~ ., data = iris) s <- kernelshap(fit, iris[1:2, -1], bg_X = iris[, -1]) is.kernelshap(s) is.kernelshap("a")
fit <- lm(Sepal.Length ~ ., data = iris) s <- kernelshap(fit, iris[1:2, -1], bg_X = iris[, -1]) is.kernelshap(s) is.kernelshap("a")
Efficient implementation of Kernel SHAP, see Lundberg and Lee (2017), and
Covert and Lee (2021), abbreviated by CL21.
For up to features, the resulting Kernel SHAP values are exact regarding
the selected background data. For larger
, an almost exact
hybrid algorithm combining exact calculations and iterative sampling is used,
see Details.
Note that (exact) Kernel SHAP is only an approximation of (exact) permutation SHAP.
Thus, for up to eight features, we recommend permshap()
. For more features,
permshap()
is slow compared the optimized hybrid strategy of our Kernel SHAP
implementation.
kernelshap(object, ...) ## Default S3 method: kernelshap( object, X, bg_X = NULL, pred_fun = stats::predict, feature_names = colnames(X), bg_w = NULL, bg_n = 200L, exact = length(feature_names) <= 8L, hybrid_degree = 1L + length(feature_names) %in% 4:16, paired_sampling = TRUE, m = 2L * length(feature_names) * (1L + 3L * (hybrid_degree == 0L)), tol = 0.005, max_iter = 100L, parallel = FALSE, parallel_args = NULL, verbose = TRUE, ... ) ## S3 method for class 'ranger' kernelshap( object, X, bg_X = NULL, pred_fun = NULL, feature_names = colnames(X), bg_w = NULL, bg_n = 200L, exact = length(feature_names) <= 8L, hybrid_degree = 1L + length(feature_names) %in% 4:16, paired_sampling = TRUE, m = 2L * length(feature_names) * (1L + 3L * (hybrid_degree == 0L)), tol = 0.005, max_iter = 100L, parallel = FALSE, parallel_args = NULL, verbose = TRUE, survival = c("chf", "prob"), ... )
kernelshap(object, ...) ## Default S3 method: kernelshap( object, X, bg_X = NULL, pred_fun = stats::predict, feature_names = colnames(X), bg_w = NULL, bg_n = 200L, exact = length(feature_names) <= 8L, hybrid_degree = 1L + length(feature_names) %in% 4:16, paired_sampling = TRUE, m = 2L * length(feature_names) * (1L + 3L * (hybrid_degree == 0L)), tol = 0.005, max_iter = 100L, parallel = FALSE, parallel_args = NULL, verbose = TRUE, ... ) ## S3 method for class 'ranger' kernelshap( object, X, bg_X = NULL, pred_fun = NULL, feature_names = colnames(X), bg_w = NULL, bg_n = 200L, exact = length(feature_names) <= 8L, hybrid_degree = 1L + length(feature_names) %in% 4:16, paired_sampling = TRUE, m = 2L * length(feature_names) * (1L + 3L * (hybrid_degree == 0L)), tol = 0.005, max_iter = 100L, parallel = FALSE, parallel_args = NULL, verbose = TRUE, survival = c("chf", "prob"), ... )
object |
Fitted model object. |
... |
Additional arguments passed to |
X |
|
bg_X |
Background data used to integrate out "switched off" features,
often a subset of the training data (typically 50 to 500 rows).
In cases with a natural "off" value (like MNIST digits),
this can also be a single row with all values set to the off value.
If no |
pred_fun |
Prediction function of the form |
feature_names |
Optional vector of column names in |
bg_w |
Optional vector of case weights for each row of |
bg_n |
If |
exact |
If |
hybrid_degree |
Integer controlling the exactness of the hybrid strategy. For
|
paired_sampling |
Logical flag indicating whether to do the sampling in a paired
manner. This means that with every on-off vector |
m |
Even number of on-off vectors sampled during one iteration.
The default is |
tol |
Tolerance determining when to stop. Following CL21, the algorithm keeps
iterating until |
max_iter |
If the stopping criterion (see |
parallel |
If |
parallel_args |
Named list of arguments passed to |
verbose |
Set to |
survival |
Should cumulative hazards ("chf", default) or survival
probabilities ("prob") per time be predicted? Only in |
The pure iterative Kernel SHAP sampling as in Covert and Lee (2021) works like this:
A binary "on-off" vector is drawn from
such that its sum follows the SHAP Kernel weight distribution
(normalized to the range
).
For each with
, the
-th column of the
original background data is replaced by the corresponding feature value
of the observation to be explained.
The average prediction on the data of Step 2 is calculated, and the
average prediction
on the background data is subtracted.
Steps 1 to 3 are repeated times. This produces a binary
matrix
(each row equals one of the
) and a vector
of
shifted predictions.
is regressed onto
under the constraint that the sum of the
coefficients equals
, where
is the prediction of the
observation to be explained. The resulting coefficients are the Kernel SHAP values.
This is repeated multiple times until convergence, see CL21 for details.
A drawback of this strategy is that many (at least 75%) of the vectors will
have
, producing many duplicates. Similarly, at least 92%
of the mass will be used for the
possible vectors with
.
This inefficiency can be fixed by a hybrid strategy, combining exact calculations
with sampling.
The hybrid algorithm has two steps:
Step 1 (exact part): There are different on-off vectors
with
, covering a large proportion of the Kernel SHAP
distribution. The degree 1 hybrid will list those vectors and use them according
to their weights in the upcoming calculations. Depending on
, we can also go
a step further to a degree 2 hybrid by adding all
vectors with
to the process etc. The necessary predictions are
obtained along with other calculations similar to those described in CL21.
Step 2 (sampling part): The remaining weight is filled by sampling vectors z according to Kernel SHAP weights renormalized to the values not yet covered by Step 1. Together with the results from Step 1 - correctly weighted - this now forms a complete iteration as in CL21. The difference is that most mass is covered by exact calculations. Afterwards, the algorithm iterates until convergence. The output of Step 1 is reused in every iteration, leading to an extremely efficient strategy.
If is sufficiently small, all possible
on-off vectors
can be
evaluated. In this case, no sampling is required and the algorithm returns exact
Kernel SHAP values with respect to the given background data.
Since
kernelshap()
calculates predictions on data with rows
(
is the background data size and
the number of
vectors),
should not be much higher than 10 for exact calculations.
For similar reasons, degree 2 hybrids should not use
much larger than 40.
An object of class "kernelshap" with the following components:
S
: matrix with SHAP values or, if the model output has
dimension
, a list of
such matrices.
X
: Same as input argument X
.
baseline
: Vector of length K representing the average prediction on the
background data.
bg_X
: The background data.
bg_w
: The background case weights.
SE
: Standard errors corresponding to S
(and organized like S
).
n_iter
: Integer vector of length n providing the number of iterations
per row of X
.
converged
: Logical vector of length n indicating convergence per row of X
.
m
: Integer providing the effective number of sampled on-off vectors used
per iteration.
m_exact
: Integer providing the effective number of exact on-off vectors used
per iteration.
prop_exact
: Proportion of the Kernel SHAP weight distribution covered by
exact calculations.
exact
: Logical flag indicating whether calculations are exact or not.
txt
: Summary text.
predictions
: matrix with predictions of
X
.
algorithm
: "kernelshap".
kernelshap(default)
: Default Kernel SHAP method.
kernelshap(ranger)
: Kernel SHAP method for "ranger" models, see Readme for an example.
Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017.
Ian Covert and Su-In Lee. Improving KernelSHAP: Practical Shapley Value Estimation Using Linear Regression. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:3457-3465, 2021.
# MODEL ONE: Linear regression fit <- lm(Sepal.Length ~ ., data = iris) # Select rows to explain (only feature columns) X_explain <- iris[-1] # Calculate SHAP values s <- kernelshap(fit, X_explain) s # MODEL TWO: Multi-response linear regression fit <- lm(as.matrix(iris[, 1:2]) ~ Petal.Length + Petal.Width + Species, data = iris) s <- kernelshap(fit, iris[3:5]) s # Note 1: Feature columns can also be selected 'feature_names' # Note 2: Especially when X is small, pass a sufficiently large background data bg_X s <- kernelshap( fit, iris[1:4, ], bg_X = iris, feature_names = c("Petal.Length", "Petal.Width", "Species") ) s
# MODEL ONE: Linear regression fit <- lm(Sepal.Length ~ ., data = iris) # Select rows to explain (only feature columns) X_explain <- iris[-1] # Calculate SHAP values s <- kernelshap(fit, X_explain) s # MODEL TWO: Multi-response linear regression fit <- lm(as.matrix(iris[, 1:2]) ~ Petal.Length + Petal.Width + Species, data = iris) s <- kernelshap(fit, iris[3:5]) s # Note 1: Feature columns can also be selected 'feature_names' # Note 2: Especially when X is small, pass a sufficiently large background data bg_X s <- kernelshap( fit, iris[1:4, ], bg_X = iris, feature_names = c("Petal.Length", "Petal.Width", "Species") ) s
Exact permutation SHAP algorithm with respect to a background dataset,
see Strumbelj and Kononenko. The function works for up to 14 features.
For more than eight features, we recommend kernelshap()
due to its higher speed.
permshap(object, ...) ## Default S3 method: permshap( object, X, bg_X = NULL, pred_fun = stats::predict, feature_names = colnames(X), bg_w = NULL, bg_n = 200L, parallel = FALSE, parallel_args = NULL, verbose = TRUE, ... ) ## S3 method for class 'ranger' permshap( object, X, bg_X = NULL, pred_fun = NULL, feature_names = colnames(X), bg_w = NULL, bg_n = 200L, parallel = FALSE, parallel_args = NULL, verbose = TRUE, survival = c("chf", "prob"), ... )
permshap(object, ...) ## Default S3 method: permshap( object, X, bg_X = NULL, pred_fun = stats::predict, feature_names = colnames(X), bg_w = NULL, bg_n = 200L, parallel = FALSE, parallel_args = NULL, verbose = TRUE, ... ) ## S3 method for class 'ranger' permshap( object, X, bg_X = NULL, pred_fun = NULL, feature_names = colnames(X), bg_w = NULL, bg_n = 200L, parallel = FALSE, parallel_args = NULL, verbose = TRUE, survival = c("chf", "prob"), ... )
object |
Fitted model object. |
... |
Additional arguments passed to |
X |
|
bg_X |
Background data used to integrate out "switched off" features,
often a subset of the training data (typically 50 to 500 rows).
In cases with a natural "off" value (like MNIST digits),
this can also be a single row with all values set to the off value.
If no |
pred_fun |
Prediction function of the form |
feature_names |
Optional vector of column names in |
bg_w |
Optional vector of case weights for each row of |
bg_n |
If |
parallel |
If |
parallel_args |
Named list of arguments passed to |
verbose |
Set to |
survival |
Should cumulative hazards ("chf", default) or survival
probabilities ("prob") per time be predicted? Only in |
An object of class "kernelshap" with the following components:
S
: matrix with SHAP values or, if the model output has
dimension
, a list of
such matrices.
X
: Same as input argument X
.
baseline
: Vector of length K representing the average prediction on the
background data.
bg_X
: The background data.
bg_w
: The background case weights.
m_exact
: Integer providing the effective number of exact on-off vectors used.
exact
: Logical flag indicating whether calculations are exact or not
(currently always TRUE
).
txt
: Summary text.
predictions
: matrix with predictions of
X
.
algorithm
: "permshap".
permshap(default)
: Default permutation SHAP method.
permshap(ranger)
: Permutation SHAP method for "ranger" models, see Readme for an example.
Erik Strumbelj and Igor Kononenko. Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems 41, 2014.
# MODEL ONE: Linear regression fit <- lm(Sepal.Length ~ ., data = iris) # Select rows to explain (only feature columns) X_explain <- iris[-1] # Calculate SHAP values s <- permshap(fit, X_explain) s # MODEL TWO: Multi-response linear regression fit <- lm(as.matrix(iris[, 1:2]) ~ Petal.Length + Petal.Width + Species, data = iris) s <- permshap(fit, iris[3:5]) s # Note 1: Feature columns can also be selected 'feature_names' # Note 2: Especially when X is small, pass a sufficiently large background data bg_X s <- permshap( fit, iris[1:4, ], bg_X = iris, feature_names = c("Petal.Length", "Petal.Width", "Species") ) s
# MODEL ONE: Linear regression fit <- lm(Sepal.Length ~ ., data = iris) # Select rows to explain (only feature columns) X_explain <- iris[-1] # Calculate SHAP values s <- permshap(fit, X_explain) s # MODEL TWO: Multi-response linear regression fit <- lm(as.matrix(iris[, 1:2]) ~ Petal.Length + Petal.Width + Species, data = iris) s <- permshap(fit, iris[3:5]) s # Note 1: Feature columns can also be selected 'feature_names' # Note 2: Especially when X is small, pass a sufficiently large background data bg_X s <- permshap( fit, iris[1:4, ], bg_X = iris, feature_names = c("Petal.Length", "Petal.Width", "Species") ) s
Prints "kernelshap" Object
## S3 method for class 'kernelshap' print(x, n = 2L, ...)
## S3 method for class 'kernelshap' print(x, n = 2L, ...)
x |
An object of class "kernelshap". |
n |
Maximum number of rows of SHAP values to print. |
... |
Further arguments passed from other methods. |
Invisibly, the input is returned.
fit <- lm(Sepal.Length ~ ., data = iris) s <- kernelshap(fit, iris[1:3, -1], bg_X = iris[, -1]) s
fit <- lm(Sepal.Length ~ ., data = iris) s <- kernelshap(fit, iris[1:3, -1], bg_X = iris[, -1]) s
Summarizes "kernelshap" Object
## S3 method for class 'kernelshap' summary(object, compact = FALSE, n = 2L, ...)
## S3 method for class 'kernelshap' summary(object, compact = FALSE, n = 2L, ...)
object |
An object of class "kernelshap". |
compact |
Set to |
n |
Maximum number of rows of SHAP values etc. to print. |
... |
Further arguments passed from other methods. |
Invisibly, the input is returned.
fit <- lm(Sepal.Length ~ ., data = iris) s <- kernelshap(fit, iris[1:3, -1], bg_X = iris[, -1]) summary(s)
fit <- lm(Sepal.Length ~ ., data = iris) s <- kernelshap(fit, iris[1:3, -1], bg_X = iris[, -1]) summary(s)