The survex
package
strives to include the functionality for the automatic creation of
explainers for as many survival models as possible. However, it is
impossible to include them all because of the great variety of available
packages and models. We provide the functionality to create an explainer
for any survival model manually.
In the best case, creating an explainer for your desired model is
already implemented. This means that everything can be extracted from
the model object (sometimes, you need to set additional parameters). For
example, this is the case for the proportional hazards model from the
survival
package:
library(survex)
library(survival)
cph <- coxph(Surv(time, status) ~ ., data = veteran, model = TRUE, x = TRUE)
auto_cph_explainer <- explain(cph)
#> Preparation of a new explainer is initiated
#> -> model label : coxph ( [33m default [39m )
#> -> data : 137 rows 6 cols ( extracted from the model )
#> -> target variable : 137 values ( 128 events and 9 censored , censoring rate = 0.066 ) ( extracted from the model )
#> -> times : 50 unique time points , min = 1.5 , median survival time = 80 , max = 999
#> -> times : ( generated from y as uniformly distributed survival quantiles based on Kaplan-Meier estimator )
#> -> predict function : predict.coxph with type = 'risk' will be used ( [33m default [39m )
#> -> predict survival function : predictSurvProb.coxph will be used ( [33m default [39m )
#> -> predict cumulative hazard function : -log(predict_survival_function) will be used ( [33m default [39m )
#> -> model_info : package survival , ver. 3.7.0 , task survival ( [33m default [39m )
#> A new explainer has been created!
It can be seen that the only required parameter of the
explain()
function is the proportional hazards model
itself. However, we needed to set model = TRUE, x = TRUE
while creating it. If you forget to set these arguments, an error will
be shown.
The next base case is when all types of predictions can be made using
your desired model. It is then possible to manually create an explainer
using the explain_survival()
function. Let’s look at an
example - how to set all parameters of a coxph
explainer by
hand:
cph <- coxph(Surv(time, status) ~ ., data=veteran)
# data must not include the target columns
veteran_data <- veteran[, -c(3,4)]
veteran_y <- Surv(veteran$time, veteran$status)
# set prediction functions of the required format
risk_pred <- function(model, newdata) predict(model, newdata, type = "risk")
surv_pred <- function(model, newdata, times) pec::predictSurvProb(model, newdata, times)
chf_pred <- function(model, newdata, times) -log(surv_pred(model, newdata, times))
manual_cph_explainer <- explain_survival(model = cph,
data = veteran_data,
y = veteran_y,
predict_function = risk_pred,
predict_survival_function = surv_pred,
predict_cumulative_hazard_function = chf_pred,
label="manual coxph")
#> Preparation of a new explainer is initiated
#> -> model label : manual coxph
#> -> data : 137 rows 6 cols
#> -> target variable : 137 values ( 128 events and 9 censored )
#> -> times : 50 unique time points , min = 1.5 , median survival time = 80 , max = 999
#> -> times : ( generated from y as uniformly distributed survival quantiles based on Kaplan-Meier estimator )
#> -> predict function : risk_pred
#> -> predict survival function : surv_pred
#> -> predict cumulative hazard function : chf_pred
#> -> model_info : package survival , ver. 3.7.0 , task survival ( [33m default [39m )
#> A new explainer has been created!
Sometimes, the survival model provides only one type of prediction, for example, the survival function. This package provides helpful utilities for converting between types of prediction and standardizing them for explainer creation.
For example, the random survival forest from the
randomForestSRC
package only provides the survival function
predictions at the unique times from the training data set
(predict.rsfrc()$survival
). Let’s try to create an
explainer for it using the utility function for converting this type of
prediction to a step function that can be used for prediction at any
time points.
surv_pred_rsf <- transform_to_stepfunction(predict,
type="survival",
prediction_element = "survival",
times_element = "time.interest")
Because the predict.rsfrc()
returns a list with the
times at which the survival function was evaluated, we provide the name
of the list element which contains the prediction and the evaluation
times. If in your case, the prediction function returns only a matrix,
you can pass the times at which the predictions were evaluated in the
eval_times
argument.
We could use the same utility to get the cumulative hazard predictions (commented code below). Still, to demonstrate another function, we make use of the mathematical relationship between the survival function (Ŝ) and the cumulative hazard function (Ĥ), i.e., Ĥ = exp (−Ŝ).
# would also work
# chf_pred_rsf <- transform_to_stepfunction(predict,
# type="chf",
# prediction_element = "chf",
# times_element = "time.interest")
chf_pred_rsf <- function(model, newdata, times) {
survival_to_cumulative_hazard(surv_pred_rsf(model, newdata, times))
}
The reverse utility (cumulative_hazard_to_survival()
)
also exists.
If no risk prediction is provided for your model, you can use a utility to sum the cumulative hazard function for each observation to achieve risk scores. This approach is recommended by Ishwaran et al. (1) and Sonabend et al. (2).
Now that we have all predictions prepared, we can finally create the explainer:
library(randomForestSRC)
#>
#> randomForestSRC 3.3.1
#>
#> Type rfsrc.news() to see new features, changes, and bug fixes.
#>
rsf <- rfsrc(Surv(time, status) ~ ., data = veteran)
manual_rsf_explainer <- explain_survival(model = rsf,
data = veteran_data,
y = veteran_y,
predict_function = risk_pred_rsf,
predict_survival_function = surv_pred_rsf,
predict_cumulative_hazard_function = chf_pred_rsf,
label = "manual rsf")
#> Preparation of a new explainer is initiated
#> -> model label : manual rsf
#> -> data : 137 rows 6 cols
#> -> target variable : 137 values ( 128 events and 9 censored )
#> -> times : 50 unique time points , min = 1.5 , median survival time = 80 , max = 999
#> -> times : ( generated from y as uniformly distributed survival quantiles based on Kaplan-Meier estimator )
#> -> predict function : risk_pred_rsf
#> -> predict survival function : surv_pred_rsf
#> -> predict cumulative hazard function : chf_pred_rsf
#> -> model_info : package randomForestSRC , ver. 3.3.1 , task survival ( [33m default [39m )
#> A new explainer has been created!