Package 'modelStudio' reference manual

Title:	Interactive Studio for Explanatory Model Analysis
Description:	Automate the explanatory analysis of machine learning predictive models. Generate advanced interactive model explanations in the form of a serverless HTML site with only one line of code. This tool is model-agnostic, therefore compatible with most of the black-box predictive models and frameworks. The main function computes various (instance and model-level) explanations and produces a customisable dashboard, which consists of multiple panels for plots with their short descriptions. It is possible to easily save the dashboard and share it with others. 'modelStudio' facilitates the process of Interactive Explanatory Model Analysis introduced in Baniecki et al. (2023) <doi:10.1007/s10618-023-00924-w>.
Authors:	Hubert Baniecki [aut, cre] , Przemyslaw Biecek [aut] , Piotr Piatyszek [ctb]
Maintainer:	Hubert Baniecki <[email protected]>
License:	GPL-3
Version:	3.1.2.9000
Built:	2025-03-23 04:40:02 UTC
Source:	https://github.com/modeloriented/modelstudio

World Happiness Report

Description

Datasets happiness_train and happiness_test are real data from the World Happiness Reports. Happiness is scored according to economic production, social support, etc. happiness_train accumulates the data from years 2015-2018, while happiness_test is the data from the year 2019, which imitates the out-of-time validation.

Usage

data(happiness_train); data(happiness_test)
data(happiness_train); data(happiness_test)

Format

happiness_train: a data frame with 625 rows and 7 columns, happiness_test: a data frame with 156 rows and 7 columns

Details

Source: World Happiness Report at Kaggle.com

The following columns: GDP per Capita, Social Support, Life Expectancy, Freedom, Generosity, Corruption describe the extent to which these factors contribute in evaluating the happiness in each country. Variables:

score - target variable, continuous value between 0 and 10 (regression)
gdp_per_capita
social_support
healthy_life_expectancy
freedom_life_choices
generosity
perceptions_of_corruption

Interactive Studio for Explanatory Model Analysis

Description

This function computes various (instance and dataset level) model explanations and produces a customisable dashboard, which consists of multiple panels for plots with their short descriptions. Easily save the dashboard and share it with others. Tools for Explanatory Model Analysis unite with tools for Exploratory Data Analysis to give a broad overview of the model behavior.

The extensive documentation covers:

Function parameters description - perks and features
Framework and model compatibility - R & Python examples
Theoretical introduction to the plots - Explanatory Model Analysis: Explore, Explain, and Examine Predictive Models

Displayed variable can be changed by clicking on the bars of plots or with the first dropdown list, and observation can be changed with the second dropdown list. The dashboard gathers useful, but not sensitive, information about how it is being used (e.g. computation length, package version, dashboard dimensions). This is for the development purposes only and can be blocked by setting telemetry to FALSE.

Usage

modelStudio(explainer, ...)

## S3 method for class 'explainer'
modelStudio(
  explainer,
  new_observation = NULL,
  new_observation_y = NULL,
  new_observation_n = 3,
  facet_dim = c(2, 2),
  time = 500,
  max_features = 10,
  max_features_fi = NULL,
  N = 300,
  N_fi = N * 10,
  N_sv = N * 3,
  B = 10,
  B_fi = B,
  eda = TRUE,
  open_plots = c("fi"),
  show_info = TRUE,
  parallel = FALSE,
  options = ms_options(),
  viewer = "external",
  widget_id = NULL,
  license = NULL,
  telemetry = TRUE,
  max_vars = NULL,
  verbose = NULL,
  ...
)
modelStudio(explainer, ...)

## S3 method for class 'explainer'
modelStudio(
  explainer,
  new_observation = NULL,
  new_observation_y = NULL,
  new_observation_n = 3,
  facet_dim = c(2, 2),
  time = 500,
  max_features = 10,
  max_features_fi = NULL,
  N = 300,
  N_fi = N * 10,
  N_sv = N * 3,
  B = 10,
  B_fi = B,
  eda = TRUE,
  open_plots = c("fi"),
  show_info = TRUE,
  parallel = FALSE,
  options = ms_options(),
  viewer = "external",
  widget_id = NULL,
  license = NULL,
  telemetry = TRUE,
  max_vars = NULL,
  verbose = NULL,
  ...
)

Arguments

`explainer`	An `explainer` created with `DALEX::explain()`.
`...`	Other parameters.
`new_observation`	New observations with columns that correspond to variables used in the model.
`new_observation_y`	True label for `new_observation` (optional).
`new_observation_n`	Number of observations to be taken from the `explainer$data` if `new_observation = NULL`. See vignette
`facet_dim`	Dimensions of the grid. Default is `c(2,2)`.
`time`	Time in ms. Set the animation length. Default is `500`.
`max_features`	Maximum number of features to be included in BD, SV, and FI plots. Default is `10`.
`max_features_fi`	Maximum number of features to be included in FI plot. Default is `max_features`.
`N`	Number of observations used for the calculation of PD and AD. Default is `300`. See vignette
`N_fi`	Number of observations used for the calculation of FI. Default is `10*N`.
`N_sv`	Number of observations used for the calculation of SV. Default is `3*N`.
`B`	Number of permutation rounds used for calculation of SV. Default is `10`. See vignette
`B_fi`	Number of permutation rounds used for calculation of FI. Default is `B`.
`eda`	Compute EDA plots and Residuals vs Feature plot, which adds the data to the dashboard. Default is `TRUE`.
`open_plots`	A vector listing plots to be initially opened (and on which positions). Default is `c("fi")`.
`show_info`	Verbose a progress on the console. Default is `TRUE`.
`parallel`	Speed up the computation using `parallelMap::parallelMap()`. See vignette. This might interfere with showing progress using `show_info`.
`options`	Customize `modelStudio`. See `ms_options` and vignette.
`viewer`	Default is `external` to display in an external RStudio window. Use `browser` to display in an external browser or `internal` to use the RStudio internal viewer pane for output.
`widget_id`	Use an explicit element ID for the widget (rather than an automatically generated one). Useful e.g. when using `modelStudio` with Shiny. See vignette.
`license`	Path to the file containing the license (`con` parameter passed to `readLines()`). It can be used e.g. to include the license for `explainer$data` as a comment in the source of `.html` output file.
`telemetry`	The dashboard gathers useful, but not sensitive, information about how it is being used (e.g. computation length, package version, dashboard dimensions). This is for the development purposes only and can be blocked by setting `telemetry` to `FALSE`.
`max_vars`	An alias for `max_features`. If provided, it will override the value.
`verbose`	An alias for `show_info`. If provided, it will override the value.

Value

An object of the r2d3, htmlwidget, modelStudio class.

References

The input object is implemented in DALEX
Feature Importance, Ceteris Paribus, Partial Dependence and Accumulated Dependence explanations are implemented in ingredients
Break Down and Shapley Values explanations are implemented in iBreakDown

Examples

library("DALEX")
library("modelStudio")

#:# ex1 classification on 'titanic' data

# fit a model
model_titanic <- glm(survived ~., data = titanic_imputed, family = "binomial")

# create an explainer for the model
explainer_titanic <- explain(model_titanic,
                             data = titanic_imputed,
                             y = titanic_imputed$survived,
                             label = "Titanic GLM")

# pick observations
new_observations <- titanic_imputed[1:2,]
rownames(new_observations) <- c("Lucas","James")

# make a studio for the model
modelStudio(explainer_titanic,
            new_observations,
            N = 200,  B = 5) # faster example



#:# ex2 regression on 'apartments' data
if (requireNamespace("ranger", quietly=TRUE)) {
  library("ranger")
  model_apartments <- ranger(m2.price ~. ,data = apartments)

  explainer_apartments <- explain(model_apartments,
                                  data = apartments,
                                  y = apartments$m2.price)

  new_apartments <- apartments[1:2,]
  rownames(new_apartments) <- c("ap1","ap2")

  # change dashboard dimensions and animation length
  modelStudio(explainer_apartments,
              new_apartments,
              facet_dim = c(2, 3),
              time = 800)

  # add information about true labels
  modelStudio(explainer_apartments,
              new_apartments,
              new_observation_y = new_apartments$m2.price)

  # don't compute EDA plots
  modelStudio(explainer_apartments,
              eda = FALSE)
}

#:# ex3 xgboost model on 'HR' dataset
if (requireNamespace("xgboost", quietly=TRUE)) {
  library("xgboost")
  HR_matrix <- model.matrix(status == "fired" ~ . -1, HR)

  # fit a model
  xgb_matrix <- xgb.DMatrix(HR_matrix, label = HR$status == "fired")
  params <- list(max_depth = 3, objective = "binary:logistic", eval_metric = "auc")
  model_HR <- xgb.train(params, xgb_matrix, nrounds = 300)

  # create an explainer for the model
  explainer_HR <- explain(model_HR,
                          data = HR_matrix,
                          y = HR$status == "fired",
                          type = "classification",
                          label = "xgboost")

  # pick observations
  new_observation <- HR_matrix[1:2, , drop=FALSE]
  rownames(new_observation) <- c("id1", "id2")

  # make a studio for the model
  modelStudio(explainer_HR,
              new_observation)
}


library("DALEX")
library("modelStudio")

#:# ex1 classification on 'titanic' data

# fit a model
model_titanic <- glm(survived ~., data = titanic_imputed, family = "binomial")

# create an explainer for the model
explainer_titanic <- explain(model_titanic,
                             data = titanic_imputed,
                             y = titanic_imputed$survived,
                             label = "Titanic GLM")

# pick observations
new_observations <- titanic_imputed[1:2,]
rownames(new_observations) <- c("Lucas","James")

# make a studio for the model
modelStudio(explainer_titanic,
            new_observations,
            N = 200,  B = 5) # faster example



#:# ex2 regression on 'apartments' data
if (requireNamespace("ranger", quietly=TRUE)) {
  library("ranger")
  model_apartments <- ranger(m2.price ~. ,data = apartments)

  explainer_apartments <- explain(model_apartments,
                                  data = apartments,
                                  y = apartments$m2.price)

  new_apartments <- apartments[1:2,]
  rownames(new_apartments) <- c("ap1","ap2")

  # change dashboard dimensions and animation length
  modelStudio(explainer_apartments,
              new_apartments,
              facet_dim = c(2, 3),
              time = 800)

  # add information about true labels
  modelStudio(explainer_apartments,
              new_apartments,
              new_observation_y = new_apartments$m2.price)

  # don't compute EDA plots
  modelStudio(explainer_apartments,
              eda = FALSE)
}

#:# ex3 xgboost model on 'HR' dataset
if (requireNamespace("xgboost", quietly=TRUE)) {
  library("xgboost")
  HR_matrix <- model.matrix(status == "fired" ~ . -1, HR)

  # fit a model
  xgb_matrix <- xgb.DMatrix(HR_matrix, label = HR$status == "fired")
  params <- list(max_depth = 3, objective = "binary:logistic", eval_metric = "auc")
  model_HR <- xgb.train(params, xgb_matrix, nrounds = 300)

  # create an explainer for the model
  explainer_HR <- explain(model_HR,
                          data = HR_matrix,
                          y = HR$status == "fired",
                          type = "classification",
                          label = "xgboost")

  # pick observations
  new_observation <- HR_matrix[1:2, , drop=FALSE]
  rownames(new_observation) <- c("id1", "id2")

  # make a studio for the model
  modelStudio(explainer_HR,
              new_observation)
}

Merge the observations of modelStudio objects

Description

This function merges local explanations from multiple modelStudio objects into one.

Usage

ms_merge_observations(...)
ms_merge_observations(...)

Arguments

...

modelStudio objects created with modelStudio().

Value

An object of the r2d3, htmlwidget, modelStudio class.

References

The input object is implemented in DALEX
Feature Importance, Ceteris Paribus, Partial Dependence and Accumulated Dependence explanations are implemented in ingredients
Break Down and Shapley Values explanations are implemented in iBreakDown

Examples


library("DALEX")
library("modelStudio")

# fit a model
model_happiness <- glm(score ~., data = happiness_train)

# create an explainer for the model
explainer_happiness <- explain(model_happiness,
                               data = happiness_test,
                               y = happiness_test$score)

# make studios for the model
ms1 <- modelStudio(explainer_happiness,
                   N = 200,  B = 5)

ms2 <- modelStudio(explainer_happiness,
                   new_observation = head(happiness_test, 3),
                   N = 200,  B = 5)

# merge 
ms <- ms_merge_observations(ms1, ms2)
ms


library("DALEX")
library("modelStudio")

# fit a model
model_happiness <- glm(score ~., data = happiness_train)

# create an explainer for the model
explainer_happiness <- explain(model_happiness,
                               data = happiness_test,
                               y = happiness_test$score)

# make studios for the model
ms1 <- modelStudio(explainer_happiness,
                   N = 200,  B = 5)

ms2 <- modelStudio(explainer_happiness,
                   new_observation = head(happiness_test, 3),
                   N = 200,  B = 5)

# merge 
ms <- ms_merge_observations(ms1, ms2)
ms

Modify default options and pass them to modelStudio

Description

This function returns default options for modelStudio. It is possible to modify values of this list and pass it to the options parameter in the main function. WARNING: Editing default options may cause unintended behavior.

Usage

ms_options(...)
ms_options(...)

Arguments

...

Options to change in the form option_name = value.

Value

list of options for modelStudio.

Options

Main options:

scale_plot: TRUE Makes every plot the same height, ignores bar_width.
show_boxplot: TRUE Display boxplots in Feature Importance and Shapley Values plots.
show_subtitle: TRUE Should the subtitle be displayed?
subtitle: label parameter from explainer.
ms_title: Title of the dashboard.
ms_subtitle: Subtitle of the dashboard (makes space between the title and line).
ms_margin_*: Dashboard margins. Change margin_top for more ms_subtitle space.
margin_*: Plot margins. Change margin_left for longer/shorter axis labels.
w: 420 in px. Inner plot width.
h: 280 in px. Inner plot height.
bar_width: 16 in px. Default width of bars for all plots, ignored when scale_plot = TRUE.
line_size: 2 in px. Default width of lines for all plots.
point_size: 3 in px. Default point radius for all plots.
[bar,line,point]_color: [#46bac2,#46bac2,#371ea3]
positive_color: #8bdcbe for Break Down and Shapley Values bars.
negative_color: #f05a71 for Break Down and Shapley Values bars.
default_color: #371ea3 for Break Down bar and highlighted line.

Plot-specific options:

** is a two letter code unique to each plot, might be one of [bd,sv,cp,fi,pd,ad,rv,fd,tv,at].

**_title: Plot-specific title. Default varies.
**_subtitle: Plot-specific subtitle. Default is subtitle.
**_axis_title: Plot-specific axis title. Default varies.
**_bar_width: Plot-specific width of bars. Default is bar_width, ignored when scale_plot = TRUE.
**_line_size: Plot-specific width of lines. Default is line_size.
**_point_size: Plot-specific point radius. Default is point_size.
**_*_color: Plot-specific [bar,line,point] color. Default is [bar,line,point]_color.

References

The input object is implemented in DALEX
Feature Importance, Ceteris Paribus, Partial Dependence and Accumulated Dependence explanations are implemented in ingredients
Break Down and Shapley Values explanations are implemented in iBreakDown

Examples

library("DALEX")
library("modelStudio")

# fit a model
model_apartments <- glm(m2.price ~. , data = apartments)

# create an explainer for the model
explainer_apartments <- explain(model_apartments,
                                data = apartments,
                                y = apartments$m2.price)

# pick observations
new_observation <- apartments[1:2,]
rownames(new_observation) <- c("ap1","ap2")

# modify default options
new_options <- ms_options(
  show_subtitle = TRUE,
  bd_subtitle = "Hello World",
  line_size = 5,
  point_size = 9,
  line_color = "pink",
  point_color = "purple",
  bd_positive_color = "yellow",
  bd_negative_color = "orange"
)

# make a studio for the model
modelStudio(explainer_apartments,
            new_observation,
            options = new_options,
            N = 200,  B = 5) # faster example

library("DALEX")
library("modelStudio")

# fit a model
model_apartments <- glm(m2.price ~. , data = apartments)

# create an explainer for the model
explainer_apartments <- explain(model_apartments,
                                data = apartments,
                                y = apartments$m2.price)

# pick observations
new_observation <- apartments[1:2,]
rownames(new_observation) <- c("ap1","ap2")

# modify default options
new_options <- ms_options(
  show_subtitle = TRUE,
  bd_subtitle = "Hello World",
  line_size = 5,
  point_size = 9,
  line_color = "pink",
  point_color = "purple",
  bd_positive_color = "yellow",
  bd_negative_color = "orange"
)

# make a studio for the model
modelStudio(explainer_apartments,
            new_observation,
            options = new_options,
            N = 200,  B = 5) # faster example

Update the observations of a modelStudio object

Description

This function calculates local explanations on new observations and adds them to the modelStudio object.

Usage

ms_update_observations(
  object,
  explainer,
  new_observation = NULL,
  new_observation_y = NULL,
  max_features = 10,
  B = 10,
  show_info = TRUE,
  parallel = FALSE,
  widget_id = NULL,
  overwrite = FALSE,
  ...
)
ms_update_observations(
  object,
  explainer,
  new_observation = NULL,
  new_observation_y = NULL,
  max_features = 10,
  B = 10,
  show_info = TRUE,
  parallel = FALSE,
  widget_id = NULL,
  overwrite = FALSE,
  ...
)

Arguments

`object`	A `modelStudio` created with `modelStudio()`.
`explainer`	An `explainer` created with `DALEX::explain()`.
`new_observation`	New observations with columns that correspond to variables used in the model.
`new_observation_y`	True label for `new_observation` (optional).
`max_features`	Maximum number of features to be included in BD and SV plots. Default is `10`.
`B`	Number of permutation rounds used for calculation of SV and FI. Default is `10`. See vignette
`show_info`	Verbose a progress on the console. Default is `TRUE`.
`parallel`	Speed up the computation using `parallelMap::parallelMap()`. See vignette. This might interfere with showing progress using `show_info`.
`widget_id`	Use an explicit element ID for the widget (rather than an automatically generated one). Useful e.g. when using `modelStudio` with Shiny. See vignette.
`overwrite`	Overwrite existing observations and their explanations. Default is `FALSE` which means add new observations to the existing ones.
`...`	Other parameters.

Value

An object of the r2d3, htmlwidget, modelStudio class.

References

The input object is implemented in DALEX
Feature Importance, Ceteris Paribus, Partial Dependence and Accumulated Dependence explanations are implemented in ingredients
Break Down and Shapley Values explanations are implemented in iBreakDown

Examples

library("DALEX")
library("modelStudio")

# fit a model
model_titanic <- glm(survived ~., data = titanic_imputed, family = "binomial")

# create an explainer for the model
explainer_titanic <- explain(model_titanic,
                             data = titanic_imputed,
                             y = titanic_imputed$survived)

# make a studio for the model
ms <- modelStudio(explainer_titanic,
                  N = 200,  B = 5) # faster example



# add new observations
ms <- ms_update_observations(ms,
                             explainer_titanic,
                             new_observation = titanic_imputed[100:101,],
                             new_observation_y = titanic_imputed$survived[100:101])
ms



# overwrite the observations with new ones
ms <- ms_update_observations(ms,
                             explainer_titanic,
                             new_observation = titanic_imputed[100:101,],
                             overwrite = TRUE)
ms



library("DALEX")
library("modelStudio")

# fit a model
model_titanic <- glm(survived ~., data = titanic_imputed, family = "binomial")

# create an explainer for the model
explainer_titanic <- explain(model_titanic,
                             data = titanic_imputed,
                             y = titanic_imputed$survived)

# make a studio for the model
ms <- modelStudio(explainer_titanic,
                  N = 200,  B = 5) # faster example



# add new observations
ms <- ms_update_observations(ms,
                             explainer_titanic,
                             new_observation = titanic_imputed[100:101,],
                             new_observation_y = titanic_imputed$survived[100:101])
ms



# overwrite the observations with new ones
ms <- ms_update_observations(ms,
                             explainer_titanic,
                             new_observation = titanic_imputed[100:101,],
                             overwrite = TRUE)
ms

Update the options of a modelStudio object

Description

This function updates the options of a modelStudio object. WARNING: Editing default options may cause unintended behavior.

Usage

ms_update_options(object, ...)
ms_update_options(object, ...)

Arguments

`object`	A `modelStudio` created with `modelStudio()`.
`...`	Options to change in the form `option_name = value`, e.g. `time = 0`, `facet_dim = c(1,2)`.

Value

An object of the r2d3, htmlwidget, modelStudio class.

Options

Main options:

scale_plot: TRUE Makes every plot the same height, ignores bar_width.
show_boxplot: TRUE Display boxplots in Feature Importance and Shapley Values plots.
show_subtitle: TRUE Should the subtitle be displayed?
subtitle: label parameter from explainer.
ms_title: Title of the dashboard.
ms_subtitle: Subtitle of the dashboard (makes space between the title and line).
ms_margin_*: Dashboard margins. Change margin_top for more ms_subtitle space.
margin_*: Plot margins. Change margin_left for longer/shorter axis labels.
w: 420 in px. Inner plot width.
h: 280 in px. Inner plot height.
bar_width: 16 in px. Default width of bars for all plots, ignored when scale_plot = TRUE.
line_size: 2 in px. Default width of lines for all plots.
point_size: 3 in px. Default point radius for all plots.
[bar,line,point]_color: [#46bac2,#46bac2,#371ea3]
positive_color: #8bdcbe for Break Down and Shapley Values bars.
negative_color: #f05a71 for Break Down and Shapley Values bars.
default_color: #371ea3 for Break Down bar and highlighted line.

Plot-specific options:

** is a two letter code unique to each plot, might be one of [bd,sv,cp,fi,pd,ad,rv,fd,tv,at].

**_title: Plot-specific title. Default varies.
**_subtitle: Plot-specific subtitle. Default is subtitle.
**_axis_title: Plot-specific axis title. Default varies.
**_bar_width: Plot-specific width of bars. Default is bar_width, ignored when scale_plot = TRUE.
**_line_size: Plot-specific width of lines. Default is line_size.
**_point_size: Plot-specific point radius. Default is point_size.
**_*_color: Plot-specific [bar,line,point] color. Default is [bar,line,point]_color.

References

The input object is implemented in DALEX
Feature Importance, Ceteris Paribus, Partial Dependence and Accumulated Dependence explanations are implemented in ingredients
Break Down and Shapley Values explanations are implemented in iBreakDown

Examples

library("DALEX")
library("modelStudio")

# fit a model
model_titanic <- glm(survived ~., data = titanic_imputed, family = "binomial")

# create an explainer for the model
explainer_titanic <- explain(model_titanic,
                             data = titanic_imputed,
                             y = titanic_imputed$survived)

# make a studio for the model
ms <- modelStudio(explainer_titanic,
                  N = 200,  B = 5) # faster example

# update the options
new_ms <- ms_update_options(ms,
                            time = 0,
                            facet_dim = c(1,2),
                            margin_left = 150)
new_ms

library("DALEX")
library("modelStudio")

# fit a model
model_titanic <- glm(survived ~., data = titanic_imputed, family = "binomial")

# create an explainer for the model
explainer_titanic <- explain(model_titanic,
                             data = titanic_imputed,
                             y = titanic_imputed$survived)

# make a studio for the model
ms <- modelStudio(explainer_titanic,
                  N = 200,  B = 5) # faster example

# update the options
new_ms <- ms_update_options(ms,
                            time = 0,
                            facet_dim = c(1,2),
                            margin_left = 150)
new_ms

Package 'modelStudio'

Help Index

World Happiness Report

Description

Usage

Format

Details

Interactive Studio for Explanatory Model Analysis

Description

Usage

Arguments

Value

References

See Also

Examples

Merge the observations of modelStudio objects

Description

Usage

Arguments

Value

References

See Also

Examples

Modify default options and pass them to modelStudio

Description

Usage

Arguments

Value

Options

Main options:

Plot-specific options:

References

See Also

Examples

Update the observations of a modelStudio object

Description

Usage

Arguments

Value

References

See Also

Examples

Update the options of a modelStudio object

Description

Usage

Arguments

Value

Options

Main options:

Plot-specific options:

References

See Also

Examples