Title: | Arena for the Exploration and Comparison of any ML Models |
---|---|
Description: | Generates data for challenging machine learning models in 'Arena' <https://arena.drwhy.ai> - an interactive web application. You can start the server with XAI (Explainable Artificial Intelligence) plots to be generated on-demand or precalculate and auto-upload data file beside shareable 'Arena' URL. |
Authors: | Piotr Piątyszek [aut, cre], Przemyslaw Biecek [aut] |
Maintainer: | Piotr Piątyszek <[email protected]> |
License: | GPL-3 |
Version: | 0.2.0 |
Built: | 2024-11-11 03:29:40 UTC |
Source: | https://github.com/modeloriented/arenar |
This is modified version of DALEXtra::funnel_measure
calculate_subsets_performance( explainer, score_functions = list(), nbins = 5, cutoff = 0.01, cutoff_name = "Other", factor_conversion_threshold = 7 )
calculate_subsets_performance( explainer, score_functions = list(), nbins = 5, cutoff = 0.01, cutoff_name = "Other", factor_conversion_threshold = 7 )
explainer |
Explainer created using |
score_functions |
Named list of functions named |
nbins |
Number of qunatiles (partition points) for numeric columns. In case when more than one qunatile have the same value, there will be less partition points. |
cutoff |
Threshold for categorical data. Entries less frequent than specified value will be merged into one category. |
cutoff_name |
Name for new category that arised after merging entries less frequent than |
factor_conversion_threshold |
Numeric columns with lower number of unique values than value of this parameter will be treated as factors |
Data frame with columns
Variable Name of splited variable
Label Label for variable's values subset
and one column for each score function with returned score
Creates object with class arena_live
or arena_static
depending on the first argument.
This method is always first in arenar
workflow and you should specify all plots' parameters there.
create_arena( live = FALSE, N = 500, fi_N = NULL, fi_B = 10, grid_points = 101, shap_B = 10, funnel_nbins = 5, funnel_cutoff = 0.01, funnel_factor_threshold = 7, fairness_cutoffs = seq(0.05, 0.95, 0.05), max_points_number = 150, distribution_bins = seq(5, 40, 5), enable_attributes = TRUE, enable_custom_params = TRUE, cl = NULL )
create_arena( live = FALSE, N = 500, fi_N = NULL, fi_B = 10, grid_points = 101, shap_B = 10, funnel_nbins = 5, funnel_cutoff = 0.01, funnel_factor_threshold = 7, fairness_cutoffs = seq(0.05, 0.95, 0.05), max_points_number = 150, distribution_bins = seq(5, 40, 5), enable_attributes = TRUE, enable_custom_params = TRUE, cl = NULL )
live |
Defines if arena should start live server or generate static json |
N |
number of observations used to calculate dependence profiles |
fi_N |
number of observations used in feature importance |
fi_B |
Number of permutation rounds to perform each variable in feature importance |
grid_points |
number of points for profile |
shap_B |
Numer of random paths in SHAP |
funnel_nbins |
Number of partitions for numeric columns for funnel plot |
funnel_cutoff |
Threshold for categorical data. Entries less frequent than specified value will be merged into one category in funnel plot. |
funnel_factor_threshold |
Numeric columns with lower number of unique values than value of this parameter will be treated as factors in funnel plot. |
fairness_cutoffs |
vector of available cutoff levels for fairness panel |
max_points_number |
maximum size of sample to plot scatter plots in variable against another panel |
distribution_bins |
vector of available bins count for histogram |
enable_attributes |
Switch for generating attributes of observations and variables. It is required for custom params. Attributes can increase size of static Arena. |
enable_custom_params |
Switch to allowing user to modify observations and generate plots for them. |
cl |
Cluster used to run parallel computations (Do not work in live Arena) |
Empty arena_static
or arena_live
class object.arena_static
:
explainer List of used explainers
observations_batches List of data frames added as observations
params Plots' parameters
plots_data List of generated data for plots
arena_live
:
explainer List of used explainers
observations_batches List of data frames added as observations
params Plots' parameters
timestamp Timestamp of last modification
library("DALEX") library("arenar") library("dplyr", quietly=TRUE, warn.conflicts = FALSE) # create a model model <- glm(m2.price ~ ., data=apartments) # create a DALEX explainer explainer <- DALEX::explain(model, data=apartments, y=apartments$m2.price) # prepare observations to be explained observations <- apartments[1:3, ] # rownames are used as labels for each observation rownames(observations) <- paste0(observations$construction.year, "-", observations$surface, "m2") # generate static arena for one model and 3 observations arena <- create_arena(live=FALSE) %>% push_model(explainer) %>% push_observations(observations) print(arena) if (interactive()) upload_arena(arena)
library("DALEX") library("arenar") library("dplyr", quietly=TRUE, warn.conflicts = FALSE) # create a model model <- glm(m2.price ~ ., data=apartments) # create a DALEX explainer explainer <- DALEX::explain(model, data=apartments, y=apartments$m2.price) # prepare observations to be explained observations <- apartments[1:3, ] # rownames are used as labels for each observation rownames(observations) <- paste0(observations$construction.year, "-", observations$surface, "m2") # generate static arena for one model and 3 observations arena <- create_arena(live=FALSE) %>% push_model(explainer) %>% push_observations(observations) print(arena) if (interactive()) upload_arena(arena)
Internal function for calculating Accumulated Dependence
get_accumulated_dependence(explainer, variable, params)
get_accumulated_dependence(explainer, variable, params)
explainer |
Explainer created using |
variable |
Name of variable |
params |
Params from arena object |
Plot data in Arena's format
When param_type
is not NULL, then function returns list of objects.
Each object represents one of available attribute for specified param type.
Field name
is attribute name and field values
is mapped list
of available params to list of value of this attribute for that param.
When param_type
is NULL, then function returns list with keys for
each param type and values are lists described above.
get_attributes(arena, param_type = NULL)
get_attributes(arena, param_type = NULL)
arena |
live or static arena object |
param_type |
Type of param. One of
|
List of attributes or named list of lists of attributes for each param type.
Internal function for calculating Break Down
get_break_down(explainer, observation, params)
get_break_down(explainer, observation, params)
explainer |
Explainer created using |
observation |
One row data frame observation |
params |
Params from arena object |
Plot data in Arena's format
Internal function for calculating Ceteris Paribus
get_ceteris_paribus(explainer, observation, variable, params)
get_ceteris_paribus(explainer, observation, variable, params)
explainer |
Explainer created using |
observation |
One row data frame observation |
variable |
Name of variable |
params |
Params from arena object |
Plot data in Arena's format
Generates list with attributes of a dataset
get_dataset_attributes(arena, dataset)
get_dataset_attributes(arena, dataset)
arena |
live or static arena object |
dataset |
List with following elements
|
simple list with attributes of given dataset
Function runs all plot generating methods for given dataset
get_dataset_plots(dataset, params)
get_dataset_plots(dataset, params)
dataset |
List with following elements
|
params |
Params from arena object |
list of generated plots' data
Generates list of datasets' labels
get_datasets_list(arena)
get_datasets_list(arena)
arena |
live or static arena object |
list of datasets' labels
Internal function for calculating fairness
get_fairness(explainer, variable, params)
get_fairness(explainer, variable, params)
explainer |
Explainer created using |
variable |
Name of variable |
params |
Params from arena object |
Plot data in Arena's format
Internal function for calculating feature importance
get_feature_importance(explainer, params)
get_feature_importance(explainer, params)
explainer |
Explainer created using |
params |
Params from arena object |
Plot data in Arena's format
Internal function for calculating funnel measure
get_funnel_measure(explainer, params)
get_funnel_measure(explainer, params)
explainer |
Explainer created using |
params |
Params from arena object |
Plot data in Arena's format
Function runs all plot generating methods for given explainer
get_global_plots(explainer, params)
get_global_plots(explainer, params)
explainer |
Explainer created using |
params |
Params from arena object |
list of generated plots' data
Function converts object with class arena_live
or arena_static
to object with structure accepted by Arena. See list of schemas.
get_json_structure(arena)
get_json_structure(arena)
arena |
live or static arena object |
Object for direct conversion into json
Function runs all plot generating methods for given observations
get_local_plots(explainer, observations, params)
get_local_plots(explainer, observations, params)
explainer |
Explainer created using |
observations |
Data frame of observations |
params |
Params from arena object |
list of generated plots' data
This method modify exisiting plot's data in Arena's format to show message instead of chart.
get_message_output(output, type, msg)
get_message_output(output, type, msg)
output |
existing plot data to be overwritten |
type |
type of message "info" or "error" |
msg |
message to be displayed |
Plot data in Arena's format
Internal function for calculating model performance metrics
get_metrics(explainer, params)
get_metrics(explainer, params)
explainer |
Explainer created using |
params |
Params from arena object |
Plot data in Arena's format
Generates list with attributes of a model
get_model_attributes(arena, explainer)
get_model_attributes(arena, explainer)
arena |
live or static arena object |
explainer |
Explainer created using |
simple list with attributes of given model
Generates list with attributes of an observation
get_observation_attributes(arena, observation)
get_observation_attributes(arena, observation)
arena |
live or static arena object |
observation |
One row data frame observation |
simple list with attributes of given observation
Generates list of rownames of each observation from each batch
get_observations_list(arena)
get_observations_list(arena)
arena |
live or static arena object |
list of observations' names
Internal function for calculating Partial Dependence
get_partial_dependence(explainer, variable, params)
get_partial_dependence(explainer, variable, params)
explainer |
Explainer created using |
variable |
Name of variable |
params |
Params from arena object |
Plot data in Arena's format
Internal function for calculating regression error characteristic
get_rec(explainer, params)
get_rec(explainer, params)
explainer |
Explainer created using |
params |
Params from arena object |
Plot data in Arena's format
Internal function for calculating receiver operating curve
get_roc(explainer, params)
get_roc(explainer, params)
explainer |
Explainer created using |
params |
Params from arena object |
Plot data in Arena's format
Internal function for calculating Shapley Values
get_shap_values(explainer, observation, params)
get_shap_values(explainer, observation, params)
explainer |
Explainer created using |
observation |
One row data frame observation to calculate Shapley Values |
params |
Params from arena object |
Plot data in Arena's format
Internal function for calculating subset performance
get_subsets_performance(explainer, params)
get_subsets_performance(explainer, params)
explainer |
Explainer created using |
params |
Params from arena object |
Plot data in Arena's format
Internal function for variable against another plot
get_variable_against_another(dataset, variable, params)
get_variable_against_another(dataset, variable, params)
dataset |
List with following elements
|
variable |
Name of primary variable |
params |
Params from arena object |
Plot data in Arena's format
Generates list with attributes of an variable
get_variable_attributes(arena, variable)
get_variable_attributes(arena, variable)
arena |
live or static arena object |
variable |
Name of variable |
simple list with attributes of given variable
Internal function for variable distribution
get_variable_distribution(dataset, variable, params)
get_variable_distribution(dataset, variable, params)
dataset |
List with following elements
|
variable |
Name of variable |
params |
Params from arena object |
Plot data in Arena's format
Generates list of unique variables(without target) from each explainer and dataset
get_variables_list(arena)
get_variables_list(arena)
arena |
live or static arena object |
list of variables' names
Prints live arena summary
## S3 method for class 'arena_live' print(x, ...)
## S3 method for class 'arena_live' print(x, ...)
x |
|
... |
other parameters |
None
library("DALEX") library("arenar") library("dplyr", quietly=TRUE, warn.conflicts = FALSE) # create a model model <- glm(m2.price ~ ., data=apartments) # create a DALEX explainer explainer <- DALEX::explain(model, data=apartments, y=apartments$m2.price) # prepare observations to be explained observations <- apartments[1:30, ] # rownames are used as labels for each observation rownames(observations) <- paste0(observations$construction.year, "-", observations$surface, "m2") # generate live arena for one model and 30 observations arena <- create_arena(live=TRUE) %>% push_model(explainer) %>% push_observations(observations) # print summary print(arena)
library("DALEX") library("arenar") library("dplyr", quietly=TRUE, warn.conflicts = FALSE) # create a model model <- glm(m2.price ~ ., data=apartments) # create a DALEX explainer explainer <- DALEX::explain(model, data=apartments, y=apartments$m2.price) # prepare observations to be explained observations <- apartments[1:30, ] # rownames are used as labels for each observation rownames(observations) <- paste0(observations$construction.year, "-", observations$surface, "m2") # generate live arena for one model and 30 observations arena <- create_arena(live=TRUE) %>% push_model(explainer) %>% push_observations(observations) # print summary print(arena)
Prints static arena summary
## S3 method for class 'arena_static' print(x, ...)
## S3 method for class 'arena_static' print(x, ...)
x |
|
... |
other parameters |
None
library("DALEX") library("arenar") library("dplyr", quietly=TRUE, warn.conflicts = FALSE) # create a model model <- glm(m2.price ~ ., data=apartments) # create a DALEX explainer explainer <- DALEX::explain(model, data=apartments, y=apartments$m2.price) # prepare observations to be explained observations <- apartments[1:3, ] # rownames are used as labels for each observation rownames(observations) <- paste0(observations$construction.year, "-", observations$surface, "m2") # generate static arena for one model and 3 observations arena <- create_arena(live=FALSE) %>% push_model(explainer) %>% push_observations(observations) # print summary print(arena)
library("DALEX") library("arenar") library("dplyr", quietly=TRUE, warn.conflicts = FALSE) # create a model model <- glm(m2.price ~ ., data=apartments) # create a DALEX explainer explainer <- DALEX::explain(model, data=apartments, y=apartments$m2.price) # prepare observations to be explained observations <- apartments[1:3, ] # rownames are used as labels for each observation rownames(observations) <- paste0(observations$construction.year, "-", observations$surface, "m2") # generate static arena for one model and 3 observations arena <- create_arena(live=FALSE) %>% push_model(explainer) %>% push_observations(observations) # print summary print(arena)
Adds data frame to create exploratory data analysis plots
push_dataset(arena, dataset, target, label)
push_dataset(arena, dataset, target, label)
arena |
live or static arena object |
dataset |
data frame used for EDA plots |
target |
name of target variable |
label |
label of dataset |
Updated arena object
library("DALEX") library("arenar") library("dplyr", quietly=TRUE, warn.conflicts = FALSE) # create live arena with only one dataset apartments <- DALEX::apartments arena <- create_arena(live=TRUE) %>% push_dataset(apartments, "m2.price", "apartment") print(arena) # add another dataset HR <- DALEX::HR arena <- arena %>% push_dataset(HR, "status", "HR") print(arena)
library("DALEX") library("arenar") library("dplyr", quietly=TRUE, warn.conflicts = FALSE) # create live arena with only one dataset apartments <- DALEX::apartments arena <- create_arena(live=TRUE) %>% push_dataset(apartments, "m2.price", "apartment") print(arena) # add another dataset HR <- DALEX::HR arena <- arena %>% push_dataset(HR, "status", "HR") print(arena)
If arena is static it will start calculations for all already pushed
observations and global plots. If arena is live, then plots will be
calculated on demand, after calling arena_run
.
push_model(arena, explainer)
push_model(arena, explainer)
arena |
live or static arena object |
explainer |
Explainer created using |
Updated arena object
library("DALEX") library("arenar") library("dplyr", quietly=TRUE, warn.conflicts = FALSE) # create first model model1 <- glm(m2.price ~ ., data=apartments, family=gaussian) # create a DALEX explainer explainer1 <- DALEX::explain(model1, data=apartments, y=apartments$m2.price, label="GLM gaussian") # create live arena with only one model arena <- create_arena(live=TRUE) %>% push_model(explainer1) print(arena) # create and add next model model2 <- glm(m2.price ~ ., data=apartments, family=Gamma) explainer2 <- DALEX::explain(model2, data=apartments, y=apartments$m2.price, label="GLM gamma") arena <- arena %>% push_model(explainer2) print(arena)
library("DALEX") library("arenar") library("dplyr", quietly=TRUE, warn.conflicts = FALSE) # create first model model1 <- glm(m2.price ~ ., data=apartments, family=gaussian) # create a DALEX explainer explainer1 <- DALEX::explain(model1, data=apartments, y=apartments$m2.price, label="GLM gaussian") # create live arena with only one model arena <- create_arena(live=TRUE) %>% push_model(explainer1) print(arena) # create and add next model model2 <- glm(m2.price ~ ., data=apartments, family=Gamma) explainer2 <- DALEX::explain(model2, data=apartments, y=apartments$m2.price, label="GLM gamma") arena <- arena %>% push_model(explainer2) print(arena)
If arena is static it will start calculations for all already pushed
models. If arena is live, then plots will be calculated on demand,
after calling arena_run
.
push_observations(arena, observations)
push_observations(arena, observations)
arena |
live or static arena object |
observations |
data frame of new observations |
Updated arena object
By default function opens browser with new arena session. Appending data to
already existing session is also possible using argument append_data
run_server( arena, port = 8181, host = "127.0.0.1", open_browser = TRUE, append_data = FALSE, arena_url = "https://arena.drwhy.ai/" )
run_server( arena, port = 8181, host = "127.0.0.1", open_browser = TRUE, append_data = FALSE, arena_url = "https://arena.drwhy.ai/" )
arena |
Live arena object |
port |
server port |
host |
server ip address (hostnames do not work yet) |
open_browser |
Whether to open browser with new session |
append_data |
Whether to append data to already existing session |
arena_url |
URL of Arena dashboard instance |
not modified arena object
library("DALEX") library("arenar") library("dplyr", quietly=TRUE, warn.conflicts = FALSE) # create a model model <- glm(m2.price ~ ., data=apartments) # create a DALEX explainer explainer <- DALEX::explain(model, data=apartments, y=apartments$m2.price) # generate live arena for one model and all data as observations arena <- create_arena(live=TRUE) %>% push_model(explainer) %>% push_observations(apartments) # run the server if (interactive()) run_server(arena, port=1234)
library("DALEX") library("arenar") library("dplyr", quietly=TRUE, warn.conflicts = FALSE) # create a model model <- glm(m2.price ~ ., data=apartments) # create a DALEX explainer explainer <- DALEX::explain(model, data=apartments, y=apartments$m2.price) # generate live arena for one model and all data as observations arena <- create_arena(live=TRUE) %>% push_model(explainer) %>% push_observations(apartments) # run the server if (interactive()) run_server(arena, port=1234)
Save generated json file from static arena
save_arena(arena, filename = "data.json", pretty = FALSE)
save_arena(arena, filename = "data.json", pretty = FALSE)
arena |
Static arena object |
filename |
Name of output file |
pretty |
whether to generate pretty and easier to debug JSON |
not modified arena object
Splits multiclass explainer into multiple classification explainers
split_multiclass_explainer(explainer)
split_multiclass_explainer(explainer)
explainer |
Multiclass explainer created using |
list of explainers
Internal function for pretty truncationg params list
truncate_vector(vec, size = 6)
truncate_vector(vec, size = 6)
vec |
vector to be truncated |
size |
elements with index greater than size will be truncated |
string with collapsed and truncated input vector
By default function opens browser with new arena session. Appending data to
already existing session is also possible using argument append_data
upload_arena( arena, open_browser = TRUE, append_data = FALSE, arena_url = "https://arena.drwhy.ai/", pretty = FALSE )
upload_arena( arena, open_browser = TRUE, append_data = FALSE, arena_url = "https://arena.drwhy.ai/", pretty = FALSE )
arena |
Static arena object |
open_browser |
Whether to open browser with new session |
append_data |
Whether to append data to already existing session |
arena_url |
URL of Arena dashboard instance |
pretty |
whether to generate pretty and easier to debug JSON |
not modified arena object
Checks if it is safe do add new dataset to the arena object
validate_new_dataset(arena, dataset, target, label)
validate_new_dataset(arena, dataset, target, label)
arena |
live or static arena object |
dataset |
data frame for data analysis |
target |
name of target variable |
label |
name of dataset |
None
Function checks if explainer's label is not already used call stop if there is at least one conflict.
validate_new_model(arena, explainer)
validate_new_model(arena, explainer)
arena |
live or static arena object |
explainer |
Explainer created using |
None
Function checks if rownames are not already used and call stop if there is at least one conflict.
validate_new_observations(arena, observations)
validate_new_observations(arena, observations)
arena |
live or static arena object |
observations |
data frame of new observations |
None