--- title: "Example of local variable importance" author: "Anna Kozak" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Example of local variable importance} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Example of local variable importance In this vignette, we present a local variable importance measure based on Ceteris Paribus profiles for random forest regression model. ```{r, include=FALSE, warning=FALSE, error=FALSE, message=FALSE} library("ggplot2") ``` ### 1 Dataset We work on Apartments dataset from `DALEX` package. ```{r, warning = FALSE, echo = FALSE, message = FALSE, include = TRUE} library("DALEX") data(apartments) head(apartments) ``` ### 2 Random forest regression model Now, we define a random forest regression model and use explain from `DALEX`. ```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE} library("randomForest") apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:5], y = apartmentsTest$m2.price) ``` ### 3 New observation We need to specify an observation. Let consider a new apartment with the following attributes. Moreover, we calculate predict value for this new observation. ```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE} new_apartment <- data.frame(construction.year = 1998, surface = 88, floor = 2L, no.rooms = 3) predict(apartments_rf_model, new_apartment) ``` ### 4 Calculate Ceteris Paribus profiles Let see the Ceteris Paribus Plots calculated with `DALEX::predict_profile()` function. The CP also can be calculated with `DALEX::individual_profile()` or `ingredients::ceteris_paribus()`. ```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE} library("ingredients") profiles <- predict_profile(explainer_rf, new_apartment) plot(profiles) + show_observations(profiles) ``` ### 5 Calculate measure of local variable importance Now, we calculated a measure of local variable importance via oscillation based on Ceteris Paribus profiles. We use variant with all parameters equals to TRUE. ```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE} library("vivo") measure <- local_variable_importance(profiles, apartments[,2:5], absolute_deviation = TRUE, point = TRUE, density = TRUE) ``` ```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE} plot(measure) ``` For the new observation the most important variable is surface, then floor, construction.year and no.rooms. ### 6 Comparison of two or more methods of calculating the importance of variables We calculated local variable importance for different parameters and we can plot together, on bar plot or lines plot. ```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE} measure_2 <- local_variable_importance(profiles, apartments[,2:5], absolute_deviation = FALSE, point = TRUE, density = TRUE) measure_3 <- local_variable_importance(profiles, apartments[,2:5], absolute_deviation = FALSE, point = TRUE, density = FALSE) ``` ```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE} plot(measure, measure_2, measure_3, color = "_label_method_") ``` ```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE} plot(measure, measure_2, measure_3, color = "_label_method_", type = "lines") ``` ### 7 Comparison of the importance of variables for two or more models Let created a linear regression model and `explain` object. ```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE} apartments_lm_model <- lm(m2.price ~ construction.year + surface + floor + no.rooms, data = apartments) explainer_lm <- explain(apartments_lm_model, data = apartmentsTest[,2:5], y = apartmentsTest$m2.price) ``` We calculated Ceteris Paribus profiles and measure. ```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE} profiles_lm <- predict_profile(explainer_lm, new_apartment) measure_lm <- local_variable_importance(profiles_lm, apartments[,2:5], absolute_deviation = TRUE, point = TRUE, density = TRUE) ``` ```{r, warning = FALSE, error = FALSE, message = FALSE, include = TRUE} plot(measure, measure_lm, color = "_label_model_", type = "lines") ``` Now we can see the order of importance of variables by model for selected observation.