1 Conformal prediction in regression

In this tutorial, we demonstrate how to use the probably package (Kuhn, Vaughan, and Ruiz 2024) to compute uncertainty intervals using conformal prediction for a model that predicts a continuous response variable using a number of covariates. Specifically, we show how to use the probably package to compute intervals using the split conformal prediction algorithm proposed by Lei et al. (2018), and the conformalized quantile regression algorithm (Romano, Patterson, and Candes 2019). In addition, we highlight some important features of the available implementations as well as some of its limitations.

1.1 Split conformal prediction for regression of a continuous response

A common regression setting is that of regression of a continuous response based on given covariates, an example of such setting is the simple linear regression model and variants of it. In this case, we have an available data set that consists of pairs \((y_i,\textbf{x}_i)\); where \(y_i\) denotes the value of the response of interest for the \(i\)-th observation and \(\textbf{x}_i\) the values of the covariates associated to it. It is assumed that the response of interest, which is a random variable \(Y\), takes values in the real numbers or more generally in a set \(\textbf{Y}\); while the predictors take values in a set \(\textbf{X}\). For this tutorial, we will denote each pair of observations \((y_i,\textbf{x}_i)\) as \(z_i\), and \(\textbf{Z} = \textbf{Y}\times\textbf{X}\) so that \(z_i\in\textbf{Z}\). The goal of conformal prediction is to construct prediction sets, \(\Gamma^{\alpha}\), for \(Y_{n+1}\in \textbf{Y}\) that satisfy coverage guarantees of the form \(P(Y_{n+1}\in \Gamma^{\alpha})=1-\alpha\), for a significance level \(\alpha\in(0,1)\). According to Vovk, Gammerman, and Shafer (2022), these prediction sets will be nested, in the sense that if \(\alpha_1 > \alpha_2\) then \(\Gamma^{\alpha_1}\subseteq \Gamma^{\alpha_2}\).

In general, conformal prediction methods are very flexible that can be wrapped around almost any model or algorithm; there is no restriction on its form, but in the classical setting it is required to be symmetric on the data; which means that no matter the permutation of the training data is passed to it, it should output the same results. In addition, conformal prediction relies on the scoring of potential values for \(Y_{n+1}\) by means of a nonconformity measure. Formally, a nonconformity measure is any function \(S:\textbf{Z}^{(*)}\times \textbf{Z}\to [0,\infty]\) that quantifies how different an example from the possible values in the set \(\textbf{Z}\) is from the available examples in the set \(\textbf{Z}^{(*)}\). From the nonconformity measure, nonconformity scores are obtained; the nonconformity score for each observation \(Z_i\) is defined as \(V_i := S(\textbf{Z}^{(*)}, Z_i)\), and it is implicit that there is an underlying prediction algorithm that has been trained using the available examples in \(\textbf{Z}^{(*)}\).

A critical quantity derived from the nonconformity scores are p-values for each available example \(Z_i\). These p-values will play a critical role as they will determine which values for \(Z_{n+1}\) should be included in the prediction set. The particular notion of p-values for conformal prediction depend on the nonconformity scores and the p-value for the available example \(Z_i\) is defined as

\[\begin{equation} p_i = \dfrac{\lvert \{j = 1\ldots, n: V_j \geq V_i\} \rvert}{n}. \label{eq:cp:generalpval} \end{equation}\]

For given new values of the covariates \(\textbf{x}_{n+1}\), the pair \(z_{y}:=(y,\textbf{x}_{n+1})\), for some \(y\in\textbf{Y}\), denotes a possible combination of values \(y\) for the given covariates, where the word possible only means that \(y\in\textbf{Y}\). The nonconformity score for this possible combination will be denoted as \(V_{n+1}^y = S(\textbf{Z}^{(*)},y)\). In this setting, the p-value for a possible combination can be defined in a similar fashion:

\[\begin{equation} p_{n+1}^{y} = \dfrac{\lvert \{j = 1\ldots, n: V_j \geq V_{n+1}^y\} \rvert}{n+1}. \label{eq:cp:pvalnewx} \end{equation}\]

Prediction sets are then constructed from p-values obtained from each value \(y\in\textbf{Y}\). For split conformal prediction, the original data set must be split in two disjoint sets, a proper training set and a proper calibration set, which will be denoted as \(\textbf{Z}_{\text{Train}}^{(*)}\) and \(\textbf{Z}_{\text{Cal}}^{(*)}\), with respective sample sizes \(l\) and \(m\) such that \(l + m = n\).

The former will be used for training the algorithm of interest, while the second will be used to compute the p-values. An explicit form for the construction of the prediction set using split conformal prediction is the following

\[\begin{equation} \Gamma^{\alpha} := \left\{ y \in \textbf{Y}: p_{m+1}^{y}>\alpha \right\}, \label{eq:fcp:predset} \end{equation}\]

where the p-values \(p_{m+1}^{y}\) are computed as

\[p_{n+1}^{y} = \dfrac{\lvert \{j = 1\ldots, m: V_j \geq V_{m+1}^y\} \rvert}{m+1},\] and \[V_{j} = S(\textbf{Z}_\text{Train}^{(*)}, z_j), \quad\text{for }z_j\in \textbf{Z}_{\text{Cal}}^{(*)}.\]

A more practical and easy expression for the construction of the prediction interval can be obtained in terms of the empirical quantiles induced by the nonconformity scores on the calibration set, such equivalent form is the following

\[\begin{equation} \Gamma^{\alpha} = \left\{ y\in\textbf{Y}: V_{m+1}^y\leq \text{Quantile}(1-\alpha; \{V_j\}_{j=1}^{m}) \right\}, \end{equation}\]

which is particularly useful when implementing it in programming languages and studying its theoretical properties.

The default nonconformity score in Lei et al. (2018) is the absolute deviation of residuals, defined as

\[ S(Y,\hat{Y}) := \lvert Y - \hat{Y} \rvert, \]

however other measures could be used.

The form of the prediction intervals obtained from using this nonconformity score have an easy to compute form

\[ [\hat{Y}_{n+1}-\hat{Q}_\alpha, \hat{Y}_{n+1}+\hat{Q}_\alpha], \]

where \(\hat{Q}_\alpha\) is the \(\lceil m(1-\alpha) \rceil\) empirical quantile obtained from the nonconformity scores on the calibration set.

1.2 Conformalized Quantile Regression

Note from the form of the prediction interval above that the width is fixed. Romano, Patterson, and Candes (2019) argued that this might not be a desirable feature since there may exist regions on the values of the covariates in which the response exhibits more variability than in other regions and to fix this problem they propose a method, Conformalized Quantile Regression.

The idea is to use quantile regression to directly estimate the prediction bands at the desired confidence level, and later construct a correction to those bands based on data in a calibration set and a custom non-conformity score.

The nonconformity score for CQR is intended to account for undercoverage and overcoverage and it has the form

\[ S(Y_i, \hat{Q}_{\alpha_\text{lo}}, \hat{Q}_{\alpha_\text{hi}}) := \max\{\hat{Q}_{\alpha_\text{lo}}(X_i) - Y_i, Y_i - \hat{Q}_{\alpha_\text{hi}}(X_i)\}, \]

where \(\hat{Q}_{\alpha_\text{lo}}(X_i)\) and \(\hat{Q}_{\alpha_\text{hi}}(X_i)\) are the estimated lower and upper quantiles from the chosen quantile regression method. The prediction band is constructed as

\[ \left[\hat{Q}_{\alpha_\text{lo}}(X_{n+1}) - \hat{Q}_{\text{Cal}}, \hat{Q}_{\alpha_\text{hi}}(X_{n+1}) + \hat{Q}_{\text{Cal}}\right], \]

where \(\hat{Q}_{\text{Cal}}\) is the quantile estimated from the nonconformity scores in the calibration set.

1.3 Coverage guarantees of split conformal prediction

One of the attractive features of CP is the fact that the coverage guarantees hold on a finite sample scenario, which means that in order to achieve the desired coverage the methods do not rely on any assumption of a growing sample size. In the context of regression, Lei et al. (2018) derived upper bounds on the coverage achieved by conformal prediction methods applied to regression setting, of particular interest is the following result.

Result. Under the exchangeability assumption, the split conformal prediction interval \(\Gamma^{\alpha}\) is a conservatively valid prediction interval for any significance level \(\alpha\), i.e., \(P(Y_{n+1}\in \Gamma^{\alpha})\geq 1-\alpha\). Furthermore, \(P(Y_{n+1}\in \Gamma^{\alpha})\leq 1-\alpha + \dfrac{1}{m+1}\), where \(m\) is the sample size of the calibration set.

From this result, we can obtain the following lower bound for the sample size of the calibration set

\[ m>\dfrac{\alpha + \nu}{\alpha + \nu -1}, \]

where \(1-\alpha<\nu\leq1\) is the desired upper bound for the probability of coverage \(P(Y_{n+1}\in \Gamma^{\alpha})\).

1.4 Available implementations of split conformal prediction in the probably package

The method introduced by Lei et al. (2018) in the regression setting has been implemented in R for some type of regression models like the multiple linear regression, regression with RIDGE/LASSO-type penalization, and ElasticNet penalizations. Unfortunately, they are not widely available to R users via the CRAN. However, the probably package (Kuhn, Vaughan, and Ruiz 2024), has implemented their methods and is available via the CRAN. In addition, the Conformalized Quantile Regression method of Romano, Patterson, and Candes (2019) is also available in this package.

The advantage of using this package, compared to installing the implementations by the original authors, is the fact that the class of models to which the method can be applied is wider due to the integration with the parsnip package (Kuhn and Vaughan 2024). This package implements wrappers to fit models from a wide variety of packages using a common syntax, storing the results in objects with a common class and making results of interest like parameter estimates, predictions, confidence intervals and prediction intervals (if available), accessible to the user using common functions for all models via the tidymodels ecosystem (Kuhn and Wickham 2020a), which defines classes for each of the previous tasks and quantities of interest. A potential drawback might be that not all the models from every package available in the CRAN are available via the parsnip package, however it can be resolved by constructing the corresponding wrappers using the developer features of parsnip. However, the class of available models is still big. Other limitations are the restriction to particular nonconformity measures, in particular to the absolute deviation for the general split conformal prediction algorithm or the fixed regression algorithm for conformalized quantile regression. In the latter, the available quantile regression algorithm is restricted to the class of quantile regression random forest (QRRF) models, such models are of the form

\[\begin{equation*} \begin{split} P(Y \leq Q_{\tau}( Y \lvert X)\lvert X) = \tau,& \\ Q_{\tau}(Y \lvert X) = \sum_{i=1}^{m}\sum_{j=1}^{k_i}b^i_j(X)\beta_j^i,& \end{split} \end{equation*}\]

where \(\tau\) is the quantile level and \(b_j^{i}(\cdot)\) is the j-th basis function of the i-th tree in the random forest model. For each quantile level of interest, say 2.5% and 97.5% for a 95% confidence prediction interval, the functions will be obtained by fitting the above model. As the CQR algorithm of Romano, Patterson, and Candes (2019) requires fitted quantile functions, the above will be used for the computations of the nonconformity scores in the calibration set and the resulting bands will be similar as those from the QRRF but corrected using the corresponding quantile of the nonconformity scores.

1.5 Outline for the remaining sections of the tutorial

In the remainder of the tutorial, a description of the data set for the examples is provided, in addition on how to produce data splits using the rsample package (Frick et al. 2024a), and how to fit a model to predict the response variable and 95% confidence intervals using the parsnip package. In our example, we use the package infrastructure provided by the tidymodels collection of packages (Kuhn and Wickham 2020b) and therefore a brief introduction to is provided. Then, we show how to use the probably package to compute uncertainty intervals. In addition, we also provide the code to visualize the uncertainty intervals, and compute the empirical coverage of the intervals obtained.

2 Conformal prediction for SCADA data

In our examples, we use the SCADA dataset obtained from Kaggle (“Wind Turbine Scada Dataset,” n.d.). SCADA stands for Supervisory Control and Data Acquisition systems, these are widely employ in the monitoring of wind energy production. The data consists of 39692 registries of electricity production (kW), wind speed (m/s), the theoretical electricity production (KWh) and the wind direction collected from a wind turbine’s SCADA system in Turkey. Our interest is to predict the production of a wind mill using as a covariate wind speed. This type of data sets pose challenges for modeling, as failures may introduce anomalous registries that deviate from the usual behaviour. Therefore, the models used for prediction must be robust and should be able to quantify uncertainty under such challenging scenarios, where outliers might impact the quality of prediction bands.

2.1 Data preparation

First, we load the data and inspect their first rows. Then, we create a data frame d that we will use in our analysis that includes variables y with the active power (active_power) and x with the covariate wind speed (wind_speed). We produce a summary of its information and observe that active power (y) ranges from 0 to 3618.7 (kW), and wind speed ranges from 1.2 to 25.2 (m/s).

library(here)
ds.path <- here("data", "scada.csv")
ds <- read.csv(ds.path)
head(ds)
##   active_power wind_speed theoretical_power wind_direction
## 1     380.0478   5.311336          416.3289       259.9949
## 2     453.7692   5.672167          519.9175       268.6411
## 3     306.3766   5.216037          390.9000       272.5648
## 4     419.6459   5.659674          516.1276       271.2581
## 5     380.6507   5.577941          491.7030       265.6743
## 6     402.3920   5.604052          499.4364       264.5786
d <- data.frame(y = ds$active_power, x = ds$wind_speed)
summary(d)
##        y                x         
##  Min.   :   0.0   Min.   : 1.209  
##  1st Qu.: 481.7   1st Qu.: 5.912  
##  Median :1394.0   Median : 8.114  
##  Mean   :1664.8   Mean   : 8.770  
##  3rd Qu.:2908.2   3rd Qu.:11.100  
##  Max.   :3618.7   Max.   :25.206

2.2 Splitting data

For split conformal prediction, we need to partition the data into two sets, namely, a proper training set and proper calibration set. For demonstration purposes we also create a testing set so we can evaluate the performance of the prediction bands obtained with each method but this is not a mandatory step.

To choose the sample size in each split, we can use the lower bound for the sample size of the proper calibration set provided in section 1.3.

Note that there is no rule for selecting the upper bound for the coverage probability, however it would be desirable to be as close to the exact coverage as the available sample size permits, for example one could be interested in only allowing a theoretical upper bound of 0.9501 to avoid obtaining conservative intervals. However, there is always a trade off between the size of the set for training and for calibration as a bigger trainning set might provide more information about the relationship between the response and the predictors, while a bigger calibration set would also provide more information about how future observation behave. In the data at hand we expect around 60% of the observations to provide enough information to capture well the relationship between the covariates of interest and we will set the upper bound of the coverage probability to be \(0.9501\).

An example of function to compute the percentage for the calibration set is presented below. The function takes as a constrain an assignment of at least 50% (train_size=0.5) of the data to the training set and if the suggested calibration sample size is bigger, it automatically sets it to 50%. The argument conf_level must be set to the desired confidence level of the prediction bands, for example 0.95 for a 95% confidence prediction bands. The upper_bound argument must be equal to the desired theoretical upper bound on the coverage probability and the sample_size must equal the sample size on the whole data set without.

percentage_cal <- function(conf_level, upper_bound, train_size = 0.5, sample_size){
  signif_level <- 1 - conf_level
  s <- signif_level + upper_bound
  ss <- floor(s / (s-1))
  percent <- ss / sample_size
  if(percent > train_size){
    percent <- 1 - train_size
    warning("Desired upper bound is not achievable without reducing training size. Percentage of calibration set has been set to 100*(1-train_size)%.")
  }
  return(percent)
}

For our example, we would like the theoretical coverage to not exceed 0.9502, and 95% confidence prediction intervals, then we can use the previous function as follows.

percentage_cal(conf_level = 0.95, upper_bound = 0.9501, sample_size = 39692)
## [1] 0.2519651

The partitions can be obtained using the rsample package (Frick et al. 2024b) which provides functions to create different types of (re) samples of data and corresponding classes for their analysis. Specifically, we use the initial_validation_split() function of rsample to create a random three-way split of the data into a training set, a validation set, and a testing set. The arguments of this function include the data of interest and a vector of proportions for the training and validation set which we set to (0.6, 0.252) based on the results of the percentage_cal() function and the discussion at the beginning of this subsection.

This splits the data such that 60% of it is assigned to the training set, 25.2% to the validation set, and the remaining proportion (14.8%) to the testing set. In our example, we use the validation set as the calibration set. To retrieve the data for each split we use the functions training(), validation() and testing() for the training, calibration and testing sets, respectively.

set.seed(141019)
library(rsample)
dsplit <- initial_validation_split(d, prop = c(0.6, 0.252))
dtrain <- training(dsplit)
dcal <- validation(dsplit)
dtest <- testing(dsplit)

Next, we visualize the obtained data splits.

cols <- c(
  palette.colors(3, palette = "Paired"),
  palette.colors(2, palette = "Okabe-Ito")
)
par(mfrow = c(1, 3))
plot(dtrain$x, dtrain$y, col = cols[1], xlab = "x", ylab = 'y', main = "Training", pch = 20)
plot(dcal$x, dcal$y, col = cols[2], xlab = "x", ylab = 'y', main = "Calibration", pch = 20)
plot(dtest$x, dtest$y, col = cols[3], xlab = "x", ylab = 'y', main = "Testing", pch = 20)
Splits of the original dataset. Left: Proper training data to fit each model. Middle: Proper calibration data for computation of nonconformity scores. Right: Test split to evaluate empirical coverage of each prediction band.

Figure 2.1: Splits of the original dataset. Left: Proper training data to fit each model. Middle: Proper calibration data for computation of nonconformity scores. Right: Test split to evaluate empirical coverage of each prediction band.

2.3 Modeling

In this subsection the models to perform predictions for the data set will be introduced. These models will be implemented by means of wrapper functions from the parsnip package and since the conformal prediction step will depend on such models, we briefly describe some of the ideas of such package and related ones.

2.3.1 A brief introduction to the tidymodeling framework

The probably package relies on objects generated by other packages within the tidymodels framework. tidymodels is a collection of packages designed for data analysis tasks, covering everything from data loading and transformation to model fitting and hyperparameter tuning. The main goal of tidymodels is to provide a unified framework for these tasks, ensuring consistency and better project organization, for example, by providing of consistent definition of outputs across the results of several packages intended for modeling tasks.

A key object of this ecosystem, and of particular interest to the probably package, is the workflow returned by functions from workflows package (Vaughan and Couch 2024). This package is designed to gather the main components of a modeling process, such as preprocessing steps and model definitions, into a single structured object. This approach promotes better methodology and project organization (Kuhn and Silge 2023). Before fitting a model, common data wrangling tasks must be performed, such as selecting relevant variables, scaling features, or handling missing values. The workflows package facilitates these steps using a lazy evaluation approach, meaning computations are only executed when necessary.

For the modeling step, the parsnip package provides a unified interface for fitting models using various existing R packages and provides interfaces to separate the model specification and model fitting steps. It organizes models into broad categories, such as regression and classification, and introduces the concept of engines. An engine is an abstraction that maps to a specific model-fitting function. For example, the base R function lm() fits linear regression models, while glm() is used for models from the exponential family. Although these functions have different implementations, they serve a common purpose: regression modeling.

Since different modeling functions often return similar outputs (e.g., parameter estimates, fitted values, or significance tests), parsnip standardizes their usage. This allows users to specify a modeling task (e.g., regression) and select an appropriate engine while ensuring that common methods, such as predict() in our case of interest, work consistently across different models.

2.3.2 Model: Linear model

For simplicity, we first model the response variable using a simple linear regression model with a quadratic term for the covariate using the training data.

\[ Y = \beta_0 + \beta_1\text{X} + \beta_2\text{X}^2 + \epsilon,\quad \epsilon\sim N(0,\sigma^2). \]

We fit the model using the parsnip package which provides a tidy, unified interface to models available from other packages. In addition, we also use the workflow() function from the workflows package since the functions from probably for the computation of conformal prediction bands require objects returned from the workflow() function.

We first specify a linear regression model by using the function linear_reg() and setting the engine to the lm() function using set_engine("lm"). Then, we initialize the workflow with the workflow() function from workflows, add the model to the workflow using add_model(), and add the model’s formula with add_formula(). Finally, we fit the model using the fit() function passing as arguments the workflow and the training data.

library(workflows)
library(parsnip)
library(tidymodels)
## ── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──
## ✔ broom        1.0.5     ✔ recipes      1.1.0
## ✔ dials        1.3.0     ✔ tibble       3.2.1
## ✔ dplyr        1.1.4     ✔ tidyr        1.3.1
## ✔ ggplot2      3.5.1     ✔ tune         1.2.1
## ✔ infer        1.0.7     ✔ workflowsets 1.1.0
## ✔ modeldata    1.4.0     ✔ yardstick    1.3.1
## ✔ purrr        1.0.2
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ purrr::discard() masks scales::discard()
## ✖ dplyr::filter()  masks stats::filter()
## ✖ dplyr::lag()     masks stats::lag()
## ✖ recipes::step()  masks stats::step()
## • Use suppressPackageStartupMessages() to eliminate package startup messages
spec <- linear_reg() %>% set_engine("lm")
wf <- workflow() %>% add_model(spec) %>% add_formula(y ~ x + I(x^2))
fittedmodellm <- fit(wf, data = dtrain)

2.3.3 Model: GAM

Based on the shape of the relationship between the response and covariate of interest, we decide to fit an alternative model, namely, a GAM due to its flexibility in the shape of the regression function. Our model is of the form

\[ \text{Y} = f(\text{X}) + \epsilon, \quad \epsilon \sim N(0,\sigma^2), \]

where \(f(\text{X}) = \sum_{j=1}^{k}b_j(\text{X})\beta_j\) and \(b_j(\cdot)\) are basis functions, in this case cubic splines.

We use the mgcv package (Wood 2011) through the gen_additive_mod() function from parsnip. In the workflow, we provide the s() function, in addition to the model formula, to the add_model() function. According to the available documentation of parsnip, this is a requirement for the correct specification of the GAM model to mgcv. We must specify the formula for mgcv in the add_model() function, later we can add a simple formula with the add_formula() function to make sure the fit() function will parse the data correctly.

set.seed(141809)
spec <- gen_additive_mod() %>% set_engine("mgcv") %>% set_mode("regression")
wf <- workflow() %>% add_model(spec, formula = y ~ s(x)) %>% add_formula(y ~ x)
fittedmodel <- fit(wf, data = dtrain)

2.3.4 Point predictions

The predictions of the response variable can be obtained with the predict() function, providing as arguments the fitted model workflow and the covariate for prediction.

pred <- data.frame("x" = seq(0, 25, length.out = 100000))
pred$ylm <- predict(fittedmodellm, pred)$.pred
pred$ygam <- predict(fittedmodel, pred)$.pred

A plot of the predictions can be produced as follows.

par(mfrow = c(1,2))
plot(dtrain$x, dtrain$y, pch = 20, col = cols[1], xlab = "x", ylab = 'y', main = "Linear, quadratic")
lines(pred$x, pred$ylm, lwd = 2, col = cols[4])
plot(dtrain$x, dtrain$y, pch = 20, col = cols[1], xlab = "x", ylab = 'y', main = "GAM")
lines(pred$x, pred$ygam, lwd = 2, col = cols[4])
Point predictors using the training data (black points). Left: Regression line with quadratic term. Right: GAM regression line.

Figure 2.2: Point predictors using the training data (black points). Left: Regression line with quadratic term. Right: GAM regression line.

2.4 Prediction bands

2.4.1 Prediction bands using linear regression results for the linear regression model with a quadratic term

We can obtain 95% confidence intervals for the linear regression model, using the predict() function using type = "pred_int" to indicate we want prediction intervals, and the desired confidence level (level = 0.95).

bandmodellm <- predict(fittedmodellm, pred, type = "pred_int", level = 0.95)

The mgcv package does not provide a direct method to produce uncertainty quantification for future predictions, one can try to simulate the posterior predictive distribution.

In the following sections, we show how to compute conformal prediction intervals for the GAM model (fittedmodel) and for for the linear model (fittedmodellm).

2.4.2 Prediction bands using split conformal prediction for the linear regression model with a quadratic term and the GAM

The nonconformity scores for split conformal prediction are computed using the int_conformal_split() function from the probably package. The arguments of this function include the fitted model workflow and the calibration split. Then, to retrieve the prediction bands we use the predict() function providing the nonconformity scores obtained with int_conformal_split() function, the predictor values for which we want to get prediction bands, and the confidence level.

library(probably)
## 
## Attaching package: 'probably'
## The following objects are masked from 'package:base':
## 
##     as.factor, as.ordered
splitcplm <- int_conformal_split(fittedmodellm, dcal)
bandsplitcplm <- predict(splitcplm, pred, level = 0.95)

splitcpgam <- int_conformal_split(fittedmodel, dcal)
bandsplitcpgam <- predict(splitcpgam, pred, level = 0.95)

2.4.3 Prediction bands using Conformalized Quantile Regression (CQR)

Before explaining how to use the function for CQR, int_conformal_quantile, it is important to remark that even when we pass a fitted model via workflow objects, as will be shown below, this object is only used to retrieve information about the columns in the data frame from the training split to be used to fit the Quantile Regression Random Forest, which is the de facto algorithm in the int_conformal_quantile() function. This means that even when we pass the object corresponding to the fitted linear model or GAM, the function will not use the predictions from those objects but the information on which variable acts as the response of interest and which are the covariates.

Similar to the int_conformal_split() function, the int_conformal_quantile() function requires a fitted model workflow object to work and the calibration data. However, it also requires the training data and the desired confidence level, as the function fits internally a QRRF model and uses the specified confidence level to generate quantile predictions.

Running the following command sequence will produce the CQR predictor. Note that in order to avoid errors with the package called by the int_conformal_quantile() function, we select the columns corresponding to the response and predictor (which in our example coincide with the data).

cqr <- int_conformal_quantile(fittedmodel, dtrain[, c(1, 2)], dcal[, c(1, 2)], level = 0.95, nodesize = 1250)
bandcqr <- predict(cqr, pred)

2.4.4 Prediction bands from QRRF

Since the int_conformal_quantile() function stores the results from the original QRRF model, we can obtain the unadjusted quantile bands from the quant object within the cqr object as follows.

bandqrrf <- predict(cqr$quant, pred, what = c(0.025, 0.975))

2.4.5 Plot predictions and prediction bands

After the results for each method have been obtained, plots of the estimated regression lines and its corresponding prediction bands can be obtained following the commands below.

par(mfrow = c(2, 2))
# Band LM Model
plot(dtest$x, dtest$y, pch = 20, col = cols[3],
     main = "Linear model, Gaussian bands", xlab = "x", ylab = 'y')
lines(pred$x, pred$ylm, lwd = 2, col = cols[4])
lines(pred$x, bandmodellm$.pred_lower, lwd = 2, col = "gray")
lines(pred$x, bandmodellm$.pred_upper, lwd = 2, col = "gray")
# Band CP Split on LM model
plot(dtest$x, dtest$y, pch = 20, col = cols[3],
     main = "Linear model, CP bands", xlab = "x", ylab = 'y')
lines(pred$x, pred$ylm, lwd = 2, col = cols[4])
lines(pred$x, bandsplitcplm$.pred_lower, lwd = 2, col = cols[5])
lines(pred$x, bandsplitcplm$.pred_upper, lwd = 2, col = cols[5])
# Band CP Split on GAM model
plot(dtest$x, dtest$y, pch = 20, col = cols[3],
     main = "GAM model, CP bands", xlab = "x", ylab = 'y')
lines(pred$x, pred$ygam, lwd = 2, col = cols[4])
lines(pred$x, bandsplitcpgam$.pred_lower, lwd = 2, col = cols[5])
lines(pred$x, bandsplitcpgam$.pred_upper, lwd = 2, col = cols[5])
# # Band CP CQR
plot(dtest$x, dtest$y, pch = 20, col = cols[3],
     main = "RFQR and CQR bands", xlab = "x", ylab = 'y')
lines(pred$x, bandcqr$.pred_lower, lwd = 2, col = cols[5])
lines(pred$x, bandcqr$.pred_upper, lwd = 2, col = cols[5])
lines(pred$x, bandqrrf[,1], lwd = 2, col = cols[4], lty = 3)
lines(pred$x, bandqrrf[,2], lwd = 2, col = cols[4], lty = 3)
Point predictors depicted with black color and its corresponding 95% prediction bands in dark orange. Top-left: Linear regression estimated with Gaussian prediction bands (gray lines). Top-right: Estimated regression line with conformal prediction bands. Bottom-left: Estimated regression line with GAM and conformal prediction bands. Bottom-left: Original (QRRF) quantile regression lines, with corrected CQR.

Figure 2.3: Point predictors depicted with black color and its corresponding 95% prediction bands in dark orange. Top-left: Linear regression estimated with Gaussian prediction bands (gray lines). Top-right: Estimated regression line with conformal prediction bands. Bottom-left: Estimated regression line with GAM and conformal prediction bands. Bottom-left: Original (QRRF) quantile regression lines, with corrected CQR.

2.5 Evaluation coverage prediction bands

Finally, we evaluate the coverage probabilities of the prediction bands by computing the proportion of observations from the test split that are contained within the bands.

# Coverage model-based prediction bands
bandmodellm <- predict(fittedmodellm, dtest, level = 0.95, type = "pred_int")
covmodellm <- mean((bandmodellm$.pred_lower <= dtest$y) & (bandmodellm$.pred_upper >= dtest$y))
# Coverage split conformal prediction for LM
bandsplitcplm <- predict(splitcplm, dtest, level = 0.95)
covsplitlm <- mean((bandsplitcplm$.pred_lower <= dtest$y) & (bandsplitcplm$.pred_upper >= dtest$y))
# Coverage split conformal prediction GAM
bandsplitcpgam <- predict(splitcpgam, dtest, level = 0.95)
covsplitcp <- mean((bandsplitcpgam$.pred_lower <= dtest$y) & (bandsplitcpgam$.pred_upper >= dtest$y))
# Coverage CQR
bandcqr <- predict(cqr, dtest[,c(1, 2)])
covcqr <- mean((bandcqr$.pred_lower <= dtest$y) & (bandcqr$.pred_upper >= dtest$y))
# Coverage RFQR
bandqrrf <- predict(cqr$quant, dtest[,c(1, 2)], what = c(0.025, 0.975))
covqrrf <- mean((bandqrrf[,1] <= dtest$y) & (bandqrrf[,2] >= dtest$y))

This indicates that the achieved mean empirical coverage on the test set is 0.9733 for the prediction intervals for the linear model constructed with the Gaussian assumptions, 0.9525 for the prediction intervals for the linear model using SCP, 0.954 for split conformal prediction with the GAM model, 0.9564 for CQR, and 0.9605 for QRRF. We can note that we wanted our probability of coverage not to exceed 0.9501, in our empirical evaluations we found that conformal prediction methods exceed this quantity, however neither of them failed to provide the desired coverage even when there exists several outlying observations.

3 Conclusions

In this tutorial we exemplified the construction of prediction bands using conformal prediction methods by means of existing implementations in the probably package. Our code examples show how to obtain the prediction intervals for several models available via the parsnip package. This tutorial also highlights some important limitations. The first one is the fact that even though conformal prediction is a general method that allows the construction of prediction intervals on almost any type of interval, the implementations available in probably work only for the case in which the response variable is continuous. In cases where the predictions should come from less general sets, for example the integers for poisson regression, there is no available method in probably to take this into account. In addition, the fact that we must rely on the models available in the parsnip package restricts the class of models that can be used, however the solution could be to define new wrappers for the model of interest so that it can be handled. Finally, the conformalized quantile regression from the probably package is limited to only one quantile regression algorithm. This makes it unsuitable for extending the functionality to other available quantile regression algorithms in R even when it could be possible to define wrappers for them in parsnip.

On the other hand, all these limitations open the door to the enhancement of the functionality in the probably package, to allow for implementation of other CP algorithms and other quantile regression algorithms, or even more ambitious, extend the infrastructure of the tidymodels and parsnip packages to tackle the specific characteristics of conformal prediction.

Frick, Hannah, Fanny Chow, Max Kuhn, Michael Mahoney, Julia Silge, and Hadley Wickham. 2024a. “Rsample: General Resampling Infrastructure.” https://CRAN.R-project.org/package=rsample.
———. 2024b. “Rsample: General Resampling Infrastructure.” https://CRAN.R-project.org/package=rsample.
Kuhn, Max, and Julia Silge. 2023. Tidy Modeling with R. 1st ed. O’Reilly. https://www.tmwr.org.
Kuhn, Max, and Davis Vaughan. 2024. “Parsnip: A Common API to Modeling and Analysis Functions.” https://CRAN.R-project.org/package=parsnip.
Kuhn, Max, Davis Vaughan, and Edgar Ruiz. 2024. “Probably: Tools for Post-Processing Predicted Values.” https://CRAN.R-project.org/package=probably.
Kuhn, Max, and Hadley Wickham. 2020a. “Tidymodels: A Collection of Packages for Modeling and Machine Learning Using Tidyverse Principles.” https://www.tidymodels.org.
———. 2020b. “Tidymodels: A Collection of Packages for Modeling and Machine Learning Using Tidyverse Principles.” https://www.tidymodels.org.
Lei, Jing, Max G’Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman. 2018. “Distribution-Free Predictive Inference for Regression.” Journal of the American Statistical Association 113 (523): 1094–1111. https://doi.org/10.1080/01621459.2017.1307116.
Romano, Yaniv, Evan Patterson, and Emmanuel Candes. 2019. “Conformalized Quantile Regression.” In. Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/hash/5103c3584b063c431bd1268e9b5e76fb-Abstract.html.
Vaughan, Davis, and Simon Couch. 2024. “Workflows: Modeling Workflows.” https://CRAN.R-project.org/package=workflows.
Vovk, Vladimir, Alexander Gammerman, and Glenn Shafer. 2022. Algorithmic learning in a random world. Second edition. Cham, Switzerland: Springer.
“Wind Turbine Scada Dataset.” n.d. https://www.kaggle.com/datasets/berkerisen/wind-turbine-scada-dataset.
Wood, S. N. 2011. “Fast Stable Restricted Maximum Likelihood and Marginal Likelihood Estimation of Semiparametric Generalized Linear Models” 73: 3–36.

  1. KAUST↩︎

  2. KAUST↩︎