5.4 Bayesian Regression

5.4.1 Performing Bayesian Modeling in R

In order to perform Bayesian Modeling in R, we first need to load in the rstanarm library. With the tools in the rstanarm library, we can then make Bayesian models like the frequentist models we made in sections 5.1 and 5.2.

library(rstanarm)

For the purposes of this chapter as well as future chapters, we will use the function stan_glm from the rstanarm library to make our Bayesian models. While the Bayesian counterpart to the frequentist lm function—the function we have been using thus far to make our models—is stan_lm, we introduce you to stan_glm here because, as you will see in later chapters, stan_glm is a very versatile and useful function. Not only can you use stan_glm to make Bayesian linear models, but you can also use stan_glm to make other kinds of Bayesian models, such as Bayesian logistic models—but that’s for later.

Let’s say we have made a Bayesian model. When examining a Bayesian model, we call the summary function on it—just as we do when examining a frequentist model.

However, there are several key differences between the summary outputs for a Bayesian model and the summary outputs for a frequentist model. When we call summary on a Bayesian model, we get model information, parameter estimates, and model diagnostics. These things may sound familiar, but the estimates no longer have t-values and p-values, and the model no longer has f-statistics and R-squared’s, as the estimates and model have when we call summary on a frequentist model. Instead of singular point estimates and fit statistics, we get a distribution of values for those estimates—the mean, the standard deviation, and the 10th, 50th, and 90th percentile values. We also get some other perhaps unfamiliar values:

In Estimates and MCMC diagnostics, we get:

  • sigma, which is the standard deviation of errors

In Fit Diagnostics, we get (again, the mean, standard deviation, and 10th, 50th, and 90th percentile values of):

  • mean_PPD, which is the sample mean of “the posterior predictive distribution of the outcome variable”

In MCMC diagnostics, we get:

  • log-posterior, which is the log of the combined posterior distributions, similar to a likelihood of a frequentist regression
  • mcse, which is (and stands for) the Monte Carlo standard error
  • Rhat, which indicates whether or not the model converges, and, in turn, whether or not our results are reliable. Posterior distributions are sampled in chains, and the Rhat statistic is a measure of within-chain variance compared to across-chain variance. Rhat values less than 1.1 indicate model convergence, and, according to the “How to Use the rstanarm Package” document on the CRAN website, if any Rhat values exceed 1.1, rstanarm will give you the following warning message: “Markov chains did not converge! Do not analyze results!”
  • n_eff, which is “a crude measure” of the effective sample size

One numerical explanatory variable - Bayesian

Recall score_model from 5.1:

# Fit regression model

score_model <- lm(score ~ bty_avg, data = evals_ch6)

# Get important summary information

summary(score_model)
## 
## Call:
## lm(formula = score ~ bty_avg, data = evals_ch6)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9246 -0.3690  0.1420  0.3977  0.9309 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.88034    0.07614   50.96  < 2e-16 ***
## bty_avg      0.06664    0.01629    4.09 5.08e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared:  0.03502,    Adjusted R-squared:  0.03293 
## F-statistic: 16.73 on 1 and 461 DF,  p-value: 5.083e-05
# Get 90% confidence intervals (90% to compare against later 90% credible intervals)

confint(score_model, level = 0.90)
##                    5 %       95 %
## (Intercept) 3.75484173 4.00583418
## bty_avg     0.03978652 0.09348755

Translating the above into Bayesian language to make and examine score_model_bayes, we perform:

# Fit regression model

score_model_bayes <- stan_glm(score ~ bty_avg, data = evals_ch6)
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 6.6e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.66 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.03964 seconds (Warm-up)
## Chain 1:                0.074889 seconds (Sampling)
## Chain 1:                0.114529 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 1e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.03503 seconds (Warm-up)
## Chain 2:                0.074875 seconds (Sampling)
## Chain 2:                0.109905 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 1.7e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.055952 seconds (Warm-up)
## Chain 3:                0.073196 seconds (Sampling)
## Chain 3:                0.129148 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 1.1e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.037256 seconds (Warm-up)
## Chain 4:                0.068902 seconds (Sampling)
## Chain 4:                0.106158 seconds (Total)
## Chain 4:
# Get important summary information

summary(score_model_bayes)
## 
## Model Info:
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      score ~ bty_avg
##  algorithm:    sampling
##  sample:       4000 (posterior sample size)
##  priors:       see help('prior_summary')
##  observations: 463
##  predictors:   2
## 
## Estimates:
##               mean   sd   10%   50%   90%
## (Intercept) 3.9    0.1  3.8   3.9   4.0  
## bty_avg     0.1    0.0  0.0   0.1   0.1  
## sigma       0.5    0.0  0.5   0.5   0.6  
## 
## Fit Diagnostics:
##            mean   sd   10%   50%   90%
## mean_PPD 4.2    0.0  4.1   4.2   4.2  
## 
## The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
## 
## MCMC diagnostics
##               mcse Rhat n_eff
## (Intercept)   0.0  1.0  4127 
## bty_avg       0.0  1.0  4045 
## sigma         0.0  1.0  4180 
## mean_PPD      0.0  1.0  3974 
## log-posterior 0.0  1.0  2086 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
# Get credible interval

posterior_interval(score_model_bayes)
##                     5%       95%
## (Intercept) 3.75531177 4.0030256
## bty_avg     0.04062741 0.0930610
## sigma       0.50730848 0.5658406

What are we doing here?

First, we “fit” the Bayesian regression model to the data using the stan_glm function and store it as score_model_bayes. stan_glm makes a “Bayesian generalized linear model” and is used as follows: stan_glm(y ~ x, data = data_frame_name) where:

  • y is the dependent variable, followed by a tilde (~). Here, y is set to score.
  • x is the independent variable. Here, x is set to bty_avg.
  • the combination of y ~ x is the model formula. (Note the order of y and x.) Here, the model formula is again score ~ bty_avg.
  • data_frame_name is the name of the data frame that contains the variables y and x. Here, data_frame_name is evals_ch6.

Second, we take the model stored as score_model_bayes and call the summary function on it to get important information about the model. Recall that when we called summary on our frequentist model, for the parameter estimates we were given the values Estimate, Std. Error, t value, and Pr(>|t|), which is a p-value, and for the model we were provided with the values Residual standard error, Multiple R-squared, Adjusted R-squared, and F-statistic. Here, when we call summary on our Bayesian model, we are provided with a distribution of values for our parameter estimates—the mean, the standard deviation, and the 10th, 50th, and 90th percentile values—and a host of other values defined at the beginning of this chapter.

Finally, we use the function posterior_interval to retrieve our estimates’ credible intervals, typically 90% ones. For our frequentist model, we used the function confint to retrieve our estimates’ confidence intervals. Typically, a frequentist confidence interval is a 95% one, but here we set the level to be 90% so that we can more easily explain the difference between a Bayesian credible interval and a frequentist confidence interval, as the distinction between the two is already subtle.

Indeed, you may have noticed that our Bayesian credible interval for our bty_avg coefficient—(0.04034801, 0.09352647)—is very similar to our frequentist confidence interval for our bty_avg coefficient—(0.03462292, 0.09865116). However, there are important differences between the two.

A 90% credible interval tells us that there is a 90% probability that the true value falls within the interval. By contrast, a frequentist 90% confidence interval tells us that, in repeated sampling, the true value falls within the confidence interval 90% of the time. Using Bayesian credible intervals, we can determine—for any given interval—the probability that the true value falls within that range, which is important because “only Bayesian methods allow us to make inferences about the actual values of a parameter”.

One categorical explanatory variable - Bayesian

Recall lifeExp_model from 5.2:

# Fit regression model

lifeExp_model <- lm(lifeExp ~ continent, data = gapminder2007)

# Get important summary information

summary(lifeExp_model)
## 
## Call:
## lm(formula = lifeExp ~ continent, data = gapminder2007)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -26.9005  -4.0399   0.2565   3.3840  21.6360 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         54.806      1.025  53.446  < 2e-16 ***
## continentAmericas   18.802      1.800  10.448  < 2e-16 ***
## continentAsia       15.922      1.646   9.675  < 2e-16 ***
## continentEurope     22.843      1.695  13.474  < 2e-16 ***
## continentOceania    25.913      5.328   4.863 3.12e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.395 on 137 degrees of freedom
## Multiple R-squared:  0.6355, Adjusted R-squared:  0.6249 
## F-statistic: 59.71 on 4 and 137 DF,  p-value: < 2.2e-16
# Get 90% confidence intervals (90% to compare against later 90% credible intervals)

confint(lifeExp_model, level = 0.90)
##                        5 %     95 %
## (Intercept)       53.10785 56.50423
## continentAmericas 15.82176 21.78240
## continentAsia     13.19699 18.64790
## continentEurope   20.03497 25.65015
## continentOceania  17.08939 34.73753

Translating the above into Bayesian language to make and examine lifeExp_model_bayes, we perform:

# Fit regression model, now setting refresh equal to 0 so as to suppress all of
# the "SAMPLING FOR MODEL 'continuous' NOW (CHAIN #)" outputs

lifeExp_model_bayes <- stan_glm(lifeExp ~ continent, data = gapminder2007, refresh = 0)

# Get important summary information

summary(lifeExp_model_bayes)
## 
## Model Info:
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      lifeExp ~ continent
##  algorithm:    sampling
##  sample:       4000 (posterior sample size)
##  priors:       see help('prior_summary')
##  observations: 142
##  predictors:   5
## 
## Estimates:
##                     mean   sd   10%   50%   90%
## (Intercept)       54.9    1.0 53.6  54.9  56.2 
## continentAmericas 18.6    1.8 16.3  18.7  20.9 
## continentAsia     15.8    1.6 13.7  15.8  17.9 
## continentEurope   22.7    1.7 20.6  22.7  24.8 
## continentOceania  25.1    5.5 17.9  25.1  32.1 
## sigma              7.4    0.5  6.9   7.4   8.1 
## 
## Fit Diagnostics:
##            mean   sd   10%   50%   90%
## mean_PPD 67.0    0.9 65.9  67.0  68.1 
## 
## The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
## 
## MCMC diagnostics
##                   mcse Rhat n_eff
## (Intercept)       0.0  1.0  2671 
## continentAmericas 0.0  1.0  2939 
## continentAsia     0.0  1.0  3101 
## continentEurope   0.0  1.0  3163 
## continentOceania  0.1  1.0  4453 
## sigma             0.0  1.0  4376 
## mean_PPD          0.0  1.0  4156 
## log-posterior     0.0  1.0  1990 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
# Get credible interval

posterior_interval(lifeExp_model_bayes)
##                          5%       95%
## (Intercept)       53.251716 56.602464
## continentAmericas 15.634195 21.620791
## continentAsia     13.066702 18.450238
## continentEurope   19.956782 25.464973
## continentOceania  16.045921 33.832984
## sigma              6.730626  8.259066

What are we doing here?

Once again, we first “fit” the Bayesian regression model to the data using the stan_glm function and store it as lifeExp_model_bayes. stan_glm makes a “Bayesian generalized linear model” and is used as follows: stan_glm(y ~ x, data = data_frame_name) where:

  • y is the dependent variable, followed by a tilde (~). Here, y is set to lifeExp.
  • x is the independent variable. Here, x is set to continent.
  • the combination of y ~ x is the model formula. (Note the order of y and x.) Here, the model formula is again lifeExp ~ continent.
  • data_frame_name is the name of the data frame that contains the variables y and x. Here, data_frame_name is gapminder2007

Second, we take the model stored as lifeExp_model_bayes and call the summary function on it to get important information about the model. Recall that when we called summary on our frequentist model, for the parameter estimates we were provided with the values Estimate, Std. Error, t value, and Pr(>|t|), which is a p-value, and for the model we were given the values Residual standard error, Multiple R-squared, Adjusted R-squared, and F-statistic. Here, when we call summary on our Bayesian model, we are provided with a distribution of values for our parameter estimates—the mean, the standard deviation, and the 10th, 50th, and 90th percentile values—and a host of other values defined at the beginning of this chapter.

Finally, we use the function posterior_interval to retrieve our estimates’ credible intervals, typically 90% ones. For our frequentist model, we used the function confint to retrieve our estimates’ confidence intervals. Typically, a frequentist confidence interval is a 95% one, but here we set the level to be 90% so that we can more easily explain the difference between a Bayesian credible interval and a frequentist confidence interval, as the distinction between the two is already subtle.

Indeed, you may have noticed that our Bayesian credible intervals are very similar to our frequentist confidence intervals. For example, for continentAmericas, they are (15.623322, 21.667609) and (15.82176, 21.78240), respectively. However, there are important differences between credible and confidence intervals.

As explained before, a 90% credible interval tells us that there is a 90% probability that the true value falls within the interval. By contrast, a frequentist 90% confidence interval tells us that, in repeated sampling, the true value falls within the confidence interval 90% of the time. Using Bayesian credible intervals, we can determine—for any given interval—the probability that the true value falls within that range, which is important because “only Bayesian methods allow us to make inferences about the actual values of a parameter”.

Modifying Bayesian regressions

We can modify our Bayesian regressions by altering chains—in number, iterations, or warmup values—and by altering priors—making them adjusted as opposed to unadjusted or informative as opposed to uninformative. Within the stan_glm function, we can alter chains using the chains, iters, and warmup arguments, and we can alter priors using the prior argument.

In the case of error messages—which are common enough to warrant the inclusion of these debugging tips—below are a couple of simple methods to troubleshoot and solve the problems.

  • If the error message suggests “divergent transitions after warmup”, the model is using steps in the estimator that are too big. To solve this problem, adjust the step-size by increasing adapt_delta from its default value of 0.95 like so: stan_glm(y ~ x, sata = data_frame_name, control = list(adapt_delta = 0.99). Increasing adapt_delta decreases the step-size.
  • If the error message suggests a chain has “reached the maximum tree depth”, the sample prematurely ended the iteration and is not working efficiently. To solve this problem, increase max_treedepth from its default value of 10 like so: stan_glm(y ~ x, sata = data_frame_name, control = list(max_treedepth = 15).

These errors can pose threats to the validity of our models, so it is important to know how to remedy them if they crop up.

Assessing model fit

When examining a frequentist regression, one way of assessing model fit is to look at its R-squared value, which can be found in the summary of the model. When examining a Bayesian regression, however, while the R-squared value is still one that can be used to assess model fit, it is not shown in the summary output of a stan_glm model. If you want to know the R-squared value of a Bayesian regression, it can be manually calculated using the following formula:

\[R^2 = 1 - \frac{\sum_i (y_i - \widehat{y}_i)^2}{\sum_i (y_i - \bar{y}_i)^2}\]

For score_model_bayes, the calculation would be:

1 - var(residuals(score_model_bayes)) / (var(fitted(score_model_bayes)) + var(residuals(score_model_bayes)))
## [1] 0.03573738

For lifeExp_model_bayes, the calculation would be:

1 - var(residuals(lifeExp_model_bayes)) / (var(fitted(lifeExp_model_bayes)) + var(residuals(lifeExp_model_bayes)))
## [1] 0.6308769

There are, however, other ways to assess a model’s fit—ones specific to Bayesian regressions. Among these, the most important fall under the class of assessments called Posterior Predictive Model Checks.

Take score_model_bayes: using the posterior_linpred function, we are able to get predicted scores for each observation using the parameter values of our iterations. We can then compare the distribution of predicted scores against the distribution of observed scores to determine whether or not the model fits the data well.

# Get predicted scores

predictions <- posterior_linpred(score_model_bayes)

# Get predicted scores for first iteration

iteration1 <- predictions[1,]

# Get predicted scores for second iteration

iteration2 <- predictions[2,]

# Summarize to compare predicted scores and observed scores

summary(iteration1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.936   4.061   4.159   4.166   4.257   4.481
summary(iteration2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.987   4.080   4.153   4.158   4.225   4.391
summary(evals_ch6$score)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.300   3.800   4.300   4.175   4.600   5.000

We can also use the function pp_check to perform a global and visual check our model’s distribution of predicted scores. pp_check takes the name of our Bayesian model as its first argument and takes the “Posterior Predictive Checking” plot function (plotfun)—or the type of plot we want to create—as its second argument. As you will see below, “Posterior Predictive Checking” plots can take the form of density overlays, histograms, and scatterplots.

In the density overlay below, again using score_model_bayes as our model, the light blue lines illustrate the distribution of predicted scores, and the dark blue line illustrates the distribution of observed scores.

# Density overlay

pp_check(score_model_bayes, plotfun = "dens_overlay")