diffuseblm

Bayesian linear regression model with diffuse conjugate prior for data likelihood

Description

The Bayesian linear regression model object diffuseblm specifies that the joint prior distribution of (β,σ²) is proportional to 1/σ² (the diffuse prior model).

The data likelihood is $\prod_{t = 1}^{T} ϕ (y_{t}; x_{t} β, σ^{2}),$ where ϕ(y_t;x_tβ,σ²) is the Gaussian probability density evaluated at y_t with mean x_tβ and variance σ². The resulting marginal and conditional posterior distributions are analytically tractable. For details on the posterior distribution, see Analytically Tractable Posteriors.

In general, when you create a Bayesian linear regression model object, it specifies the joint prior distribution and characteristics of the linear regression model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis, pass the model object and data to the appropriate object function.

Creation

Syntax

PriorMdl = diffuseblm(NumPredictors)

PriorMdl = diffuseblm(NumPredictors,Name,Value)

Description

example

PriorMdl = diffuseblm(NumPredictors) creates a Bayesian linear regression model object (PriorMdl) composed of NumPredictors predictors and an intercept, and sets the NumPredictors property. The joint prior distribution of (β, σ²) is the diffuse model. PriorMdl is a template that defines the prior distributions and the dimensionality of β.

example

PriorMdl = diffuseblm(NumPredictors,Name,Value) sets properties (except NumPredictors) using name-value pair arguments. Enclose each property name in quotes. For example, diffuseblm(2,'VarNames',["UnemploymentRate"; "CPI"]) specifies the names of the two predictor variables in the model.

Properties

expand all

You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, to exclude an intercept from the model, enter

PriorMdl.Intercept = false;

`NumPredictors` — Number of predictor variables
nonnegative integer

Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer.

NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation.

When specifying NumPredictors, exclude any intercept term from the value.

After creating a model, if you change the value of NumPredictors using dot notation, then VarNames reverts to its default value.

Data Types: double

`Intercept` — Flag for including regression model intercept
`true` (default) | `false`

Flag for including a regression model intercept, specified as a value in this table.

Value	Description
`false`	Exclude an intercept from the regression model. Therefore, β is a `p`-dimensional vector, where `p` is the value of `NumPredictors`.
`true`	Include an intercept in the regression model. Therefore, β is a (`p` + 1)-dimensional vector. This specification causes a T-by-1 vector of ones to be prepended to the predictor data during estimation and simulation.

If you include a column of ones in the predictor data for an intercept term, then set Intercept to false.

Example: 'Intercept',false

Data Types: logical

`VarNames` — Predictor variable names
string vector | cell vector of character vectors

Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable in column j of the predictor data set, which you specify during estimation, simulation, or forecasting.

The default is {'Beta(1)','Beta(2),...,Beta(p)}, where p is the value of NumPredictors.

Example: 'VarNames',["UnemploymentRate"; "CPI"]

Data Types: string | cell | char

Object Functions

`estimate`	Estimate posterior distribution of Bayesian linear regression model parameters
`simulate`	Simulate regression coefficients and disturbance variance of Bayesian linear regression model
`forecast`	Forecast responses of Bayesian linear regression model
`plot`	Visualize prior and posterior densities of Bayesian linear regression model parameters
`summarize`	Distribution summary statistics of standard Bayesian linear regression model

Examples

collapse all

Create Diffuse Prior Model

Open Live Script

Consider the multiple linear regression model that predicts U.S. real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR).

${GNPR}_{t} = β_{0} + β_{1} {IPI}_{t} + β_{2} E_{t} + β_{3} {WR}_{t} + ε_{t} .$

For all $t$ time points, $ε_{t}$ is a series of independent Gaussian disturbances with a mean of 0 and variance $σ^{2}$ .

Suppose that the regression coefficients $β = [β_{0}, . . ., β_{3}]^{'}$ and the disturbance variance $σ^{2}$ are random variables, and you have no prior knowledge of their values or distribution. That is, you want to use the noninformative Jeffreys prior: the joint prior distribution is proportional to $1 / σ^{2}$ .

These assumptions and the data likelihood imply an analytically tractable posterior distribution.

Create a diffuse prior model for the linear regression parameters, which is the default prior model type. Specify the number of predictors p.

p = 3;
Mdl = bayeslm(p)

Mdl = 
  diffuseblm with properties:

    NumPredictors: 3
        Intercept: 1
         VarNames: {4x1 cell}

 
           | Mean  Std        CI95        Positive        Distribution       
-----------------------------------------------------------------------------
 Intercept |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 Beta(1)   |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 Beta(2)   |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 Beta(3)   |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 Sigma2    |  Inf  Inf  [   NaN,    NaN]    1.000   Proportional to 1/Sigma2

Mdl is a diffuseblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. At the command window, bayeslm displays a summary of the prior distributions. Because the prior is noninformative and the data have not been incorporated yet, the summary is trivial.

You can set writable property values of created models using dot notation. Set the regression coefficient names to the corresponding variable names.

Mdl.VarNames = ["IPI" "E" "WR"]

Mdl = 
  diffuseblm with properties:

    NumPredictors: 3
        Intercept: 1
         VarNames: {4x1 cell}

 
           | Mean  Std        CI95        Positive        Distribution       
-----------------------------------------------------------------------------
 Intercept |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 IPI       |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 E         |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 WR        |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 Sigma2    |  Inf  Inf  [   NaN,    NaN]    1.000   Proportional to 1/Sigma2

Estimate Marginal Posterior Distributions

Open Live Script

Consider the linear regression model in Create Diffuse Prior Model.

Create a diffuse prior model for the linear regression parameters. Specify the number of predictors, p, and the names of the regression coefficients.

p = 3;
PriorMdl = bayeslm(p,'ModelType','diffuse','VarNames',["IPI" "E" "WR"]);

Load the Nelson-Plosser data set. Create variables for the response and predictor series.

load Data_NelsonPlosser
X = DataTable{:,PriorMdl.VarNames(2:end)};
y = DataTable{:,'GNPR'};

Estimate the marginal posterior distributions of $β$ and $σ^{2}$ .

PosteriorMdl = estimate(PriorMdl,X,y);

Method: Analytic posterior distributions
Number of observations: 62
Number of predictors:   4
 
           |   Mean      Std           CI95        Positive       Distribution      
------------------------------------------------------------------------------------
 Intercept | -24.2536   9.5314  [-43.001, -5.506]    0.006   t (-24.25, 9.37^2, 58) 
 IPI       |   4.3913   0.1535   [ 4.089,  4.693]    1.000   t (4.39, 0.15^2, 58)   
 E         |   0.0011   0.0004   [ 0.000,  0.002]    0.999   t (0.00, 0.00^2, 58)   
 WR        |   2.4682   0.3787   [ 1.723,  3.213]    1.000   t (2.47, 0.37^2, 58)   
 Sigma2    |  51.9790  10.0034   [35.965, 74.937]    1.000   IG(29.00, 0.00069)

PosteriorMdl is a conjugateblm model object storing the joint marginal posterior distribution of $β$ and $σ^{2}$ given the data. estimate displays a summary of the marginal posterior distributions to the command window. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns to characteristics of the posterior distribution. The characteristics include:

CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of WR is in [1.723, 3.213] is 0.95.
Positive, which contains the posterior probability that the parameter is greater than 0. For example, the probability that the intercept is greater than 0 is 0.006.
Distribution, which contains descriptions of the posterior distributions of the parameters. For example, the marginal posterior distribution of IPI is t with a mean of 4.39, a standard deviation of 0.15, and 58 degrees of freedom.

Access properties of the posterior distribution using dot notation. For example, display the marginal posterior means by accessing the Mu property.

PosteriorMdl.Mu

Estimate Conditional Posterior Distribution

Open Live Script

Consider the linear regression model in Create Diffuse Prior Model.

Create a diffuse prior model for the linear regression parameters. Specify the number of predictors p, and the names of the regression coefficients.

p = 3;
PriorMdl = bayeslm(p,'ModelType','diffuse','VarNames',["IPI" "E" "WR"])

PriorMdl = 
  diffuseblm with properties:

    NumPredictors: 3
        Intercept: 1
         VarNames: {4x1 cell}

 
           | Mean  Std        CI95        Positive        Distribution       
-----------------------------------------------------------------------------
 Intercept |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 IPI       |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 E         |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 WR        |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 Sigma2    |  Inf  Inf  [   NaN,    NaN]    1.000   Proportional to 1/Sigma2

Load the Nelson-Plosser data set. Create variables for the response and predictor series.

load Data_NelsonPlosser
X = DataTable{:,PriorMdl.VarNames(2:end)};
y = DataTable{:,'GNPR'};

Estimate the conditional posterior distribution of $β$ given the data and $σ^{2} = 2$ , and return the estimation summary table to access the estimates.

[Mdl,Summary] = estimate(PriorMdl,X,y,'Sigma2',2);

Method: Analytic posterior distributions
Conditional variable: Sigma2 fixed at   2
Number of observations: 62
Number of predictors:   4
 
           |   Mean      Std          CI95         Positive     Distribution    
--------------------------------------------------------------------------------
 Intercept | -24.2536  1.8696  [-27.918, -20.589]    0.000   N (-24.25, 1.87^2) 
 IPI       |   4.3913  0.0301   [ 4.332,  4.450]     1.000   N (4.39, 0.03^2)   
 E         |   0.0011  0.0001   [ 0.001,  0.001]     1.000   N (0.00, 0.00^2)   
 WR        |   2.4682  0.0743   [ 2.323,  2.614]     1.000   N (2.47, 0.07^2)   
 Sigma2    |    2       0       [ 2.000,  2.000]     1.000   Fixed value

estimate displays a summary of the conditional posterior distribution of $β$ . Because $σ^{2}$ is fixed at 2 during estimation, inferences on it are trivial.

Extract the mean vector and covariance matrix of the conditional posterior of $β$ from the estimation summary table.

condPostMeanBeta = Summary.Mean(1:(end - 1))

condPostMeanBeta = 4×1

  -24.2536
    4.3913
    0.0011
    2.4682

CondPostCovBeta = Summary.Covariances(1:(end - 1),1:(end - 1))

CondPostCovBeta = 4×4

    3.4956    0.0350   -0.0001    0.0241
    0.0350    0.0009   -0.0000   -0.0013
   -0.0001   -0.0000    0.0000   -0.0000
    0.0241   -0.0013   -0.0000    0.0055

Display Mdl.

Mdl

Mdl = 
  diffuseblm with properties:

    NumPredictors: 3
        Intercept: 1
         VarNames: {4x1 cell}

 
           | Mean  Std        CI95        Positive        Distribution       
-----------------------------------------------------------------------------
 Intercept |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 IPI       |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 E         |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 WR        |  0    Inf  [   NaN,    NaN]    0.500   Proportional to one      
 Sigma2    |  Inf  Inf  [   NaN,    NaN]    1.000   Proportional to 1/Sigma2

Because estimate computes the conditional posterior distribution, it returns the original prior model, not the posterior, in the first position of the output argument list.

Estimate Posterior Probability Using Monte Carlo Simulation

Open Live Script

Consider the linear regression model in Estimate Marginal Posterior Distributions.

Create a prior model for the regression coefficients and disturbance variance, then estimate the marginal posterior distributions.

p = 3;
PriorMdl = bayeslm(p,'ModelType','diffuse','VarNames',["IPI" "E" "WR"]);

load Data_NelsonPlosser
X = DataTable{:,PriorMdl.VarNames(2:end)};
y = DataTable{:,'GNPR'};

PosteriorMdl = estimate(PriorMdl,X,y);

Method: Analytic posterior distributions
Number of observations: 62
Number of predictors:   4
 
           |   Mean      Std           CI95        Positive       Distribution      
------------------------------------------------------------------------------------
 Intercept | -24.2536   9.5314  [-43.001, -5.506]    0.006   t (-24.25, 9.37^2, 58) 
 IPI       |   4.3913   0.1535   [ 4.089,  4.693]    1.000   t (4.39, 0.15^2, 58)   
 E         |   0.0011   0.0004   [ 0.000,  0.002]    0.999   t (0.00, 0.00^2, 58)   
 WR        |   2.4682   0.3787   [ 1.723,  3.213]    1.000   t (2.47, 0.37^2, 58)   
 Sigma2    |  51.9790  10.0034   [35.965, 74.937]    1.000   IG(29.00, 0.00069)

Extract the posterior mean of $β$ from the posterior model, and the posterior covariance of $β$ from the estimation summary returned by summarize.

estBeta = PosteriorMdl.Mu;
Summary = summarize(PosteriorMdl);
estBetaCov = Summary.Covariances{1:(end - 1),1:(end - 1)};

Suppose that if the coefficient of real wages is below 2.5, then a policy is enacted. Although the posterior distribution of WR is known, and you can calculate probabilities directly, you can estimate the probability using Monte Carlo simulation instead.

Draw 1e6 samples from the marginal posterior distribution of $β$ .

NumDraws = 1e6;
rng(1);
BetaSim = simulate(PosteriorMdl,'NumDraws',NumDraws);

BetaSim is a 4-by- 1e6 matrix containing the draws. Rows correspond to the regression coefficient and columns to successive draws.

Isolate the draws corresponding to the coefficient of real wages, and then identify which draws are less than 2.5.

isWR = PosteriorMdl.VarNames == "WR";
wrSim = BetaSim(isWR,:);
isWRLT2p5 = wrSim < 2.5;

Find the marginal posterior probability that the regression coefficient of WR is below 2.5 by computing the proportion of draws that are less than 2.5.

probWRLT2p5 = mean(isWRLT2p5)

probWRLT2p5 = 0.5341

The posterior probability that the coefficient of real wages is less than 2.5 is about 0.53.

The marginal posterior distribution of the coefficient of WR is a $t_{58}$ , but centered at 2.47 and scaled by 0.37. Directly compute the posterior probability that the coefficient of WR is less than 2.5.

center = estBeta(isWR);
stdBeta = sqrt(diag(estBetaCov));
scale = stdBeta(isWR);
t = (2.5 - center)/scale;
dof = 68;
directProb = tcdf(t,dof)

directProb = 0.5333

The posterior probabilities are nearly identical.

Forecast Responses Using Posterior Predictive Distribution

Open Live Script

Consider the linear regression model in Estimate Marginal Posterior Distributions.

Create a prior model for the regression coefficients and disturbance variance, then estimate the marginal posterior distributions. Hold out the last 10 periods of data from estimation so you can use them to forecast real GNP. Turn the estimation display off.

p = 3;
PriorMdl = bayeslm(p,'ModelType','diffuse','VarNames',["IPI" "E" "WR"]);

load Data_NelsonPlosser
fhs = 10; % Forecast horizon size
X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)};
y = DataTable{1:(end - fhs),'GNPR'};
XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data
yFT = DataTable{(end - fhs + 1):end,'GNPR'};                  % True future responses

PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);

Forecast responses using the posterior predictive distribution and the future predictor data XF. Plot the true values of the response and the forecasted values.

yF = forecast(PosteriorMdl,XF);

figure;
plot(dates,DataTable.GNPR);
hold on
plot(dates((end - fhs + 1):end),yF)
h = gca;
hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],...
    h.YLim([1,1,2,2]),[0.8 0.8 0.8]);
uistack(hp,'bottom');
legend('Forecast Horizon','True GNPR','Forecasted GNPR','Location','NW')
title('Real Gross National Product');
ylabel('rGNP');
xlabel('Year');
hold off

yF is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data.

Estimate the forecast root mean squared error (RMSE).

frmse = sqrt(mean((yF - yFT).^2))

frmse = 25.5489

The forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best-performing model of the ones being compared.

More About

expand all

Bayesian Linear Regression Model

A Bayesian linear regression model treats the parameters β and σ² in the multiple linear regression (MLR) model y_t = x_tβ + ε_t as random variables.

For times t = 1,...,T:

y_t is the observed response.
x_t is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x_1t = 1 for all t.
β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of x_t.
ε_t is the random disturbance with a mean of zero and Cov(ε) = σ²I_T×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is

$ℓ (β, σ^{2} | y, x) = \prod_{t = 1}^{T} ϕ (y_{t}; x_{t} β, σ^{2}) .$
ϕ(y_t;x_tβ,σ²) is the Gaussian probability density with mean x_tβ and variance σ² evaluated at y_t;.

Before considering the data, you impose a joint prior distribution assumption on (β,σ²). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ²) or the conditional posterior distributions of the parameters.

Alternatives

The bayeslm function can create any supported prior model object for Bayesian linear regression.

Version History

Introduced in R2017a

diffuseblm

Description

Creation

Syntax

Description

Properties

`NumPredictors` — Number of predictor variables
nonnegative integer

`Intercept` — Flag for including regression model intercept
`true` (default) | `false`

`VarNames` — Predictor variable names
string vector | cell vector of character vectors

Object Functions

Examples

Create Diffuse Prior Model

Estimate Marginal Posterior Distributions

Estimate Conditional Posterior Distribution

Estimate Posterior Probability Using Monte Carlo Simulation

Forecast Responses Using Posterior Predictive Distribution

More About

Bayesian Linear Regression Model

Alternatives

Version History

See Also

Objects

Functions

Topics

diffuseblm

Description

Creation

Syntax

Description

Properties

NumPredictors — Number of predictor variables nonnegative integer

Intercept — Flag for including regression model intercept true (default) | false

VarNames — Predictor variable names string vector | cell vector of character vectors

Object Functions

Examples

Create Diffuse Prior Model

Estimate Marginal Posterior Distributions

Estimate Conditional Posterior Distribution

Estimate Posterior Probability Using Monte Carlo Simulation

Forecast Responses Using Posterior Predictive Distribution

More About

Bayesian Linear Regression Model

Alternatives

Version History

See Also

Objects

Functions

Topics

`NumPredictors` — Number of predictor variables
nonnegative integer

`Intercept` — Flag for including regression model intercept
`true` (default) | `false`

`VarNames` — Predictor variable names
string vector | cell vector of character vectors