Compute standard deviations of predictions of linear and polynomial regression models
15 views (last 30 days)
Show older comments
I'm using the fit and fitlm functions to fit various linear and polynomial regression models, and then using predict and predint to compute predictions of the response variable with lower/upper confidence intervals as in the example below. However, I also want to calculate standard deviations, y_sigma, of the predictions. Is there an easy way to do that?
% Some data
X = [239.38 254.46 266.06 269.20 277.59]';
Y = [194.72 201.03 206.94 209.58 212.32]';
% Make predictions at
x = linspace(230, 290, 13)';
Linear regression
% Fit LR model
model = fitlm(X, Y);
% Make prediction at new points
[y_mean, y_int] = predict(model, x, 'Alpha', 0.1);
Fit polynomial (e.g. cubic)
% Fit polynomial model
fit_type = "poly3";
[model, gof, output] = fit(X, Y, fit_type);
% Make prediction at new points
[y_int, y_mean] = predint(model, x, 0.9, 'Observation', 'off');
For comparison, a Gaussian process model can produce std. deviations y_sigma as follows:
% Fit GPR model
model = fitrgp(X, Y)
% Make prediction at new points
[y_mean, y_sigma, y_int] = predict(model, x, 'Alpha', 0.1);
I want to make sure that y_mean, y_sigma, y_int are comparable in all three cases above (i.e. compare "apples with apples").
2 Comments
the cyclist
on 12 Apr 2023
This is misguided. y_sigma is not a measure of uncertainty. See my comment on your answer.
In an OLS model the RMSE is a measure of average uncertainty in model prediction. (I guess you could multiply that by some arbitrary number of points to call it a "total" uncertainty, but that seems silly.)
I'd need to dig a little to see if there is an equivalent average uncertainty measure for the gaussian process model. But, I'm not sure it's ever truly an "apples to apples" comparison between a two-parameter model and a one-parameter model.
Answers (2)
the cyclist
on 11 Apr 2023
I am by no means an expert on gaussian process models, but I don't think that an ordinary least squares regression (fitlm) has the equivalent parameter to y_sigma here. I don't mean that it isn't reported -- I mean it does not exist. If one thinks about the underlying assumptions of the generative process of the data, there is no y_sigma equivalent.
I won't be surprised if someone here answers by calculating the standard deviation of something coming out of the fitlm. But I'd be wary that it is the statistical equivalent of what you want.
2 Comments
the cyclist
on 12 Apr 2023
In OLS the parameter has error, and that error has a gaussian distribution.
That is different from saying you have two parameters (mean and standard deviation of a gaussian) to fit.
Gaussians abound in parameterized statistics.
Bill Tubbs
on 12 Apr 2023
Edited: Bill Tubbs
on 12 Apr 2023
4 Comments
Jeff Miller
on 13 Apr 2023
- "I thought the lower and upper confidence bounds produced during the fitting of the linear model (y_int above) reflected the uncertainty of the model predictions at the new points (x)." The bounds that you get from predict are bounds on where the curve might be--that is, bounds on what the true mean Y might be at each X. ((These are the red dashed lines in one of the plots you posted.) If you want bounds on the actual observed individual values of Y (i.e., bounds on observed Y data points), then you have to call predict with the optional parameter pair 'prediction','observation'. The bounds will be wider in the latter case, because there is more uncertainty about the location of the observations than about the location of their mean. I'm not entirely sure which kind of uncertainty you want to measure.
- "can I simply calculate the std. deviations of the prediction errors as follows?" No. The confidence intervals are computed from t distributions rather than normals (e.g., CIs for observations). You could use the same approach using the t distribution rather than the normal (model has the required dferror).
- Whichever type of prediction error you settle on for the linear and polynomial regression models, perhaps you can get estimate the comparable type of prediction error for the gaussian process model using a bootstrapping approach. (Disclaimer: I know nothing about these models.) The basic idea would be to repeatedly (a) take a bootstrap sample of your data, (b) fit the gpm to that sample, ( c) compute a predicted Y' from that gpm (whatever that means in terms of a gpm). Across iterations of this process, you can certainly look at the bootstrap standard error of those Y' values. I have no idea whether that would provide an "apples vs apples" comparison with either of the linear model variability measures, but maybe @the cyclist can shed some light on that. In any case, from his earlier replies I gather that y_sigma does not index prediction uncertainty in either of the senses that apply with the linear/poly models.
the cyclist
on 13 Apr 2023
@Bill Tubbs, your understanding of the uncertainty coming out of fitlm is largely correct. y_int is indeed a measure of uncertainty in the prediction.
y_sigma, coming out of the fitrgp prediction, is not a measure of uncertainty. It is a fitted parameter.
Note that fitrgp also outputs a value for y_int, which is "prediction intervals of the response variable".
Here is my take, for "apples to apples". If your two models are
% LR model
model = fitlm(X, Y);
[y_mean, y_int] = predict(model, x, 'Alpha', 0.1)
% GPR model
model = fitrgp(X, Y);
[y_mean, y_sigma, y_int] = predict(model, x, 'Alpha', 0.1)
then
- In both models, y_mean is the predicted response.
- In both models, y_int is a measure of the uncertainty in that prediction.
- In the gaussian process model, y_sigma is the predicted standard deviation of the response (which is NOT an uncertainty measure). There is NO equivalent of y_sigma in the linear model.
See Also
Categories
Find more on Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!