How to define a custom equation in fitlm function for linear regression?

44 views (last 30 days)
I'd like to define a custom equation for linear regression. For example y = a*log(x1) + b*x2^2 + c*x3 + k. This is a linear regression problem - but how to do this within FitLm function?
Thanks, Shriram

Answers (3)

the cyclist
the cyclist on 22 Nov 2017
Edited: the cyclist on 22 Nov 2017
% Set the random number seed for reproducibility
rng default
% Make up some pretend data
N = 100;
x1 = rand(N,1);
x2 = rand(N,1);
x3 = rand(N,1);
a = 2;
b = 3;
c = 5;
k = 7;
noise = 0.2*randn(N,1);
y = a*log(x1) + b*x2.^2 + c*x3 + k + noise;
% Put the variables into a table, naming them appropriately
tbl = table(log(x1),x2.^2,x3,y,'VariableNames',{'log_x1','x2_sqr','x3','y'});
% Specify and carry out the fit
mdl = fitlm(tbl,'y ~ 1 + log_x1 + x2_sqr + x3')
  5 Comments
Walter Roberson
Walter Roberson on 8 Jun 2023
If you have a discontinuity in the first or second derivatie of the model, then you surely do not have a linear regression situation.
You probably need to use ga(). Not fmincon() or similar -- those optimizers cannot handle discontinuities in derivatives either.
the cyclist
the cyclist on 8 Jun 2023
I suggest you search the keywords segmented regression matlab and/or piecewise regression matlab. Although I don't believe there are any built-in functions for this, you should find a few different threads that you might find useful. I also think you might want to start a brand-new question for this, after you have done that search. In that question, I would suggest posting your data, which makes it easier for people to try out code suggestions.

Sign in to comment.


laurent jalabert
laurent jalabert on 19 Dec 2021
Edited: laurent jalabert on 19 Dec 2021
To proceed with a custom function it is possible to use the non linear regression model
The example below is intended to fit a basic Resistance versus Temperature at the second order such as R=R0*(1+alpha*(T-T0)+beta*(T-T0)^2), and the fit coefficient will be b(1)=R0, b(2) = alpha, and b(3)=beta.
The advantage here, is that the SE will be computed directly for R0, alpha and beta.
beta0 is an initial range of [R0 alpha beta]
b(n) is retrieved using mdl.Coefficients.Estimate(n), for n=1,2,3
standard deviation on the coefficients are retrieved by mdl.Coefficients.SE(n)
(Curve fitting toolbox and Statistical/Machine Learning toolbox are both requiered)
clear tbl mdl
% your vector data T_T0 and R of same dimension
tbl = table(T_T0,R);
modelfun = @(b,x)b(1).*(1+b(2).*x(:,1)+b(3).*x(:,1).^2);
beta0 = [100 1e-3 1e-6];
mdl = fitnlm(tbl,modelfun,beta0,'CoefficientNames',{'R0';'alpha';'beta'})
  1 Comment
Erin Evans
Erin Evans on 7 Jun 2023
Would you be able to help me write this such that there is a conditional statement in it? I essentially need two connected trendlines such that the statistics are for both sections together.
my equation is:
log10(Qs) = log10(a) + b*log(Qr) + c*log10(Max(1, Qr / Qc)) + d*log10(Qr(i) / Qr(i - 1))
the code I have so far is:
AgaDisc = readtable("File Path");
%% Bring in data columns
AgaDischargeArray = table2array(AgaDisc(:,"Discharge"));
AgaLoadArray = table2array(AgaDisc(:,"SedimentLoad"));
%% Normalize the discharge and sediment data
meanDisc = mean(AgaDischargeArray);
medianLoad = median(AgaLoadArray);
AgaDischargeArray = AgaDischargeArray/meanDisc;
AgaLoadArray = AgaLoadArray/medianLoad;
%% log10 sediment
logQs = log10(AgaDischargeArray);
%% log10 discharge
logQt = log10(AgaLoadArray);
%% Create new table of log-normalized data
tbl = table(logQs, logQt);
%% Add weighting factor
weights = zeros(length(logQt), 1);
weightFactor = [0.5, 1, 5, 10];
Q = quantile(logQt, 3);
for i = 1:length(logQt)
if logQt(i) > Q(3)
weights(i) = weightFactor(4);
elseif logQt(i) > Q(2)
weights(i) = weightFactor(3);
elseif logQt(i) > Q(1)
weights(i) = weightFactor(2);
else
weights(i) = weightFactor(1);
end
end
m = fitlm(tbl, 'logQs ~ logQt', 'RobustOpts', 'on', 'weight', weights);

Sign in to comment.


laurent jalabert
laurent jalabert on 8 Jun 2023
please check carefully your expression, cause you use log10 and log (I guess neperian log here)
log10(Qs) in equation and logQs = log10(AgaDischargeArray) in your program; is it same ?
d*log(Qr(i)/Qr(i-1) might be similar to d* diff(log(Qr))
log10(Qs) = log10(a) + b*log(Qr) + c*log10(Max(1, Qr / Qc)) + d*log10(Qr(i) / Qr(i - 1))
log10Qs is x(:,1) as first column of tbl.
logQt is x(:,2) as the second column of tbl.
a,b,c,d are unknown
Qc is not defined
d* diff(log(Qr)) will lead to problem because its length is length(Qr) -1

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!