Nonlinear Regression
What is Nonlinear Regression?
The purpose of regression models is to describe a response variable as a function of independent variables. Multiple linear regression models describe the response as a linear combination of coefficients and functions of independent variables. Nonlinearities can be modeled using nonlinear functions of independent variables. However, the coefficients always enter the model in a linear fashion.
Nonlinear regression models are more mechanistic models of nonlinear relationships between the response and independent variables. The parameters can enter the model as exponential, trigonometric, power, or any other nonlinear function. The unknown parameters in the model are estimated by minimizing a statistical criterion such as the negative log likelihood or the sum of squared deviations between observed and predicted values.
In the case of pharmacokinetic (PK) studies, the response data usually represent some measured drug concentrations, and independent variables are often dose and time. The nonlinear function often used for such data is an exponential function since many drugs once distributed in a patient are eliminated in an exponential fashion. One PK parameter to estimate in this case is the rate at which the drug is eliminated from the body given the concentrationtime data.
For instance, consider drug plasma concentration data from a single individual after an intravenous bolus dose measured at different time points with some errors. Assume the measured drug concentration follows a monoexponential decline: $${C}_{t}={C}_{0}{e}^{{k}_{e}t}+a\epsilon $$.
This model describes the time course of drug concentration in the body (C_{t}), as a function of the drug concentration after an intravenous bolus dose at t = 0 (C_{0}), time (t), and elimination rate parameter (k_{e}). ε is the meanzero and unitvariance variable, that is, $$\epsilon \sim N(0,1)$$ representing the measurement error and a is the error model parameter (here, standard deviation).
More generically, you can write the model as
$${y}_{i}=f(t;p)+g({\epsilon}_{i})\text{\hspace{0.17em}}$$
where y_{i} is the ith response (such as drug concentration), f is a function of time t and model parameters p (such as k_{e}), and an error model $$g({\epsilon}_{i})$$.
Fitting Options in SimBiology
This table summarizes nonlinear regression options available in SimBiology^{®}.
Fitting Option  Example 
Individualspecific parameter estimation (Unpooled fitting) Fit each individual separately, resulting in one set of parameter estimates for each individual. 

Category or groupspecific parameter estimation Fit each category or group separately, resulting in one set of parameter estimates for each category. 

Populationwide parameter estimation (Pooled fitting) Fit all of the data pooled together, resulting in just one set of parameter estimates. 

In addition, SimBiology supports four kinds of error models for measured or observed responses, namely, constant (default), proportional, combined, and exponential. For details, see Error Models. Depending on the optimization method, you can specify an error model for each response or all responses. For details, see Supported Methods for Parameter Estimation in SimBiology.
Parameter Transformations
SimBiology supports three parameter transformations. These parameter transformations can be useful to improve fitting convergence or to enforce parameter bounds.
The general model explained previously is $${y}_{i}=f(t;p)+g({\epsilon}_{i})\text{\hspace{0.17em}}$$, where p is model parameters that you can transform. Consider the following two equations.
$$\beta =T\left(p\right)$$
$$p={T}^{1}\left(\beta \right)$$
Here, $$\beta $$ represents transformed model parameters,
p represents untransformed model
parameters, T
is the transformation, and
T^{1}
is the
inverse transformation.
SimBiology performs parameter estimation using the transformed parameters
β, which means that the transformed model $$F\left(t;\beta \right)=f\left(t;{T}^{1}\left(\beta \right)\right)$$ is used, where F
is the model function using
the transformed parameters. Equivalently, the model function can be rewritten as $$f\left(t;p\right)=F\left(t;T\left(p\right)\right)$$.
In other words, the SimBiology optimizer uses the transformed values during maximum likelihood estimation but the reported fit result is reverted back to the model space (untransformed values). For example, if you are estimating a clearance parameter Cl that is logtransformed, you have Cl_{β} = log(Cl), where Cl_{β} is what the optimizer uses and Cl is what the model sees.
Specifying parameter transformations imposes implicit bounds on the untransformed
parameter values. The log
transformation keeps the parameter
value to be always positive, and the logit
and
probit
transformations keep the parameter value to between
0
and 1
. Alternatively, you can specify
further constraints on the parameter values by providing explicit bounds for
untransformed parameters p or for the transformed parameters
β.
Transformation  Transformed Parameter and Range†  Untransformed Parameter and Range‡  Description 

 $$\beta =\mathrm{log}\left(p\right)\in [Inf,Inf]$$  $$p=\mathrm{exp}\left(\beta \right)\in [0,Inf]$$  In many cases, the Applying the 
logit  $$\beta =\text{logit(p)=log}\left(\frac{p}{1p}\right)\in [Inf,Inf]$$  $$p=1/\left(1+\mathrm{exp}\left(\beta \right)\right)\in [0,1]$$  The 
probit  $$\beta =\text{norminv}\left(p\right)\in [Inf,Inf]$$  $$p=\text{normcdf}\left(\beta \right)\in [0,1]$$  Similar to The 
† Use the InitialTransformedValue
and
TransformedBounds
properties of an EstimatedInfo object
to set the
initial transformed value and transformed bounds to the desired subset of the
range.
‡ Use the InitialValue
and Bounds
properties of an EstimatedInfo object
to set the
initial untransformed value and untransformed bounds to the desired subset of the
range.
Maximum Likelihood Estimation
SimBiology estimates parameters by the method of maximum likelihood. Rather than directly
maximizing the likelihood function, SimBiology constructs an equivalent minimization problem. Whenever possible, the estimation
is formulated as a weighted least squares (WLS) optimization
that minimizes the sum of the squares of weighted residuals. Otherwise, the estimation is
formulated as the minimization of the negative of the logarithm of the likelihood (NLL). The WLS formulation often
converges better than the NLL formulation, and SimBiology can
take advantage of specialized WLS algorithms, such as the
LevenbergMarquardt algorithm implemented in lsqnonlin
and
lsqcurvefit
. SimBiology uses WLS when
there is a single error model that is constant, proportional, or exponential. SimBiology uses
NLL if you have a combined error model or a multipleerror
model, that is, a model having an error model for each response.
sbiofit
supports different
optimization methods, and passes in the formulated WLS or
NLL expression to the optimization method that minimizes
it. For simplicity, each expression shown below assumes only one error model and one response.
If there are multiple responses, SimBiology takes the sum of the expressions that correspond to
error models of given responses.
Expression that is being minimized  

Weighted Least Squares (WLS)  For the constant error model, $$\sum _{i}^{N}{\left({y}_{i}{f}_{i}\right)}^{2}$$ 
For the proportional error model, $$\sum _{i}^{N}\frac{{\left({y}_{i}{f}_{i}\right)}^{2}}{{f}_{i}^{2}/{f}_{gm}^{2}}$$  
For the exponential error model, $$\sum _{i}^{N}{\left(\mathrm{ln}{y}_{i}\mathrm{ln}{f}_{i}\right)}^{2}$$  
For numeric weights, $$\sum _{i}^{N}\frac{{\left({y}_{i}{f}_{i}\right)}^{2}}{{w}_{gm}/{w}_{i}}$$  
Negative Loglikelihood (NLL)  For the combined error model and multipleerror model, $$\sum}_{i}^{N}\frac{{\left({y}_{i}{f}_{i}\right)}^{2}}{2{\sigma}_{i}^{2}}+{\displaystyle \sum}_{i}^{N}\mathrm{ln}\sqrt{2\pi {\sigma}_{i}^{2}$$ 
The variables are defined as follows.
N  Number of experimental observations 
y_{i}  The ith experimental observation 
${f}_{i}$  Predicted value of the ith observation 
${\sigma}_{i}$  Standard deviation of the ith observation.

$${f}_{gm}$$  $${f}_{gm}={\left({\displaystyle \prod _{i}^{N}\left{f}_{i}\right}\right)}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{1ex}{$N$}\right.}$$ 
$${w}_{i}$$  The weight of the ith predicted value 
$${w}_{gm}$$  $${w}_{gm}={\left({\displaystyle \prod _{i}^{N}{w}_{i}}\right)}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{1ex}{$N$}\right.}$$ 
When you use numeric weights or the weight function, the weights are assumed to be inversely proportional to the variance of the error, that is, $${\sigma}_{i}^{2}=\frac{{a}^{2}}{{w}_{i}}$$ where a is the constant error parameter. If you use weights, you cannot specify an error model except the constant error model.
Various optimization methods have different requirements on the function that is being minimized. For some methods, the estimation of model parameters is performed independently of the estimation of the error model parameters. The following table summarizes the error models and any separate formulas used for the estimation of error model parameters, where a and b are error model parameters and e is the standard meanzero and unitvariance (Gaussian) variable.
Error Model  Error Parameter Estimation Function 

'constant' : $${y}_{i}={f}_{i}+ae$$  $${a}^{2}=\frac{1}{N}{\displaystyle \sum _{i}^{N}{\left({y}_{i}{f}_{i}\right)}^{2}}$$ 
'exponential' : $${y}_{i}={f}_{i}\mathrm{exp}(ae)$$  $${a}^{2}=\frac{1}{N}{\displaystyle \sum _{i}^{N}{\left(\mathrm{ln}{y}_{i}\mathrm{ln}{f}_{i}\right)}^{2}}$$ 
'proportional' : $${y}_{i}={f}_{i}+b\left{f}_{i}\righte$$  ${b}^{2}=\frac{1}{N}{\displaystyle \sum _{i}^{N}{\left(\frac{{y}_{i}{f}_{i}}{{f}_{i}}\right)}^{2}}$ 
'combined' : $${y}_{i}={f}_{i}+\left(a+b\left{f}_{i}\right\right)e$$  Error parameters are included in the minimization. 
Weights  $${a}^{2}=\frac{1}{N}{\displaystyle \sum _{i}^{N}{\left({y}_{i}{f}_{i}\right)}^{2}}{w}_{i}$$ 
Note
nlinfit
only support single error models, not multipleerror models, that is, responsespecific error models. For a combined error model, it uses an iterative WLS algorithm. For other error models, it uses the WLS algorithm as described previously. For details, see nlinfit
(Statistics and Machine Learning Toolbox).
Fitting Workflow
The following steps show one of the workflows you can use at the command line to fit a PK model.
Convert the data to the
groupedData
format.Define dosing data. For details, see Doses in SimBiology Models.
Create a structural model (one, two, or a multicompartment model). For details, see Create Pharmacokinetic Models.
Map the response variable from data to the model component. For example, if you have the measured drug concentration data for the central compartment, then map it to the drug species in the central compartment (typically the
Drug_Central
species).Specify parameters to estimate using an
EstimatedInfo object
. Optionally, you can specify parameter transformations, initial values, and parameter bounds.Perform parameter estimation using
sbiofit
orfitproblem
.
For illustrated examples, see the following.
See Also
sbiofit
 groupedData
 estimatedInfo
 sbiofitmixed