Equation Solving Algorithms

Equation Solving Definition

Given a set of n nonlinear functions F_i(x), where n is the number of components in the vector x, the goal of equation solving is to find a vector x that makes all F_i(x) = 0.

fsolve attempts to solve a system of equations by minimizing the sum of squares of the components. If the sum of squares is zero, the system of equations is solved. fsolve has three algorithms:

Trust-region
Trust-region-dogleg
Levenberg-Marquardt

All algorithms are large scale; see Large-Scale vs. Medium-Scale Algorithms.

The fzero function solves a single one-dimensional equation.

The mldivide function solves a system of linear equations.

Trust-Region Algorithm

Many of the methods used in Optimization Toolbox™ solvers are based on trust regions, a simple yet powerful concept in optimization.

To understand the trust-region approach to optimization, consider the unconstrained minimization problem, minimize f(x), where the function takes vector arguments and returns scalars. Suppose that the current point is x in n-space and you want to improve by moving to a point with a lower function value. To do so, the algorithm approximates f with a simpler function q, which reasonably reflects the behavior of function f in a neighborhood N around the point x. This neighborhood is the trust region. The solver computes a trial step s by minimizing (or approximately minimizing) over N. The trust-region subproblem is

$\min_{s} {q (s), s \in N} .$

The solver updates the current point to x + s if f(x + s) < f(x); otherwise, the current point remains unchanged and the solver shrinks N (the trust region) and repeats the trial step computation.

The key questions in defining a specific trust-region approach to minimizing f(x) are how to choose and compute the approximation q (defined at the current point x), how to choose and modify the trust region N, and how accurately to solve the trust-region subproblem.

In the standard trust-region method ([48]), the quadratic approximation q is defined by the first two terms of the Taylor approximation to F at x. The neighborhood N is usually spherical or ellipsoidal in shape. Mathematically, the trust-region subproblem is typically stated

\min {\frac{1}{2} s^{T} H s + s^{T} g such that ‖ D s ‖ \leq Δ},

(1)

where g is the gradient of f at the current point x, H is the Hessian matrix (the symmetric matrix of second derivatives), D is a diagonal scaling matrix, Δ is a positive scalar, and ‖ . ‖ is the 2-norm. To solve Equation 1, an algorithm (see [48]) can compute all eigenvalues of H and then apply a Newton process to the secular equation

$\frac{1}{Δ} - \frac{1}{‖ s ‖} = 0.$

Such an algorithm provides an accurate solution to Equation 1. However, this requires time proportional to several factorizations of H. Therefore, trust-region problems require a different approach. Several approximation and heuristic strategies, based on Equation 1, have been proposed in the literature ([42] and [50]). Optimization Toolbox solvers follow an approximation approach that restricts the trust-region subproblem to a two-dimensional subspace S ([39] and [42]). After the solver computes the subspace S, the work to solve Equation 1 is trivial because, in the subspace, the problem is only two-dimensional. The dominant work now shifts to the determination of the subspace.

The solver determines the two-dimensional subspace S with the aid of a preconditioned conjugate gradient method (described in the next section). The solver defines S as the linear space spanned by s₁ and s₂, where s₁ is in the direction of the gradient g, and s₂ is either an approximate Newton direction, that is, a solution to

$H \cdot s_{2} = - g,$

or a direction of negative curvature,

$s_{2}^{T} \cdot H \cdot s_{2} < 0.$

The philosophy behind this choice of S is to force global convergence (via the steepest descent direction or negative curvature direction) and achieve fast local convergence (via the Newton step, when it exists).

The process of unconstrained minimization using the trust-region approach is now easy to specify:

Formulate the two-dimensional trust-region subproblem.
Solve Equation 1 to determine the trial step s.
If f(x + s) < f(x), then x = x + s.
Adjust Δ.

The solver repeats these four steps until convergence, adjusting he trust-region dimension Δ according to standard rules. In particular, the solver decreases the trust-region size if it does not accept the trial step, when f(x + s) ≥ f(x). See [46] and [49] for a discussion of this aspect.

Optimization Toolbox solvers treat important cases of f with specialized functions: nonlinear least-squares, quadratic functions, and linear least-squares. However, the underlying algorithmic ideas are the same as for the general case.

Preconditioned Conjugate Gradient Method

A popular way to solve large, symmetric, positive definite systems of linear equations Hp = –g is the method of Preconditioned Conjugate Gradients (PCG). This iterative approach requires the ability to calculate matrix-vector products of the form H·v where v is an arbitrary vector. The symmetric positive definite matrix M is a preconditioner for H. That is, M = C², where C^–1HC^–1 is a well-conditioned matrix or a matrix with clustered eigenvalues.

In a minimization context, you can assume that the Hessian matrix H is symmetric. However, H is guaranteed to be positive definite only in the neighborhood of a strong minimizer. Algorithm PCG exits when it encounters a direction of negative (or zero) curvature, that is, d^THd ≤ 0. The PCG output direction p is either a direction of negative curvature or an approximate solution to the Newton system Hp = –g. In either case, p helps to define the two-dimensional subspace used in the trust-region approach discussed in Trust-Region Methods for Nonlinear Minimization.

Trust-Region-Dogleg Algorithm

Another approach is to solve a linear system of equations to find the search direction. Newton's method specifies to solve for the search direction d_k such that

J(x_k)d_k = –F(x_k)
x_{k
+ 1} = x_k + d_k,

where J(x_k) is the n-by-n Jacobian

$J (x_{k}) = [\begin{matrix} \nabla F_{1} {(x_{k})}^{T} \\ \nabla F_{2} {(x_{k})}^{T} \\ ⋮ \\ \nabla F_{n} {(x_{k})}^{T} \end{matrix}] .$

Newton's method can be problematic. J(x_k) might be singular, in which case the Newton step d_k is not even defined. Also, the exact Newton step d_k can be expensive to compute. In addition, Newton's method might not converge if the starting point is far from the solution.

Using trust-region techniques (introduced in Trust-Region Methods for Nonlinear Minimization) handles the case when J(x_k) is singular and improves robustness when the starting point is far from the solution. To use a trust-region strategy, you need a merit function to decide if x_{k + 1} is better or worse than x_k. A possible choice is

$\min_{d} f (d) = \frac{1}{2} F {(x_{k} + d)}^{T} F (x_{k} + d) .$

But a minimum of f(d) is not necessarily a root of F(x).

The Newton step d_k is a root of

M(x_k + d) = F(x_k) + J(x_k)d,

so it is also a minimum of m(d), where

\begin{matrix} \min_{d} m (d) = \frac{1}{2} {‖ M (x_{k} + d) ‖}_{2}^{2} = \frac{1}{2} {‖ F (x_{k}) + J (x_{k}) d ‖}_{2}^{2} \\ = \frac{1}{2} F {(x_{k})}^{T} F (x_{k}) + d^{T} J {(x_{k})}^{T} F (x_{k}) + \frac{1}{2} d^{T} J {(x_{k})}^{T} J (x_{k}) d . \end{matrix}

(2)

m(d) is a better choice of merit function than f(d), so the trust-region subproblem is

\min_{d} [\frac{1}{2} F {(x_{k})}^{T} F (x_{k}) + d^{T} J {(x_{k})}^{T} F (x_{k}) + \frac{1}{2} d^{T} J {(x_{k})}^{T} J (x_{k}) d],

(3)

such that ‖D·d‖ ≤ Δ. You can solve this subproblem efficiently using a dogleg strategy.

For an overview of trust-region methods, see Conn [4] and Nocedal [31].

Trust-Region-Dogleg Implementation

The key feature of the trust-region-dogleg algorithm is the use of the Powell dogleg procedure for computing the step d, which minimizes Equation 3. For a detailed description, see Powell [34].

The algorithm constructs the step d from a convex combination of a Cauchy step (a step along the steepest descent direction) and a Gauss-Newton step for f(x). The Cauchy step is calculated as

d_C = –αJ(x_k)^TF(x_k),

where α minimizes Equation 2.

The Gauss-Newton step is calculated by solving

J(x_k)·d_GN = –F(x_k),

using the MATLAB^® mldivide (matrix left division) operator.

The algorithm chooses the step d so that

d = d_C + λ(d_GN – d_C),

where λ is the largest value in the interval [0,1] such that ‖d‖ ≤ Δ. If J_k is (nearly) singular, d is just the Cauchy direction.

The trust-region-dogleg algorithm is efficient because it requires only one linear solve per iteration (for the computation of the Gauss-Newton step). Additionally, the algorithm can be more robust than using the Gauss-Newton method with a line search.

Levenberg-Marquardt Method

The Levenberg-Marquardt algorithm ([25], and [27]) uses a search direction that is a solution of the linear set of equations

(J {(x_{k})}^{T} J (x_{k}) + λ_{k} I) d_{k} = - J {(x_{k})}^{T} F (x_{k}),

(4)

or, optionally, of the equations

(J {(x_{k})}^{T} J (x_{k}) + λ_{k} diag (J {(x_{k})}^{T} J (x_{k}))) d_{k} = - J {(x_{k})}^{T} F (x_{k}),

(5)

where the scalar λ_k controls both the magnitude and direction of d_k. Set the fsolve option ScaleProblem to 'none' to use Equation 4, or set this option to 'jacobian' to use Equation 5.

When λ_k is zero, the direction d_k is the Gauss-Newton method. As λ_k tends towards infinity, d_k tends towards the steepest descent direction, with magnitude tending towards zero. The implication is that, for some sufficiently large λ_k, the term F(x_k + d_k) < F(x_k) holds true. Therefore, the algorithm can control the term λ_k to ensure descent despite second-order terms, which restrict the efficiency of the Gauss-Newton method. The Levenberg-Marquardt algorithm, therefore, uses a search direction that is a cross between the Gauss-Newton direction and the steepest descent direction. For more details, see Levenberg-Marquardt Method in the least squares documentation.

fzero Algorithm

fzero attempts to find the root of a scalar function f of a scalar variable x.

fzero looks for an interval around an initial point such that f(x) changes sign. If you specify an initial interval instead of an initial point, fzero checks to make sure that f(x) has different signs at the endpoints of the interval. The initial interval must be finite; it cannot contain ±Inf.

fzero uses a combination of interval bisection, linear interpolation, and inverse quadratic interpolation in order to locate a root of f(x). See fzero for more information.

`\` Algorithm

The \ algorithm is described in the Algorithms section of the mldivide reference page.