Main Content

checkGradients

Check first derivative function against finite-difference approximation

Since R2023b

Description

valid = checkGradients(fun,x0) compares the value of the supplied first derivative function in fun at a point near x0 against a finite-difference approximation. By default, the comparison assumes that the function is an objective function. To check constraint functions, set the IsConstraint name-value argument to true.

example

valid = checkGradients(fun,x0,options) modifies the comparison by changing finite differencing options.

example

valid = checkGradients(___,Name=Value) specifies additional options using one or more name-value arguments, in addition to any of the input argument combinations in the previous syntaxes. For example, you can set the tolerance for the comparison, or specify that the comparison is for nonlinear constraint functions.

example

[valid,err] = checkGradients(___) also returns a structure err containing the relative differences between the supplied derivatives and finite-difference approximations.

example

Examples

collapse all

The rosen function at the end of this example computes the Rosenbrock objective function and its gradient for a 2-D variable x.

Check that the computed gradient in rosen matches a finite-difference approximation near the point [2,4].

x0 = [2,4];
valid = checkGradients(@rosen,x0)
valid = logical
   1

function [f,g] = rosen(x)
f = 100*(x(1) - x(2)^2)^2 + (1 - x(2))^2;
if nargout > 1
    g(1) = 200*(x(1) - x(2)^2);
    g(2) = -400*x(2)*(x(1) - x(2)^2) - 2*(1 - x(2));
end
end

The vecrosen function at the end of this example computes the Rosenbrock objective function in least-squares form and its Jacobian (gradient).

Check that the computed gradient in vecrosen matches a finite-difference approximation near the point [2,4].

x0 = [2,4];
valid = checkGradients(@vecrosen,x0)
valid = logical
   1

function [f,g] = vecrosen(x)
f = [10*(x(1) - x(2)^2),1-x(1)];
if nargout > 1
    g = zeros(2); % Allocate g
    g(1,1) = 10; % df(1)/dx(1)
    g(1,2) = -20*x(2); % df(1)/dx(2)
    g(2,1) = -1; % df(2)/dx(1)
    g(2,2) = 0; % df(2)/dx(2)
end
end

The rosen function at the end of this example computes the Rosenbrock objective function and its gradient for a 2-D variable x.

For some initial points, the default forward finite differences cause checkGradients to mistakenly indicate that the rosen function has incorrect gradients. To see result details, set the Display option to "on".

x0 = [0,0];
valid = checkGradients(@rosen,x0,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.48826e-06.
  Supplied derivative element (1,1): -0.126021
  Finite-difference derivative element (1,1): -0.126023

checkGradients failed.

Supplied derivative and finite-difference approximation are
not within 'Tolerance' (1e-06).
____________________________________________________________
valid = logical
   0

checkGradients reports a mismatch, with a difference of just over 1 in the sixth decimal place. Use central finite differences and check again.

opts = optimoptions("fmincon",FiniteDifferenceType="central");
valid = checkGradients(@rosen,x0,opts,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.29339e-11.

checkGradients successfully passed.
____________________________________________________________
valid = logical
   1

Central finite differences are generally more accurate. checkGradients reports that the gradient and central finite-difference approximation match to about 11 decimal places.

function [f,g] = rosen(x)
f = 100*(x(1) - x(2)^2)^2 + (1 - x(2))^2;
if nargout > 1
    g(1) = 200*(x(1) - x(2)^2);
    g(2) = -400*x(2)*(x(1) - x(2)^2) - 2*(1 - x(2));
end
end

The tiltellipse function at the end of this example imposes the constraint that the 2-D variable x is confined to the interior of the tilted ellipse

xy2+(x+2)2+(y-2)222.

Visualize the ellipse.

f = @(x,y) x.*y/2+(x+2).^2+(y-2).^2/2-2;
fcontour(f,LevelList=0)
axis([-6 0 -1 7])

Figure contains an axes object. The axes object contains an object of type functioncontour.

Check the gradient of this nonlinear inequality constraint function.

x0 = [-2,6];
valid = checkGradients(@tiltellipse,x0,IsConstraint=true)
valid = 1x2 logical array

   1   1

function [c,ceq,gc,gceq] = tiltellipse(x)
c = x(1)*x(2)/2 + (x(1) + 2)^2 + (x(2)- 2)^2/2 - 2;
ceq = [];

if nargout > 2
   gc = [x(2)/2 + 2*(x(1) + 2);
       x(1)/2 + x(2) - 2];
   gceq = [];
end
end

The fungrad function at the end of this example correctly calculates the gradient of some components of the least-squares objective, and incorrectly calculates others.

Examine the second output of checkGradients to see which components do not match well at the point [2,4]. To see result details, set the Display option to "on".

x0 = [2,4];
[valid,err] = checkGradients(@fungrad,x0,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 0.749797.
  Supplied derivative element (3,2): 19.9838
  Finite-difference derivative element (3,2): 5

checkGradients failed.

Supplied derivative and finite-difference approximation are
not within 'Tolerance' (1e-06).
____________________________________________________________
valid = logical
   0

err = struct with fields:
    Objective: [3x2 double]

The output shows that element [3,2] is incorrect. But is that the only problem? Examine err.Objective and look for entries that are far from 0.

err.Objective
ans = 3×2

    0.0000    0.0000
    0.0000         0
    0.5000    0.7498

Both the [3,1] and [3,2] elements of the derivative are incorrect. The fungrad2 function at the end of this example corrects the errors.

[valid,err] = checkGradients(@fungrad2,x0,Display="on")
____________________________________________________________

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 2.2338e-08.

checkGradients successfully passed.
____________________________________________________________
valid = logical
   1

err = struct with fields:
    Objective: [3x2 double]

err.Objective
ans = 3×2
10-7 ×

    0.2234    0.0509
    0.0003         0
    0.0981    0.0042

All the differences between the gradient and finite-difference approximations are less than 1e-7 in magnitude.

This code creates the fungrad helper function.

function [f,g] = fungrad(x)
f = [10*(x(1) - x(2)^2),1 - x(1),5*(x(2) - x(1)^2)];

if nargout > 1
    g = zeros(3,2);
    g(1,1) = 10;
    g(1,2) = -20*x(2);
    g(2,1) = -1;
    g(3,1) = -20*x(1);
    g(3,2) = 5*x(2);
end
end

This code creates the fungrad2 helper function.

function [f,g] = fungrad2(x)
f = [10*(x(1) - x(2)^2),1 - x(1),5*(x(2) - x(1)^2)];

if nargout > 1
    g = zeros(3,2);
    g(1,1) = 10;
    g(1,2) = -20*x(2);
    g(2,1) = -1;
    g(3,1) = -10*x(1);
    g(3,2) = 5;
end
end

Input Arguments

collapse all

Function to check, specified as a function handle.

  • If fun represents an objective function, then fun must have the following signature.

    [fval,grad] = fun(x)

    checkGradients compares the value of grad(x) to a finite-difference approximation for a point x near x0. The comparison is

    |grad_fd  gradmax(1,|grad|)|,

    where grad represents the value of the gradient function, and grad_fd represents the value of the finite-difference approximation. checkGradients performs this division component-wise.

  • If fun represents a least-squares objective, then fun is a vector, and grad(x) is a matrix representing the Jacobian of fun.

    If fun returns an array of m components and x has n elements, where n is the number of elements of x0, the Jacobian J is an m-by-n matrix where J(i,j) is the partial derivative of F(i) with respect to x(j). (The Jacobian J is the transpose of the gradient of F.)

  • If fun represents a nonlinear constraint, then fun must have the following signature.

    [c,ceq,gc,gceq] = fun(x)
    • c represents the nonlinear inequality constraints. Solvers attempt to achieve c <= 0. The c output can be a vector of any length.

    • ceq represents the nonlinear equality constraints. Solvers attempt to achieve ceq = 0. The ceq output can be a vector of any length.

    • gc represents the gradient of the nonlinear inequality constraints.

    • gceq represents the gradient of the nonlinear equality constraints.

Data Types: function_handle

Location at which to check the gradient, specified as a double array for all solvers except lsqcurvefit. For lsqcurvefit, x0 is a 1-by-2 cell array {x0array,xdata}.

checkGradients checks the gradient at a point near the specified x0. The function adds a small random direction to x0, no more than 1e-3 in absolute value. This perturbation attempts to protect the check against a point where an incorrect gradient function might pass because of cancellations.

Example: randn(5,1)

Data Types: double
Complex Number Support: Yes

Finite differencing options, specified as the output of optimoptions. The following options affect finite differencing.

OptionDescription
FiniteDifferenceStepSize

Scalar or vector step size factor for finite differences. When you set FiniteDifferenceStepSize to a vector v, the forward finite differences delta are

delta = v.*sign′(x).*max(abs(x),TypicalX);

where sign′(x) = sign(x) except sign′(0) = 1. Central finite differences are

delta = v.*max(abs(x),TypicalX);

A scalar FiniteDifferenceStepSize expands to a vector. The default is sqrt(eps) for forward finite differences, and eps^(1/3) for central finite differences.

FiniteDifferenceType

Finite differences used to estimate gradients are either "forward" (default), or "central" (centered). "central" takes twice as many function evaluations but is usually more accurate.

TypicalX

Typical x values. The number of elements in TypicalX is equal to the number of elements in the starting point x0. The default value is ones(numberofvariables,1).

DiffMaxChange (discouraged)

Maximum change in variables for finite-difference gradients (a positive scalar). The default is Inf.

DiffMinChange (discouraged)

Minimum change in variables for finite-difference gradients (a nonnegative scalar). The default is 0.

Example: optimoptions("fmincon",FiniteDifferenceStepSize=1e-4)

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: IsConstraint=true,Tolerance=5e-4

Flag to display results at command line, specified as "off" (do not display the results) or "on" (display the results).

Example: "off"

Data Types: char | string

Flag to check nonlinear constraint gradients, specified as false (the function is an objective function) or true (the function is a nonlinear constraint function).

Example: true

Data Types: logical

Tolerance for the gradient approximation, specified as a nonnegative scalar. The returned value valid is true for each component where the absolute relative difference between the gradient of fun and its finite-difference approximation is less than or equal to Tolerance.

Example: 1e-3

Data Types: double

Output Arguments

collapse all

Indication that the finite-difference approximation matches the gradient, returned as a logical scalar for objective functions or a two-element logical vector for nonlinear constraint functions [c,ceq]. The returned value valid is true when the absolute relative difference between the gradient of fun and its finite-difference approximation is less than or equal to Tolerance for all components of the gradient. Otherwise, valid is false.

When a nonlinear constraint c or ceq is empty, the returned value of valid for that constraint is true.

Relative differences between the gradients and finite-difference approximations, returned as a structure. For objective functions, the field name is Objective. For nonlinear constraint functions, the field names are Inequality (corresponding to c) and Equality (corresponding to ceq). Each component of err has the same shape as the supplied derivatives from fun.

More About

collapse all

CheckGradients Failed

Gradients or Jacobians estimated near the initial point did not match the supplied derivatives to within a default tolerance of 1e-6 or, for the checkGradients function, the specified Tolerance value.

  • Usually, this failure means that your objective or nonlinear constraint functions have an incorrect derivative calculation. Double-check the indicated derivative.

  • Occasionally, the finite difference approximations to the derivatives are inaccurate enough to cause the failure. This inaccuracy can occur when the second derivative of a function (objective or nonlinear constraint) has a large magnitude. It can also occur when using the default 'forward' finite differences, which are less accurate but faster than 'central' finite differences. If you think that the derivative functions are correct, try one or both of the following to see if CheckGradients passes:

    • Set the FiniteDifferenceType option to 'central'.

    • Set the FiniteDifferenceStepSize option to a small value such as 1e-10.

  • Derivatives are checked at a random point near the initial point. Therefore, the gradient check can pass or fail randomly when differing nearby points have differing check results.

For details, see Checking Validity of Gradients or Jacobians.

Version History

Introduced in R2023b