Main Content

mlecov

Asymptotic covariance of maximum likelihood estimators

Description

example

acov = mlecov(params,data,'pdf',pdf) returns an approximation to the asymptotic covariance matrix of the maximum likelihood estimators of the parameters for a distribution specified by the custom probability density function pdf. The output acov is a p-by-p matrix, where p is the number of parameters in params.

mlecov computes a finite difference approximation to the Hessian of the loglikelihood at the maximum likelihood estimates params, given the observed data, and returns the negative inverse of that Hessian.

example

acov = mlecov(params,data,'logpdf',logpdf) returns acov for a distribution specified by the custom log probability density function logpdf.

example

acov = mlecov(params,data,'nloglf',nloglf) returns acov for a distribution specified by the custom negative loglikelihood function nloglf.

example

acov = mlecov(___,Name,Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. For example, you can specify the censored data and frequency of observations.

Examples

collapse all

Load the sample data.

load carbig

The vector Weight contains the weights of 406 cars.

Define a custom function that returns the pdf of a lognormal distribution. Save the file in your current folder as lognormpdf.m.

function newpdf = lognormpdf(data,mu,sigma)
newpdf = exp((-(log(data)-mu).^2)/(2*sigma^2))./(data*sigma*sqrt(2*pi));

Estimate the parameters mu and sigma of the custom distribution.

[phat,pci] = mle(Weight,'pdf',@lognormpdf,'Start',[4.5 0.3])
phat = 1×2

    7.9600    0.2804

pci = 2×2

    7.9327    0.2611
    7.9872    0.2997

Compute the approximate covariance matrix of the parameter estimates.

acov = mlecov(phat,Weight,'pdf',@lognormpdf)
acov = 2×2
10-3 ×

    0.1937   -0.0000
   -0.0000    0.0968

Estimate the standard errors of the estimates.

se = sqrt(diag(acov))'
se = 1×2

    0.0139    0.0098

The standard error of the estimates of mu and sigma are 0.0139 and 0.0098, respectively.

Recalculate the confidence intervals pci from the standard error se by using the Wald method (normal approximation).

alpha = 0.05;
probs = [alpha/2; 1-alpha/2];
pci2 = norminv(repmat(probs,1,numel(phat)),[phat; phat],[se; se])
pci2 = 2×2

    7.9327    0.2611
    7.9872    0.2997

Define a custom function that returns the log pdf of a beta distribution. Save the file in your current folder as betalogpdf.m.

function logpdf = betalogpdf(x,a,b)
logpdf = (a-1)*log(x)+(b-1)*log(1-x)-betaln(a,b);

Generate sample data from a beta distribution with parameters 1.23 and 3.45, and estimate the parameters using the simulated data.

rng('default') % For reproducibility
x = betarnd(1.23,3.45,25,1);
phat = mle(x,'Distribution','beta')
phat = 1×2

    1.1213    2.7182

Compute the approximate covariance matrix of the parameter estimates.

acov = mlecov(phat,x,'logpdf',@betalogpdf)
acov = 2×2

    0.0810    0.1646
    0.1646    0.6074

Load the sample data.

load('readmissiontimes.mat')

The data includes ReadmissionTime, which has readmission times for 100 patients. This data is simulated.

Define a custom negative loglikelihood function for a Poisson distribution with the parameter lambda, where 1/lambda is the mean of the distribution. You must define the function to accept a logical vector of censorship information and an integer vector of data frequencies, even if you do not use these values in the custom function.

custnloglf = @(lambda,data,cens,freq) ...
    - length(data)*log(lambda) + sum(lambda*data,'omitnan');

Estimate the parameter of the custom distribution and specify its initial parameter value (Start name-value argument).

phat = mle(ReadmissionTime,'nloglf',custnloglf,'Start',0.05)
phat = 0.1462

Compute the variance of the parameter estimate.

acov = mlecov(phat,ReadmissionTime,'nloglf',custnloglf)
acov = 2.1374e-04

Compute the standard error.

sqrt(acov)
ans = 0.0146

Load the sample data.

load('readmissiontimes.mat');

The data includes ReadmissionTime, which has readmission times for 100 patients. The column vector Censored contains the censorship information for each patient, where 1 indicates a right-censored observation, and 0 indicates that the exact readmission time is observed. This data is simulated.

Define a custom log probability density function (pdf) and log survival function for a Weibull distribution with the scale parameter lambda and the shape parameter k. When the data contains censored observations, you must pass both the log pdf and log survival function to mle and mlecov.

custlogpdf = @(data,lambda,k) ...
    log(k) - k*log(lambda) + (k-1)*log(data) - (data/lambda).^k;
custlogsf = @(data,lambda,k) - (data/lambda).^k;

Estimate the parameters of the custom distribution for the censored sample data. Specify the initial parameter values (Start name-value argument) for the custom distribution.

phat = mle(ReadmissionTime,'logpdf',custlogpdf,'logsf',custlogsf, ...
    'Start',[1,0.75],'Censoring',Censored)
phat = 1×2

    9.2090    1.4223

The scale and shape parameters of the custom distribution are 9.2090 and 1.4223, respectively.

Compute the approximate covariance matrix of the parameter estimates.

acov = mlecov(phat,ReadmissionTime, ...
    'logpdf',custlogpdf,'logsf',custlogsf,'Censoring',Censored)
acov = 2×2

    0.5653    0.0102
    0.0102    0.0163

Input Arguments

collapse all

Parameter estimates, specified as a vector. These parameter estimates must be maximum likelihood estimates. For example, you can specify parameter estimates returned by mle.

Data Types: single | double

Sample data and censorship information used to estimate the distribution parameters params, specified as a vector of sample data or a two-column matrix of sample data and censorship information.

You can specify the censorship information for the sample data by using either the data argument or the Censoring name-value argument. mlecov ignores the Censoring argument value if data is a two-column matrix.

Specify data as a vector or a two-column matrix depending on the censorship types of the observations in data.

  • Fully observed data — Specify data as a vector of sample data.

  • Data that contains fully observed, left-censored, or right-censored observations — Specify data as a vector of sample data, and specify the Censoring name-value argument as a vector that contains the censorship information for each observation. The Censoring vector can contain 0, –1, and 1, which refer to fully observed, left-censored, and right-censored observations, respectively.

  • Data that includes interval-censored observations — Specify data as a two-column matrix of sample data and censorship information. Each row of data specifies the range of possible survival or failure times for each observation, and can have one of these values:

    • [t,t] — Fully observed at t

    • [–Inf,t] — Left-censored at t

    • [t,Inf] — Right-censored at t

    • [t1,t2] — Interval-censored between [t1,t2], where t1 < t2

mlecov ignores NaN values in data. Additionally, any NaN values in the censoring vector (Censoring) or frequency vector (Frequency) cause mlecov to ignore the corresponding rows in data.

Data Types: single | double

Custom probability distribution function (pdf), specified as a function handle or a cell array containing a function handle and additional arguments to the function.

The custom function accepts a vector containing sample data, one or more individual distribution parameters, and any additional arguments passed by a cell array as input parameters. The function returns a vector of probability density values.

Example: @newpdf

Data Types: function_handle | cell

Custom log probability density function, specified as a function handle or a cell array containing a function handle and additional arguments to the function.

The custom function accepts a vector containing sample data, one or more individual distribution parameters, and any additional arguments passed by a cell array as input parameters. The function returns a vector of log probability values.

Example: @customlogpdf

Data Types: function_handle | cell

Custom negative loglikelihood function, specified as a function handle or a cell array containing a function handle and additional arguments to the function.

The custom function accepts the following input arguments, in the order listed in the table.

Input Argument of Custom FunctionDescription
paramsVector of distribution parameter values params.
dataSample data. The data value is a vector of sample data or a two-column matrix of sample data and censorship information.
censLogical vector of censorship information. nloglf must accept cens even if you do not use the Censoring name-value argument. In this case, you can write nloglf to ignore cens.
freqInteger vector of data frequencies. nloglf must accept freq even if you do not use the Frequency name-value argument. In this case, you can write nloglf to ignore freq.
truncTwo-element numeric vector of truncation bounds. nloglf must accept trunc if you use the TruncationBounds name-value argument.

nloglf can optionally accept the additional arguments passed by a cell array as input parameters.

nloglf returns a scalar negative loglikelihood value and, optionally, a negative loglikelihood gradient vector (see the GradObj field in the Options name-value argument).

Example: @negloglik

Data Types: function_handle | cell

Name-Value Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Censoring',cens,'Options',opt instructs mlecov to read the censored data information from the vector cens and perform according to the new options structure opt.

Custom cumulative distribution function (cdf), specified as a function handle or a cell array containing a function handle and additional arguments to the function.

The custom function accepts a vector containing sample data, one or more individual distribution parameters, and any additional arguments passed by a cell array as input parameters. The function returns a vector of cdf values.

For censored or truncated observations, you must define both cdf and pdf. For fully observed and untruncated observations, mlecov does not use cdf. You can specify the censorship information by using either data or Censoring and specify the truncation bounds by using TruncationBounds.

Example: 'cdf',@newcdf

Data Types: function_handle | cell

Custom log survival function, specified as a function handle or a cell array containing a function handle and additional arguments to the function.

The custom function accepts a vector containing sample data, one or more individual distribution parameters, and any additional arguments passed by a cell array as input parameters. The function returns a vector of log survival probability values.

For censored or truncated observations, you must define both logsf and logpdf. For fully observed and untruncated observations, mlecov does not use logsf. You can specify the censorship information by using either data or Censoring and specify the truncation bounds by using TruncationBounds.

Example: 'logsf',@logsurvival

Data Types: function_handle | cell

Indicator of censored data, specified as a vector consisting of 0, –1, and 1, which indicate fully observed, left-censored, and right-censored observations, respectively. Each element of the Censoring value indicates the censorship status of the corresponding observation in data. The Censoring value must have the same size as data. The default is a vector of 0s, indicating all observations are fully observed.

You cannot specify interval-censored observations using this argument. If the sample data includes interval-censored observations, specify data using a two-column matrix. mlecov ignores the Censoring value if data is a two-column matrix.

For censored data, you must define the custom distribution by using pdf and cdf, logpdf and logsf, or nloglf.

mlecov ignores any NaN values in the censoring vector. Additionally, any NaN values in data or the frequency vector (Frequency) cause mlecov to ignore the corresponding values in the censoring vector.

Example: 'Censoring',censored, where censored is a vector that contains censorship information.

Data Types: logical | single | double

Truncation bounds, specified as a vector of two elements.

For censored data, you must define the custom distribution by using pdf and cdf, logpdf and logsf, or nloglf.

Example: 'TruncationBounds',[0,10]

Data Types: single | double

Frequency of observations, specified as a vector of nonnegative integer counts that has the same number of rows as data. The jth element of the Frequency value gives the number of times the jth row of data was observed. The default is a vector of 1s, indicating one observation per row of data.

mlecov ignores any NaN values in this frequency vector. Additionally, any NaN values in data or the censoring vector (Censoring) cause mlecov to ignore the corresponding values in the frequency vector.

Example: 'Frequency',freq, where freq is a vector that contains the observation frequencies.

Data Types: single | double

Numerical options for the finite difference Hessian calculation, specified as a structure returned by statset.

The mlecov function interprets the following statset options.

Field NameDescription
GradObj

Flag indicating whether the function provided by the nloglf input argument can return the gradient vector of the negative loglikelihood as a second output, specified as 'on' or 'off' (default).

DerivStep

Relative step size used in the finite difference for Hessian calculations, specified as a vector of the same size as params.

The default value is eps^(1/4). A smaller value than the default might be appropriate if 'GradObj' is 'on'.

Example: 'Options',statset('GradObj','on')

Data Types: struct

More About

collapse all

Censorship Types

mlecov supports left-censored, right-censored, and interval-censored observations.

  • Left-censored observation at time t — The event occurred before time t, and the exact event time is unknown.

  • Right-censored observation at time t — The event occurred after time t, and the exact event time is unknown.

  • Interval-censored observation within the interval [t1,t2] — The event occurred after time t1 and before time t2, and the exact event time is unknown.

Double-censored data includes both left-censored and right-censored observations.

Survival Function

The survival function is the probability of survival as a function of time. It is also called the survivor function.

The survival function gives the probability that the survival time of an individual exceeds a certain value. Because the cumulative distribution function F(t) is the probability that the survival time is less than or equal to a given point t in time, the survival function for a continuous distribution S(t) is the complement of the cumulative distribution function: S(t) = 1 – F(t).

Introduced before R2006a