Main Content

fitcox

Create Cox proportional hazards model

Description

The fitcox function creates a Cox proportional hazards model for lifetime data. The basic Cox model includes a hazard function h0(t) and model coefficients b such that, for predictor X, the hazard rate at time t is

h(Xi,t)=h0(t)exp[j=1pxijbj],

where the b coefficients do not depend on time. fitcox infers both the model coefficients b and the hazard rate h0(t), and stores them as properties in the resulting CoxModel object.

The full Cox model includes extensions to the basic model, such as hazards with respect to different baselines or the inclusion of stratification variables. See Extension of Cox Proportional Hazards Model.

coxMdl = fitcox(X,T) returns a Cox proportional hazards model object coxMdl using the predictor values X and event times T.

example

coxMdl = fitcox(X,T,Name,Value) modifies the fit using one or more Name,Value arguments. For example, when the data includes censoring (values that are not observed), the Censoring argument specifies the censored data.

Examples

collapse all

Weibull random variables with the same shape parameter have proportional hazard rates; see Weibull Distribution. The hazard rate with scale parameter a and shape parameter b at time t is

babtb-1.

Generate pseudorandom samples from the Weibull distribution with scale parameters 1, 5, and 1/3, and with the same shape parameter B.

rng default % For reproducibility
B = 2;
A = ones(100,1);
data1 = wblrnd(A,B);
A2 = 5*A;
data2 = wblrnd(A2,B);
A3 = A/3;
data3 = wblrnd(A3,B);

Create a table of data. The predictors are the three variable types, 1, 2, or 3.

predictors = categorical([A;2*A;3*A]);
data = table(predictors,[data1;data2;data3],'VariableNames',["Predictors" "Times"]);

Fit a Cox regression to the data.

mdl = fitcox(data,"Times")
mdl = 
Cox Proportional Hazards regression model:

                     Beta        SE        zStat       pValue  
                    _______    _______    _______    __________

    Predictors_2    -3.5834    0.33187    -10.798    3.5299e-27
    Predictors_3     2.1668    0.20802     10.416    2.0899e-25

rates = exp(mdl.Coefficients.Beta)
rates = 2×1

    0.0278
    8.7301

Perform a Cox proportional hazards regression on the lightbulb data set, which contains simulated lifetimes of light bulbs. The first column of the light bulb data contains the lifetime (in hours) of two different types of bulbs. The second column contains a binary variable indicating whether the bulb is fluorescent or incandescent; 0 indicates the bulb is fluorescent, and 1 indicates it is incandescent. The third column contains the censoring information, where 0 indicates the bulb was observed until failure, and 1 indicates the observation was censored.

Load the lightbulb data set.

load lightbulb

Fit a Cox proportional hazards model for the lifetime of the light bulbs, accounting for censoring. The predictor variable is the type of bulb.

coxMdl = fitcox(lightbulb(:,2),lightbulb(:,1), ...
    'Censoring',lightbulb(:,3))
coxMdl = 
Cox Proportional Hazards regression model:

           Beta       SE      zStat       pValue  
          ______    ______    ______    __________

    X1    4.7262    1.0372    4.5568    5.1936e-06

Find the hazard rate of incandescent bulbs compared to fluorescent bulbs by evaluating exp(Beta).

hr = exp(coxMdl.Coefficients.Beta)
hr = 112.8646

The estimate of the hazard ratio is eBeta = 112.8646, which means that the estimated hazard for the incandescent bulbs is 112.86 times the hazard for the fluorescent bulbs. The small value of coxMdl.Coefficients.pValue indicates there is a negligible chance that the two types of light bulbs have identical hazard rates, which would mean Beta = 0.

Input Arguments

collapse all

Predictor values, specified as a matrix or table.

  • A matrix contains one column for each predictor and one row for each observation.

  • A table contains one row for each observation. A table can also contain the time data as well as the predictors.

By default, if the predictor data is in a table, fitcox assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix, fitcox assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the CategoricalPredictors name-value argument.

If X, T, the value of 'Frequency', or the value of 'Stratification' contains NaN values, then fitcox removes rows with NaN values from all data when fitting a Cox model.

Data Types: double | table | categorical

Event times, specified as one of the following:

  • Real column vector.

  • Real matrix with two columns representing the start and stop times.

  • Name of a column in the table X.

  • Formula in Wilkinson notation for the table X. For example, to specify that the table columns 'x' and 'y' are in the model, use

    'T ~ x + y'

    See Wilkinson Notation.

For vector or matrix entries, the number of rows of T must be the same as the number of rows of X.

Use the two-column form of T to fit a model with time-varying coefficients. See Cox Proportional Hazards Model with Time-Dependent Covariates.

Data Types: single | double | char | string

Name-Value Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: To fit data with censored values cens, specify 'Censoring',cens.

X values at which to compute the baseline hazard, specified as a real scalar or row vector. If Baseline is a row vector, its length is the number of predictors, so there is one baseline for each predictor.

The default baseline for continuous predictors is mean(X), so the default hazard rate at X for these predictors is h(t)*exp((X – mean(X))*b). The default baseline for categorical predictors is 0. Enter 0 to compute the baseline for all predictors relative to 0, so the hazard rate at X is h(t)*exp(X*b). Changing the baseline changes the hazard ratio, but does not affect the coefficient estimates.

For the identified categorical predictors, fitcox creates dummy variables. fitcox creates one less dummy variable than the number of categories. For details, see Automatic Creation of Dummy Variables.

Example: 'Baseline',0

Data Types: double

Coefficient initial values, specified as a numeric vector of coefficient values. These values initiate the likelihood maximization iterations performed by fitcox.

Data Types: double

Categorical predictors list, specified as one of the values in this table.

ValueDescription
Vector of positive integersEach entry in the vector is an index value corresponding to the column of the predictor data (X) that contains a categorical variable.
Logical vectorA true entry means that the corresponding column of predictor data (X) is a categorical variable.
Character matrixEach row of the matrix is the name of a predictor variable in the table X. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.
String array or cell array of character vectorsEach element in the array is the name of a predictor variable in the table X. The names must match the entries in PredictorNames.
'all'All predictors are categorical.

By default, if the predictor data is in a table, fitcox assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix, fitcox assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the 'CategoricalPredictors' name-value argument.

For the identified categorical predictors, fitcox creates dummy variables. fitcox creates one less dummy variable than the number of categories. For details, see Automatic Creation of Dummy Variables.

Example: 'CategoricalPredictors','all'

Data Types: single | double | logical | char | string | cell

Indicator for censoring, specified as a Boolean vector with the same number of rows as X or the name of a column in the table X. Use 1 for observations that are right censored and 0 for observations that are fully observed. By default, all observations are fully observed. For an example, see Cox Proportional Hazards Model for Censored Data.

Example: 'Censoring',cens

Data Types: logical

Frequency or weights of observations, specified as an array the same size as T containing nonnegative scalar values. The array can contain integer values corresponding to frequencies of observations or nonnegative values corresponding to observation weights.

The default is 1 per row of X and T.

If X, T, the value of 'Frequency', or the value of 'Stratification' contains NaN values, then fitcox removes rows with NaN values from all data when fitting a Cox model.

Example: 'Frequency',w

Data Types: double

Algorithm control parameters for the iterative algorithm fitcox uses to estimate the solution, specified as a structure. Create this structure using statset. For parameter names and default values, see the following table or enter statset('fitcox').

In the table, "termination tolerance" means that if the internal iterations cause a change in the stated value less than the tolerance, the iterations stop.

Field in StructureDescriptionValues
DisplayAmount of information returned to the command line
  • 'off' — None (default)

  • 'final' — Final output

  • 'iter' — Output at each iteration

MaxFunEvalsMaximum number of function evaluationsPositive integer; default is 200
MaxIterMaximum number of iterationsPositive integer; default is 100
TolFunTermination tolerance on change in likelihood; see Cox Proportional Hazards ModelPositive scalar; default is 1e-8
TolXTermination tolerance for parameter (predictor estimate) changePositive scalar; default is 1e-8

Example: 'OptimizationOptions',statset('TolX',1e-6,'MaxIter',200)

Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The functionality of 'PredictorNames' depends on how you supply the training data.

  • If you supply X as a numeric array, then you can use 'PredictorNames' to assign names to the predictor variables in X.

    • The order of the names in PredictorNames must correspond to the column order of X. That is, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal.

    • By default, PredictorNames is {'X1','X2',...}.

  • If you supply X as a table, then you can use 'PredictorNames' to choose which predictor variables to use in training. That is, fitcox uses only the predictor variables in PredictorNames and the time variable during training.

    • PredictorNames must be a subset of X.Properties.VariableNames and cannot include the name of the time variable T.

    • By default, PredictorNames contains the names of all predictor variables.

    • Specify the predictors for training using either 'PredictorNames' or a formula in Wilkinson notation, but not both.

Example: 'PredictorNames',{'Sex','Age','Weight','Smoker'}

Data Types: string | cell

Stratification variables, specified as a matrix of real values, the name of a column in table X, or an array of categorical variables. The matrix must have the same number of rows as T, with each row corresponding to an observation.

The default [] is no stratification variable.

If X, T, the value of 'Frequency', or the value of 'Stratification' contains NaN values, then fitcox removes rows with NaN values from all data when fitting a Cox model.

Example: 'Stratification',Gender

Data Types: single | double | char | string | categorical

Method to handle tied failure times, specified as 'breslow' (Breslow's method) or 'efron' (Efron's method). See Partial Likelihood Function for Tied Events.

Example: 'TieBreakMethod','efron'

Data Types: char | string

Introduced in R2021a