fitcox
Description
The fitcox
function creates a Cox proportional hazards model
for lifetime data. The basic Cox model includes a hazard function
h0(t) and model coefficients
b such that, for predictor X
, the hazard rate at time
t is
where the b coefficients do not depend on time.
fitcox
infers both the model coefficients b and the
hazard rate h0(t), and stores
them as properties in the resulting CoxModel
object.
The full Cox model includes extensions to the basic model, such as hazards with respect to different baselines or the inclusion of stratification variables. See Extension of Cox Proportional Hazards Model.
coxMdl = fitcox(
modifies the fit using one or more X
,T
,Name,Value
)Name,Value
arguments. For example,
when the data includes censoring (values that are not observed), the
Censoring
argument specifies the censored data.
Examples
Estimate Cox Proportional Hazard Regression
Weibull random variables with the same shape parameter have proportional hazard rates; see Weibull Distribution. The hazard rate with scale parameter and shape parameter at time is
.
Generate pseudorandom samples from the Weibull distribution with scale parameters 1, 5, and 1/3, and with the same shape parameter B
.
rng default % For reproducibility B = 2; A = ones(100,1); data1 = wblrnd(A,B); A2 = 5*A; data2 = wblrnd(A2,B); A3 = A/3; data3 = wblrnd(A3,B);
Create a table of data. The predictors are the three variable types, 1, 2, or 3.
predictors = categorical([A;2*A;3*A]); data = table(predictors,[data1;data2;data3],'VariableNames',["Predictors" "Times"]);
Fit a Cox regression to the data.
mdl = fitcox(data,"Times")
mdl = Cox Proportional Hazards regression model Beta SE zStat pValue _______ _______ _______ __________ Predictors_2 -3.5834 0.33187 -10.798 3.5299e-27 Predictors_3 2.1668 0.20802 10.416 2.0899e-25 Log-likelihood: -1197.917
rates = exp(mdl.Coefficients.Beta)
rates = 2×1
0.0278
8.7301
Fit Cox Proportional Hazards Model to Lifetime Data
Perform a Cox proportional hazards regression on the lightbulb
data set, which contains simulated lifetimes of light bulbs. The first column of the light bulb data contains the lifetime (in hours) of two different types of bulbs. The second column contains a binary variable indicating whether the bulb is fluorescent or incandescent; 0 indicates the bulb is fluorescent, and 1 indicates it is incandescent. The third column contains the censoring information, where 0 indicates the bulb was observed until failure, and 1 indicates the observation was censored.
Load the lightbulb
data set.
load lightbulb
Fit a Cox proportional hazards model for the lifetime of the light bulbs, accounting for censoring. The predictor variable is the type of bulb.
coxMdl = fitcox(lightbulb(:,2),lightbulb(:,1), ... 'Censoring',lightbulb(:,3))
coxMdl = Cox Proportional Hazards regression model Beta SE zStat pValue ______ ______ ______ __________ X1 4.7262 1.0372 4.5568 5.1936e-06 Log-likelihood: -212.638
Find the hazard rate of incandescent bulbs compared to fluorescent bulbs by evaluating .
hr = exp(coxMdl.Coefficients.Beta)
hr = 112.8646
The estimate of the hazard ratio is = 112.8646, which means that the estimated hazard for the incandescent bulbs is 112.86 times the hazard for the fluorescent bulbs. The small value of coxMdl.Coefficients.pValue
indicates there is a negligible chance that the two types of light bulbs have identical hazard rates, which would mean Beta
= 0.
Input Arguments
X
— Predictor values
matrix | table
Predictor values, specified as a matrix or table.
A matrix contains one column for each predictor and one row for each observation.
A table contains one row for each observation. A table can also contain the time data as well as the predictors.
By default, if the predictor data is in a table, fitcox
assumes
that a variable is categorical if it is a logical vector, categorical vector, character
array, string array, or cell array of character vectors. If the predictor data is a
matrix, fitcox
assumes that all predictors are continuous. To
identify any other predictors as categorical predictors, specify them by using the
CategoricalPredictors
name-value argument.
If X
, T
, the value of
'Frequency'
, or the value of 'Stratification'
contains
NaN
values, then fitcox
removes rows with
NaN
values from all data when fitting a Cox model.
Data Types: double
| table
| categorical
T
— Event times
real column vector | real matrix with two columns | name of column in table X
| formula in Wilkinson notation for table X
Event times, specified as one of the following:
Real column vector.
Real matrix with two columns representing the start and stop times.
Name of a column in the table
X
.Formula in Wilkinson notation for the table
X
. For example, to specify that the table columns'x'
and'y'
are in the model, use'T ~ x + y'
See Wilkinson Notation.
For vector or matrix entries, the number of rows of T
must be the
same as the number of rows of X
.
Use the two-column form of T
to fit a model with time-varying
coefficients. See Cox Proportional Hazards Model with Time-Dependent Covariates.
Data Types: single
| double
| char
| string
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: To fit data with censored values cens
, specify
'Censoring',cens
.
Baseline
— X
values at which to compute baseline hazard
mean(X)
, the default for continuous predictors | 0
, the default for categorical predictors | real scalar | real row vector
X
values at which to compute the baseline hazard, specified
as a real scalar or row vector. If Baseline
is a row vector, its
length is the number of predictors, so there is one baseline for each
predictor.
The default baseline for continuous predictors is mean(X)
, so
the default hazard rate at X
for these predictors is
h(t)*exp((X – mean(X))*b)
. The default baseline for categorical
predictors is 0
. Enter 0
to compute the baseline
for all predictors relative to 0, so the hazard rate at X
is
h(t)*exp(X*b)
. Changing the baseline changes the hazard ratio,
but does not affect the coefficient estimates.
For the identified categorical predictors, fitcox
creates dummy variables. fitcox
creates one less dummy
variable than the number of categories. For details, see Automatic Creation of Dummy Variables.
Example: 'Baseline',0
Data Types: double
Beta
— Coefficient initial values
0.01/std(X)
(default) | numeric vector
Coefficient initial values, specified as a numeric vector of coefficient values.
These values initiate the likelihood maximization iterations performed by
fitcox
.
Data Types: double
CategoricalPredictors
— Categorical predictors list
vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all'
Categorical predictors list, specified as one of the values in this table.
Value | Description |
---|---|
Vector of positive integers | Each entry in the vector is an index value corresponding to the column
of the predictor data (X ) that contains a categorical
variable. |
Logical vector | A true entry means that the corresponding column of
predictor data (X ) is a categorical variable. |
Character matrix | Each row of the matrix is the name of a predictor variable in the table
X . The names must match the entries in
PredictorNames . Pad the names with extra blanks so
each row of the character matrix has the same length. |
String array or cell array of character vectors | Each element in the array is the name of a predictor variable in the
table X . The names must match the entries in
PredictorNames . |
'all' | All predictors are categorical. |
By default, if the predictor data is in a table,
fitcox
assumes that a variable is categorical if it is a
logical vector, categorical vector, character array, string array, or cell array of
character vectors. If the predictor data is a matrix, fitcox
assumes that all predictors are continuous. To identify any other predictors as
categorical predictors, specify them by using the
'CategoricalPredictors'
name-value argument.
For the identified categorical predictors, fitcox
creates dummy variables. fitcox
creates one less dummy
variable than the number of categories. For details, see Automatic Creation of Dummy Variables.
Example: 'CategoricalPredictors','all'
Data Types: single
| double
| logical
| char
| string
| cell
Censoring
— Indicator for censoring
array of 0s (default) | array of 0s and 1s | name of a column in table X
Indicator for censoring, specified as a Boolean vector with the same number of
rows as X
or the name of a column in the table
X
. Use 1 for observations that are right censored and 0 for
observations that are fully observed. By default, all observations are fully observed.
For an example, see Cox Proportional Hazards Model for Censored Data.
Example: 'Censoring',cens
Data Types: logical
Frequency
— Frequency or weights of observations
array of 1s (default) | vector of nonnegative scalar values
Frequency or weights of observations, specified as an array the same size as
T
containing nonnegative scalar values. The array can contain
integer values corresponding to frequencies of observations or nonnegative values
corresponding to observation weights.
The default is 1 per row of X
and
T
.
If X
, T
, the value of
'Frequency'
, or the value of 'Stratification'
contains
NaN
values, then fitcox
removes rows with
NaN
values from all data when fitting a Cox model.
Example: 'Frequency',w
Data Types: double
OptimizationOptions
— Algorithm control parameters
structure
Algorithm control parameters for the iterative algorithm
fitcox
uses to estimate the solution, specified as a
structure. Create this structure using statset
. For parameter names
and default values, see the following table or enter
statset('fitcox')
.
In the table, "termination tolerance" means that if the internal iterations cause a change in the stated value less than the tolerance, the iterations stop.
Field in Structure | Description | Values |
---|---|---|
Display | Amount of information returned to the command line |
|
MaxFunEvals | Maximum number of function evaluations | Positive integer; default is 200 |
MaxIter | Maximum number of iterations | Positive integer; default is 100 |
TolFun | Termination tolerance on change in likelihood; see Cox Proportional Hazards Model | Positive scalar; default is 1e-8 |
TolX | Termination tolerance for parameter (predictor estimate) change | Positive scalar; default is 1e-8 |
Example: 'OptimizationOptions',statset('TolX',1e-6,'MaxIter',200)
PredictorNames
— Predictor variable names
string array of unique names | cell array of unique character vectors
Predictor variable names, specified as a string array of unique names or cell
array of unique character vectors. The functionality of
'PredictorNames'
depends on how you supply the training
data.
If you supply
X
as a numeric array, then you can use'PredictorNames'
to assign names to the predictor variables inX
.The order of the names in
PredictorNames
must correspond to the column order ofX
. That is,PredictorNames{1}
is the name ofX(:,1)
,PredictorNames{2}
is the name ofX(:,2)
, and so on. Also,size(X,2)
andnumel(PredictorNames)
must be equal.By default,
PredictorNames
is{'X1','X2',...}
.
If you supply
X
as a table, then you can use'PredictorNames'
to choose which predictor variables to use in training. That is,fitcox
uses only the predictor variables inPredictorNames
and the time variable during training.PredictorNames
must be a subset ofX.Properties.VariableNames
and cannot include the name of the time variableT
.By default,
PredictorNames
contains the names of all predictor variables.Specify the predictors for training using either
'PredictorNames'
or a formula in Wilkinson notation, but not both.
Example: 'PredictorNames',{'Sex','Age','Weight','Smoker'}
Data Types: string
| cell
Stratification
— Stratification variables
[]
(default) | matrix of real values | name of column in table X
| array of categorical variables
Stratification variables, specified as a matrix of real values, the name of a
column in table X
, or an array of categorical variables. The matrix
must have the same number of rows as T
, with each row
corresponding to an observation.
The default []
is no stratification variable.
If X
, T
, the value of
'Frequency'
, or the value of 'Stratification'
contains
NaN
values, then fitcox
removes rows with
NaN
values from all data when fitting a Cox model.
Example: 'Stratification',Gender
Data Types: single
| double
| char
| string
| categorical
TieBreakMethod
— Method to handle tied failure times
'breslow'
(default) | 'efron'
Method to handle tied failure times, specified as 'breslow'
(Breslow's method) or 'efron'
(Efron's method). See Partial Likelihood Function for Tied Events.
Example: 'TieBreakMethod','efron'
Data Types: char
| string
Version History
Introduced in R2021a
See Also
CoxModel
| hazardratio
| survival
| plotSurvival
| linhyptest
| coefci
| coxphfit
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)