modelDiscrimination
Syntax
Description
computes the area under the receiver operating characteristic curve (AUROC).
DiscMeasure
= modelDiscrimination(eadModel
,data
)modelDiscrimination
supports segmentation and comparison
against a reference model and alternative methods to discretize the EAD response
into a binary variable.
[
specifies options using one or more name-value arguments in addition to the input
arguments in the previous syntax.DiscMeasure
,DiscData
] = modelDiscrimination(___,Name=Value
)
Examples
Compute AUROC and ROC Using Tobit EAD Model
This example shows how to use fitEADModel
to create a Tobit
model and then use modelDiscrimination
to compute AUROC and ROC.
Load EAD Data
Load the EAD data.
load EADData.mat
head(EADData)
UtilizationRate Age Marriage Limit Drawn EAD _______________ ___ ___________ __________ __________ __________ 0.24359 25 not married 44776 10907 44740 0.96946 44 not married 2.1405e+05 2.0751e+05 40678 0 40 married 1.6581e+05 0 1.6567e+05 0.53242 38 not married 1.7375e+05 92506 1593.5 0.2583 30 not married 26258 6782.5 54.175 0.17039 54 married 1.7357e+05 29575 576.69 0.18586 27 not married 19590 3641 998.49 0.85372 42 not married 2.0712e+05 1.7682e+05 1.6454e+05
rng('default'); NumObs = height(EADData); c = cvpartition(NumObs,'HoldOut',0.4); TrainingInd = training(c); TestInd = test(c);
Select Model Type
Select a model type for Tobit
or Regression
.
ModelType = "Tobit";
Select Conversion Measure
Select a conversion measure for the EAD response values.
ConversionMeasure = "LCF";
Create Tobit
EAD Model
Use fitEADModel
to create a Tobit
model using the TrainingInd
data.
eadModel = fitEADModel(EADData(TrainingInd,:),ModelType,PredictorVars={'UtilizationRate','Age','Marriage'}, ... ConversionMeasure=ConversionMeasure,DrawnVar="Drawn",LimitVar="Limit",ResponseVar="EAD"); disp(eadModel);
Tobit with properties: CensoringSide: "both" LeftLimit: 0 RightLimit: 1 ModelID: "Tobit" Description: "" UnderlyingModel: [1x1 risk.internal.credit.TobitModel] PredictorVars: ["UtilizationRate" "Age" "Marriage"] ResponseVar: "EAD" LimitVar: "Limit" DrawnVar: "Drawn" ConversionMeasure: "lcf"
Display the underlying model. The underlying model's response variable is the transformation of the EAD response data. Use the 'LiimitVar'
and 'DrwanVar'
name-value arguments to modify the transformation.
disp(eadModel.UnderlyingModel);
Tobit regression model: EAD_lcf = max(0,min(Y*,1)) Y* ~ 1 + UtilizationRate + Age + Marriage Estimated coefficients: Estimate SE tStat pValue __________ __________ ________ __________ (Intercept) 0.22467 0.031085 7.2276 6.4149e-13 UtilizationRate 0.4714 0.020682 22.793 0 Age -0.0014209 0.00075844 -1.8735 0.061111 Marriage_not married -0.010543 0.015817 -0.66654 0.50512 (Sigma) 0.3618 0.0049991 72.374 0 Number of observations: 2627 Number of left-censored observations: 0 Number of uncensored observations: 2626 Number of right-censored observations: 1 Log-likelihood: -1057.9
Predict EAD
EAD prediction operates on the underlying compact statistical model and then transforms the predicted values back to the EAD scale. You can specify the predict
function with different options for the 'ModelLevel'
name-value argument.
predictedEAD = predict(eadModel,EADData(TestInd,:),ModelLevel="ead"); predictedConversion = predict(eadModel,EADData(TestInd,:),ModelLevel="ConversionMeasure");
Validate EAD Model
For model validation, use modelDiscrimination
, modelDiscriminationPlot
, modelCalibration
, and modelCalibrationPlot
.
Use modelDiscrimination
and then modelDiscriminationPlot
to plot the ROC curve.
ModelLevel = "ConversionMeasure";
[DiscMeasure1,DiscData1] = modelDiscrimination(eadModel,EADData(TestInd,:),ShowDetails=true,ModelLevel=ModelLevel)
DiscMeasure1=1×3 table
AUROC Segment SegmentCount
_______ __________ ____________
Tobit 0.70893 "all_data" 1751
DiscData1=1534×3 table
X Y T
__________ _________ _______
0 0 0.63602
0 0.0027778 0.63602
0 0.0041667 0.6349
0.00096993 0.0055556 0.63377
0.00096993 0.0069444 0.63265
0.0019399 0.0083333 0.63152
0.0029098 0.0097222 0.6304
0.0029098 0.015278 0.62927
0.0029098 0.016667 0.62922
0.0029098 0.018056 0.6288
0.0029098 0.019444 0.62864
0.0038797 0.022222 0.62814
0.0038797 0.025 0.62767
0.0048497 0.026389 0.62701
0.0048497 0.033333 0.62654
0.0058196 0.033333 0.62618
⋮
modelDiscriminationPlot(eadModel,EADData(TestInd,:),ModelLevel=ModelLevel,SegmentBy="Marriage");
Compute AUROC and ROC Using Beta EAD Model
This example shows how to use fitEADModel
to create a Beta
model and then use modelDiscrimination
to compute AUROC and ROC.
Load EAD Data
Load the EAD data.
load EADData.mat
head(EADData)
UtilizationRate Age Marriage Limit Drawn EAD _______________ ___ ___________ __________ __________ __________ 0.24359 25 not married 44776 10907 44740 0.96946 44 not married 2.1405e+05 2.0751e+05 40678 0 40 married 1.6581e+05 0 1.6567e+05 0.53242 38 not married 1.7375e+05 92506 1593.5 0.2583 30 not married 26258 6782.5 54.175 0.17039 54 married 1.7357e+05 29575 576.69 0.18586 27 not married 19590 3641 998.49 0.85372 42 not married 2.0712e+05 1.7682e+05 1.6454e+05
rng('default'); NumObs = height(EADData); c = cvpartition(NumObs,'HoldOut',0.4); TrainingInd = training(c); TestInd = test(c);
Select Model Type
Select a model type for a Beta
model.
ModelType = "Beta";
Select Conversion Measure
Select a conversion measure for the EAD response values.
ConversionMeasure = "LCF";
Create Beta
EAD Model
Use fitEADModel
to create a Beta
model using the TrainingInd
data.
eadModel = fitEADModel(EADData(TrainingInd,:),ModelType,PredictorVars={'UtilizationRate','Age','Marriage'}, ... ConversionMeasure=ConversionMeasure,DrawnVar="Drawn",LimitVar="Limit",ResponseVar="EAD"); disp(eadModel);
Beta with properties: BoundaryTolerance: 1.0000e-07 ModelID: "Beta" Description: "" UnderlyingModel: [1x1 risk.internal.credit.BetaModel] PredictorVars: ["UtilizationRate" "Age" "Marriage"] ResponseVar: "EAD" LimitVar: "Limit" DrawnVar: "Drawn" ConversionMeasure: "lcf"
Display the underlying model. The underlying model's response variable is the transformation of the EAD response data. Use the 'LiimitVar'
and 'DrwanVar'
name-value arguments to modify the transformation.
disp(eadModel.UnderlyingModel);
Beta regression model: logit(EAD_lcf) ~ 1_mu + UtilizationRate_mu + Age_mu + Marriage_mu log(EAD_lcf) ~ 1_phi + UtilizationRate_phi + Age_phi + Marriage_phi Estimated coefficients: Estimate SE tStat pValue _________ _________ ________ __________ (Intercept)_mu -0.65566 0.11484 -5.7093 1.2614e-08 UtilizationRate_mu 1.7014 0.078094 21.787 0 Age_mu -0.00559 0.0027603 -2.0252 0.042952 Marriage_not married_mu -0.012576 0.052098 -0.2414 0.80926 (Intercept)_phi -0.50132 0.094625 -5.2979 1.2685e-07 UtilizationRate_phi 0.39731 0.066707 5.956 2.9304e-09 Age_phi -0.001167 0.0023161 -0.50386 0.61441 Marriage_not married_phi -0.013275 0.042627 -0.31143 0.7555 Number of observations: 2627 Log-likelihood: -3140.21
Predict EAD
EAD prediction operates on the underlying compact statistical model and then transforms the predicted values back to the EAD scale. You can specify the predict
function with different options for the 'ModelLevel'
name-value argument.
predictedEAD = predict(eadModel,EADData(TestInd,:),ModelLevel="ead"); predictedConversion = predict(eadModel,EADData(TestInd,:),ModelLevel="ConversionMeasure");
Validate EAD Model
For model validation, use modelDiscrimination
, modelDiscriminationPlot
, modelCalibration
, and modelCalibrationPlot
.
Use modelDiscrimination
and then modelDiscriminationPlot
to plot the ROC curve.
ModelLevel = "ConversionMeasure";
[DiscMeasure1,DiscData1] = modelDiscrimination(eadModel,EADData(TestInd,:),ShowDetails=true,ModelLevel=ModelLevel)
DiscMeasure1=1×3 table
AUROC Segment SegmentCount
_______ __________ ____________
Beta 0.70895 "all_data" 1751
DiscData1=1534×3 table
X Y T
__________ _________ _______
0 0 0.71675
0 0.0027778 0.71675
0 0.0041667 0.71561
0 0.0055556 0.71533
0.00096993 0.0069444 0.71447
0.00096993 0.0097222 0.71419
0.00096993 0.011111 0.71333
0.00096993 0.018056 0.71304
0.0019399 0.018056 0.7128
0.0029098 0.019444 0.71218
0.0048497 0.019444 0.7119
0.0058196 0.020833 0.71104
0.0067895 0.020833 0.71075
0.0067895 0.022222 0.71022
0.0067895 0.027778 0.70989
0.0067895 0.029167 0.70968
⋮
modelDiscriminationPlot(eadModel,EADData(TestInd, :),ModelLevel=ModelLevel,SegmentBy="Marriage");
Input Arguments
eadModel
— Exposure at default model
Regression
object | Tobit
object | Beta
object
Exposure at default model, specified as a previously created Regression
,
Tobit
, or Beta
object using
fitEADModel
.
Data Types: object
data
— Data
table
Data, specified as a
NumRows
-by-NumCols
table with
predictor and response values. The variable names and data types must be
consistent with the underlying model.
Data Types: table
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: [DiscMeasure,DiscData] =
modelDiscrimination(eadModel,data(TestInd,:),DataID='Testing',DiscretizeBy='median')
DataID
— Data set identifier
""
(default) | character vector | string
Data set identifier, specified as DataID
and a
character vector or string. The DataID
is included in
the output for reporting purposes.
Data Types: char
| string
DiscretizeBy
— Discretization method for EAD data
at defined ModelLevel
'mean'
(default) | character vector with value 'mean'
or
'median'
| string with value "mean"
or
"median"
Discretization method for EAD data
at the defined
ModelLevel
, specified as
DiscretizeBy
and a character vector or string.
'mean'
— Discretized response is1
if observed EAD is greater than or equal to the mean EAD,0
otherwise.'median'
— Discretized response is1
if observed EAD is greater than or equal to the median EAD,0
otherwise.
Data Types: char
| string
SegmentBy
— Name of column in data
input used to segment data set
""
(default) | character vector | string
Name of a column in the data
input, not
necessarily a model variable, to be used to segment the data set,
specified as SegmentBy
and a character vector or
string. One AUROC is reported for each segment, and the corresponding
ROC data for each segment is returned in the optional output.
Data Types: char
| string
ShowDetails
— Indicates if output includes columns showing segment value and segment count
false
(default) | logical
Since R2022a
Indicates if the output includes columns showing segment value and
segment count, specified as the comma-separated pair consisting of
'ShowDetails'
and a scalar logical.
Data Types: logical
ModelLevel
— Model level
"ead"
(default) | character vector with value 'ead'
, 'conversionMeasure'
, or
'conversionTransform'
| string with value "ead"
,
"conversionMeasure"
, or
"conversionTransform"
Model level, specified as ModelLevel
and a
character vector or string.
Note
Regression
models support all three model levels,
but a Tobit
or Beta
model supports only a ModelLevel
for
"ead"
and
"conversionMeasure"
.
Data Types: char
| string
ReferenceEAD
— EAD values predicted for data
by reference model
[]
(default) | numeric vector
ReferenceID
— Identifier for the reference model
'Reference'
(default) | character vector | string
Identifier for the reference model, specified as
ReferenceID
and a character vector or string.
'ReferenceID'
is used in the
modelDiscrimination
output for reporting
purposes.
Data Types: char
| string
Output Arguments
DiscMeasure
— AUROC information for each model and each segment
table
AUROC information for each model and each segment, returned as a table.
DiscMeasure
has a single column named
'AUROC'
and the number of rows depends on the number
of segments and whether you use a ReferenceID
for a
reference model. The row names of DiscMeasure
report the
model IDs, segment, and data ID. If the optional
ShowDetails
name-value argument is
true
, the DiscMeasure
output
displays Segment
and SegmentCount
columns.
Note
If you do not specify SegmentBy
and use
ShowDetails
to request the segment details,
the two columns are added and show the Segment
column as "all_data"
and the sample size (minus
missing values) for the SegmentCount
column.
DiscData
— ROC data for each model and each segment
table
ROC data for each model and each segment, returned as a table. There are
three columns for the ROC data, with column names 'X'
,
'Y'
, and 'T'
, where the first two
are the X and Y coordinates of the ROC curve, and T contains the
corresponding thresholds. For more information, see Model Discrimination or perfcurve
.
If you use SegmentBy
, the function stacks the ROC
data for all segments and DiscData
has a column with the
segmentation values to indicate where each segment starts and ends.
If reference model data is given, the DiscData
outputs
for the main and reference models are stacked, with an extra column
'ModelID'
indicating where each model starts and
ends.
More About
Model Discrimination
Model discrimination measures the risk ranking.
The modelDiscrimination
function computes the area under the
receiver operator characteristic (AUROC) curve, sometimes called simply the area
under the curve (AUC). This metric is between 0 and 1 and higher values indicate
better discrimination.
To compute the AUROC, you need a numeric prediction and a binary response. For EAD
models, the predicted EAD is used directly as the prediction. However, the observed
EAD must be discretized into a binary variable. By default, observed EAD values
greater than or equal to the mean observed EAD are assigned a value of 1, and values
below the mean are assigned a value of 0. This discretized response is interpreted
as "high EAD" vs. "low EAD." Therefore, the modelDiscrimination
function measures how well the predicted EAD separates the "high EAD" vs. the "low
EAD" observations. You can change the level to compute the model discrimination with
the ModelLevel
name-value pair argument and the discretization
criterion with the DiscretizeBy
name-value pair
argument.
To plot the receiver operator characteristic (ROC) curve, use the modelDiscriminationPlot
function. However, if you need the ROC curve
data, use the optional DiscData
output argument from the
modelDiscrimination
function.
The ROC curve is a parametric curve that plots the proportion of
High EAD cases with predicted EAD greater than or equal to a parameter t, or true positive rate (TPR)
Low EAD cases with predicted EAD greater than or equal to the same parameter t, or false positive rate (FPR)
The parameter t sweeps through all the observed predicted EAD
values for the given data. The DiscData
optional output
contains the TPR in the 'X'
column, the FPR in the
'Y'
column, and the corresponding parameters
t in the 'T'
column. For more information
about ROC curves, see ROC Curve and Performance Metrics.
References
[1] Baesens, Bart, Daniel Roesch, and Harald Scheule. Credit Risk Analytics: Measurement Techniques, Applications, and Examples in SAS. Wiley, 2016.
[2] Bellini, Tiziano. IFRS 9 and CECL Credit Risk Modelling and Validation: A Practical Guide with Examples Worked in R and SAS. San Diego, CA: Elsevier, 2019.
[3] Brown, Iain. Developing Credit Risk Models Using SAS Enterprise Miner and SAS/STAT: Theory and Applications. SAS Institute, 2014.
[4] Roesch, Daniel and Harald Scheule. Deep Credit Risk. Independently published, 2020.
Version History
Introduced in R2021bR2022b: Support for Beta
model
The eadModel
input supports an option for a
Beta
model object that you can create using fitEADModel
.
R2022a: Additional option for ShowDetails
There is an additional name-value pair for ShowDetails
to
indicate if the DiscMeasure
output includes columns for
Segment
value and the SegmentCount
.
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)