risk.validation.kolmogorovSmirnov

Kolmogorov-Smirnov statistic

Since R2025a

collapse all in page

Syntax

ksValue = risk.validation.kolmogorovSmirnov(Score,BinaryResponse)

ksValue = risk.validation.kolmogorovSmirnov(Sample1,Sample2)

ksValue = risk.validation.kolmogorovSmirnov(___,SortDirection=sortdir)

[ksValue,Output] = risk.validation.kolmogorovSmirnov(___)

Description

ksValue = risk.validation.kolmogorovSmirnov(Score,BinaryResponse) returns the two-sample Kolmogorov-Smirnov (KS) statistic, where Score contains numeric values that represent rankings or predictions from a binary classification model, such as probability of default (PD) estimates. BinaryResponse specifies the target state of each value in Score. This syntax is well-suited for binary classification models.

example

ksValue = risk.validation.kolmogorovSmirnov(Sample1,Sample2) calculates the two-sample KS statistic for the data in Sample1 and Sample2.

ksValue = risk.validation.kolmogorovSmirnov(___,SortDirection=sortdir) specifies the sorting direction of the unique values in Score or in Sample1 and Sample2.

[ksValue,Output] = risk.validation.kolmogorovSmirnov(___) also returns a structure Output, that contains the KS score and additional information about the test.

Examples

collapse all

Compute Kolmogorov-Smirnov Statistic for Credit Scores

Open Live Script

Compute the Kolmogorov-Smirnov (KS) statistic for credit scores by using the kolmogorovSmirnov function. In this example, you use the credit validation data set, which includes a table, ScorecardValidationData, that contains credit scores and their corresponding default status information.

Load and display the credit validation data.

load CreditValidationData.mat
head(ScorecardValidationData)

    CreditScore      PD       Default
    ___________    _______    _______

      579.86       0.14182       0   
      563.65       0.17143       0   
      549.52       0.20106       0   
      546.25       0.20845       0   
      485.34       0.37991       0   
      482.07       0.39065       0   
      579.86       0.14182       1   
      451.73         0.494       0

Extract the variables CreditScore and Default from the table ScorecardValidationData. Use Default as the BinaryResponse input argument.

Scores = ScorecardValidationData.CreditScore;
BinaryResponse = ScorecardValidationData.Default;

Compute the KS statistic by using the kolmogorovSmirnov function with the fully qualified namespace risk.validation. For credit models, you can sort the scores from lower scores to higher scores by setting the SortDirection name-value argument to "ascending". This setting ensures that the function sorts the scores from higher risk individuals to lower risk individuals.

[ksValue,Output] = risk.validation.kolmogorovSmirnov(Scores,BinaryResponse,SortDirection="ascending")

ksValue = 
0.1770

Output = struct with fields:
    KolmogorovSmirnovStatistic: 0.1770
        KolmogorovSmirnovScore: 476.4030
                       Metrics: [107×3 table]

The output structure, Output, contains the KS statistic and the value in Score that attains this statistic. Display the metrics Threshold, TruePositiveRate, and FalsePositiveRate contained in the table Output.Metrics.

head(Output.Metrics)

    Threshold    TruePositiveRate    FalsePositiveRate
    _________    ________________    _________________

     408.99                 0                   0     
     408.99          0.071429            0.012821     
     410.12          0.079365            0.017094     
     430.66          0.087302            0.017094     
     435.52          0.087302            0.025641     
     436.65           0.10317            0.029915     
     439.33           0.11905            0.029915     
     440.45           0.13492            0.029915

Calculate Kolmogorov-Smirnov Statistic for Profit and Loss Data

Open Live Script

Calculate the Kolmogorov-Smirnov (KS) statistic for two samples containing risk-theoretical profit and loss (RTPL) data and hypothetical profit and loss (HPL) data, respectively. The vectors RTPL and HPL contain the RTPL and HPL data for 250 trading-days, or one year, of a simulated portfolio.

load("PandLValues.mat")
[ksValue,Output] = risk.validation.kolmogorovSmirnov(RTPL,HPL)

ksValue = 
0.0280

Output = struct with fields:
    KolmogorovSmirnovStatistic: 0.0280
        KolmogorovSmirnovPoint: -1.0261e+03
                 Distributions: [501×3 table]

The output indicates that the largest distance between the empirical cumulative distribution function (CDF) for RTPL and the empirical CDF for HTPL is 0.028.

Display the evaluation points and values for the empirical CDFs.

Output.Distributions

ans=501×3 table
    -3.9596e+04         0         0
    -3.9596e+04    0.0040         0
    -3.0298e+04    0.0040    0.0040
    -2.2525e+04    0.0040    0.0080
    -2.1882e+04    0.0080    0.0080
    -2.0224e+04    0.0120    0.0080
    -2.0065e+04    0.0120    0.0120
    -1.9575e+04    0.0120    0.0160
    -1.8832e+04    0.0160    0.0160
    -1.7563e+04    0.0160    0.0200
    -1.7370e+04    0.0160    0.0240
    -1.7006e+04    0.0160    0.0280
    -1.6749e+04    0.0200    0.0280
    -1.6713e+04    0.0240    0.0280
      ⋮

Input Arguments

collapse all

`Score` — Score values
numeric vector

Score values, specified as a numeric vector, containing values that indicate quantities such as rankings or predictions, PD, or LGD estimates. For more information, see Algorithms.

Data Types: single | double

`BinaryResponse` — Binary response
numeric or logical vector, containing values of `1` (`true`) or `0` (`false`)

Binary response, specified as a numeric or logical vector, that contains values of 1 (true) or 0 (false). The binary response represents the target state for each value in Score.

When you specify BinaryResponse, risk.validation.kolmogorovSmirnov creates two samples from the data in Score. The sample given by the 0 values in BinaryResponse corresponds to the Output argument's FalsePositiveRate field, and the sample given by the 1 values corresponds to the TruePositiveRate field. For more information, see Algorithms.

`Sample1,Sample2` — Sample data
two numeric vectors

Sample data, specified as two numeric vectors

Example: normrnd(0,1,1,100),normrnd(5,2,1,100)

Data Types: single | double

`sortdir` — Sorting direction of the unique values in `Score`
`"descending"` | `"ascending"`

Sorting direction of the distribution variable, specified as one of the following:

"descending" — Default value when you specify Score and BinaryResponse. Descending sorting direction is well suited for binary classifiers. Models that use probability of default data, for example, typically use a descending sorting direction because higher values correspond to higher risk. In this case, a descending sorting direction ensures that TruePositiveRate represents the proportion of defaulters.
"ascending" — Default value when you specify Sample1,Sample2. Ascending sorting orders are well suited for comparing samples. Models that use credit scores, for example, typically use an ascending sorting direction because low values correspond to higher risk.

Example: SortDirection="descending"

Output Arguments

collapse all

`ksValue` — KS value
numeric scalar

KS value for the values contained in Score, returned as a numeric scalar. You can use the KS value to quantify how well a model differentiates between lower risk and higher risk customers.

`Output` — Output metrics
structure

Output metrics, returned as a structure containing the following fields:

KolmogorovSmirnovStatistic — ksValue
KolmogorovSmirnovScore — Value in Score that attains the KS statistic.

When you specify Score and BinaryResponse, Output includes a Metrics field, which is a table with the following columns.

- Thresholds — Unique score values sorted according to the value of sortdir.
- TruePositiveRate — True positive rate values corresponding to the unique scores in the Thresholds column. For credit scoring models, this column represents the proportion of defaulters.
- FalsePositiveRate — False positive rate values corresponding to the unique scores in the Threshold column. For credit scoring models, this column represents the proportion of nondefaulters.

When you specify Sample1,Sample2, Output includes a field Distributions, which is a table with the following columns.

EvaluationPoint — Evaluation points for the CDFs
EmpiricalCDF1 — Values of the Sample1 CDF, evaluated at the points in EvaluationPoint
EmpiricalCDF2 — Values of the Sample2 CDF, evaluated at the points in EvaluationPoint

Algorithms

The risk.validation.kolmogorovSmirnov function calculates the KS statistic by taking the largest absolute difference between the empirical cumulative distribution functions (CDFs) for two samples.

When you specify Sample1,Sample2, the function calculates the empirical CDFs using the data in the samples.
When you specify Score and BinaryResponse, risk.validation.kolmogorovSmirnov uses BinaryResponse to create two samples from the data in Score and then calculates the empirical CDF using the data in the samples. The sample given by the 0 values in BinaryResponse corresponds to the Output argument's FalsePositiveRate field, and the sample given by the 1 values corresponds to the TruePositiveRate field.

Alternative Functionality

You can calculate the and visualize the KS statistic by using the risk.validation.kolmogorovSmirnovPlot plot. risk.validation.kolmogorovSmirnovPlot displays the KS statistic and empirical cumulative distribution function (CDFs) for the samples, and allows you to plot a difference profile for the empirical CDFs. You can also perform a two-sample KS test using kstest2.

Version History

Introduced in R2025a

risk.validation.kolmogorovSmirnov

Syntax

Description

Examples

Compute Kolmogorov-Smirnov Statistic for Credit Scores

Calculate Kolmogorov-Smirnov Statistic for Profit and Loss Data

Input Arguments

Score — Score values numeric vector

BinaryResponse — Binary response numeric or logical vector, containing values of 1 (true) or 0 (false)

Sample1,Sample2 — Sample data two numeric vectors

sortdir — Sorting direction of the unique values in Score "descending" | "ascending"

Output Arguments

ksValue — KS value numeric scalar

Output — Output metrics structure

Algorithms

Alternative Functionality

Version History

See Also

`Score` — Score values
numeric vector

`BinaryResponse` — Binary response
numeric or logical vector, containing values of `1` (`true`) or `0` (`false`)

`Sample1,Sample2` — Sample data
two numeric vectors

`sortdir` — Sorting direction of the unique values in `Score`
`"descending"` | `"ascending"`

`ksValue` — KS value
numeric scalar

`Output` — Output metrics
structure