# shapley

## Description

The Shapley value of a feature for a query point explains the deviation of the prediction for the query point from the average prediction, due to the feature. For each query point, the sum of the Shapley values for all features corresponds to the total deviation of the prediction from the average.

You can create a `shapley`

object for a machine learning model with a
specified query point or query points (`queryPoints`

). The software creates
an object and computes the Shapley values of all features for the query points.

Use the Shapley values to explain the contribution of individual features to a prediction
at each specified query point. Use the `plot`

function to
display a bar graph of the Shapley values for one query point or the mean absolute Shapley
values averaged across multiple query points. If you have multiple query points, you can use
the `boxchart`

, `plotDependence`

, and
`swarmchart`

functions to visualize Shapley values. You can compute the Shapley values for new query points
by using the `fit`

function.

## Creation

### Syntax

### Description

also computes the Shapley values for the query points `explainer`

= shapley(___,QueryPoints=`queryPoints`

)`queryPoints`

and stores the computed Shapley values in the `Shapley`

property of `explainer`

. You can specify
`queryPoints`

in addition to any of the input argument combinations
in the previous syntaxes.

specifies additional options using one or more name-value arguments. For example,
specify `explainer`

= shapley(___,`Name=Value`

)`UseParallel=true`

to compute Shapley values in
parallel.

### Input Arguments

`blackbox`

— Machine learning model to be interpreted

regression model object | classification model object | function handle

Machine learning model to be interpreted, specified as a full or compact regression or classification model object or a function handle.

Full or compact model object — You can specify a full or compact regression or classification model object, which has a

`predict`

object function. The software uses the`predict`

function to compute Shapley values.If you specify a model object that does not contain predictor data (for example, a compact model), then you must provide the predictor data using

`X`

.When you train a model, use a numeric matrix or table for the predictor data where rows correspond to individual observations.

`shapley`

does not support a model object trained with more than one response variable.

**Regression Model Object**Supported Model Full or Compact Regression Model Object Ensemble of regression models `RegressionEnsemble`

,`RegressionBaggedEnsemble`

,`CompactRegressionEnsemble`

Gaussian kernel regression model using random feature expansion `RegressionKernel`

Gaussian process regression `RegressionGP`

,`CompactRegressionGP`

Generalized additive model `RegressionGAM`

,`CompactRegressionGAM`

Linear regression for high-dimensional data `RegressionLinear`

Neural network regression model `RegressionNeuralNetwork`

,`CompactRegressionNeuralNetwork`

Regression tree `RegressionTree`

,`CompactRegressionTree`

Support vector machine regression `RegressionSVM`

,`CompactRegressionSVM`

**Classification Model Object**Supported Model Full or Compact Classification Model Object Discriminant analysis classifier `ClassificationDiscriminant`

,`CompactClassificationDiscriminant`

Multiclass model for support vector machines or other classifiers `ClassificationECOC`

,`CompactClassificationECOC`

Ensemble of learners for classification `ClassificationEnsemble`

,`CompactClassificationEnsemble`

,`ClassificationBaggedEnsemble`

Gaussian kernel classification model using random feature expansion `ClassificationKernel`

Generalized additive model `ClassificationGAM`

,`CompactClassificationGAM`

*k*-nearest neighbor classifier`ClassificationKNN`

Linear classification model `ClassificationLinear`

Multiclass naive Bayes model `ClassificationNaiveBayes`

,`CompactClassificationNaiveBayes`

Neural network classifier `ClassificationNeuralNetwork`

,`CompactClassificationNeuralNetwork`

Support vector machine classifier for one-class and binary classification `ClassificationSVM`

,`CompactClassificationSVM`

Binary decision tree for multiclass classification `ClassificationTree`

,`CompactClassificationTree`

Function handle — You can specify a function handle that accepts predictor data and returns a column vector containing a prediction for each observation in the predictor data. The prediction is a predicted response for regression or a predicted score of a single class for classification. You must provide the predictor data using

`X`

.

`X`

— Predictor data

numeric matrix | table

Predictor data, specified as a numeric matrix or table. Each row of
`X`

corresponds to one observation, and each column corresponds
to one variable.

For a numeric matrix:

The variables that makes up the columns of

`X`

must have the same order as the predictor variables that trained`blackbox`

, stored in`blackbox.X`

.If you trained

`blackbox`

using a table, then`X`

can be a numeric matrix if the table contains all numeric predictor variables.

For a table:

If you trained

`blackbox`

using a table (for example,`Tbl`

), then all predictor variables in`X`

must have the same variable names and data types as those in`Tbl`

. However, the column order of`X`

does not need to correspond to the column order of`Tbl`

.If you trained

`blackbox`

using a numeric matrix, then the predictor names in`blackbox.PredictorNames`

and the corresponding predictor variable names in`X`

must be the same. To specify predictor names during training, use the`PredictorNames`

name-value argument. All predictor variables in`X`

must be numeric vectors.`X`

can contain additional variables (response variables, observation weights, and so on), but`shapley`

ignores them.`shapley`

does not support multicolumn variables or cell arrays other than cell arrays of character vectors.

If `blackbox`

is a model object that does not contain predictor
data or a function handle, you must provide `X`

. If
`blackbox`

is a full machine learning model object and you
specify this argument, then `shapley`

does not use the predictor
data in `blackbox`

; it uses the specified predictor data
only.

**Data Types: **`single`

| `double`

`queryPoints`

— Query points

numeric matrix | table

Query points at which `shapley`

explains a prediction,
specified as a numeric matrix or a table. Each row of `queryPoints`

corresponds to one query point.

For a numeric matrix:

For a table:

If you trained

`blackbox`

using a table (for example,`Tbl`

), then all predictor variables in`queryPoints`

must have the same variable names and data types as those in`Tbl`

. However, the column order of`queryPoints`

does not need to correspond to the column order of`Tbl`

.If you trained

`blackbox`

using a numeric matrix, then the predictor names in`blackbox.PredictorNames`

and the corresponding predictor variable names in`queryPoints`

must be the same. To specify predictor names during training, use the`PredictorNames`

name-value argument. All predictor variables in`queryPoints`

must be numeric vectors.`queryPoints`

can contain additional variables (response variables, observation weights, and so on), but`shapley`

ignores them.`shapley`

does not support multicolumn variables or cell arrays other than cell arrays of character vectors.

If `queryPoints`

contains `NaN`

s for
continuous predictors and `Method`

is
`"conditional"`

, then the Shapley values (`Shapley`

) in
the returned object are `NaN`

s. If you use a regression model that is
a Gaussian process regression (GPR), kernel, linear, neural network, or support vector
machine (SVM) model, then `shapley`

returns `NaN`

Shapley values for query points that contain missing predictor values or categories
not seen during training. For all other models, `shapley`

handles
missing values in `queryPoints`

in the same way as
`blackbox`

(that is, the `predict`

object
function of `blackbox`

or the function handle specified by
`blackbox`

).

*Before R2024a: You can specify only one query point using
QueryPoint=queryPoint, where queryPoint is a
row vector of numeric values or a single-row table.*

**Example: **`blackbox.X(1,:)`

specifies the query point as the first
observation of the predictor data in the full machine learning model
`blackbox`

.

**Data Types: **`single`

| `double`

| `table`

**Name-Value Arguments**

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

**Example: **`shapley(blackbox,QueryPoint=q,Method="conditional")`

creates
a `shapley`

object and computes the Shapley values for the query point
`q`

using the extension to the Kernel SHAP algorithm.

`CategoricalPredictors`

— Categorical predictors list

vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | `"all"`

Categorical predictors list, specified as one of the values in this table.

Value | Description |
---|---|

Vector of positive integers | Each entry in the vector is an index value indicating that the corresponding predictor
is categorical. The index values are between 1 and If |

Logical vector | A |

Character matrix | Each row of the matrix is the name of a predictor variable. The names must match the variable names of the predictor data in the form of a table. Pad the names with extra blanks so each row of the character matrix has the same length. |

String array or cell array of character vectors | Each element in the array is the name of a predictor variable. The names must match the variable names of the predictor data in the form of a table. |

`"all"` | All predictors are categorical. |

If you specify

`blackbox`

as a function handle, then`shapley`

identifies categorical predictors from the predictor data`X`

. If the predictor data is in a table,`shapley`

assumes that a variable is categorical if it is a logical vector, unordered categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix,`shapley`

assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the`CategoricalPredictors`

name-value argument.If you specify

`blackbox`

as a regression or classification model object, then`shapley`

identifies categorical predictors by using the`CategoricalPredictors`

property of the model object.

`shapley`

supports an ordered categorical predictor when
`blackbox`

supports ordered categorical predictors and you
specify `Method`

as
`"interventional"`

.

**Example: **`CategoricalPredictors="all"`

**Data Types: **`single`

| `double`

| `logical`

| `char`

| `string`

| `cell`

`MaxNumSubsets`

— Maximum number of predictor subsets

`min(2^M,1024)`

where `M`

is
the number of predictors (default) | positive integer

Maximum number of predictor subsets to use for Shapley value computations, specified as a positive integer.

For details on how `shapley`

chooses the subsets to use,
see Computational Cost.

This argument is valid only when `shapley`

uses the Kernel SHAP
algorithm or the extension to the Kernel SHAP algorithm. If you set the
`MaxNumSubsets`

argument when `Method`

is
`"interventional"`

, the software uses the Kernel SHAP algorithm.
For more information, see Algorithms.

**Example: **`MaxNumSubsets=100`

**Data Types: **`single`

| `double`

`Method`

— Shapley value computation algorithm

`"interventional"`

(default) | `"conditional"`

Shapley value computation algorithm, specified as
`"interventional"`

or `"conditional"`

.

`"interventional"`

(default) —`shapley`

computes the Shapley values with an interventional value function.`shapley`

offers three interventional algorithms: Kernel SHAP [1], Linear SHAP [1], and Tree SHAP [2]. The software selects an algorithm based on the machine learning model`blackbox`

and other specified options. For details, see Interventional Algorithms.`"conditional"`

—`shapley`

uses the extension to the Kernel SHAP algorithm [3] with a conditional value function.

The `Method`

property stores the name of the selected algorithm. For more information, see Algorithms.

*Before R2023a: You can specify this argument as
"interventional-kernel" or
"conditional-kernel". shapley supports
the Kernel SHAP algorithm and the extension of the Kernel SHAP algorithm.*

**Example: **`Method="conditional"`

**Data Types: **`char`

| `string`

`NumObservationsToSample`

— Number of observations to sample from predictor data

`100`

(default) | `"all"`

| positive integer scalar

*Since R2024b*

Number of observations to sample from the predictor data, specified as
`"all"`

or a positive integer scalar. A value of
`"all"`

indicates to use all observations in the predictor data
`X`

to compute Shapley values. A value of *n*
indicates to use at most *n* observations randomly sampled from
`X`

. To see the sampled observations, use the
`SampledObservationIndices`

property.

**Example: **`NumObservationsToSample="all"`

**Data Types: **`single`

| `double`

| `char`

| `string`

`OutputFcn`

— Function called after each query point evaluation

`[]`

(default) | function handle

*Since R2024a*

Function called after each query point evaluation, specified as a function handle. An output function can perform various tasks, such as stopping Shapley value computations, creating variables, or plotting results. For details and examples on how to write your own output functions, see Shapley Output Functions.

This argument is valid only when the `shapley`

function computes
Shapley values for multiple query points.

**Data Types: **`function_handle`

`UseParallel`

— Flag to run in parallel

`false`

(default) | `true`

Flag to run in parallel, specified as a numeric or logical
`1`

(`true`

) or `0`

(`false`

). If you specify `UseParallel=true`

, the
`shapley`

function executes `for`

-loop iterations by
using `parfor`

. The loop runs in parallel when you
have Parallel Computing Toolbox™.

This argument is valid only when the `shapley`

function computes
Shapley values for multiple query points, or computes Shapley values for one query
point by using the Tree SHAP algorithm for an ensemble of trees, the Kernel SHAP
algorithm, or the extension to the Kernel SHAP algorithm.

**Example: **`UseParallel=true`

**Data Types: **`logical`

## Properties

`BlackboxModel`

— Machine learning model to be interpreted

regression model object | classification model object | function handle

This property is read-only.

Machine learning model to be interpreted, specified as a regression or classification model object or a function handle.

The `blackbox`

argument sets this property.

`BlackboxFitted`

— Predictions for query points computed by machine learning model

vector | `[]`

This property is read-only.

Predictions for the query points computed by the machine learning model (`BlackboxModel`

), specified as a vector.

If

`BlackboxModel`

is a model object, then`BlackboxFitted`

contains predicted responses for regression or classified labels for classification.If

`BlackboxModel`

is a function handle, then`BlackboxFitted`

contains values returned by the function handle, either predicted responses for regression or predicted scores for classification.

The `BlackboxFitted`

property is empty if you do not specify
query points.

`CategoricalPredictors`

— Categorical predictor indices

vector of positive integers | `[]`

This property is read-only.

Categorical predictor
indices, specified as a vector of positive integers. `CategoricalPredictors`

contains index values indicating that the corresponding predictors are categorical. The index
values are between 1 and `p`

, where `p`

is the number of
predictors used to train the model. If none of the predictors are categorical, then this
property is empty (`[]`

).

If you specify

`blackbox`

using a function handle, then`shapley`

identifies categorical predictors from the predictor data`X`

. If you specify the`CategoricalPredictors`

name-value argument, then the argument sets this property.If you specify

`blackbox`

as a regression or classification model object, then`shapley`

determines this property by using the`CategoricalPredictors`

property of the model object.

`shapley`

supports an ordered categorical predictor when
`blackbox`

supports ordered categorical predictors and when you
specify `Method`

as
`"interventional"`

.

`Intercept`

— Average prediction

numeric vector | numeric scalar

This property is read-only.

Average prediction, averaged over the predictor data `X`

,
specified as a numeric vector or numeric scalar.

If

`BlackboxModel`

is a classification model object, then`Intercept`

is a vector of the average classification scores for each class.If

`BlackboxModel`

is a regression model object, then`Intercept`

is a scalar of the average response.If

`BlackboxModel`

is a function handle, then`Intercept`

is a scalar of the average function evaluation.

For a query point, the sum of the Shapley values for all features corresponds to the
total deviation of the prediction from the average
(`Intercept`

).

`MeanAbsoluteShapley`

— Mean absolute Shapley values

table | `[]`

*Since R2024a*

This property is read-only.

Mean absolute Shapley values, specified as a table. The mean is taken over all query
points (`QueryPoints`

).

For regression, the table has two columns. The first column contains the predictor variable names, and the second column contains the mean absolute Shapley values of the predictors.

For classification, the table has two or more columns, depending on the number of classes in

`BlackboxModel`

. The first column contains the predictor variable names, and the rest of the columns contain the mean absolute Shapley values of the predictors for each class.

The `MeanAbsoluteShapley`

property is empty if you do not specify
query points.

`Method`

— Shapley value computation algorithm

`"interventional-linear"`

| `"interventional-tree"`

| `"interventional-kernel"`

| `"interventional-mix"`

| `"conditional-kernel"`

This property is read-only.

Shapley value computation algorithm, specified as
`"interventional-linear"`

, `"interventional-tree"`

,
`"interventional-kernel"`

, `"interventional-mix"`

,
or `"conditional-kernel"`

.

`"interventional-linear"`

—`shapley`

uses the Linear SHAP algorithm [1] with an interventional value function. That is,`shapley`

computes interventional Shapley values using the estimated coefficients for linear models.`"interventional-tree"`

—`shapley`

uses the Tree SHAP algorithm [2] with an interventional value function.`"interventional-kernel"`

—`shapley`

uses the Kernel SHAP algorithm [1] with an interventional value function.`"interventional-mix"`

—`shapley`

might not use the same Shapley value computation algorithm for all query points. That is,`shapley`

might use the Tree SHAP algorithm with an interventional value function to compute Shapley values for some query points, and use the Kernel SHAP algorithm with an interventional value function to compute Shapley values for other query points.*(since R2024a)*For an example that shows how to find the method information for specific query points, see Find Method Used for Individual Shapley Value Computations.

`"conditional-kernel"`

—`shapley`

uses the extension to the Kernel SHAP algorithm [3] with a conditional value function.

The `Method`

argument of `shapley`

or the `Method`

argument of `fit`

sets this property.

For more information, see Algorithms.

`NumSubsets`

— Number of predictor subsets

positive integer

This property is read-only.

Number of predictor subsets to use for Shapley value computations, specified as a positive integer.

The `MaxNumSubsets`

argument of `shapley`

or the `MaxNumSubsets`

argument of `fit`

sets this property.

For details on how `shapley`

chooses the subsets to use, see
Computational Cost.

`QueryPoints`

— Query points

numeric matrix | table | `[]`

This property is read-only.

Query points at which `shapley`

explains predictions using the
Shapley values (`Shapley`

),
specified as a numeric matrix or a table.

The `QueryPoints=`

name-value argument of `queryPoints`

`shapley`

or the `queryPoints`

argument of `fit`

sets this property.

`SampledObservationIndices`

— Indices of observations sampled from predictor data

numeric vector

*Since R2024b*

This property is read-only.

Indices of the observations sampled from the predictor data `X`

, specified
as a numeric vector. To see the sampled observations, use
`explainer.X(explainer.SampledObservationIndices,:)`

.

The `NumObservationsToSample`

name-value argument of
`shapley`

sets this property.

`Shapley`

— Shapley values for query points

table | `[]`

This property is read-only.

Shapley values for the query points (`QueryPoints`

),
specified as a table.

For regression, the table has two columns. The first column contains the predictor variable names, and the second column contains the Shapley values of the predictors.

For classification, the table has two or more columns, depending on the number of classes in

`BlackboxModel`

. The first column contains the predictor variable names, and the rest of the columns contain the Shapley values of the predictors for each class.

The `Shapley`

property is empty if you do not specify query
points.

For an example that shows how to find Shapley values for one query point after fitting multiple query points, see Investigate One Query Point After Fitting Multiple Query Points.

*Before R2024b: The Shapley property is
named ShapleyValues.*

`X`

— Predictor data

numeric matrix | table

This property is read-only.

Predictor data, specified as a numeric matrix or table.

Each row of `X`

corresponds to one observation, and each column
corresponds to one variable.

If an observation contains `NaN`

s for continuous predictors and
`Method`

is
`"conditional-kernel"`

, then `shapley`

does not use
the observation for the Shapley value computation. Similarly, if an observation contains
missing predictor values or categories not seen during training, and `BlackboxModel`

is a regression model of type GPR, kernel, linear, neural network, or SVM, then
`shapley`

omits the observation from the Shapley value computation.
Otherwise, `shapley`

handles missing values in `X`

in the same way as `BlackboxModel`

(that is, the
`predict`

object function of `BlackboxModel`

or
the function handle specified by `BlackboxModel`

).

`shapley`

stores all observations, including the rows with missing
values, in this property.

## Object Functions

`fit` | Compute Shapley values for query points |

`plot` | Plot Shapley values using bar graphs |

`plotDependence` | Plot dependence of Shapley values on predictor values |

`boxchart` | Visualize Shapley values using box charts (box plots) |

`swarmchart` | Visualize Shapley values using swarm scatter charts |

## Examples

### Compute Shapley Values When Creating `shapley`

Object

Train a classification model and create a `shapley`

object. When you create a `shapley`

object, specify a query point so that the software computes the Shapley values for the query point. Then create a bar graph of the Shapley values by using the object function `plot`

.

Load the `CreditRating_Historical`

data set. The data set contains customer IDs and their financial ratios, industry labels, and credit ratings.

`tbl = readtable("CreditRating_Historical.dat");`

Display the first three rows of the table.

head(tbl,3)

ID WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry Rating _____ _____ _____ _______ ________ _____ ________ ______ 62394 0.013 0.104 0.036 0.447 0.142 3 {'BB'} 48608 0.232 0.335 0.062 1.969 0.281 8 {'A' } 42444 0.311 0.367 0.074 1.935 0.366 1 {'A' }

Train a blackbox model of credit ratings by using the `fitcecoc`

function. Use the variables from the second through seventh columns in `tbl`

as the predictor variables. A recommended practice is to specify the class names to set the order of the classes.

blackbox = fitcecoc(tbl,"Rating", ... PredictorNames=tbl.Properties.VariableNames(2:7), ... CategoricalPredictors="Industry", ... ClassNames={'AAA','AA','A','BBB','BB','B','CCC'});

Create a `shapley`

object that explains the prediction for the last observation. Specify a query point so that the software computes Shapley values and stores them in the `Shapley`

property.

queryPoint = tbl(end,:)

`queryPoint=`*1×8 table*
ID WC_TA RE_TA EBIT_TA MVE_BVTD S_TA Industry Rating
_____ _____ _____ _______ ________ ____ ________ ______
73104 0.239 0.463 0.065 2.924 0.34 2 {'AA'}

explainer = shapley(blackbox,QueryPoints=queryPoint)

explainer = shapley explainer with the following local Shapley values: Predictor AAA AA A BBB BB B CCC __________ _________ __________ __________ __________ ___________ __________ __________ "WC_TA" 0.054853 0.022849 0.0082629 3.418e-07 -0.031172 -0.045745 -0.044031 "RE_TA" 0.17254 0.093639 0.048798 -0.015662 -0.097291 -0.22498 -0.31434 "EBIT_TA" 0.0012558 0.0005285 0.00038919 5.0004e-05 -0.00076196 -0.0014544 -0.0012907 "MVE_BVTD" 1.3942 1.3051 0.53214 -0.27713 -0.88189 -1.1197 -0.87933 "S_TA" -0.012379 -0.0080417 0.00013755 -0.0020191 -0.00019923 0.0018047 -0.0026414 "Industry" -0.1102 -0.057898 -0.0019888 0.08099 0.097352 0.11483 0.16764 Properties, Methods

By default, `shapley`

subsamples 100 observations from the data in `blackbox.X`

to compute the Shapley values. For faster computation, use a smaller sample of the training set or specify `UseParallel`

as `true`

.

For a classification model, `shapley`

computes Shapley values using the predicted class score for each class. Display the values in the `Shapley`

property.

explainer.Shapley

`ans=`*6×8 table*
Predictor AAA AA A BBB BB B CCC
__________ _________ __________ __________ __________ ___________ __________ __________
"WC_TA" 0.054853 0.022849 0.0082629 3.418e-07 -0.031172 -0.045745 -0.044031
"RE_TA" 0.17254 0.093639 0.048798 -0.015662 -0.097291 -0.22498 -0.31434
"EBIT_TA" 0.0012558 0.0005285 0.00038919 5.0004e-05 -0.00076196 -0.0014544 -0.0012907
"MVE_BVTD" 1.3942 1.3051 0.53214 -0.27713 -0.88189 -1.1197 -0.87933
"S_TA" -0.012379 -0.0080417 0.00013755 -0.0020191 -0.00019923 0.0018047 -0.0026414
"Industry" -0.1102 -0.057898 -0.0019888 0.08099 0.097352 0.11483 0.16764

The `Shapley`

property contains the Shapley values of all features for each class.

Plot the Shapley values for the predicted class by using the `plot`

function.

plot(explainer)

The horizontal bar graph shows the Shapley values for all variables, sorted by their absolute values. Each Shapley value explains the deviation of the score for the query point from the average score of the predicted class, due to the corresponding variable.

### Create `shapley`

Object and Compute Shapley Values Using `fit`

Train a regression model and create a `shapley`

object. When you create a `shapley`

object, if you do not specify query points, then the software does not compute Shapley values. Use the object function `fit`

to compute the Shapley values for a specified query point. Then create a bar graph of the Shapley values by using the object function `plot`

.

Load the `carbig`

data set, which contains measurements of cars made in the 1970s and early 1980s.

`load carbig`

Create a table containing the predictor variables `Acceleration`

, `Cylinders`

, and so on, as well as the response variable `MPG`

.

```
tbl = table(Acceleration,Cylinders,Displacement, ...
Horsepower,Model_Year,Weight,MPG);
```

Removing missing values in a training set can help reduce memory consumption and speed up training for the `fitrkernel`

function. Remove missing values in `tbl`

.

tbl = rmmissing(tbl);

Train a blackbox model of `MPG`

by using the `fitrkernel`

function. Specify the `Cylinders`

and `Model_Year`

variables as categorical predictors. Standardize the remaining predictors.

rng("default") % For reproducibility mdl = fitrkernel(tbl,"MPG",CategoricalPredictors=[2 5], ... Standardize=true);

Create a `shapley`

object. Specify the data set `tbl`

, because `mdl`

does not contain training data.

explainer = shapley(mdl,tbl)

explainer = BlackboxModel: [1×1 RegressionKernel] QueryPoints: [] BlackboxFitted: [] Shapley: [] X: [392×7 table] CategoricalPredictors: [2 5] Method: "interventional-kernel" Intercept: 23.2474 NumSubsets: 64

`explainer`

stores the training data `tbl`

in the `X`

property. By default, `shapley`

subsamples 100 observations from the data in `X`

and stores their indices in the `SampledObservationIndices`

property.

Compute the Shapley values of all predictor variables for the first observation in `tbl`

. The `fit`

object function uses the sampled observations rather than all of `X`

to compute the Shapley values.

queryPoint = tbl(1,:)

`queryPoint=`*1×7 table*
Acceleration Cylinders Displacement Horsepower Model_Year Weight MPG
____________ _________ ____________ __________ __________ ______ ___
12 8 307 130 70 3504 18

explainer = fit(explainer,queryPoint);

For a regression model, `fit`

computes Shapley values using the predicted response, and stores them in the `Shapley`

property of the `shapley`

object. Display the values in the `Shapley`

property.

explainer.Shapley

`ans=`*6×2 table*
Predictor Value
______________ ________
"Acceleration" -0.33821
"Cylinders" -0.97631
"Displacement" -1.1425
"Horsepower" -0.62927
"Model_Year" -0.17268
"Weight" -0.87595

Plot the Shapley values for the query point by using the `plot`

function.

plot(explainer)

The horizontal bar graph shows the Shapley values for all variables, sorted by their absolute values. Each Shapley value explains the deviation of the prediction for the query point from the average, due to the corresponding variable.

### Investigate One Query Point After Fitting Multiple Query Points

Train a classification model and create a shapley object. Visualize the Shapley values for multiple query points by using the `swarmchart`

object function. Find the Shapley values for any query points of interest.

Load the `fisheriris`

data set, which contains measurements for 150 irises, and create a table. `SepalLength`

, `SepalWidth`

, `PetalLength`

, and `PetalWidth`

are the predictor variables, and `Species`

is the response variable.

`fisheriris = readtable("fisheriris.csv");`

Partition the data into training and test sets. Use 75% of the observations to create the training set and 25% of the observations to create the test set.

rng("default") c = cvpartition(fisheriris.Species,"Holdout",0.25); trainTbl = fisheriris(training(c),:); testTbl = fisheriris(test(c),:);

Train a classification model by using the `fitcnet`

function. Standardize the predictor variables, and specify the order of the classes.

mdl = fitcnet(trainTbl,"Species",Standardize=true, ... ClassNames={'setosa','versicolor','virginica'});

Create a shapley object that explains the predictions for multiple query points. Use the test set data to compute the Shapley values, and specify the observations in the test set as the query points.

explainer = shapley(mdl,testTbl,QueryPoints=testTbl)

explainer = shapley explainer with the following mean absolute Shapley values: Predictor setosa versicolor virginica _____________ ________ __________ _________ "SepalLength" 0.12466 0.12539 0.066055 "SepalWidth" 0.027488 0.03004 0.016665 "PetalLength" 0.17226 0.14164 0.18777 "PetalWidth" 0.11795 0.17135 0.23687 Properties, Methods

For a classification model, `shapley`

computes the Shapley values using the predicted class scores, and stores them in the `Shapley`

property. Because `explainer`

contains Shapley values for multiple query points, the object display shows the mean absolute Shapley values by default.

For each predictor and class, the mean absolute Shapley value is the absolute value of the Shapley values, averaged across all query points.

Visualize the distribution of the Shapley values for the default class (`setosa`

) by using the `swarmchart`

object function.

swarmchart(explainer)

For each predictor, the function displays the Shapley values for the query points. The corresponding swarm chart shows the distribution of the Shapley values. The function determines the order of the predictors by using the mean absolute Shapley values.

Find the observation with the lowest `SepalWidth`

Shapley value for class `setosa`

. Use data tips to find the index of the observation in the set of query points.

The query point is the 17th observation in the set of query points.

Find the observation's Shapley values in the `Shapley`

property of `explainer`

.

First, define a custom function named `localShapley`

that returns a table of Shapley values for the observation with the specified query point index (`queryPointIndex`

) in the specified `shapley`

object (`explainer`

).

function queryPointTbl= localShapley(explainer,queryPointIndex) tbl = explainer.Shapley(:,2:end); queryPointTbl = varfun(@(x)x(:,queryPointIndex),tbl); queryPointTbl.Properties.VariableNames = tbl.Properties.VariableNames; queryPointTbl = [explainer.Shapley(:,1) queryPointTbl]; end

Return the Shapley values for the query point with index `17`

.

results = localShapley(explainer,17)

`results=`*4×4 table*
Predictor setosa versicolor virginica
_____________ ________ __________ _________
"SepalLength" 0.06193 -0.028438 -0.033492
"SepalWidth" -0.1135 0.088441 0.02506
"PetalLength" -0.1543 0.31506 -0.16076
"PetalWidth" -0.11846 0.35579 -0.23734

Plot the query point Shapley values using the `plot`

object function.

plot(explainer,QueryPointIndices=17)

By default, the function plots the Shapley values for the `versicolor`

class because it is the predicted class for the query point.

### Specify Blackbox Model Using Function Handle

Train a regression model and create a `shapley`

object using a function handle to the `predict`

function of the model. Use the object function `fit`

to compute the Shapley values for the specified query point. Then plot the Shapley values by using the object function `plot`

.

Load the `carbig`

data set, which contains measurements of cars made in the 1970s and early 1980s.

`load carbig`

Create a table containing the predictor variables `Acceleration`

, `Cylinders`

, and so on.

```
tbl = table(Acceleration,Cylinders,Displacement, ...
Horsepower,Model_Year,Weight);
```

Train a blackbox model of `MPG`

by using the `TreeBagger`

function.

rng("default") % For reproducibility Mdl = TreeBagger(100,tbl,MPG,Method="regression", ... CategoricalPredictors=[2 5]);

`shapley`

does not support a `TreeBagger`

object directly, so you cannot specify the first input argument (blackbox model) of `shapley`

as a `TreeBagger`

object. Instead, you can use a function handle to the `predict`

function. You can also specify options of the `predict`

function using name-value arguments of the function.

Create the function handle to the `predict`

function of the `TreeBagger`

object `Mdl`

. Specify the array of tree indices to use as `1:50`

.

f = @(tbl) predict(Mdl,tbl,Trees=1:50);

Create a `shapley`

object using the function handle `f`

. When you specify a blackbox model as a function handle, you must provide the predictor data. `tbl`

includes categorical predictors (`Cylinder`

and `Model_Year`

) with the `double`

data type. By default, `shapley`

does not treat variables with the `double`

data type as categorical predictors. Specify the second (`Cylinder`

) and fifth (`Model_Year`

) variables as categorical predictors.

explainer = shapley(f,tbl,CategoricalPredictors=[2 5]); explainer = fit(explainer,tbl(1,:));

Plot the Shapley values.

plot(explainer)

## More About

### Shapley Values

In game theory, the Shapley value of a player is the average marginal contribution of the player in a cooperative game. In the context of machine learning prediction, the Shapley value of a feature for a query point explains the contribution of the feature to a prediction (response for regression or score of each class for classification) at the specified query point.

The Shapley value of a feature for a query point is the contribution of the feature to the deviation from the average prediction. For a query point, the sum of the Shapley values for all features corresponds to the total deviation of the prediction from the average. That is, the sum of the average prediction and the Shapley values for all features corresponds to the prediction for the query point.

For more details, see Shapley Values for Machine Learning Model.

## References

[1] Lundberg, Scott M., and
S. Lee. "A Unified Approach to Interpreting Model Predictions." *Advances in Neural
Information Processing Systems* 30 (2017): 4765–774.

[2] Lundberg, Scott M., G.
Erion, H. Chen, et al. "From Local Explanations to Global Understanding with Explainable AI for
Trees." *Nature Machine Intelligence* 2 (January 2020):
56–67.

[3] Aas, Kjersti, Martin Jullum,
and Anders Løland. "Explaining Individual Predictions When Features Are Dependent: More Accurate
Approximations to Shapley Values." *Artificial Intelligence* 298 (September
2021).

## Extended Capabilities

### Automatic Parallel Support

Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To run in parallel, set the `UseParallel`

name-value argument to
`true`

in the call to this function.

For more general information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).

## Version History

**Introduced in R2021a**

### R2024b: Speed up Shapley value computations

To speed up Shapley value computations, the `shapley`

function now
uses a default maximum of 100 observations from the predictor data set `X`

to compute
Shapley values. You can set the number of predictor data observations to use by specifying
the `NumObservationsToSample`

name-value argument. You can find the indices of the
sampled observations in the `SampledObservationIndices`

property of the `shapley`

object.

In previous releases, `shapley`

uses the entire predictor data set to
compute Shapley values. To replicate this behavior, set the
`NumObservationsToSample`

value to `"all"`

.

You can also run Shapley value computations in parallel when using the `OutputFcn`

name-value argument by setting the `UseParallel`

value
to `true`

. To parallelize Shapley value computations, you must have
Parallel Computing Toolbox.

### R2024b: Shapley values property is now named `Shapley`

The `Shapley`

property of the `shapley`

object
contains the Shapley values for the query points. In previous releases, the
`Shapley`

property is named `ShapleyValues`

.

When you use a regression model (`blackbox`

) to compute Shapley
values, the `Shapley`

property contains a table whose second column is
`Value`

. In previous releases, the column name is
`ShapleyValue`

.

### R2024a: Compute Shapley values for multiple query points

You can now compute Shapley values for multiple query points by using the
`QueryPoints=`

name-value argument. Before R2024a, you could specify
only one query point using `queryPoints`

`QueryPoint=queryPoint`

, where
`queryPoint`

is a row vector of numeric values or a single-row
table.

The `shapley`

object contains a new property `MeanAbsoluteShapley`

, which contains the absolute Shapley values, averaged
across all query points. Additionally, the `Method`

property
can now have the value `"interventional-mix"`

. This value indicates that
the software might not use the same Shapley value computation algorithm for all query
points.

When computing Shapley values for multiple query points, you can use an output function
to perform various tasks, such as stopping Shapley value computations, creating variables,
or plotting results. To do so, use the `OutputFcn`

name-value argument.

### R2023b: Interventional Tree SHAP algorithm supports data with missing predictor values

When observations in the input predictor data (

or `blackbox`

.X`X`

) or values in
the query point (`queryPoint`

)
contain missing values and the `Method`

value is
`"interventional"`

, the `shapley`

function can use the
Tree SHAP algorithm for tree models and ensemble models of tree learners. In previous
releases, under these conditions, the `shapley`

function always used the
Kernel SHAP algorithm for tree-based models. For more information, including cases where the
software still uses Kernel SHAP instead of Tree SHAP for tree-based models, see Interventional Algorithms.

### R2023b: Improved performance of Tree SHAP algorithm for tree-based models

The `shapley`

function
shows improved performance when computing Shapley values for tree models and ensemble models
of tree learners by using the Tree SHAP algorithm with an interventional value function (see
`Method`

). The
performance increase is sensitive to the values of the `shapley`

input
arguments (such as the predictor data, machine learning model, and query point). For
example, the Shapley value computation in this code is about 36x faster than in the previous
release:

function timingTest % Generate data set rng("default") numObservations = 1e5; numPredictors = 10; X = rand(numObservations,numPredictors); Y = rand(numObservations,1); % Train model mdl = fitrensemble(X,Y,Learners="tree"); % Compute Shapley value tic shapley(mdl,"QueryPoint",X(50,:),Method="interventional"); toc end

The approximate execution times are:

**R2023b:**3s**R2023a:**107s

The code was timed on a Windows^{®} 10, Intel^{®}
Xeon^{®} CPU E5-1650 v4 @ 3.60GHz test system by calling the function
`timingTest`

.

### R2023a: `shapley`

supports the Linear SHAP and Tree SHAP algorithms

`shapley`

supports the Linear SHAP [1] algorithm for linear models and the Tree
SHAP [2] algorithm for tree models and ensemble
models of tree learners.

If you specify the `Method`

name-value
argument as `'interventional'`

(default), `shapley`

selects
an algorithm based on the machine learning model type of `blackbox`

. The
`Method`

property
stores the name of the selected algorithm.

### R2023a: Values of the `Method`

name-value argument have changed

The supported values of the `Method`

name-value
argument have changed from `'interventional-kernel'`

and
`'conditional-kernel'`

to `'interventional'`

and
`'conditional'`

, respectively.

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)