Interpretability

What Is Interpretability?

How it works, why it matters, and how to apply it

Interpretability is the degree to which machine learning algorithms can be understood by humans. More specifically, interpretability describes the ability to understand the reasoning behind predictions and decisions made by a machine learning model.

How Interpretability Works

Interpretability techniques help to reveal how machine learning models make predictions. By uncovering how various features contribute (or do not contribute) to predictions, interpretability techniques can help you validate that a machine learning model is using appropriate evidence for predictions and find biases in your model that were not apparent during training. Some machine learning models, such as linear regression, decision trees, and generative additive models are inherently interpretable. However, interpretability often comes at the expense of power and accuracy, as illustrated in Figure 1.

Graph showing machine learning algorithms plotted with interpretability on the x-axis versus their predictive power on the y-axis. — Figure 1. Tradeoff between performance and interpretability for multiple popular machine learning algorithms.

Interpretability vs. Explainability

Explainable AI is an emerging field where the closely related terms interpretability and explainability are often used interchangeably. However, interpretability and explainability are different. Explainability refers to explaining the behavior of a machine learning model in human terms, without necessarily understanding the model’s inner mechanisms. Explainability can also be viewed as model-agnostic interpretability.

For engineers, one approach to explain the behavior of a system is by using first principles. A first-principles model has clear, explainable physical meaning, and its behavior can be parameterized. That type of model is known as “white box.” The behavior of machine learning models is more “opaque.”

Machine learning models vary in complexity, intuitiveness of knowledge representation, and, thus, difficulty to fully understand how they work. Machine learning models can be “gray box,” in which case you can apply interpretability techniques to understand their inner mechanisms, or “black box,” in which case you can apply explainability (or model-agnostic interpretability) techniques to understand their behavior. Deep learning models are typically black box.

Gradated bar starting with black box, on the left, representing explainable machine learning modes, to gray box, in the middle, for interpretable machine learning models, to white box, on the right, for first-principles models. — Figure 2. Interpretability techniques can be applied to “gray-box” models, whereas for “black-box” models, you typically use explainability techniques.

Global and Local Interpretability Methods

Interpretability is typically applied at two levels:

Global methods: These interpretability methods provide an overview of the most influential variables in the model based on input data and predicted output.
Local methods: These interpretability methods provide an explanation of a single prediction result.

Figure 3 illustrates the difference between the local and global scope of interpretability. You can also apply interpretability to groups within your data and arrive at conclusions at the group level, such as why a group of manufactured products were classified as faulty.

Graph with dots grouped into regions representing global and local interpretability. — Figure 3. Global vs. local interpretability: The two classes are represented by purple and orange dots.

Get Started with Interpretability Examples in MATLAB

Popular techniques for local interpretability include local interpretable model-agnostic explanations (LIME) and Shapley values. For global interpretability, many users start with feature ranking (or importance) and visualizing partial dependence plots. You can apply these techniques using MATLAB^®, as shown in these examples:

Interpret Generalized Additive Model Using Partial Dependence

Understand Network Predictions Using LIME

Interpret Machine Learning Model Using Shapley Values

Why Interpretability Matters

Engineers and scientists seek model interpretability for three main reasons:

Debugging: Understanding where or why predictions go wrong. Running “what-if” scenarios can improve model robustness and eliminate bias.
Guidelines: Black-box or gray-box models may violate industry best practices.
Regulations: Some government regulations require interpretability for sensitive applications, such as finance, public health, and transportation.

Model interpretability addresses these concerns and increases trust in the models in situations where explanations for predictions are important, such as when comparing results between competing models, or necessary, such as when interpretability is required by regulation.

Applications Where Interpretability Matters

Interpretability tools help you understand why a machine learning model makes the predictions that it does. Interpretability is likely to become increasingly relevant as regulatory and professional bodies continue to work toward a framework for certifying AI for sensitive applications, such as:

Automated Driving Systems

Medical Devices

Computational Finance

Interpretability with MATLAB

Using MATLAB with Statistics and Machine Learning Toolbox™ enables you to you apply techniques to interpret machine learning models. Here, some popular interpretability techniques are described.

Local Interpretable Model-Agnostic Explanations (LIME): Use LIME to approximate a complex model in the neighborhood of the prediction of interest with a simple interpretable model, such as a linear model or decision tree. You can then use the simpler model as a surrogate to explain how the original (complex) model works. Figure 4 illustrates the three main steps of applying LIME.

Graphs and plots showing three main steps of applying LIME: complex model and point of interest, approximate with simple model, and explain, which equals drivers within areas of interest. — Figure 4. By fitting a LIME object in MATLAB, you can obtain LIME explanations via a simple interpretable model.

Partial Dependence (PDP) and Individual Conditional Expectation (ICE) Plots: With these methods, you examine the effect of one or two predictors on the overall prediction by averaging the output of the model over all the possible feature values. Figure 5 shows a partial dependence plot that was generated with the plotPartialDependence function.

Strictly speaking, a partial dependence plot shows that certain ranges in the value of a predictor are associated with specific likelihoods for prediction; that’s not sufficient to establish a causal relationship between predictor values and prediction. However, if a local interpretability method like LIME indicates the predictor significantly influenced the prediction (in an area of interest), you can arrive at an explanation why a model behaved a certain way in that local area.

Shapley Values: This technique explains how much each predictor contributes to a prediction by calculating the deviation of a prediction of interest from the average. This method is particularly popular within the finance industry because it is derived from game theory as its theoretical underpinning, and because it satisfies the regulatory requirement of providing complete explanations: the sum of the Shapley values for all features corresponds to the total deviation of the prediction from the average. The shapley function computes Shapley values for a query point of interest.

Evaluating all combinations of features generally takes a long time. Therefore, Shapley values are often approximated by applying Monte Carlo simulation in practice.

Figure 6 shows that in the context of predicting heart arrhythmia near the sample of interest, MFCC4 had a strong positive impact on predicting “abnormal,” while MFCC11 and 5 leaned against that prediction, i.e., toward a “normal” heart.

A partial dependence plot with x-axis values that deviate above or below 0, representing Shapley values. The y-axis shows the features used to train the machine learning model. — Figure 5. Partial dependence plot showing that the probability of “standing” decreases sharply if the gyroscope indicates significant angular velocity.

A graph with Shapley values ranging from -1 to 1.5 on the x-axis and various predictors on the y-axis. Each predictor deviates below or above a vertical line coming from 0 on the x-axis. — Figure 6. The Shapley values indicate how much each predictor deviates from the average prediction at the point of interest, indicated by the vertical line at zero.

Predictor Importance Estimations by Permutation: MATLAB also supports permuted predictor importance for random forests. This approach takes the impact changes in predictor values have on model prediction error as an indication of predictor importance. The function shuffles the values of a predictor on test or training data and observes the magnitude of the resulting changes in error.

Choosing a Method for Interpretability

Different interpretability methods have their own limitations and strengths; take these into account when you seek interpretability in your application. Figure 6 provides an overview of interpretability methods and guidance on how to apply them. The interpretability methods presented in Figure 7 are available in MATLAB.

Diagram showing decision paths for selecting an interpretability method. — Figure 7. How to select the appropriate interpretability method.

Interpretability FAQs

Interpretability is the degree to which machine learning algorithms can be understood by humans, specifically describing the ability to understand the reasoning behind predictions and decisions made by a machine learning model.

Interpretability refers to understanding a model's inner mechanisms, while explainability refers to explaining the behavior of a machine learning model in human terms without necessarily understanding those inner mechanisms. Explainability can also be viewed as model-agnostic interpretability.

Interpretability matters for three main reasons: debugging to understand where or why predictions go wrong, following industry guidelines where black-box models may violate best practices, and meeting regulations that require interpretability for sensitive applications like finance, public health, and transportation.

Global methods provide an overview of the most influential variables in the model based on input data and predicted output, while local methods provide an explanation of a single prediction result.

Linear regression, decision trees, and generative additive models are inherently interpretable machine learning models, though interpretability often comes at the expense of power and accuracy.

LIME (Local Interpretable Model-Agnostic Explanations) approximates a complex model in the neighborhood of the prediction of interest with a simple interpretable model, such as a linear model or decision tree, which can then be used as a surrogate to explain how the original complex model works.

Shapley values explain how much each predictor contributes to a prediction by calculating the deviation of a prediction of interest from the average. This method is particularly popular in finance because it provides complete explanations where the sum of Shapley values for all features corresponds to the total deviation from the average.

Interpretability is particularly important in sensitive applications such as automated driving systems, medical devices, and computational finance, where regulatory and professional bodies are working toward frameworks for certifying AI.

Resources

Expand your knowledge through documentation, examples, videos, and more.

Documentation

Overview of Interpretability in MATLAB - Documentation
Applying Visualization and Interpretability to Deep Neural Networks - Documentation

Discover More

Model Interpretability in MATLAB (5:49) - Video
Lowering Barriers to AI Adoption with AutoML and Interpretability (35:11) - Video
Explainable AI (XAI): Are we there yet? - Blog
What Is Explainable AI? (3:45) - Video