Main Content

Decision Models for Fault Detection and Diagnosis

Condition monitoring includes discriminating between faulty and healthy states (fault detection) or, when a fault state is present, determining the source of the fault (fault diagnosis). To design an algorithm for condition monitoring, you use condition indicators extracted from system data to train a decision model that can analyze indicators extracted from test data to determine the current system state. Thus, this step in the algorithm-design process is the next step after identifying condition indicators.

(For information about using condition indicators for fault prediction, see Models for Predicting Remaining Useful Life.)

Some examples of decision models for condition monitoring include:

  • A threshold value or set of bounds on a condition-indicator value that indicates a fault when the indicator exceeds it

  • A probability distribution that describes the likelihood that any particular value of the condition indicator is indicative of any particular type of fault

  • A classifier that compares the current value of the condition indicator to values associated with fault states, and returns the likelihood that one or another fault state is present

In general, when you are testing different models for fault detection or diagnosis, you construct a table of values of one or more condition indicators. The condition indicators are features that you extract from data in an ensemble representing different healthy and faulty operating conditions. (See Condition Indicators for Monitoring, Fault Detection, and Prediction.) It is useful to partition your data into a subset that you use for training the decision model (the training data) and a disjoint subset that you use for validation (the validation data). Compared to training and validation with overlapping data sets, using completely separate training and validation data generally gives you a better sense of how the decision model will perform with new data.

When designing your algorithm, you might test different fault detection and diagnosis models using different condition indicators. Thus, this step in the design process is likely iterative with the step of extracting condition indicators, as you try different indicators, different combinations of indicators, and different decision models.

Statistics and Machine Learning Toolbox™ and other toolboxes include functionality that you can use to train decision models such as classifiers and regression models. Some common approaches are summarized here.

Feature Selection

Feature selection techniques help you reduce large data sets by eliminating features that are irrelevant to the analysis you are trying to perform. In the context of condition monitoring, irrelevant features are those that do not separate healthy from faulty operation or help distinguish between different fault states. In other words, feature selection means identifying those features that are suitable to serve as condition indicators because they change in a detectable, reliable way as system performance degrades. Some functions for feature selection include:

  • pca — Perform principal component analysis, which finds the linear combination of independent data variables that account for the greatest variation in observed values. For instance, suppose that you have ten independent sensor signals for each member of your ensemble from which you extract many features. In that case, principal component analysis can help you determine which features or combination of features are most effective for separating the different healthy and faulty conditions represented in your ensemble. The example Wind Turbine High-Speed Bearing Prognosis uses this approach to feature selection.

  • sequentialfs — For a set of candidate features, identify the features that best distinguish between healthy and faulty conditions, by sequentially selecting features until there is no improvement in discrimination.

  • fscnca — Perform feature selection for classification using neighborhood component analysis. The example Using Simulink to Generate Fault Data uses this function to weight a list of extracted condition indicators according to their importance in distinguishing among fault conditions.

For more functions relating to feature selection, see Dimensionality Reduction and Feature Extraction.

Statistical Distribution Fitting

When you have a table of condition indicator values and corresponding fault states, you can fit the values to a statistical distribution. Comparing validation or test data to the resulting distribution yields the likelihood that the validation or test data corresponds to one or the other fault states. Some functions you can use for such fitting include:

For more information about statistical distributions, see Probability Distributions.

Machine Learning

There are several ways to apply machine-learning techniques to the problem of fault detection and diagnosis. Classification is a type of supervised machine learning in which an algorithm “learns” to classify new observations from examples of labeled data. In the context of fault detection and diagnosis, you can pass condition indicators derived from an ensemble and their corresponding fault labels to an algorithm-fitting function that trains the classifier.

For instance, suppose that you compute a table of condition-indicator values for each member in an ensemble of data that spans different healthy and faulty conditions. You can pass this data to a function that fits a classifier model. This training data trains the classifier model to take a set of condition-indicator values extracted from a new data set, and guess which healthy or faulty condition applies to the data. In practice, you use a portion of your ensemble for training, and reserve a disjoint portion of the ensemble for validating the trained classifier.

Statistics and Machine Learning Toolbox includes many functions that you can use to train classifiers. These functions include:

  • fitcsvm — Train a binary classification model to distinguish between two states, such as the presence or absence of a fault condition. The examples Using Simulink to Generate Fault Data use this function to train a classifier with a table of feature-based condition indicators. The example Fault Diagnosis of Centrifugal Pumps Using Steady State Experiments also uses this function, with model-based condition indicators computed from statistical properties of the parameters obtained by fitting data to a static model.

  • fitcecoc — Train a classifier to distinguish among multiple states. This function reduces a multiclass classification problem to a set of binary classifiers. The example Multi-Class Fault Detection Using Simulated Data uses this function.

  • fitctree — Train a multiclass classification model by reducing the problem to a set of binary decision trees.

  • fitclinear — Train a classifier using high-dimensional training data. This function can be useful when you have a large number of condition indicators that you are not able to reduce using functions such as fscnca.

Other machine-learning techniques include k-means clustering (kmeans), which partitions data into mutually exclusive clusters. In this technique, a new measurement is assigned to a cluster by minimizing the distance from the data point to the mean location of its assigned cluster. Tree bagging is another technique that aggregates an ensemble of decision trees for classification. The example Fault Diagnosis of Centrifugal Pumps Using Steady State Experiments uses a TreeBagger classifier.

For more general information about machine-learning techniques for classification, see Classification.

Regression with Dynamic Models

Another approach to fault detection and diagnosis is to use model identification. In this approach, you estimate dynamic models of system operation in healthy and faulty states. Then, you analyze which model is more likely to explain the live measurements from the system. This approach is useful when you have some information about your system that can help you select a model type for identification. To use this approach, you:

  1. Collect or simulate data from the system operating in a healthy condition and in known faulty, degraded, or end-of-life conditions.

  2. Identify a dynamic model representing the behavior in each healthy and fault condition.

  3. Use clustering techniques to draw a clear distinction between the conditions.

  4. Collect new data from a machine in operation and identify a model of its behavior. You can then determine which of the other models, healthy or faulty, is most likely to explain the observed behavior.

The example Fault Detection Using Data Based Models uses this approach. Functions you can use for identifying dynamic models include:

You can use functions like forecast to predict the future behavior of the identified model.

Control Charts

Statistical process control (SPC) methods are techniques for monitoring and assessing the quality of manufactured goods. SPC is used in programs that define, measure, analyze, improve, and control development and production processes. In the context of predictive maintenance, control charts and control rules can help you determine when a condition-indicator value indicates a fault. For instance, suppose you have a condition indicator that indicates a fault if it exceeds a threshold, but also exhibits some normal variation that makes it difficult to identify when the threshold is crossed. You can use control rules to define the threshold condition as occurring when a specified number of sequential measurements exceeds the threshold, rather than just one.

  • controlchart — Visualize a control chart.

  • controlrules — Define control rules and determine whether they are violated.

  • cusum — Detect small changes in the mean value of data.

For more information about statistical process control, see Statistical Process Control.

Changepoint Detection

Another way to detect fault conditions is to track the value of a condition indicator over time and detect abrupt changes in the trend behavior. Such abrupt changes can be indicative of a fault. Some functions you can use for such changepoint detection include:

Related Topics