Subset of Data Approximation for GPR Models

Training a GPR model with the exact method (when FitMethod is 'Exact') requires the inversion of an n-by-n matrix. Therefore, the computational complexity is O(kn³), where k is the number of function evaluations required for estimating $β$ , $θ$ , and $σ^{2}$ , and n is the number of observations. For large n, estimation of parameters or computing predictions can be very expensive.

One simple way to solve the computational complexity problem with large data sets is to select m < n observations out of n and then apply exact GPR model to these m points to estimate $β$ , $θ$ , and $σ^{2}$ while ignoring the other (n – m) points. This smaller subset is known as the active set or inducing input set. And this approximation method is called the Subset of Data (SD) method.

The computational complexity when using SD method is O(km³), where k is the number of function evaluations and m is the active set size. The storage requirements are O(m²) since only a part of the full kernel matrix $K (X, X | θ)$ needs to be stored in memory.

You can specify the SD method for parameter estimation by using the 'FitMethod','sd' name-value pair argument in the call to fitrgp. To specify the SD method for prediction, use the 'PredictMethod','sd' name-value pair argument.

For estimating parameters using the exact GPR model, see parameter estimation using the exact GPR method. For making predictions using the exact GPR model, see prediction using the exact GPR method.

Subset of Data Approximation for GPR Models

See Also

Related Topics