Prediction Using Discriminant Analysis Models
predict
uses three quantities to classify observations: posterior probability, prior probability, and cost.
predict
classifies so as to minimize the expected classification cost:
where
is the predicted classification.
K is the number of classes.
is the posterior probability of class k for observation x.
is the cost of classifying an observation as y when its true class is k.
The space of X
values divides into regions where a classification Y
is a particular value. The regions are separated by straight lines for linear discriminant analysis, and by conic sections (ellipses, hyperbolas, or parabolas) for quadratic discriminant analysis. For a visualization of these regions, see Create and Visualize Discriminant Analysis Classifier.
Posterior Probability
The posterior probability that a point x belongs to class k is the product of the prior probability and the multivariate normal density. The density function of the multivariate normal with 1-by-d mean μk and d-by-d covariance Σk at a 1-by-d point x is
where is the determinant of Σk, and is the inverse matrix.
Let P(k) represent the prior probability of class k. Then the posterior probability that an observation x is of class k is
where P(x) is a normalization constant, namely, the sum over k of P(x|k)P(k).
Prior Probability
The prior probability is one of three choices:
'uniform'
— The prior probability of classk
is 1 over the total number of classes.'empirical'
— The prior probability of classk
is the number of training samples of classk
divided by the total number of training samples.A numeric vector — The prior probability of class
k
is thej
th element of thePrior
vector. Seefitcdiscr
.
After creating a classifier obj
, you can set the prior using dot notation:
obj.Prior = v;
where v
is a vector of positive elements representing the frequency with which each element occurs. You do not need to retrain the classifier when you set a new prior.
Cost
There are two costs associated with discriminant analysis classification: the true misclassification cost per class, and the expected misclassification cost per observation.
True Misclassification Cost per Class
Cost(i,j)
is the cost of classifying an observation into class j
if its true class is i
. By default, Cost(i,j)=1
if i~=j
, and Cost(i,j)=0
if i=j
. In other words, the cost is 0
for correct classification, and 1
for incorrect classification.
You can set any cost matrix you like when creating a classifier. Pass the cost matrix in the Cost
name-value pair in fitcdiscr
.
After you create a classifier obj
, you can set a custom cost using dot notation:
obj.Cost = B;
B
is a square matrix of size K
-by-K
when there are K
classes. You do not need to retrain the classifier when you set a new cost.
Expected Misclassification Cost per Observation
Suppose you have Nobs
observations that you want to classify with a trained discriminant analysis classifier obj
. Suppose you have K
classes. You place the observations into a matrix Xnew
with one observation per row. The command
[label,score,cost] = predict(obj,Xnew)
returns, among other outputs, a cost matrix of size Nobs
-by-K
. Each row of the cost matrix contains the expected (average) cost of classifying the observation into each of the K
classes. cost(n,k)
is
where
K is the number of classes.
is the posterior probability of class i for observation Xnew(n).
is the cost of classifying an observation as k when its true class is i.