Label new data using semi-supervised graph-based classifier
Classify New Data Using Model Trained on Labeled and Unlabeled Data
Use both labeled and unlabeled data to train a
SemiSupervisedGraphModel object. Label new data using the trained model.
Randomly generate 15 observations of labeled data, with 5 observations in each of three classes.
rng('default') % For reproducibility labeledX = [randn(5,2)*0.25 + ones(5,2); randn(5,2)*0.25 - ones(5,2); randn(5,2)*0.5]; Y = [ones(5,1); ones(5,1)*2; ones(5,1)*3];
Randomly generate 300 additional observations of unlabeled data, with 100 observations per class.
unlabeledX = [randn(100,2)*0.25 + ones(100,2); randn(100,2)*0.25 - ones(100,2); randn(100,2)*0.5];
Fit labels to the unlabeled data by using a semi-supervised graph-based method. Specify label spreading as the labeling algorithm, and use an automatically selected kernel scale factor. The function
fitsemigraph returns a
SemiSupervisedGraphModel object whose
FittedLabels property contains the fitted labels for the unlabeled data and whose
LabelScores property contains the associated label scores.
Mdl = fitsemigraph(labeledX,Y,unlabeledX,'Method','labelspreading', ... 'KernelScale','auto')
Mdl = SemiSupervisedGraphModel with properties: FittedLabels: [300x1 double] LabelScores: [300x3 double] ClassNames: [1 2 3] ResponseName: 'Y' CategoricalPredictors:  Method: 'labelspreading' Properties, Methods
Randomly generate 150 observations of new data, with 50 observations per class. For the purposes of validation, keep track of the true labels for the new data.
newX = [randn(50,2)*0.25 + ones(50,2); randn(50,2)*0.25 - ones(50,2); randn(50,2)*0.5]; trueLabels = [ones(50,1); ones(50,1)*2; ones(50,1)*3];
Predict the labels for the new data by using the
predict function of the
SemiSupervisedGraphModel object. Compare the true labels to the predicted labels by using a confusion matrix.
predictedLabels = predict(Mdl,newX); confusionchart(trueLabels,predictedLabels)
Only 3 of the 150 observations in
newX are mislabeled.
Mdl — Semi-supervised graph-based classifier
Semi-supervised graph-based classifier, specified as a
SemiSupervisedGraphModel object returned by
X — Predictor data to be classified
numeric matrix | table
Predictor data to be classified, specified as a numeric matrix or table. Each row of
X corresponds to one observation, and each column corresponds to
If you trained
Mdl using matrix data (
UnlabeledX in the call to
X as a numeric matrix.
The variables in the columns of
Xmust have the same order as the predictor variables that trained
The software treats the predictors in
Xwhose indices match
Mdl.CategoricalPredictorsas categorical predictors.
If you trained
Mdl using tabular data (
UnlabeledTbl in the call to
X as a table.
All predictor variables in
Xmust have the same variable names and data types as those that trained
Mdl.PredictorNames). However, the column order of
Xdoes not need to correspond to the column order of
Xcan contain additional variables (for example, response variables), but
predictdoes not support multicolumn variables, cell arrays other than cell arrays of character vectors, or ordinal categorical variables.
If you set
fitsemigraph to train
Mdl, then the software
standardizes the columns of
X using the corresponding means and
standard deviations computed on the training data.
label — Predicted class labels
categorical array | character array | logical vector | numeric vector | cell array of character vectors
Predicted class labels, returned as a categorical or character array, logical or
numeric vector, or cell array of character vectors.
label has the
same data type as the fitted class labels
Mdl.FittedLabels, and its
length is equal to the number of rows in
For more information on how
predict predicts class labels, see
score — Predicted class scores
Predicted class scores, returned as a numeric matrix.
size m-by-K, where m is the
number of observations (or rows) in
X and K is
the number of classes in
score(m,k) is the likelihood that observation
X belongs to class
where a higher score value indicates a higher likelihood.
For more information on how
predict predicts class scores, see
A similarity graph models the local neighborhood relationships between observations in the predictor data, both labeled and unlabeled, as an undirected graph. The nodes in the graph represent observations, and the edges, which are directionless, represent the connections between the observations.
If the pairwise distance Disti,j between any two nodes i and j is positive (or larger than a certain threshold), then the similarity graph connects the two nodes using an edge. The edge between the two nodes is weighted by the pairwise similarity Si,j, where , for a specified kernel scale σ value.
A similarity matrix is a matrix representation of a similarity graph. The n-by-n matrix contains pairwise similarity values between connected nodes in the similarity graph. The similarity matrix of a graph is also called an adjacency matrix.
The similarity matrix is symmetric because the edges of the similarity graph are
directionless. A value of
0 means that nodes i and j of the
similarity graph are not connected.
To fit labels to unlabeled training data,
constructs a similarity graph with both labeled and unlabeled observations as nodes, and
distributes the label information from labeled observations to unlabeled observations by using
either label propagation or label spreading. The resulting
SemiSupervisedGraphModel object stores the fitted labels and label scores
for the unlabeled data in its
To predict the label of a new observation x, the
predict function uses a weighted average of neighboring observation
scores to compute the label scores for x, namely .
n is the number of observations in the training data.
Fxj is the row vector of label scores for the training observation xj (or node j). For more information on the computation of label scores for training observations, see Algorithms.
S(x,xj) is the pairwise similarity between the new observation x and the training observation xj, where S(xi,xj) = Si,j is as defined in Similarity Graph.
The column with the maximum score in Fx corresponds to the predicted class label for x. For more information, see .
 Delalleau, Olivier, Yoshua Bengio, and Nicolas Le Roux. “Efficient Non-Parametric Function Induction in Semi-Supervised Learning.” Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics. 2005.