How to define custom classification loss function

5 views (last 30 days)
I am currently trying to run a kfold cross validation on a decision tree with a custom classification loss function, as described here.
However, I don't understand how the C and S matrices, which are passed to the loss function, are helpful.
1. Under the link it says "C is an n-by-K logical matrix with rows indicating which class the corresponding observation belongs." So this is not predicted and hence a repetition of the input data?
2. The S matrix. "S is an n-by-K numeric matrix of classification scores". Why can I not simply use the predicted classifications instead of the scores?
To be more specific: I create a classification decision tree. Next, I use crossval to get a partitionedModel. Then, I calculate the validation accuracy by using kfoldLoss. Now, instead of using the in built 'classiferror' function, I would like to use my own classification loss function, e.g. matthews correlation coefficient.
% create set of cross-validated classification model(s) from a classification model
partitionedModel = crossval(trainedClassifier.ClassificationTree, 'KFold', 10);
% Loss, by default the fraction of misclassified data, is a scalar and averaged over all folds
validationAccuracy = 1 - kfoldLoss(partitionedModel, 'LossFun', 'classiferror');
Any help is greatly appreciated.

Accepted Answer

Leon Kellner
Leon Kellner on 3 Jul 2018
In case anybody else is looking for a solution. I used the crossval function to wrap the training of the decision tree. This way the implementation of other loss functions is straightforward.
function [trainedClassifier, qualityMeasures] = trainDTwCrossVal(data, predictorNames, MaxNumSplits)
% cross validation
numberOfFolds=5;
cp = cvpartition(data.typeBehavior,'k',numberOfFolds); % creates a random partition for a stratified k-fold cross-validation
vals=crossval(@trainDT2, data, 'partition', cp); % Loss estimate using cross validation
function testval = trainDT2(trainingData, testingData) % nested function to train one DT with trainingData and test with testingData
Testval are quality measures of the prediction, derived from the confusion matrix, calculated inside the nested function to train the decision tree.
% C=[TP FP
% FN TN]
TP=C(1,1); FP=C(1,2); FN=C(2,1); TN=C(2,2);
% Matthews correlation coefficient, worst value = -1, best value = 1
if ( (TP+FP)*(TP+FN)*(TN+FP)*(TN+FN) ) == 0
MCC = 0; % set MCC to zero, if the denominator is zero
else
MCC = (TP*TN - FP*FN) / ...
sqrt( (TP+FP)*(TP+FN)*(TN+FP)*(TN+FN) );
end
accuracy=(TP+TN)/(TP+TN+FP+FN); % accuracy, worst value = 0, best value = 1
F1score=2*TP/(2*TP+FP+FN); % F1 score, worst value = 0, best value = 1
testval=[accuracy F1score MCC];
  1 Comment
Elena Casiraghi
Elena Casiraghi on 21 Aug 2019
Dear, I had the same problem, however it seems I found the solution:
I have a classification problems with labels 1,..,5.
Since the label is a score related to a grade, I would to compute the Loss by computing the
distance between the triue label and the predicted label
So, if:
are the N points in my dataset,
is the TRUE label of , and if the predicted label is ,
is the weigh for point
is the cost of assigning the point in class to class
I would like to measure the loss ass:
The score S to be used when computing the loss contains ngative values. Whats the meaning of that score?
from the explanation in matlab help it seems that the more the S value is low (negative), the more the point is "distant from that class"
If I have 5 labels and for x(i) I have than this means that x(i) would have predicted label = 3. You could somehow normalize the score to transform them in a sort of probability of the point x of belonging to each class.
I used kfoldPredict to understand what's happening and it should be right.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!