how to use ReliefF algorithm for feteare selection?

17 views (last 30 days)
I want to use ReliefF Algorithm for feature selection problem,I have a dataset (CNS.mat) I wanted to apply ReliefF Algoritm on this data and obtain the top 30 features, then apply classifier on the result of ReliefF Algorithm. I studied about how this Algorithm works in MATLAB Help:
[RANKED,WEIGHT] = relieff(X,Y,K)
[RANKED,WEIGHT] = relieff(X,Y,K,'PARAM1',val1,'PARAM2',val2,...)
and also I studied this example of ReliefF in MATLAB HELP:
load fisheriris
[ranked,weight] = relieff(meas,species,10)
ranked =
4 3 1 2
weight =
0.1399 0.1226 0.3590 0.3754
But I don't know if this code works the way I descripted, (selects top features and save them as result for classify), my aim is to apply ReliefF Algorithm as feature selection on CNS data and compare the results of this algorithm with other algorithms like SVM-RFE,InfoGain.
I'll be very gratefull your opinions how to use ReliefF for feature selection.

Answers (2)

MeLearningProgramming
MeLearningProgramming on 23 Jul 2020
Edited: MeLearningProgramming on 23 Jul 2020
Hey guy,
I am using the relieff as well. you have to watch out, how the outputs are given.
weight = 0.1399 0.1226 0.3590 0.3754
means that the first parameter in meas got the weight 0.1399 (first line = first parameter of meas)
ranked = 4 3 1 2 dosn't mean first line = first parameter of meas = ranking number 4
it means that the first parameter in meas got the ranking position 3 (position of the number 1 = first parameter)
How to use relieff?
X should a Matix with datapoint x parameter (in my case for example 147510x10) and y should be a vector datapoint x 1 (147510x1)
first you should estimate the best k-value, like this:
ParamLabels = {'P1','P2','P3','P4','P5','P6','P7','P8','P9','P10'};
for k=1:200 %or parfor
[idx,weights] = relieff(X,y,k);
RankImportanceIdx(:,k) = idx';
RankImportanceWeight(:,k) = weights';
end
by a simple plot of RankImportanceWeight you can see at which k-value the results stay equal => best k-value.
In my case, the best k value for example is 75! afterwards you could plot the results like this:
plot(RankImportanceWeight(RankImportanceIdx(1:end,75),1:end)','LineWidth',2);
title(['Relief algorithm weights vs. k-values','FontWeight','normal')
xlabel('size of k-nearest neighbor'); ylabel('weights');
legend(ParamLabels(RankImportanceIdx(1:end,75)),'Box','off');
set(gca,'FontName','Arial','FontSize',16);
and/or you could create a table, like this:
for pidx=1:size(ParamLabels,2)
[a,~] = find(strcmp(ParamLabels(RankImportanceIdx(1:end,75)),ParamLabels{pidx}));
RankImportanceTbl{pidx,:} = a;
end
by this you could chose the best 30 parameter that fits to your y.
hope this helps to adapt it to your problem,
regards,
MLP

Jingwei Too
Jingwei Too on 23 Jul 2020

Categories

Find more on Data Distribution Plots in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!