Selecting important features from a very large pool
Show older comments
Hello everyone.
I've been struggling with a pedestrian detection algorithm and I was wondering if MatLab had a toolbox that could help me somehow.
I extract a very large number of features from images and then I want to classify them as pedestrian or non-pedestrian. What I need is a way to determine which of the features are actually worth extracting for the sake of speed.
Imagine I have a text file with all the data. In the first column I'd enter the class (1 or 2) , and in the next columns come the features.
So the file could look like
1,1000,2000,1500,3000 ...
1,1200,1000,1600,3000 ...
2,3000,4000,120,10000 ...
2,4900,7300,3000,100 ...
and so on.
I have 30.000 features (columns) and around 10.000 samples (rows) and I would like to know if MatLab has any toolbox or function that could return the indexes of the features that are actually relevant for the classification.
Training the classifier isn't a problem, the main issue is to determine which of this features are worth extracting.
Thanks in advance.
3 Comments
Cedric
on 3 May 2013
I have no experience on the matter, but if your features are numbers below 65535 you can store the whole array in 600MB RAM (1.2GB if uint32, 2.4GB if uint64 or double), and it is not an unreasonable size for using relational operators and logical indexing for data selection/extraction. I could even imagine some pre-processing that could facilitate further data manipulation if criteria for selection are complex. Sadly not that many tools, in particular statistical, will work with uint16, and 2.4GB worth of doubles is starting to be tricky for processing.
Pedro Silva
on 3 May 2013
Greg Heath
on 6 May 2013
Divide and conquer. You are never going to find the best combination anyway. So just find one that works well.
Answers (4)
Image Analyst
on 3 May 2013
0 votes
Well principal components analysis is the first thing that comes to my mind - well after thinking how can you have 30,000 features. Does 30.000 mean thirty thousand, or thirty? I can't see how you could possibly have 30,000 features unless you're having whole chunks of the image (groups of pixels) be features. But that doesn't make sense. Maybe for neural networks maybe, but not for feature analysis. I'm sure you can boil it down to a few dozen at most. Surely you must have some idea which of the 30 thousand features is most meaningful and which are useless.
3 Comments
Pedro Silva
on 3 May 2013
Edited: Pedro Silva
on 3 May 2013
Image Analyst
on 3 May 2013
I've never used that many. How did Viola and Jones do it? The paper you cited says they used adaboost. Can't you do it the same way? I haven't used adaboost so I can't help you anymore.
Pedro Silva
on 3 May 2013
Edited: Pedro Silva
on 3 May 2013
Ilya
on 6 May 2013
0 votes
You may find these two posts useful:
I don't really know what Viola and Jones did, but you could simply train AdaBoost (by calling fitensemble from Statistics Tlbx) and then execute the predictorImportance method for the trained ensemble.
8 Comments
Pedro Silva
on 6 May 2013
Ilya
on 6 May 2013
I am not sure what you mean when you say "if each feature is a predictor itself". If you mean "each tree in the ensemble is grown using one feature only", then - no, predictorImportance is informative not only in this situation. The predictorImportance method returns changes in the Gini index (or other criterion) due to splits on a specific feature summed over all splits and over all trees in this ensemble. You get an informative number if you use several features for one tree. But, as it happens, by default fitensemble grows decision stumps for AdaBoost, that is, decision trees with one split only. And so every tree is grown using one best feature only - the feature selected out of all features by optimizing the Gini index (or other criterion).
Pedro Silva
on 6 May 2013
Ilya
on 7 May 2013
I am not an expert on image processing. What you say sounds reasonable enough to give it a try. When it comes to feature selection, questions like "is it possible to select features using this method" do not sound quite right. Sure, it is possible. But is it optimal? It's hard for me to say because I don't know what your constraints are (for example, who says you must use boosted trees?).
If you grow 2000 decision stumps, you will have non-zero importance for at most 2000 features out of your 15k features. If just a few features have very large importance values and the rest has small importance values, you can perhaps comfortably conclude that only these few features are needed. If you see a smooth distribution, how do you know it's ok to cut at 2000 (or any other number)? If you need to convince a statistician that you do not lose accuracy by reducing the feature pool, you need to compare the accuracy of your model for the best N selected features and the accuracy of your model for all features (by performing a formal test, for instance). But perhaps people in your field are not concerned with such issues at all.
Pedro Silva
on 7 May 2013
Pedro Silva
on 7 May 2013
Edited: Pedro Silva
on 7 May 2013
Ilya
on 7 May 2013
Each binary stump selects one best feature, based on the split criterion such as the Gini index. Some features can be selected many times and some features can be selected never. If you have only one informative feature and the rest 14999 features are noise, that one feature may be selected for all 2000 stumps and the rest may not be selected for any stumps.
As I wrote earlier, if you grow 2000 trees by fitensemble at the default settings, you will get at most 2000 features with non-zero importance. If you expect at least 2000 useful features, this means you expect that each stump select a feature different from all other stumps. What might this expectation be based on?
I agree - you should invest some time into understanding what AdaBoost and fitensemble do.
Pedro Silva
on 8 May 2013
Anand
on 8 May 2013
0 votes
If you have the latest release of the Computer Vision System Toolbox, there's a way to train a classifier using the Viola-Jones approach, i.e. Haar-based features with an ada-boost learning framework. You might want to look at this:
fatima qureshi
on 14 Jan 2016
0 votes
how adaboost decides which one is relevant and which is irrelevant feature????
Categories
Find more on Classification Ensembles in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!