Image Classification: Color Histogram & KNN classifer

7 views (last 30 days)
Dennis Tran on 3 May 2015
Commented: Nalini Vishnoi on 5 May 2015
Hello,
I want to classify an image through the use of color histograms and knn classifer. I have a dataset of 100 images for each class (butterfly, dog, cat) in a folder. My understanding of the problem is as follows:
1) read in images to create a color histogram for each (RGB)
2) find the kmeans for RGB for each image
3) cluster the kmeans points separately for each class and find its centroid (so for the butterfly class, each image gives me kmeans value for R, G, B. I plot all the R kmeans values and find the centroid, same with G and B.)
4) Read in test image, create a color histogram, find the kmeans value for RGB, then use the Euclidean distance for each kmeans to find the nearest cluster for R,G,B.
Is this how it is supposed to be done or am I not understanding this correctly?

Nalini Vishnoi on 4 May 2015
Hi Dennis,
The steps in your algorithm seem correct. However, when you are doing k-means clustering a lot of information is lost and for practical purposes, color histograms may not be strong enough to discriminate various classes (it would be heavily dependent on your data set). It might be useful to consider adding additional features, for example: texture, shape etc. These features combined together would capture unique information about the classes that need to be distinguished from each other.
Dennis Tran on 4 May 2015
Thank you for the reply Nalini,
For starting purposes, I would just like to use color histogram & KNN then later add on the other features.
I skipped the kmeans part and just found the mean value for each color channel on each image since i didn't want to cluster them (no point).
I am currently stuck on the "centroid" part of each color channel. Is that the same as mean?
Nalini Vishnoi on 5 May 2015
If you are just finding the mean value, the algorithms is NN/1-NN (nearest neighbor) rather than K-NN. The mean and the centroid should be the same. You may find this link useful.

Image Analyst on 5 May 2015
There is no way that will correctly classify the animals UNLESS all your cats are the "same" color, all the dogs are the "same" color, and all the butterflies are the same color, and there is little other clutter in the background. If you assumed all your cats were black, and all your butterflies were orange and black monarchs, and you presented an orange/ginger tabby cat, your algorithm might say the cat was a butterfly.
Dennis Tran on 5 May 2015
Edited: Dennis Tran on 5 May 2015
Yes, I understand that the color histogram isn't the only feature I should have. I understand that let say a checkers board will compare exactly to a board half black and half red, but I just want to be able to use this feature with KNN as a start. I will be using a training data set of 80images and a test data set of 20 images for each category. I will hopefully choose the training set to have the edge boundaries.
I have currently read in my images and got rgbhistogram with 8bins for each channel giving me 512bins. This gives me an matrix of Nx512 where N is the number of images read.
Now I am stuck on how to use the knn classify function in matlab.
I know I need to compare the query image to my dataset using the Euclidean distance, sort and use the smallest distance as my answer, but how do I grab the name category specific to the data with the smallest distance? I haven't stored the class name with the histogram data.