My goal is to find an insect from a fruit image using k means segmentaion. while doing iit using 4 clusters, each time the index values change. that is sometimes insect comes in 1st cluster, sometimes in second, and so on. how to fix it. my code is:

1 view (last 30 days)
close all;
he = imread('deseaseleaf(3).jpg');
figure;
imshow(he), title('Original Leaf Image');
text(size(he,2),size(he,1)+15,...
'Image courtesy ____________ , Analysed by Debasmita Bhoumik, University of Calcutta', ...
'FontSize',7,'HorizontalAlignment','right');
cform = makecform('srgb2lab');
lab_he = applycform(he,cform);
ab = double(lab_he(:,:,2:3));
nrows = size(ab,1);
ncols = size(ab,2);
ab = reshape(ab,nrows*ncols,2);
nColors = 4;
% repeat the clustering 4 times to avoid local minima
[cluster_idx, cluster_center] = kmeans(ab,nColors,'distance','sqEuclidean','Replicates',4);
pixel_labels = reshape(cluster_idx,nrows,ncols);
figure;
imshow(pixel_labels,[]), title('image labeled by cluster index');
segmented_images = cell(1,3);
rgb_label = repmat(pixel_labels,[1 1 3]);
for k = 1:nColors
color = he;
color(rgb_label ~= k) = 0;
segmented_images{k} = color;
end
figure;
imshow(segmented_images{1}), title('objects in cluster 1');
figure;
imshow(segmented_images{2}), title('objects in cluster 2');
figure;
imshow(segmented_images{3}), title('objects in cluster 3');
figure;
imshow(segmented_images{4}), title('objects in cluster 4');
mean_cluster_value = mean(cluster_center,2);
[tmp, idx] = sort(mean_cluster_value);
blue_cluster_num = idx(2);
L = lab_he(:,:,1);
blue_idx = find(pixel_labels == blue_cluster_num);
L_blue = L(blue_idx);
is_light_blue = im2bw(L_blue,graythresh(L_blue));
nuclei_labels = repmat(uint8(0),[nrows ncols]);
nuclei_labels(blue_idx(is_light_blue==false)) = 1;
nuclei_labels = repmat(nuclei_labels,[1 1 3]);
blue_nuclei = he;
blue_nuclei(nuclei_labels ~= 1) = 0;
figure
imshow(blue_nuclei), title('1st nuclei');

Answers (2)

Walter Roberson
Walter Roberson on 11 Mar 2016
kmeans has no idea of what the data means so there is no way it is going to know that one variety of data should go into the first cluster specifically.
kmeans() also uses random initialization as part of its strategies, and you have specifically asked it to use 4 different random starting points. Because of that, even if in each of the 4 times the data broke up into exactly the same subsets, the "first" cluster could have been seeded around any of the locations, so you cannot assume that the insect will be in the first cluster even on two different runs of the same image.
You will need to use some other information to decide what the data in the cluster means. You might find regionprops() useful for examining features of each of the areas. regionprops can accept labeled images directly, by the way.

Image Analyst
Image Analyst on 11 Mar 2016
kmeans will be no good for determining if a picture has fruit flies in it or not. Do you know why? Think about it. Think about what kmeans does and how it operates and then you should realize.
  4 Comments
Image Analyst
Image Analyst on 11 Mar 2016
All you did was to segment the color image into 4 color classes, then I presume you're going to assume one of those classes is fruit flies. Don't you see the flaw in that? First of all, your segmentation using kmeans is not what the authors did when they did steps 3 and 4. Their segmentation must have been different than yours. Let's say that you were able to find fruit flies from their color, like they glowed blue under fluorescent light or something. OK, what if there are no flies there at all? You're telling your algorithm to find 4 color classes there, but the blue class is missing. So kmeans will find something because you told it that it must find 4 classes. So it will find some bogus color. Maybe the 4th class it finds are brown spots or something. And now since you assume, say, class 4 is blue fruit flies, but none were there, now you're going to assume that the brown spots are flies, when they're NOT. That's what I was hoping you'd think through and realize.
What the authors must have done was to find fruit fly blobs. They may have done this by color, size, shape, or whatever. But whatever they did, they have a feature vector for each blob listing area, color, roundness, or whatever. And it is that feature vector that they pass into kmeans, not the color image itself!
One problem with automatic methods like kmeans, Otsu thresholding (graythresh()), etc. is that they must find something. But you need your algorithm to work for any amount that the material is present, from 0% to 100% coverage. So things like kmeans and graythresh often are forced to pick some value(s) and it will be bogus. So it's often smarter to use a fixed threshold or some kind of more intelligent segmentation routine than these simple minded techniques. It could work though for a narrow range, like you know that 20-30% of your image will always be covered in flies. But it will not work in general for any range whatsoever, which is what I think you'd want for maximum robustness.
I hope that explains it better why I said that kmeans like you used it will be no good for finding flies in your image.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!