How to calculate within group sum of squares for kmeans ?

I have data set with 318 data points and 11 attributes. So my matrix is 318*11. I am trying to find the best number of cluster required for my data set. I have started from thumb rule which gives me around sqrt(312/2)~ 12. So i started from 12.I am using default kmeans function of matlab.For example my data is stored in X which 318*11 matrix. Now I run kmeans like this .
kmeans(X,12)
How to calculate sum of square to find optimum number of cluster for my data set like this ?

 Accepted Answer

Have to run kmeans over the range of number of clusters saving the optional sumd (third) output parameter for each case. The total sum of distances is then
sum(sumd)
for each run. Probably simplest is to just use a loop...
nClusters=15; % pick/set number of clusters your going to use
totSum=zeros(nClusters); % preallocate the result
for i=1:nClusters
[~,~,sumd]=kmeans(X,i);
totSum(i)=sum(sumd);
end
plot(totSum) % plot of totals versus number (same as index)

3 Comments

Thanks I fixed this.
But I have one another question. For same number of cluster why the sum is always changing. So It is difficult for me to identify the best number of cluster.The plot i obtained looks like
I continued till 23. But still there is no gradual decrease.For this plot I iterate each cluster for 20 times and took average. Why the values are always changing ? So help me to choose the correct number of cluster.
That would, I believe, be totally dependent upon the characteristics of the data set. If there were no real groupings then it would simply be measuring the variance between means (roughly) of bins which clearly will continue to decrease as the size of the bin gets smaller. What if you reduce bins instead of continuing to increase them? Have you looked at the plot of the results and the silhouette plot to get a visual "feel" for the data?
Bikram Kawan, your post is bit old. However, elbow method is bit ambiguous. You can try Average silhouette method to get the optimal clusters. You can take help from link below. https://www.mathworks.com/help/stats/clustering.evaluation.silhouetteevaluation-class.html

Sign in to comment.

More Answers (0)

Asked:

on 21 Jun 2015

Edited:

on 1 Jul 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!