How to get optimal number of clusters in data set using fuzzy c means? Please answer this it is very important for my study.

9 views (last 30 days)
How to get optimal number of clusters in data set using fuzzy c means? Please answer this it is very important for my study.

Answers (2)

Sai Teja G
Sai Teja G on 21 Jan 2024
Hi Mamta,
Based on what I have gathered, you are seeking to determine the optimal number of clusters in your dataset by employing the Fuzzy C-Means algorithm. I recommend going through the documentation provided at the link below to gain a comprehensive understanding of the algorithm:
To ascertain the optimal number of clusters, you may need to adjust the settings in "fcmOptions". For detailed information on which specific option to modify to achieve the optimal cluster count, please refer to the documentation:
Hope it helps!

Walter Roberson
Walter Roberson on 21 Jan 2024
The optimal number of clusters is equal to the number of unique points. When every unique point is the center of its own cluster, the cluster distance is zero.
  3 Comments
Walter Roberson
Walter Roberson on 5 Jul 2024
%supposing you have matrix YourData
[Centroids, ~, Cluster_Index] = unique(YourData, 'rows');
Then the optimal number of clusters is
size(Centroids,1)
and the centroids of the clusters are
Centroids
and the cluster numbers are
Cluster_Index
Walter Roberson
Walter Roberson on 5 Jul 2024
If every unique point is assigned to be its own centroid, then the distance from every point to the nearest centroid is identical to 0, and so the total distance over all of the points will be 0 * number_of_unique_points which will be 0. Therefore each unique point being its own centroid is the optimal clustering.
Suppose you have a different "optimal" clustering, one in which not all points are the centers of their own clusters. Then for at least some points, the distance from the point to its centroid will be non-zero. Distance are never negative, so the total distances will be non-zero. That would be worse than the exact zero distance achieved above, and by way of contradiction there cannot be any other "optimal" clustering.
The exception comes if you have some penalty for increasing the number of clusters. For example if you were creating an electrical distribution system in which each centroid corresponded to a distribution station with cost greater than the cost to be a node feeding into the station, then such a system would have a balance that involved having fewer clusters. But the Fuzzy C-means clustering algorithm does not have any way to account for per-cluster penalties, so the Fuzzy C-means clustering algorithm itself is not suitable for finding optimum clusters in the case of per-cluster penalties.

Sign in to comment.

Categories

Find more on Data Clustering in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!