Initial centroids selection - Kmeans

7 views (last 30 days)
Salad Box
Salad Box on 26 Sep 2019
Edited: Adam on 26 Sep 2019
Hi,
Am I allowed to choose k initial centroids that are not contained in the original data set, in another word, not using the random sampling.
For instance, in the below two graphs the middle coloured points are my original data set.
  • In the left graph, the 5 red points are the initial centroids I selected using my own method.
  • In the right graph, the initial centroids will be evenly distributed on the megenta circle. Notice that, although my original data set will all be positive numbers, some initial centroids will have negative values in this case depending on the location of the initial centroids on the circle.
I wonder whether there are any fundemental mistakes I made which I haven't been aware of yet for selecting initial centroids using above two proposed methods.
Even there are no fundermental mistakes, any disadvantages of using these two ways of selecting initial centroids?

Accepted Answer

Adam
Adam on 26 Sep 2019
Edited: Adam on 26 Sep 2019
doc kmeans
shows the
idx = kmeans(X,k,Name,Value)
function signature. If you look at the options for 'Name', 'Value' pairs you will see that 'Start' allows you to input your own starting positions.
As for what is a valid choice, simplest way is to try them and find out. In some cases they may not converge to where you want, in others they may do. Without random initialisation it is a 100% deterministic algorithm though so it would only be a single test to get the 1 answer in each case (although there are, of course, an infinite number of ways to place evenly distributed points around that circle)..

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!