K-mode clustering algorithm to cluster categorical data?
Show older comments
Has anyone come across k-mode script in the Matlabsphere? I've seen people respond with links to supervised learning algos, but I need unsupervised. Even a pseudo code would be okay, so I can build it.
I'm using R2017b.
Really trying to avoid using R..
Answers (1)
Image Analyst
on 11 Aug 2018
0 votes
I can't imagine why you'd use kmeans with categorical data. If it's categorical you can simply just use the category to classify the data point, right?
4 Comments
Dankur Mcgoo
on 12 Aug 2018
Image Analyst
on 12 Aug 2018
Now, since you're not clarifying anything, I'll pose an example. Let's say that you have car makes (manufacturers) and your data is categorical, like Ford, GM, VW, BMW, Toyota, Nissan, Kia, and Jaguar (8 makes). Now, how would I cluster that? Let's say I had anywhere from 1000 to 10,000 counts for each car make and you want to find the "clusters". Well, how about 2 clusters? OK, then which makes would you group into each cluster? If you have no other info, then there is not really enough info to decide what makes a cluster. How about clustering by make, so it would make sense to have 8 clusters, one for each unique make. Or, if you want, you could use categorical() and cluster based on some other factor, like the count in each category so that you could have classes of "sold many" or "sold few".
I attach an example where I use kmeans to cluster an image. You could, if you want, consider that the gray levels are like a set of 256 categories and the clusters/classes are 2 (or however many you specify) gray level ranges.
Dankur Mcgoo
on 12 Aug 2018
Edited: Image Analyst
on 12 Aug 2018
Image Analyst
on 12 Aug 2018
I'm not an expert on questionnaires, though we have many statisticians in our company who spend their whole lives doing that. I'd suggest you try the Classification Learner app, and pick the best one. Check out this page https://www.mathworks.com/help/stats/machine-learning-in-matlab.html. You have unsupervised learning because you have data but no ground truth - you don't know the classes/groupings of any of them in advance.
Categories
Find more on k-Means and k-Medoids Clustering in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!