how to reverse normalized data after kmeans clustering in matlab

how to reverse normalized data after kmeans clustering in matlab

6 Comments

I built a kmeans cluster where I first normalized. The model provides me with cluster centers, but they are obviously in their normalized state (center of income is -0.5).
I want to convert that -0.5 back into a non-normalized value to be able to give it a practical meaning
original data =[...
22704 94
63575 81
25026 72
31510 88
21864 90
32162 95
31585 95
20126 92
39525 97
58691 87
34870 91
28052 89
15122 94
10185 80
30220 95
9066 69
36450 93
8704 67
15140 78
38380 87
15470 85
27553 90
13349 92
11857 71
43514 96];
normalized data
-0.324716668157071 0.803729170410423
2.50865116793338 -0.631501491036761
-0.163744827110803 -1.62512271819250
0.285756213950352 0.141315018973261
-0.382949375512310 0.362119736118982
0.330955886802275 0.914131528983283
0.290955562821355 0.914131528983283
-0.503435620016365 0.582924453264702
0.841393296631591 1.13493624612900
2.17006956945363 0.0309126604004010
0.518687043371308 0.472522094691842
0.0460315686712842 0.251717377546122
-0.850336176689717 0.803729170410423
-1.19259198170497 -0.741903849609621
0.196327413369092 0.914131528983283
-1.27016626686035 -1.95632979391108
0.628219992920448 0.693326811837563
-1.29526179074439 -2.17713451105680
-0.849088332960676 -0.962708566755341
0.762016570534271 0.0309126604004010
-0.826211197928261 -0.189892056745320
0.0114385675162077 0.362119736118982
-0.973248784000240 0.582924453264702
-1.07668116420740 -1.73552507676536
1.11792933191736 1.02453388755614
how to reverse normalized data after kmeans clustering in matlab
c =
-0.9534 -1.3412
0.3708 0.5216

Sign in to comment.

Answers (2)

I don't believe that's true. The centroids are the actual centroids, unless you've normalized your data beforehand. Here's proof:
x = [60*rand(100, 1), 90 + 100*rand(100, 1)];
y = [30*rand(100, 1), 60+50*rand(100, 1)];
plot(x, y, '.', 'MarkerSize', 15);
grid on;
[classIndexes, classCentroids] = kmeans([x(:), y(:)], 2)
classIndexes = 200×1
1 1 1 1 1 1 1 1 1 1
classCentroids = 2×2
30.6948 14.8815 139.6902 84.3660
% Plot centroids as magenta crosshairs
hold on
plot(classCentroids(1,1), classCentroids(1,2), 'm+', 'MarkerSize', 150, 'LineWidth', 3)
plot(classCentroids(2,1), classCentroids(2,2), 'm+', 'MarkerSize', 100, 'LineWidth', 3)
Try this:
% Initialization steps.
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 20;
markerSize = 30;
data =[...
22704 94
63575 81
25026 72
31510 88
21864 90
32162 95
31585 95
20126 92
39525 97
58691 87
34870 91
28052 89
15122 94
10185 80
30220 95
9066 69
36450 93
8704 67
15140 78
38380 87
15470 85
27553 90
13349 92
11857 71
43514 96];
% Plot original data.
x = data(:, 1);
y = data(:, 2);
subplot(2, 2, 1);
plot(x, y, '.', 'MarkerSize', 15);
grid on;
title('Original, unscaled and unclassified data', 'FontSize',fontSize)
% First do kmeans with original data.
[classIndexes, classCentroids] = kmeans([x(:), y(:)], 2)
subplot(2, 2, 2);
plot(x(classIndexes == 1), y(classIndexes == 1), 'r.', 'MarkerSize', markerSize);
hold on;
plot(x(classIndexes == 2), y(classIndexes == 2), 'b.', 'MarkerSize', markerSize);
grid on;
% Plot centroids as magenta crosshairs
hold on
plot(classCentroids(1,1), classCentroids(1,2), 'm+', 'MarkerSize', 150, 'LineWidth', 3)
plot(classCentroids(2,1), classCentroids(2,2), 'm+', 'MarkerSize', 100, 'LineWidth', 3)
caption = sprintf('2 clusters and their centroids\nas determined from original (NOT normalized) data')
title(caption, 'FontSize',fontSize)
% Now kmeans with normalized data
[Nx, Cx, Sx] = normalize(x)
[Ny, Cy, Sy] = normalize(y)
[classIndexesN, classCentroidsN] = kmeans([Nx(:), Ny(:)], 2);
% Plot normalized data in lower left plot.
subplot(2, 2, 3);
plot(Nx(classIndexesN == 1), Ny(classIndexesN == 1), 'r.', 'MarkerSize', markerSize);
hold on;
plot(Nx(classIndexesN == 2), Ny(classIndexesN == 2), 'b.', 'MarkerSize', markerSize);
grid on;
% Plot centroids as magenta crosshairs over normalized data.
hold on
x1N = classCentroidsN(1,1);
y1N = classCentroidsN(1,2);
x2N = classCentroidsN(2,1);
y2N = classCentroidsN(2,2);
plot(x1N, y1N, 'm+', 'MarkerSize', 150, 'LineWidth', 3)
plot(x2N, y2N, 'm+', 'MarkerSize', 100, 'LineWidth', 3)
caption = sprintf('2 clusters and their centroids\nas determined from normalized data')
title(caption, 'FontSize',fontSize)
% Now unnormalize the location of the classCentroids
x1 = x1N * Sx + Cx
y1 = y1N * Sy + Cy
x2 = x2N * Sx + Cx
y2 = y2N * Sy + Cy
% Plot original data with class colors.
subplot(2, 2, 4);
plot(x(classIndexesN == 1), y(classIndexesN == 1), 'r.', 'MarkerSize', markerSize); % Plot class 1 in red
grid on;
hold on;
plot(x(classIndexesN == 2), y(classIndexesN == 2), 'b.', 'MarkerSize', markerSize); % Plot class 2 in blue
plot(x1, y1, 'm+', 'MarkerSize', 100, 'LineWidth', 3)
plot(x2, y2, 'm+', 'MarkerSize', 100, 'LineWidth', 3)
caption = sprintf('2 clusters and their centroids\nas determined from normalized data')
title(caption, 'FontSize',fontSize)
Here I show you how to normalize the data, do kmenas, and then unnormalize the centroid to the original, unnormalized location. But you need to realize that what class a point is assigned by kmeans depends on whether the data was normalized or not. Look at the data in the upper left and lower left -- which dots are in each class changes. The classes it decided to assign are different in the two cases, and thus the centroids are in different locations. In a case like this where the x axis values are so huge compared to the y values, it's probably best to normalize the data first, and then unnormalize if you want the centroids back in the original scale space.

Asked:

on 21 Oct 2022

Answered:

on 28 Oct 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!