File Exchange

## Clustering via Jenks Natural Breaks

version 1.0.1 (3.51 KB) by
Example on using the Jenks Natural Breaks method to cluster a one-dimensional data array into two classes.

Updated 10 May 2020

View Version History

Jenks Natural Breaks is a data clustering method. It is an optimization process that finds the best arrangement of values into different classes. It can be used for step-change detection in noisy data. In this example, a one-dimensional array of noisy values is used. The method is applied to the array to find the index of the interface separating the high and low values.

### Cite As

MS (2021). Clustering via Jenks Natural Breaks (https://github.com/MSH19/Clustering-via-Jenks-Natural-Breaks-Matlab), GitHub. Retrieved .

Sim

Ops, I forgot one line of code... it is now fixed:
\\
\\
clc; clear output sub_array;
input = [1,1,2,3,10,11,13,67,71];
classes = 4;
for i = 1 : classes-1
if i == 1
data = input;
elseif i > 1
data = remaining_elements;
end
total = length (data);
[SDCM_All, GF] = get_jenks_interface(data);
[M, I1] = max(GF);
sub_array{i} = data(I1+1:total);
remaining_elements = data (1:I1);
end
output = vertcat({data(1:I1)}, flipud(sub_array'));
output{:}
\\
\\
The result with
classes = 4;
is the following
\\
\\
ans =
1 1
ans =
2 3
ans =
10 11 13
ans =
67 71

MS

Hello, Thanks for your comments. You can easily make this work for several classes by updating the input array after each iteration. For example, for four classes:

data = [1,1,2,3,10,11,13,67,71];
total = length (data);

%% Extract elements of class 4
% 1- Split the input array into two classes based on Jenks Natural Breaks
[SDCM_All, GF] = get_jenks_interface(data);
% 2- get the interface: index of maximum Goodness of Variance Fit
[M, I1] = max(GF);
% 3- extract sub_array 4 (class 4)
sub_array_4 = data(I1+1:total);
% 4- get the reamining elements
remaining_elements = data (1:I1);
total = length(remaining_elements);

%% Extract elements of class 3
% 1- Split the remaining elements into two classes based on Jenks natural breaks
[SDCM_All, GF] = get_jenks_interface(remaining_elements);
% get the interface: index that has the maximum Goodness of Variance Fit
[M, I2] = max(GF);
% extract sub_array_3 (class 3)
sub_array_3 = data(I2+1:total);
% get the reamining elements
remaining_elements = data (1:I2);
total = length(remaining_elements);

%% Extract elements of class 2
% Split the remaining elements into two classes based on Jenks natural breaks
[SDCM_All, GF] = get_jenks_interface(remaining_elements);
% get the interface: index that has the maximum Goodness of Variance Fit
[M, I1] = max(GF);
% extract sub_array_2 (class 2)
sub_array_2 = data(I1+1:total);

%% Extract elements of class 1
sub_array_1 = data(1:I1);

%% Display the result of classes 1 to 4
disp(sub_array_4);
disp(sub_array_3);
disp(sub_array_2);
disp(sub_array_1);

Output:

67 71

10 11 13

2 3

1 1

Sim

Hi, your code is working for 2 or 3 classes, but for 4 classes, I am not sure about the result:

COMPACT CODE:
clc; clear output sub_array;
input = [1,1,2,3,10,11,13,67,71];
classes = 4;
for i = 1 : classes-1
if i == 1
data = input;
elseif i > 1
data = remaining_elements;
end
total = length (data);
[SDCM_All, GF] = get_jenks_interface(data);
[M, I1] = max(GF);
sub_array{i} = data(I1+1:total);
end
output = vertcat({data(1:I1)}, sub_array');
output{:}

RESULT (with "classes = 4"):
ans =
1 1 2 3
ans =
67 71
ans =
10 11 13
ans =
10 11 13

Is there a way to make this work for three classes instead of only 2?

MS

Thanks for your comment, Roberto. Yes, you are right, it should be: class_2 = Array(i+1:total);

Roberto

Thank you very much for this code, it is great. Just one note: I would expect line 14 in get_jenks_interface.m to read like this:
class_2 = Array(i+1:total);

rather than
class_2 = Array(i:total);

Am I right?

##### MATLAB Release Compatibility
Created with R2018a
Compatible with any release
##### Platform Compatibility
Windows macOS Linux