Shorten the values from a column in another column.

1 view (last 30 days)
Hello,
I have a column with 4 million values. These are categorized in 800 unique values, for example 01.010, 01.020, 05.010, 05.020 etc. I want to make a new column that shortens the unique values. So instead of it containing 01.010 and 01.020 values, I want the new column to say 01 when the values are between 01.000 and 01.999.
And the same for 02.000 .... to 02.999. So this should only assign it the value 02 instead.
I can´t figure out the way to do this and was hoping someone could help me with the coding.
Thank you!
  2 Comments
the cyclist
the cyclist on 25 May 2021
Are your inputs numerical values, or text? (I'm asking because MATLAB would not ordinarily display a number as 01.010.)
Can you give a small example of an input and expected output? And perhaps you could upload your full dataset in a MAT file, using the paperclip symbol in from the INSERT section of the toolbar.
Amina Ag
Amina Ag on 25 May 2021
The datafile is text but i have exported a short version of it into excel (also added here). Some of the variables are cell and have been transformed to categorical variables. So the specific one I am talking about here is item_50108 which is industry code. And I would like to shorten them from being very specific industries to only containing the main industry code, which would be the 2 first values of the 5.

Sign in to comment.

Accepted Answer

the cyclist
the cyclist on 25 May 2021
Edited: the cyclist on 26 May 2021
The specific solution is potentially very dependent on ...
  • the input format -- there are several way to read in Excel, leading to different variables types in the workspace
  • what exactly you want as an output format -- numeric, character array, string?
It might be better to upload a MAT file with what you already have in your MATLAB workspace, instead of an export of something you have already imported.
However, here is a solution that reads in the Excel, and convert the longer character array to a character with only the numeric characters prior to the decimal point.
% Read the Excel file [This points to the file you uploaded here.]
B = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/629905/B.xlsx');
% Extract the regular expresion
out = regexp(B.item_50108,'\d*(?=\.)','match');
% Display the first value, as a check
disp(out{1})
{'63'}
  3 Comments
the cyclist
the cyclist on 26 May 2021
So, the problem is that the output of my command is a cell array of 1x1 cell arrays. (I'm not clear why that is, to be frank.)
I am certain there is a better solution, but here is a quick fix:
% data = readtable('B.txt'); % You should use this instead of the next line, since you have the file locally
data = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/630650/B.txt');
cellOfCells = regexp(data.item_50108,'\d*(?=\.)','match');
cellOfChar = cellfun(@(x)x{:},cellOfCells,'UniformOutput',false);
out = categorical(cellOfChar)
out = 30×1 categorical array
63 51 00 45 51 22 70 22 60 22 70 50 52 51 50 15 63 55 28 52 52 45 51 45 45 51 52 52 70 67

Sign in to comment.

More Answers (1)

the cyclist
the cyclist on 25 May 2021
If these are numeric values, I think maybe you can do
output_values = unique(floor(input_values));
But see my comment above, for clarifications.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!