Create index vector from grouping variable

I am using grp2inx command to convert categorical data into numbers that I can find correlation between the variables. When matlab index them automatically, it substitute the categories based on their order in the table. In other words {'No Damage'} {'Destroyed'} {'1-9%'} {'10-25%'} {'51-75%'} {'26-50%'} are replaced with 1 2 3 4 5 6, respectively. I want to replace them meaningfully; 1 instead of {'No Damage'}, 2 instead of {'1-9%'} ......and 6 instead {'Destroyed'}. How can I do that? Here is what I use, but doesnt work
gL1={'No Damage','1-9%','10-25%','26-50%','21-75%','Destroyed'};
[g1,gN1,gL1] = grp2idx(Table.Damage_rate);
Thank you

 Accepted Answer

dpb
dpb on 11 Oct 2018
Edited: dpb on 11 Oct 2018
gr2idx is used with the Statistics Toolbox implementation of a categorical variable type. Unfortunately, while it was useful and ahead of the ultimate native categorical data type later introduced, it is now deprecated and use is discouraged.
Use the base categorical data type and findgroups combined with splitapply to process table data by groups:
gL1={'No Damage','1-9%','10-25%','26-50%','21-75%','Destroyed'};
gC1=categorical(gL1,gL1,'ordinal',1); % create ordinal categorical variable with names given
>> [g,id]=findgroups(gC1) % find the groups and the group id associated with...
g =
1 2 3 4 5 6
id =
1×6 categorical array
No Damage 1-9% 10-25% 26-50% 21-75% Destroyed
>>
ADDENDUM
Hmmm...I thought the computational routines ought to have been amended to handle ordinal categorical variables but apparently that was a unwarranted assumption.
To get the numeric values, just use
Table.Damage=double(Table.Damage_rate);
once you've created the categorical variable--make a new variable for it; no sense in destroying the other.

5 Comments

Thank you for your response. I see that now we have the orders right. How does this replaces the categories with numbers in my table though?
Thank you
dpb
dpb on 11 Oct 2018
Edited: dpb on 11 Oct 2018
I'm sorry, I guess I didn't follow what you were driving at--why should you want to replace the categories with numbers? I would think (without knowing precisely what it is your end goal is) that any analysis you would wish could be accomplished more simply with the categorical datatype.
I just happened to notice that the fifth category overlaps 3 and 4? Is that correct or is there a typo here? That kinda' makes ordinal mean "not so much" if were really so, it would seem.
Thank you for catching that! its a typo. Its 51-75%.
Regarding the analysis I am trying to do, I have several predictors and my response variable is the damage rate. All are categorical data but can be meaningfully replaced with numbers ( Similar to damage rate, the higher the number is, the more severe the damage is). I want calculate the correlation between the predictors and the response value by replacing them with numbers.
Thank you for your time
Here is what I think should work but doesn't! no error or anything, simply no change! I simply replace the strings with another string and then make the new strings numbers!
strrep(Tubbs.Damage_rate(:), 'No Damage','1');
strrep(Tubbs.Damage_rate(:), '1-9%','2');
strrep(Tubbs.Damage_rate(:), '10-25%','3');
strrep(Tubbs.Damage_rate(:), '26-50%','4');
strrep(Tubbs.Damage_rate(:), '51-75%','5');
strrep(Tubbs.Damage_rate(:), 'Destroyed ','6');
str2num(Tubbs.Damage_rate(:));
See the addendum/update to the Answer I posted last night...convert the string variables to ordinal categorical variables as shown initially then double().
There's "no change" in the above because you didn't assign the results to anything, but that's the harder and less pleasing way; the character strings are categorical variables; use the facility Matlab provides for the purpose.

Sign in to comment.

More Answers (0)

Categories

Products

Release

R2018a

Asked:

on 11 Oct 2018

Commented:

dpb
on 12 Oct 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!