Hello everyone, I want to create new variables in order to perform a t-test based on the group membership of my subjects. I have this code here: clearvars close all filepath = ['filepath']; T =readtable('filename'); G = findgroups(T(:,1)) if G == 1 X = T(:,:) else G == 2 Y = T(:,:) end I am encountering the following problem: It does not work. I will only get table T again for Y and not what I want, two entirely seperate tables based on whether a subject is in group 1 or 2. Any help or tips would be appreciated. Thank you

How can I create new variables based on groups?

Rik on 27 Apr 2020

If you set a breakpoint you will see what is happening: only one of the branches will be executed.

It is a common mistake that people make: if you use an array as the conditional in an if-statement, it may not do what you expect. Either use a loop or an array operation.

If you want specific help: share your data or write code that will generate plausible data.

Hannah_Mad on 27 Apr 2020

Open in MATLAB Online

Thank you Rik,

Please see below an excerpt from my data.

'0,1188'	'0,1103'	'1,4'	'1,3'	'-13,00950292'	'-1,000894239'	'3,728322672'	'12,81289888'	'0,468820547'	'1,169608552'
'0,1103'	'0,2376'	'1,3'	'2,8'	'-11,8'	'-2'	'3,6'	'13,4'	'-0,9'	'2,9'
'0,1313'	'0,1717'	'1,3'	'1,7'	'-13,28540783'	'-3,043789654'	'1,401630356'	'13,32603837'	'-2,987182197'	'0,545827005'
'0,0971'	'0,0883'	'1,1'	'1'	'-15,71450602'	'-3,962745391'	'3,050642807'	'13,45261762'	'-1,497263892'	'3,083489585'
'0,295'	'0,295'	'2,8'	'2,8'	'-14,5881751'	'-2,603528618'	'3,518819139'	'14,33740562'	'-1,870682366'	'3,525744346'
'0,0883'	'0,0883'	'1'	'1'	'-12,86394769'	'-5,766465114'	'3,120227299'	'13,97601291'	'-4,209455419'	'3,276772679'
'0,2191'	'0,402'	'2'	'3,3'	''	''	''	''	''	''
'0,1424'	'0,1442'	'1,6'	'1,5'	'-17,17220026'	'2,691067249'	'6,865599728'	'14,59057189'	'4,206039042'	'5,34181054'
''	''	''	''	'-13,1'	'-4,9'	'1,5'	'12,7'	'-2,7'	'3,1'

If I try and use a loop:

clearvars
close all
filepath = ['C:\Hannah\ET\Statistik'];
T =readtable('ET2mat.csv');
G = findgroups(T(:,1))
for k = 1:44
    if T(:,1) == 1 
        x = T(:,:)
    else T(:,1) == 2
        y = T(:,:)
    end
end

I will get the following error message: Undefined operator '==' for input arguments of type 'table'.

So what do I need to do? Make it an array? I understand that maybe this will not work because the grouping variable is not a vector but part of the table.

Thank you

Stephen23 on 27 Apr 2020

Open in MATLAB Online

Using if is a red herring and rather unsuitable. The MATLAB way would be to use logical indexing, e.g.:

G = findgroups(T(:,1))
X = T(G==1,:);
Y = T(G==2,:);

But note that splitting up your table into separate variables is unlikely to be required, nor a good approach. The recommended approach is to use the Split-Apply-Combine Workflow on one table:

https://www.mathworks.com/help/matlab/matlab_prog/grouping-variables-for-splitting-data.html

Hannah_Mad on 27 Apr 2020

Thank you very much!

Hannah_Mad on 27 Apr 2020

I use splitapply for most things, such as mean, standard deviation etc., however, it does not work for the t-test - do you have another suggestion for this perhaps? Thank you.

Walter Roberson on 27 Apr 2020

What happens when you try to use ttest ?

Hannah_Mad on 27 Apr 2020

Open in MATLAB Online

So this is my code then:

G = findgroups(T(:,1))
splitapply(ttest,(T(:,2)), G)

Whiich will result in this error message:

Not enough input arguments.

Error in ttest (line 124)

dim = find(size(x) ~= 1, 1);

Error in test (line 7)

splitapply(ttest,(T(:,2)), G)

>>

Stephen23 on 27 Apr 2020

Open in MATLAB Online

You called ttest with no input arguments, thus the error. You forgot to use @ to create a function handle:

splitapply(@ttest,...)
%          ^ you forgot this

Hannah_Mad on 27 Apr 2020

Thank you very much!

However, I still get the following error:

Error using splitapply (line 132)

Applying the function 'ttest' to the 1st group of data generated the following error:

Undefined function 'isnan' for input arguments of type 'cell'.

Error in test (line 7)

splitapply(@ttest,(T(:,11)), G)

Stephen23 on 27 Apr 2020

Open in MATLAB Online

Hannah_Mad's "Answer" moved here:

Well. I keep getting error messages, different ones though.

clearvars
close all
filepath = ['C:\Hannah\ET\Statistik'];
T = readtable('ET2mat.csv');
F = rmmissing(T)
[row col] = size(F)
G = findgroups(F(:,1))
for n = 2:col
    fprintf('This is column %d. \n' , n)
    splitapply(@ttest,F,G)
end 

Will result in:

Error using splitapply (line 132)

Applying the function 'ttest' to the 1st group of data generated the following error:

Undefined function 'minus' for input arguments of type 'cell'.

Error in test (line 12)

splitapply(@ttest,F,G)

So what can I do from here - I do in fact have negative values in my table. Is that the reason?

Stephen23 on 27 Apr 2020

Open in MATLAB Online

test.txt

"I do in fact have negative values in my table. Is that the reason?"

The actual reason is your data file, which is imported as character, not as numeric. The reasons are:

The file is typical of regions which use a decimal comma, namely tab-separated values (and a misleading .CSV file extension). Whilst readtable can cope with the tab delimiter, it cannot parse decimal commas.
single quotes around all "numeric" values. I cannot image what badly written application does that.

Because of these, readtable imports that data (which you think is numeric) as character vectors in cell vectors, complete with single quotes. You can check this quite easily (because you did not upload a sample file I had to create it myself based on your earlier comment, attached, including column headers):

>> T = readtable('test.txt','delimiter','\t')
T = 
    AA        BB            CC          DD         EE              FF                  GG                 HH                 II                  JJ                 KK       
    __    __________    __________    _______    _______    ________________    ________________    _______________    _______________    ________________    _______________
    1     ''0,1188''    ''0,1103''    ''1,4''    ''1,3''    ''-13,00950292''    ''-1,000894239''    ''3,728322672''    ''12,81289888''    ''0,468820547''     ''1,169608552''
    1     ''0,1103''    ''0,2376''    ''1,3''    ''2,8''    ''-11,8''           ''-2''              ''3,6''            ''13,4''           ''-0,9''            ''2,9''        
    1     ''0,1313''    ''0,1717''    ''1,3''    ''1,7''    ''-13,28540783''    ''-3,043789654''    ''1,401630356''    ''13,32603837''    ''-2,987182197''    ''0,545827005''
    1     ''0,0971''    ''0,0883''    ''1,1''    ''1''      ''-15,71450602''    ''-3,962745391''    ''3,050642807''    ''13,45261762''    ''-1,497263892''    ''3,083489585''
    2     ''0,295''     ''0,295''     ''2,8''    ''2,8''    ''-14,5881751''     ''-2,603528618''    ''3,518819139''    ''14,33740562''    ''-1,870682366''    ''3,525744346''
    2     ''0,0883''    ''0,0883''    ''1''      ''1''      ''-12,86394769''    ''-5,766465114''    ''3,120227299''    ''13,97601291''    ''-4,209455419''    ''3,276772679''
    2     ''0,2191''    ''0,402''     ''2''      ''3,3''    ''''                ''''                ''''               ''''               ''''                ''''           
    2     ''0,1424''    ''0,1442''    ''1,6''    ''1,5''    ''-17,17220026''    ''2,691067249''     ''6,865599728''    ''14,59057189''    ''4,206039042''     ''5,34181054'' 
    2     ''''          ''''          ''''       ''''       ''-13,1''           ''-4,9''            ''1,5''            ''12,7''           ''-2,7''            ''3,1''        
>> cellfun(@class,T.BB,'uni',0)
ans = 
    'char'
    'char'
    'char'
    'char'
    'char'
    'char'
    'char'
    'char'
    'char'
>> +T.BB{1} % first and last characters are single-quotes.
ans =
    39    48    44    49    49    56    56    39

Essentially you have two choices:

write or edit the file so that all numeric data are written without single quotes and using decimal points, then efficiently import the whole file in one step using readtable, or
parse those character vectors inside of MATLAB, replacing the decimal commas with decimal points and then converting to numeric. Not particularly efficient, but it can work with your existing data files, e.g.:

T.KKnum = str2double(strrep(strrep(T.KK,'''',''),',','.'));

You can then apply numeric functions to that numeric data. I recommend that you use the variable names to refer to the data columns, rather than indexing.

Hannah_Mad on 28 Apr 2020

Open in MATLAB Online

So, unfortunately it is still not working. Dataset will be provided.

This is my code:

clearvars
close all
filepath = ['C:\Hannah\ET\Statistik'];
N = readtable('ET2mat.txt','delimiter','\t')
NNUM = str2double(strrep(strrep(N.Gruppe, N.CV_left, N.CV_right, N.Amplitude_left, N.Amplitude_right, N.x_left, N.y_left, N.z_left, N.x_right, N.y_right, N.z_right),',','.'));
F = rmmissing(NNUM)
[row col] = size(F)
N = F(:,1:col)
G = findgroups(N(:,1))
splitapply(@ttest,N,G)
for n = 2:col
    fprintf('This is column %d. \n' , n)
    splitapply(@ttest,F,G)
end 

The error will always be

Error using strrep

Too many input arguments.

Any ideas on that?

Also: how can I chose any of your answers and rate them? I heard I am supposed to do that but it won't work here.

Thank you for your help. I know I am a beginner to MATLAB but it is quite tedious.

Hannah_Mad on 28 Apr 2020

Open in MATLAB Online

I now did change the commas in excel to dots - so far everything seems fine but I seem to be getting a different error message.

This is my code now:

clearvars
close all
filepath = ['C:\Hannah\ET\Statistik'];
N = readtable('ET2mat.txt','delimiter','\t')
F = rmmissing(N)
[row col] = size(F)
G = findgroups(F{:,1})
for n = 2:col
    fprintf('This is column %d. \n' , n)
    [h, p ] = splitapply(@ttest,F,G,'Alpha',0.05 )
end 

The error I get is:

Error using splitapply (line 87)

Group numbers must be a vector of positive integers, and cannot be a sparse vector.

Error in test (line 14)

[h, p ] = splitapply(@ttest,F,G,'Alpha',0.05 )

Tommy on 28 Apr 2020

Edited: Tommy on 28 Apr 2020

Open in MATLAB Online

Have you also gotten rid of the quotes in your text file?

This line:

[h, p ] = splitapply(@ttest,F,G)

would pass every column within F to ttest at once, as separate arguments. If you want to consider each column individually, you could use

for n = 2:col
    fprintf('This is column %d. \n' , n)
    [h, p ] = splitapply(@ttest,F{:,n},G)
end 

(although this does use indexing rather than variable names.)

Then, the last argument to splitapply must be G, so you cannot have

[h, p ] = splitapply(@ttest,F{:,n},G,'Alpha',0.05 )

because of the 'Alpha' and 0.05. splitapply thinks the 0.05 specifies the group numbers, which is not allowed because the group numbers need to be positive integers. If you want, you could use this syntax:

[h, p ] = splitapply(@(x,y) ttest(x,y,'Alpha',0.05),F{:,n},?,G)

or this syntax:

[h, p ] = splitapply(@(x,m) ttest(x,m,'Alpha',0.05),F{:,n},?,G)

both of which are explained in the documentation for ttest, but this would require you to pass a y or m to ttest, perhaps in place of the ?s above. However, the default alpha value is 0.05, so you shouldn't need to provide it anyway.

(edit) You can only choose and vote for answers, but so far everything here is a comment.

Walter Roberson on 28 Apr 2020

Group numbers must be a vector of positive integers, and cannot be a sparse vector.

You could get that if your G is empty. Check whether F is empty.

Hannah_Mad on 28 Apr 2020

Thank you very much for your kind explanations and detailed information. However I am not entirely sure that this script does what I believe it does: compare the means of two groups (1 and 2, hence the splitapply approach) - as I get two h-values for each column. Shouldn't it be only one value? As there are two groups being compared per column. Do you have any idea about that?

Again, I can only apologize for my basic questions.

Thank you!

Walter Roberson on 28 Apr 2020

What is class(F{:,1}) ? What is size(F{:,1}) ? What is size(G) ?

Hannah_Mad on 29 Apr 2020

Hello Walter,

I got the following:

class(F{:,1}) : double

size(F{:,1}) 38 1

size(G) 38 1

I think that is alright, isn't it?

Thank you,

Hannah

How can I create new variables based on groups?

18 Comments
Show 16 older comments Hide 16 older comments

Answers (0)

Categories

Tags

Community Treasure Hunt

How can I create new variables based on groups?

18 Comments Show 16 older comments Hide 16 older comments

Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

18 Comments
Show 16 older comments Hide 16 older comments