How can I create new variables based on groups?
Show older comments
Hello everyone,
I want to create new variables in order to perform a t-test based on the group membership of my subjects. I have this code here:
clearvars
close all
filepath = ['filepath'];
T =readtable('filename');
G = findgroups(T(:,1))
if G == 1
X = T(:,:)
else G == 2
Y = T(:,:)
end
I am encountering the following problem: It does not work. I will only get table T again for Y and not what I want, two entirely seperate tables based on whether a subject is in group 1 or 2. Any help or tips would be appreciated.
Thank you
18 Comments
Rik
on 27 Apr 2020
If you set a breakpoint you will see what is happening: only one of the branches will be executed.
It is a common mistake that people make: if you use an array as the conditional in an if-statement, it may not do what you expect. Either use a loop or an array operation.
If you want specific help: share your data or write code that will generate plausible data.
Hannah_Mad
on 27 Apr 2020
Stephen23
on 27 Apr 2020
Using if is a red herring and rather unsuitable. The MATLAB way would be to use logical indexing, e.g.:
G = findgroups(T(:,1))
X = T(G==1,:);
Y = T(G==2,:);
But note that splitting up your table into separate variables is unlikely to be required, nor a good approach. The recommended approach is to use the Split-Apply-Combine Workflow on one table:
Hannah_Mad
on 27 Apr 2020
Hannah_Mad
on 27 Apr 2020
Walter Roberson
on 27 Apr 2020
What happens when you try to use ttest ?
Hannah_Mad
on 27 Apr 2020
Stephen23
on 27 Apr 2020
You called ttest with no input arguments, thus the error. You forgot to use @ to create a function handle:
splitapply(@ttest,...)
% ^ you forgot this
Hannah_Mad
on 27 Apr 2020
Stephen23
on 27 Apr 2020
Hannah_Mad's "Answer" moved here:
Well. I keep getting error messages, different ones though.
clearvars
close all
filepath = ['C:\Hannah\ET\Statistik'];
T = readtable('ET2mat.csv');
F = rmmissing(T)
[row col] = size(F)
G = findgroups(F(:,1))
for n = 2:col
fprintf('This is column %d. \n' , n)
splitapply(@ttest,F,G)
end
Will result in:
Error using splitapply (line 132)
Applying the function 'ttest' to the 1st group of data generated the following error:
Undefined function 'minus' for input arguments of type 'cell'.
Error in test (line 12)
splitapply(@ttest,F,G)
So what can I do from here - I do in fact have negative values in my table. Is that the reason?
Stephen23
on 27 Apr 2020
"I do in fact have negative values in my table. Is that the reason?"
The actual reason is your data file, which is imported as character, not as numeric. The reasons are:
- The file is typical of regions which use a decimal comma, namely tab-separated values (and a misleading .CSV file extension). Whilst readtable can cope with the tab delimiter, it cannot parse decimal commas.
- single quotes around all "numeric" values. I cannot image what badly written application does that.
Because of these, readtable imports that data (which you think is numeric) as character vectors in cell vectors, complete with single quotes. You can check this quite easily (because you did not upload a sample file I had to create it myself based on your earlier comment, attached, including column headers):
>> T = readtable('test.txt','delimiter','\t')
T =
AA BB CC DD EE FF GG HH II JJ KK
__ __________ __________ _______ _______ ________________ ________________ _______________ _______________ ________________ _______________
1 ''0,1188'' ''0,1103'' ''1,4'' ''1,3'' ''-13,00950292'' ''-1,000894239'' ''3,728322672'' ''12,81289888'' ''0,468820547'' ''1,169608552''
1 ''0,1103'' ''0,2376'' ''1,3'' ''2,8'' ''-11,8'' ''-2'' ''3,6'' ''13,4'' ''-0,9'' ''2,9''
1 ''0,1313'' ''0,1717'' ''1,3'' ''1,7'' ''-13,28540783'' ''-3,043789654'' ''1,401630356'' ''13,32603837'' ''-2,987182197'' ''0,545827005''
1 ''0,0971'' ''0,0883'' ''1,1'' ''1'' ''-15,71450602'' ''-3,962745391'' ''3,050642807'' ''13,45261762'' ''-1,497263892'' ''3,083489585''
2 ''0,295'' ''0,295'' ''2,8'' ''2,8'' ''-14,5881751'' ''-2,603528618'' ''3,518819139'' ''14,33740562'' ''-1,870682366'' ''3,525744346''
2 ''0,0883'' ''0,0883'' ''1'' ''1'' ''-12,86394769'' ''-5,766465114'' ''3,120227299'' ''13,97601291'' ''-4,209455419'' ''3,276772679''
2 ''0,2191'' ''0,402'' ''2'' ''3,3'' '''' '''' '''' '''' '''' ''''
2 ''0,1424'' ''0,1442'' ''1,6'' ''1,5'' ''-17,17220026'' ''2,691067249'' ''6,865599728'' ''14,59057189'' ''4,206039042'' ''5,34181054''
2 '''' '''' '''' '''' ''-13,1'' ''-4,9'' ''1,5'' ''12,7'' ''-2,7'' ''3,1''
>> cellfun(@class,T.BB,'uni',0)
ans =
'char'
'char'
'char'
'char'
'char'
'char'
'char'
'char'
'char'
>> +T.BB{1} % first and last characters are single-quotes.
ans =
39 48 44 49 49 56 56 39
Essentially you have two choices:
- write or edit the file so that all numeric data are written without single quotes and using decimal points, then efficiently import the whole file in one step using readtable, or
- parse those character vectors inside of MATLAB, replacing the decimal commas with decimal points and then converting to numeric. Not particularly efficient, but it can work with your existing data files, e.g.:
T.KKnum = str2double(strrep(strrep(T.KK,'''',''),',','.'));
You can then apply numeric functions to that numeric data. I recommend that you use the variable names to refer to the data columns, rather than indexing.
Hannah_Mad
on 28 Apr 2020
Hannah_Mad
on 28 Apr 2020
Have you also gotten rid of the quotes in your text file?
This line:
[h, p ] = splitapply(@ttest,F,G)
would pass every column within F to ttest at once, as separate arguments. If you want to consider each column individually, you could use
for n = 2:col
fprintf('This is column %d. \n' , n)
[h, p ] = splitapply(@ttest,F{:,n},G)
end
(although this does use indexing rather than variable names.)
Then, the last argument to splitapply must be G, so you cannot have
[h, p ] = splitapply(@ttest,F{:,n},G,'Alpha',0.05 )
because of the 'Alpha' and 0.05. splitapply thinks the 0.05 specifies the group numbers, which is not allowed because the group numbers need to be positive integers. If you want, you could use this syntax:
[h, p ] = splitapply(@(x,y) ttest(x,y,'Alpha',0.05),F{:,n},?,G)
or this syntax:
[h, p ] = splitapply(@(x,m) ttest(x,m,'Alpha',0.05),F{:,n},?,G)
both of which are explained in the documentation for ttest, but this would require you to pass a y or m to ttest, perhaps in place of the ?s above. However, the default alpha value is 0.05, so you shouldn't need to provide it anyway.
(edit) You can only choose and vote for answers, but so far everything here is a comment.
Walter Roberson
on 28 Apr 2020
Group numbers must be a vector of positive integers, and cannot be a sparse vector.
You could get that if your G is empty. Check whether F is empty.
Hannah_Mad
on 28 Apr 2020
Walter Roberson
on 28 Apr 2020
What is class(F{:,1}) ? What is size(F{:,1}) ? What is size(G) ?
Hannah_Mad
on 29 Apr 2020
Answers (0)
Categories
Find more on Workspace Variables and MAT Files in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!