I need to optimize this slow code. I am Working with three tables with uneven rows and same columns. Missing data already accounted for but the code runs slow.

1 view (last 30 days)
Hi,
I am working on importing and comparing large chunks of data shown as tables. I have 3 .txt files which I want to import as tables. The problem is that,to concatenate them, I need to add NaNs to have variables with the same size to perform operations in Matlab (using the rs variable). The three tables are table C which has a size of 4873x15, table D with 5117x15 and table E with 5216 x 15. To import these, I have a bunch of missing data and variable types, so I used the import wizard to generate a function (importpsm) strictly for these sort of tables. Lastly, I want to save these tables so I don't have to run this code every time I need to analyze other variables. I have all the toolboxes available right now.
Here is the code, it is part of a for loop to access different txt files pertaining to three output files (B) It would be nice to have increasing tables with increasing B, but I have not found a way to do this yet.
...for loop starts after I define k as 1,A as 2, and B as 3...
elseif A == 2
filePattern = fullfile(myFolder, '*PSMs*.txt');
txtFiles = dir(filePattern);
for k = 1
while B == 3
C=importpsm(txtFiles(k).name);
D=importpsm(txtFiles(k+1).name);
E=importpsm(txtFiles(k+2).name);
rs=array2table(nan(max([size(C) size(D) size(E)]),width(C)),'variablenames',C.Properties.VariableNames);
rs(1:height(C),1:end)=C
end
end
  13 Comments
Amanda Figueroa
Amanda Figueroa on 26 Nov 2018
To address your second comment:
I wish to get a p-value for each of the individual rows. I tried ttest2 successfully but it gives me one general p value for all rows.
I chose to use the mattest function because I want the individual p values. However, it doesnt work if my rows are uneven for each trial. When I put the nan rows in, I get a p-value of zero for every row.
With p-values of zero for every row (using mattest), my volcano plot throws an error because it ignores zeros and nothing can be plotted.
So I am stuck at this point and am waiting for authorization to send the files and see if we can figure this out :(
dpb
dpb on 26 Nov 2018
Edited: dpb on 26 Nov 2018
OK. If you don't want to post files/data in open forum, you can contact me directly thru my link -- I'll be glad to look at them offline and keep any confidentiality needs--I spent 30+ yrs in consulting gig so know that routine well... :)
I need to know more about what we're trying to test as the null hypothesis to understand what p value is needed...and thus how to generate it.

Sign in to comment.

Answers (1)

Amanda Figueroa
Amanda Figueroa on 23 Nov 2018
Hmm... It seems that my txtFiles variable is type struct, so I guess that is why length is working right now. The experiments are numbered, so it seems like these are placed in order right now. screenshot.png
  2 Comments
dpb
dpb on 24 Nov 2018
Yes, dir returns a 1D struct array so length does work there--I was just making a side note that the particular function can surprise on occasion.
The order ML returns (on Windows anyway) is alphabetical, yes. Again, I was just noting that while it's that way now and this particular set works, neither is really guaranteed.
As an example of what can go wrong, after running
for i=1:20
fid=fopen(num2str(i,'T%d.txt'),'w');
fprintf(fid,'%s\n','Record');
fid=fclose(fid);
end
d=dir('T??.txt');
for i=1:length(d),disp(d(i).name),end
T1.txt
T10.txt
T11.txt
T12.txt
...
T18.txt
T19.txt
T2.txt
T20.txt
T3.txt
T4.txt
...
which alphabetic but not natural order...the Current Folder window in ML would show those in natural order, however.
Again, "just sayin'!" :) It's possible your code will work just fine for the limited universe in which it needs to operate; just making you aware that "stuff happens!".
Amanda Figueroa
Amanda Figueroa on 26 Nov 2018
Thanks!
Sorry I haven't written back. I appreciate your helpful comments, I know I will use them in the future. Wrangling the data is my actual test to see if I am the best candidate to analyze data for my lab (I am not part of it yet). I have no problem with seeking your assistance (which I am truly grateful for! :) ) but I need to verify if I can share complete files before actually doing so.
I will get back to you by the end of the day.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!