Fast subsetting or indexing of data

I am working with large datasets which I am subsetting into various categories and saving as smaller files. What I am doing right now is working but it is quite time consuming and error prone, as it involved a lot of copy and paste.
For example, I have many files I have split into those with boats and those without boats. I then split those into season. Would there be a faster way to do this where I apply the same command to prescribed set of variables?
%% Comparisons... Season using water temp
boatsAbsent_t=boatsAbsent.Var1; %time variables
[BA_spring, BA_summer, BA_autumn, BA_winter]=indexSeasons(boatsAbsent_t); %index times into seasons
boatsPresent_t=boatsPresent.Var1;
[BP_spring, BP_summer, BP_autumn, BP_winter]=indexSeasons(boatsPresent_t);
%Subset PSD outputs and write to file
S=withtol(BA_spring,seconds(1));
BA_spring=boatsAbsent(S,:);
writetable(timetable2table(BA_spring),...
fullfile(folder,strcat(site,'_PSD_boatsAbsent_Spring.csv')));
S=withtol(BA_summer,seconds(1));
BA_summer=boatsAbsent(S,:);
writetable(timetable2table(BA_summer),...
fullfile(folder,strcat(site,'_PSD_boatsAbsent_Summer.csv')));
S=withtol(BA_autumn,seconds(1));
BA_autumn=boatsAbsent(S,:);
writetable(timetable2table(BA_autumn),...
fullfile(folder,strcat(site,'_PSD_boatsAbsent_Autumn.csv')));
S=withtol(BA_winter,seconds(1));
BA_winter=boatsAbsent(S,:);
writetable(timetable2table(BA_winter),...
fullfile(folder,strcat(site,'_PSD_boatsAbsent_Winter.csv')));
S=withtol(BP_spring,seconds(1));
writetable(timetable2table(BP_spring),...
fullfile(folder,strcat(site,'_PSD_boatsPresent_Spring.csv')));
S=withtol(BP_summer,seconds(1));
writetable(timetable2table(BP_summer),...
fullfile(folder,strcat(site,'_PSD_boatsPresent_Summer.csv')));
S=withtol(BP_autumn,seconds(1));
writetable(timetable2table(BP_autumn),...
fullfile(folder,strcat(site,'_PSD_boatsPresent_Autumn.csv')));
S=withtol(BP_winter,seconds(1));
writetable(timetable2table(BP_winter),...
fullfile(folder,strcat(site,'_PSD_boatsPresent_Winter.csv')));

3 Comments

Removed from spam filter.
Meta-data is data, and data does not belong in variable names! Sticking meta-data into variable names, e.g. the season names:
BA_spring, BA_summer, BA_autumn, BA_winter
means that you force yourself into writing slow, inefficient code or doing lots of copy-and-paste. Rik correctly recommends that you should put all of your data in arrays, rather than splitting into separated variables.
Awesome, this helps a lot, thank you Stephen! I am glad I asked.

Sign in to comment.

 Accepted Answer

Whenever you find yourself copy-pasting code in Matlab, you should consider an array.
seasons={'Spring','Summer','Autumn','Winter'};
boatsPresent_t=boatsPresent.Var1; %time variables
boatsAbsent_t=boatsAbsent.Var1; %time variables
BP=cell(1,4);BA=cell(1,4);
[BP{:}]=indexSeasons(boatsPresent_t); %index times into seasons
[BA{:}]=indexSeasons(boatsAbsent_t); %index times into seasons
for n=1:numel(seasons)
S=withtol(BP{n},seconds(1));
BP_part=boatsPresent(S,:);
writetable(timetable2table(BP_part),...
fullfile(folder,strcat(site,'_PSD_boatsPresent_',seasons{n},'.csv')));
S=withtol(BA{n},seconds(1));
BA_part=boatsAbsent(S,:);
writetable(timetable2table(BA_part),...
fullfile(folder,strcat(site,'_PSD_boatsAbsent_',seasons{n},'.csv')));
end
If you have more states than just present and absent you should consider putting those states in an array so you can use it to generate logical indices.

5 Comments

Brilliant!! Thank you. I had no idea of this but this is exactly the kind of solution I was looking for. Thank you!
Could you please give me a clue for how to apply this when reading in tables? At the moment, I do it very labouriously...
folder=('Y:\SoundTrap\Boats\Manual Vessel Detections\Tiri PSD subset');
BP_Spring=readtable(fullfile(folder,strcat('Tiritiri_PSD_boatsPresent_',...
'Spring.csv')));
BP_Summer=readtable(fullfile(folder,strcat('Tiritiri_PSD_boatsPresent_',...
'Summer.csv')));
BP_Autumn=readtable(fullfile(folder,strcat('Tiritiri_PSD_boatsPresent_',...
'Autumn.csv')));
BP_Winter=readtable(fullfile(folder,strcat('Tiritiri_PSD_boatsPresent_',...
'Winter.csv')));
BA_Spring=readtable(fullfile(folder,strcat('Tiritiri_PSD_boatsAbsent_',...
'Spring.csv')));
BA_Summer=readtable(fullfile(folder,strcat('Tiritiri_PSD_boatsAbsent_',...
'Summer.csv')));
BA_Autumn=readtable(fullfile(folder,strcat('Tiritiri_PSD_boatsAbsent_',...
'Autumn.csv')));
BA_Winter=readtable(fullfile(folder,strcat('Tiritiri_PSD_boatsAbsent_',...
'Winter.csv')));
Something like...
folder=('Y:\SoundTrap\Boats\Manual Vessel Detections\Tiri PSD subset');
seasons={'Spring','Summer','Autumn','Winter'};
for n=1:numel(seasons)
BP.seasons{n}=readtable(fullfile(folder,strcat('Tiritiri_PSD_boatsPresent_',...
seasons{n},'.csv')));
BA.seasons{n}=readtable(fullfile(folder,strcat('Tiritiri_PSD_boatsAbsent_',...
seasons{n},'.csv')));
end
??
I think my issue is just how to name the variable the table is being stored in? BP.seasons{n} doesn't work.
If you want to have a dynamic field name you need to use this syntax:
name='foo';
S.(name)='bar';
But what is wrong with the code you posted? You shouldn't be storing data (i.e. the season) in a variable name. If you do, that will cause the same issue every time you want to use the variables.

Sign in to comment.

More Answers (0)

Categories

Find more on Data Import and Analysis in Help Center and File Exchange

Products

Release

R2020a

Tags

Asked:

on 29 Sep 2020

Commented:

Rik
on 30 Sep 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!