Filter data by flag columns
5 views (last 30 days)
Show older comments
Below I have some code which uses a for loop to call in each day of data from some radiometry data and processes it sequentially. Now I need to add some line of code so that I can apply the flags that are in the tables. I have attached an example day of data (labeled "Lw.mat", hopefully you're able to open it). At each time point there are two rows of data, this is because there were two instruments measuring, I need to at each time point pick one of these rows to use and omit the other based on the"flag" columns. I want to first apply the "flag light availability" and "flag light quality" so that only rows which pass both of these (has a 1 in both of these columns) are included. Then I want to use the "flag sunglint (Lw)" and "flag sunglint (Rrs)" columns so that if one row shows a 0 in either flag column then it should be excluded and the row with a 1 in at least one flag column should be used, if both rows have 1's in both flag columns then the row with the lowest value in column 13 should be used. If both rows show all 0s then both rows should be excluded.
Basically I'm very lost on how to do this so any help or guidence would be appreciated, I've looked at how to apply conditioning language and if it was just one condition I would probably be alright but trying to apply so many loses me and especially how to relate it to the pair of rows at each time point.
% Specify the folder where the files live.
myFolder = 'C:\Users\tbrob\MATLAB Drive\SO298-MATLAB\MATLAB\Lw';
% Check to make sure that folder actually exists. Warn user if it doesn't.
if ~isfolder(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
uiwait(warndlg(errorMessage));
return;
end
% Get a list of all files in the folder with the desired file name pattern.
filePattern = fullfile(myFolder, '*Lw.dat'); % The folder has 3 different file types; we only want the Es data from each day.
theFiles = dir(filePattern);
% Plotting setup
figure; % Create a new figure
subplot(2,1,1);
hold on; % Enable hold to overlay plots
subplot(2,1,2);
hold on; % Enable hold to overlay plots
% Process each file and plot the data
for k = 1 : length(theFiles)
baseFileName = theFiles(k).name;
fullFileName = fullfile(myFolder, baseFileName);
fprintf(1, 'Now reading %s\n', fullFileName);
% Read the data file
dataArray = readtable(fullFileName);
dataArray = table2array(dataArray);
%add code which at each timepoint, pics the first or second row based
%on if it passes first the sunlight flags and then the sunglint flags
% Extract relevant data for plotting
x = 320:2:950;
y = dataArray(2:end, 13:328);
L = y(1:2:end, :);
R = y(2:2:end, :);
% Plot the data
subplot(2, 1, 1);
plot(x, L);
subplot(2, 1, 2);
plot(x, R);
end
0 Comments
Accepted Answer
Mathieu NOE
on 17 Jul 2023
hello
this is my first attempt , I hope I didn't make any mistake in your requirements interpretation
still there is one case which is maybe not exactly treated as you wish : what are we suppose to do if we have for example one flag ("flag sunglint (Lw)") which has one 1 and the other flag ("flag sunglint (Rrs)") has two 1's ? for the time being my code keep both rows as valid . Only if both rows has both falgs having two 1's we do the min value search in the 13th column
this maybe needs a further improvement of the code
dataArray = Lw;
dataArray = table2array(dataArray);
%add code which at each timepoint, pics the first or second row based
%on if it passes first the sunlight flags and then the sunglint flags
[m,n] = size(dataArray);
out = []; % init
for k = 1:floor(m/2) % loop over 2 lines (rows) of data
r = (1:2)+(k-1)*2; % current 2 rows index
array = dataArray(r,:);
% I need to at each time point pick one of these rows to use and omit the other based on the"flag" columns.
% I want to first apply the "flag light availability" (C7) and "flag
% light quality" (C8) so that only rows which pass both of these (has a 1 in both of these columns)
% are included.
flag_light_availability = array(:,7); % 7th column
flag_light_quality = array(:,8); % 8th column
ind_rows1 = (flag_light_availability>0) & (flag_light_quality>0); % NB I prefer >0 instead of == 1 , it's more robust to round off / precision error
% keep the valid rows
array = array(ind_rows1,:);
% Then I want to use the "flag sunglint (Lw)" (C10) and "flag sunglint (Rrs)" (C11) columns
% the row with a 1 in at least one flag column should be used, if both rows have 1's in both flag columns
% then the row with the lowest value in column 13 should be used.
% If both rows show all 0s then both rows should be excluded.
flag_sunglint_Lw = array(:,10); % 10th column
flag_sunglint_Rrs = array(:,11); % 11th column
ind_rows2 = (flag_sunglint_Lw>0);
ind_rows3 = (flag_sunglint_Rrs>0);
% find if both rows have 1's in both flag columns
ind_rows_common = ind_rows2 & ind_rows3;
if all(ind_rows_common) % case 1 : both rows have 1's in both flag columns
% get corresponding values of column13
C13_data = array(ind_rows_common,13); % 13th column
% find which row has the lowest value in 13th col
[val,ind] = min(C13_data);
array = array(ind,:);
else % case 1 : we have only one flag == 1 => simply keep the data where one flag is == 1 (logical OR)
array = array(ind_rows2>0 | ind_rows3>0 ,:);
end
l(k) = size(array,1); % for debug only (shows how many of the 2 initial rows have been finally stored in the output)
out = [out; array]; % concat all valid data
end
% % Extract relevant data for plotting
% x = 320:2:950;
% y = dataArray(2:end, 13:328);
% L = y(1:2:end, :);
% R = y(2:2:end, :);
%
% % Plot the data
% subplot(2, 1, 1);
% plot(x, L);
% subplot(2, 1, 2);
% plot(x, R);
2 Comments
Mathieu NOE
on 19 Jul 2023
hello again
1) in fact I noticed that the code works fine too for this case as it will select the row whera we have a common 1's (case 2)
so for example (1 0) & (1 0) = (1 0) and we pick first row ,
same for (1 1) & (1 0) = (1 0)
or (1 0) & (1 1) = (1 0)
2) for the case "either flag=0 get rid of the row" there is nothing to do
my logic is to extract from the original array (dataArray) the valid data and store (concatenate) in array out
I am not deleting / removing rows in the original array (left untouched as you can check after running the code)
below I simply added a few more comments for case 2 so we keep track of that info, the code itself is the same as before :
dataArray = Lw;
dataArray = table2array(dataArray);
%add code which at each timepoint, pics the first or second row based
%on if it passes first the sunlight flags and then the sunglint flags
[m,n] = size(dataArray);
out = []; % init
for k = 1:floor(m/2) % loop over 2 lines (rows) of data
r = (1:2)+(k-1)*2; % current 2 rows index
array = dataArray(r,:);
% I need to at each time point pick one of these rows to use and omit the other based on the"flag" columns.
% I want to first apply the "flag light availability" (C7) and "flag
% light quality" (C8) so that only rows which pass both of these (has a 1 in both of these columns)
% are included.
flag_light_availability = array(:,7); % 7th column
flag_light_quality = array(:,8); % 8th column
ind_rows1 = (flag_light_availability>0) & (flag_light_quality>0); % NB I prefer >0 instead of == 1 , it's more robust to round off / precision error
% keep the valid rows
array = array(ind_rows1,:);
% Then I want to use the "flag sunglint (Lw)" (C10) and "flag sunglint (Rrs)" (C11) columns
% the row with a 1 in at least one flag column should be used, if both rows have 1's in both flag columns
% then the row with the lowest value in column 13 should be used.
% If both rows show all 0s then both rows should be excluded.
flag_sunglint_Lw = array(:,10); % 10th column
flag_sunglint_Rrs = array(:,11); % 11th column
ind_rows2 = (flag_sunglint_Lw>0);
ind_rows3 = (flag_sunglint_Rrs>0);
% find if both rows have 1's in both flag columns
ind_rows_common = ind_rows2 & ind_rows3;
if all(ind_rows_common) % case 1 : both rows have 1's in both flag columns
% get corresponding values of column13
C13_data = array(ind_rows_common,13); % 13th column
% find which row has the lowest value in 13th col
[val,ind] = min(C13_data);
array = array(ind,:);
else % case 2 : we have only one flag == 1 => simply keep the data where one flag is == 1 (logical OR)
% also if one row has both Lw and Rrs flag =1 and the second row has flags = 0 and 1, it will select only the common row with 1's.
array = array(ind_rows2>0 | ind_rows3>0 ,:);
end
% l(k) = size(array,1); % for debug only (shows how many of the 2 initial rows have been finally stored in the output)
out = [out; array]; % concat all valid data
end
More Answers (1)
dpb
on 17 Jul 2023
load Lw
%whos Lw
Lw=addvars(Lw,datetime(Lw{:,1:6}),'NewVariableNames',{'Time'},'After',{'Second'});
Lw=removevars(Lw,[1:6]);
% shorten variable names to not be so much typing, stuff taking up space...
Lw.Properties.VariableNames=strrep(Lw.Properties.VariableNames,'Flag','');
Lw.Properties.VariableNames=strrep(Lw.Properties.VariableNames,'Sung','G');
Lw.Properties.VariableNames([2:3 5:6])=extractBefore(Lw.Properties.VariableNames([2:3 5:6]),7);
Lw=addvars(Lw,Lw.LightA & Lw.LightQ,'NewVariableNames',{'LightOK'},'After','Time'); % both available and quality
Lw=addvars(Lw,Lw.GlintL | Lw.GlintR,'NewVariableNames',{'GlintOK'},'After','LightOK'); % at least one or other OK
isMin=cell2mat(rowfun(@(v)v==min(v),Lw,'groupingvariables','Time','InputVariables','VarName13', ...
'OutputVariableName',{'isMin'},'OutputFormat','cell'));
Lw=addvars(Lw,isMin,'NewVariableNames',{'isMin'},'After','LightOK');
head(Lw,20)
Now, the combination of
isBest=Lw.LightOK & Lw.GlintOK & Lw.isMin;
Those are the specific rows that match the conditions that both are ok as far as light, at least one is ok on the glint scale and if both are flagged on glint, then it deselects the one not the minimum V13 value.
2 Comments
dpb
on 19 Jul 2023
"I had to adjust the code so that it would work in my original for loop..."
That's part of the point was to illustrate vectorized use of MATLAB.
It wasn't/isn't clear to me precisely what the plots should be of; the thing where you introduce magic numbers to subscript is not anything that can be logically code; what's that all about and why those particular numbers? Must be some logic there, but it wasn't explained as to why/what it was to be able to do anything with/about...
"... I get the error "vectors must be the same length"."
There's an indexing issue with the selection of the arrays in that one goes from the first row while another starts at the second--that causes a mismatch in lengths. What was up with that? Although I noticed there was a negative value for one of the variables in that first row that maybe disqualified it? If that's the deal, that should also be handled logically.
You had a very good table to start with; use it or, if you don't want the table, then use readmatrix instead of readtable; switching back and forth just wastes code and perhaps memory; there's no real reason to use both storage formats for the same data/code.
"...and plot them on top of eachother so "hold on","
You can put hold first if you need to plot into the same axis sequentially, but plot is vectorized; it will plot multiple lines at one time; each column passed is considered a separate variable to be plotted against the corresponding x; the x values can be a vector while y is an array as long as the lengths are matching -- that's why you don't want to select different array sizes but use logic operations on the data. If some values aren't to be shown, then set them to NaN with the logical; they will be silently ignored by the plot routines.
Then, the whole thing is one call to plot for each file.
See Also
Categories
Find more on Dates and Time in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!