How to remove table data based on comparison between different columns in another table in MATLAB?

I have 2 tables in MATLAB- table A and table B, each having different dimensions (different no. of rows and columns). The first column of table A has the Date and time in format like 2018-11-01 12:00:00 (DateTime data format).
Now, in Table B, the Third and Fourth Column also consist of Date and Time in format like 2018-11-01 01:11:12:173000. What I would like to achieve is to remove all the rows (which are the data instances) from Table A, in case of which the Datetime for Table A falls in the range between the Date and Time in Table B. (To be more precise, suppose Table B has an entry of DateTime in third column for the first row/first data instance as 2018-11-10 12:30:00:173 and in fourth column as 2018-11-10 12:40:00:145, I would like to remove all data entries/rows from Table A, in case of which the DateTime Column value for Table A falls in the range of 2018-11-10 12:30:00:173 to 2018-11-10 12:40:00:145, as an example). This means that basically I would be removing the data in the aforesaid range from the Table A.
To approach this, the first thing which comes to my mind is to use inner join(), but, it is evident from Mathworks Community guidance that innerjoin() only matches the exact column value which I specify the Key as, but in this case, I would be looking at a range of DateTime values in 2 columns of table B, so perhaps this might not be the best approach. Using a for loop for this purpose might as well work, but would be quite complex and redundant with huge computational time on the large data in the tables. Any help in this regard would be highly appreciated.

 Accepted Answer

It's fairly simple to do. Annoyingly implicit expansion and bsxfun don't work with datetime so you have to use repmat to do the comparison.
%inputs: TableA with a column named date, TableB with a column named datestart and dateend
%replace by actual table and variable names.
datetocheck = repmat(TableA.date, 1, height(TableB)); %replicate in as many columns as there are rows in B
datestart = repmat(TableB.datestart', height(TableA), 1); %tranpose and replicate in as many rows as in A
dateend = repmat(TableB.dateend', height(TableA), 1);
toremove = any(datetocheck >= datestart & datetocheck <= dateend, 2);
TableA(toremove, :) = [];
If implicit expansion was implemented for datetime, you could have replaced the first 4 lines by:
toremove = any(TableA.date >= TableB.datestart' & TableA.date <= TableB.dateend', 2);

11 Comments

Thanks for the answer. But, just to confirm, does this also take into account the date and month and year while removing the data from table A? I mean, I would only like to remove data in case of which the TimeStamp( including the day, hour, minute, year etc. all fall within the interval specified by Table B's Datestart and Dateend.I don't want to remove data only based on the Time, but the entire range of timestamp from Table B's datestart to dateend. Could you please confirm if this accomplishes the same?
Yes, it uses datetime comparisons which takes everything into account, from year to fraction of a second. In fact, it even takes into account timezones correctly, if the dates are in different time zones.
However, note that if the intervals in B are disjoint, then Peter Perkins' answer would be more efficient than mine.
I appreciate your answer and it works well for me. Just to add one more query, Table B also consists of a column named X suppose with string type values. When I am removing the data from Table A (in which I have created an empty column X as well), if I would like to also copy the corresponding value in Table B's X Column into Table A's X Column (according to the timestamps, please note table A only has one timsstamp, while Table B has start and end timestamp as you have rightly infered in your answer above), then could you suggest if I can proceed with this code and then add a if condition for this or would the any() function used above can make it simpler. I would really appreciate it if you can provide a sample piece of code like you did above as it would be helpful to understand.
I'm a bit confused. Since the rows of tableA that match the condition are deleted, where would you copy that X column?
In addition, even if we had somewhere to copy it, is it guaranteed that a row of tableA would match at most one row of tableB. If not, we'd have multiple X to copy into that somewhere.
Sorry for the confusion in my previous query. I would try to be more precise:-
So, I have the 2 tables A and B. The entries from Table A corresponding to start and end times in Table B which match the Timestamp in Table A have already been deleted at this stage. Kindly note that Table A now contains all values except deleted values. I have now created a Table C which consists of only the deleted values. Now, what I would like to do is that there is a column in Table B with some labels, suppose let that column be column X. I have created a new column in Table C in which I would like to copy the corresponding label in Table B's column X matching the timestamp. As the table C now only contains the deleted values with the matching condition, I believe that it should be possible to copy the labels into Table C's X Column. Please note that a row in table A is guaranteed to match at most one row in table B.
This query is a bit complex and I myself was confused for days pondering over it. I hope it makes sense and would appreciate if you could help me with the right direction.
You could use more or less the same method:
datetocheck = repmat(TableA.date, 1, height(TableB)); %replicate in as many columns as there are rows in B
datestart = repmat(TableB.datestart', height(TableA), 1); %tranpose and replicate in as many rows as in A
dateend = repmat(TableB.dateend', height(TableA), 1);
[matchinA, whereinB] = find(TableA.date >= TableB.datestart' & TableA.date <= TableB.dateend');
tableC = [tableA(matchinA, :), tableB.columntocopy(whereinB)];
tableA(matchinA, :) = [];
Thanks a lot for the answer. Just to clarify, here, columntocopy in the syntax is Table B's X column right? And matchinA and whereinB are the indices of the table found using find().
So, columntocopy is to be replaced by the name of Table B's X Column and I don't need to make any further changes (as we are already appending the value of Table A and the labels column value (Table B's X Column values) into Table C's X Column. So, we are assuming column X in table C is the last column of table C, right? Sorry for asking but wanted to clarify my confusion.
Thanks. But, the code isn't working for the last comment and I get the error "All input arguments must be tables", whereas definitely Table A and B are of course tables and show as tables only in the workspace. I have replaced "columntocopy" with the name of the Column with Labels in Table B i.e Column X of Table B and kept rest of the syntax unchanged. Your previous code for removing the data was working perfectly, but this isn't working for me. Can you kindly tell where the issue might be?
datetocheck = repmat(TableA.date, 1, height(TableB)); %replicate in as many columns as there are rows in B
datestart = repmat(TableB.datestart', height(TableA), 1); %tranpose and replicate in as many rows as in A
dateend = repmat(TableB.dateend', height(TableA), 1);
[matchinA, whereinB] = find(TableA.date >= TableB.datestart' & TableA.date <= TableB.dateend');
tableC = [tableA(matchinA, :), tableB.columntocopy(whereinB)];
tableA(matchinA, :) = [];
In the code, till dateend, everything executes fine. but issue occurs when the matchinA line in code is run, showing error "All input arguments must be tables", but wierdly, table A and table B are tables (that is why it worked perfectly when I removed the rows bsed on your first time answer). Could you please guide me as to why this is happening, has it got anything to do with find() function or is it some other problem? Thanks again!
Yes, I made a silly mistake
tableC = [tableA(matchinA, :), tableB(whereinB, 'columntocopy')];
or
tableC = tableA(matchinA, :);
tableC.columntocopy = tableB.columntocopy(whereinB);

Sign in to comment.

More Answers (1)

From your description it sounds like you want to remove rows from A whose timestamp falls in any of the time intervals defined by any of the timestamp pairs in B. As opposed to removing rows from A whose timestamp falls in the time interval defined by the timestamp pair in the corresponding row of B. Right?
Is there any structure to the intervals defined in B? Are they disjoint? Are they in some order?
Assuming they are disjoint and in increasing order, interleave the two vectors of interval endpoints in B. This defines edges for time bins, and anything in A that falls into an even bin you throw out, in an odd bin you keep. I think that's consistent with what you described.
Use discretize on A's timestamps to determine which bins each row of A falls in. Then use logical subscripting to remove the rows that fall in even bins.
If the intervals defined by B are not disjoint, you'll have to do something to merge overlapping intervals. Perhaps sort by the left endpoints, then look to see whose right endpoint is larger than the succeeding left endpoint.

Products

Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!