VertCat unique rows of multiple tables.

19 views (last 30 days)
Hi All,
I've got myself in a bit of a muddle and am looking for some help.
I have a number of tables containing the same variables but different rows of data. Some of the data is duplicated between tables.
The first column contains filenames stored in a categorical array.
I would like to vertically concatenate the tables after filtering out duplicate rows of data defined by duplicate filenames.
Is there a way to check for duplicates within two separate category arrays, then use that as a logical mask to select the rows to append.
%% Example Table structure.
tAll = table( 'Size' ,[0,5],...
'VariableNames' ,{'FileName' ,'SampleNumber' ,'DataType' ,'Value' ,'cellData'},...
'VariableTypes' ,{'categorical' ,'uint8' ,'categorical' ,'double' ,'cell'});
So what I want to do is something like...
tAll = [ t1 ; t2(t2.FileName ~= categories(t1.FileName))] % this doesn't work because t1.FileName and t2.FileName are arrays
I understand Joining tables wouldn't be appropriate, as there is no relationship as such, they are all the same variables.
I have also tried...
tAll = union( t1 ; t2 ); % this give an error because one variable contains random numbers of cells.
% Error using tabular/union (line 42)
% Unable to group rows using unique values of the table variable 'cellData'. UNIQUE returned an error.
% Caused by:
% Error using matlab.internal.math.uniqueCellstrHelper
% Cell array input must be a cell array of character vectors.
TIA, MC.

Accepted Answer

Bruno Luong
Bruno Luong on 25 Apr 2022
% dummy test data
Filename1=["a"; "b"; "c"];
Data1=["a1"; "b1"; "c1"];
T1=table(Filename1,Data1,'VariableNames',{'Filename','Data'})
T1 = 3×2 table
Filename Data ________ ____ "a" "a1" "b" "b1" "c" "c1"
Filename2=["a"; "d"];
Data2=["a2"; "d2"];
T2=table(Filename2,Data2,'VariableNames',{'Filename','Data'})
T2 = 2×2 table
Filename Data ________ ____ "a" "a2" "d" "d2"
[~,i]=setdiff(T2.Filename, T1.Filename);
[T1; T2(i,:)]
ans = 4×2 table
Filename Data ________ ____ "a" "a1" "b" "b1" "c" "c1" "d" "d2"
  2 Comments
MC
MC on 25 Apr 2022
Edited: MC on 25 Apr 2022
Thanks, this put me on the right track!
I had to add some steps because setdiff() only returns the first unique value and my data contains many rows with the same filename.
% find new unique filenames to copy.
t2_uniqueFileNames = setdiff( categories(t2.Filename)), categories(t1.Filename));
% copy all rows with each unique filename.
tAll = t1;
for i = 1:length(t2_uniqueFileNames)
tAll = [tAll; t2(t2.Filename == t2_uniqueFileNames(i),:)]
end
I imagine there is a non-loop version but this is good for now.
Thanks again, MC.
Bruno Luong
Bruno Luong on 25 Apr 2022
You migh try
tAll = [T1; T2(~ismember(T2.Filename, T1.Filename),:)];

Sign in to comment.

More Answers (1)

Matt J
Matt J on 25 Apr 2022
Why not as follows?
tAll = unique( [t1 ; t2] )
  1 Comment
MC
MC on 25 Apr 2022
Thanks.
I tried this on my actual data but it gives the same error as the union() example.
Error using tabular/unique (line 39)
Unable to group rows using unique values of the table variable 'cellData'. UNIQUE returned an error.
Caused by:
Error using matlab.internal.math.uniqueCellstrHelper
Cell array input must be a cell array of character vectors.

Sign in to comment.

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!