How can I find matching rows from three out of four colomns

I have a matrix with 4 columns, x y z and D, and a lot of rows. I want the program to find the rows where x, y and z are the same (so the first three colomns match) and add the corresponding D value of these matching rows. When the matching rows are found only one stays in the matrix with the new D value. To make it a little more complicated I like to attach a tolerance for what the matching values can diverge.
For example:
x y z D
1 2 3 4
2 4 6 8
1 2 3 5
New one:
x y z D
1 2 3 9
2 4 6 8

 Accepted Answer

This solution might not be as simple and elegant as the one Andrei Bobrov provided, but it allows for a tolerance level.
% Example data
A=[ 1 2 3 ; ...
2 4 6 ; ...
1.05 2 3 ; ...
1 52 3 ; ...
2 4 6 ; ...
1.05 2 3 ; ...
5 2 3 ; ...
1 52 3 ; ...
5 2 3 ; ...
];
D = [4; 8; 5; 1; 2; 3; 5; 7; 8];
% Vector to track which have been processed and how they are grouped
processed = zeros(size(A,1),1);
% Tolerance level
tol = .1;
row = 1;
while ~all(processed)
% Only check rows that havent been matched yet
if ~processed(row)
% Take out the row you want to compare to the others
curRow = A(row,:);
% Subtract that row from all rows in A
subtractedA = A - repmat(curRow,size(A-1,1), 1);
% Check to see if they are within the tolerance level
matchingRows = all((subtractedA <= tol & subtractedA >= -tol),2);
% Mark the rows as processed
processed(matchingRows) = row;
% Update A to the mean of all matches
A(matchingRows,:) = repmat(mean(A(matchingRows,:),1),sum(matchingRows),1);
% Sum the D values that matches
D(matchingRows) = sum(D(matchingRows));
end
% Check the next row
row = row + 1;
end
% Results
[A(unique(processed),:) D(unique(processed))]

4 Comments

Thanks! I have a little problem with the code and that is when I change all values last column of A to 3 so:
A=[ 1 2 3 ; ... 2 4 3; ... 1.05 2 3 ; ... 1 52 3 ; ... 2 4 3 ; ... 1.05 2 3 ; ... 5 2 3 ; ... 1 52 3 ; ... 5 2 3 ; ... ];
and I change the tolerance for example to 0.01 there is a change in the values in the last column altough they don't change so the average should be the same, 3.
Ah! That's because the mean was taken along columns instead of rows when there was no match. This is fixed by explicitly stating that the mean should be along rows. I've updated the code

Sign in to comment.

More Answers (1)

d = [1 2 3 4
2 4 6 8
1 2 3 5];
[a,b,c] = unique(d(:,1:end-1),'rows');
out = [a, accumarray(c,d(:,end))];
add
d0 =[[ 1.1 2 3.2
2 4 6
1.2 2 3.3
2.09 3.96 6.05
1.05 1.99 3.25],randi(25,5,1)];
eps1 = .1;
d = d0(:,1:end-1);
l = bsxfun(@(x,y)abs(x-y)<=eps1,permute(d,[1 3 2]),permute(d,[3 1 2]));
[ii,jj] = find(triu(all(l,3),1));
a = num2cell(unique([[ii;ii],[ii;jj]],'rows'),1);
out = [cell2mat(accumarray(a{:},[],@(x){mean(d0(x,1:end-1))})),...
accumarray(a{1},d0(a{2},end))];

2 Comments

This works.
But what if the values diverge a little so I can add a tolerance and take the average of the row:
d = [1.1 2 3.2 4; 2 4 6 8; 1.2 2 3.3 5]
and the tolerance is 0.1 so the outcome will be:
dout = [1.15 2 3.25 9; 2 4 6 8]

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!