Find matching rows of cell arrays containing strings

Hi,
I am working with a pair of cell arrays, containing strings. The cell arrays are of the same size, and they contain the same rows, but in different order. E.g.:
A = {'ABC' '123'; 'A' '100'; 'C' '0'};
B = {'A' '100'; 'C' '0'; 'ABC' '123'};
It will always be the case that the rows are simply ordered differently. What I am trying to accomplish is to find this ordering. That is, row 1 in A becomes row 3 in B, row 2 in A becomes row 1 in B, and row 3 in A becomes row 2 in B. So I need some sort of output like:
output = [3 2 1]
or something containing the same kind of information. I just need to know how the rows are permuted. I've tried using cellfun with ismember and so on, but I can't seem to make that handle rows, as opposed to cell by cell. I've also tried converting A and B into arrays numbers (because it's my impression that it's easier to deal with numbers than strings), but then I run into the problem that not all the cells have the same size, so there's a concatenating error.
In short, I've tried a bunch of different cellfun's with strfind, ismember, etc., but I haven't found a method that works for rows specifically.
Any help is greatly appreciated!

 Accepted Answer

B=circshift(A,[-1 0])
Edit
A = {'ABC' '123'; 'A' '100'; 'C' '0'};
B = {'A' '100'; 'C' '0'; 'ABC' '123'}
[n,m]=size(A);
[~,idx]=ismember(A(:),B(:));
[ii,jj]=ind2sub([n m],idx);
output=ii(1:n)';

9 Comments

Hi Azzi,
The particular example was pretty simple. In general A and B won't be cyclic permutations of each other.
Yeah, but the question isn't about finding B from A; it's about finding that vector, [3 1 2] (this can generally be more complicated), using only A and B.
n=size(A,1);
A1=arrayfun(@(x) strjoin(A(x,:),'_'),(1:n)','un',0)
m=size(B,1)
B1=arrayfun(@(x) strjoin(B(x,:),'_'),(1:m)','un',0)
[~,idx]=ismember(A1,B1)
Hi Azzi,
That's interesting, strjoin was the method I used earlier (I guess I should have mentioned that), but the problem with that is that it quite slow. That is, it spends a lot of time (at least on my computer) just joining strings together. Although it works, I was hoping there was a faster solution.
Many thanks for the help with this!
Alternatively, if I took a single row from A (and then looping through the rows) and compared it with B, would that work? That is, say size(A)=[1 4], size(B)=[100 4], could I somehow get a 100 by 4 array of ligicals, indicating whether each of the four elements of A matches the elements of each row in B?
Maybe that was unclear, but I would ideally like to avoid strjoin, as it's been slowing down my code so far.
Try this, it's much faster
A = {'ABC' '123'; 'A' '100'; 'C' '0'};
B = {'A' '100'; 'C' '0'; 'ABC' '123'}
[n,m]=size(A);
[~,idx]=ismember(A(:),B(:));
[ii,jj]=ind2sub([n m],idx);
output=ii(1:n)';
Test the speed
A = {'ABC' '123'; 'A' '100'; 'C' '0'};
B=A([3 1 2],:)
nn=5000
A=repmat(A,nn,1);
B=repmat(B,nn,1);
tic
n=size(A,1);
A1=arrayfun(@(x) strjoin(A(x,:),'_'),(1:n)','un',0);
m=size(B,1);
B1=arrayfun(@(x) strjoin(B(x,:),'_'),(1:m)','un',0);
[~,output1]=ismember(A1,B1);
t1=toc
tic
[n,m]=size(A);
[~,idx]=ismember(A(:),B(:));
[ii,jj]=ind2sub([n m],idx);
output=ii(1:n);
t2=toc
isequal(output1,output)
Result
Elapsed time is 9.558223 seconds.
Elapsed time is 0.034396 seconds.
Hi Azzi,
That's really impressive! I'll try that out in my code and see how it goes! Based on your tests, it looks extremely promising! Many thanks for the help! :)

Sign in to comment.

More Answers (0)

Categories

Products

Asked:

on 12 Jul 2015

Commented:

on 12 Jul 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!