counting occurances of a specific character in a cell array

Assuming that these are amino acids/codons (3 uppercase letters), here are three "not-very-orthodox" solutions, just for fun. But keep in mind that with bioinformatics being a hot topic, there are quite a few very specialized libs out there (e.g. http://www.mathworks.com/help/bioinfo/functionlist.html) that would do the job in a much better fashion. You might also get a more orthodox version from someone else once you answer Walter's comment.

Assuming, for the example (but it works for any cell array of 3 uppercase letters codes):

 C = {'AAA','AAT','AAG','AAT','AGC','ACG'} ;
 n = numel(C) ;

1. Probably the most efficient of these non-orthodox solutions (~0.58s for processing 1 million codons on my poor laptop):

D = accumarray([[C{:}]-64; reshape([1;1;1]*(1:n), 1, [])].', 1, [26 n]) ;

2. Closely followed by a "sparse" version:

D = sparse([C{:}]-64, reshape([1;1;1]*(1:n), 1, []), ones(1,3*n), 26, n) ;

3. And finally a much less efficient cell2mat/cellfun:

 D = cell2mat(cellfun(@(code)accumarray(code.'-64, 1, [26,1]), C, ...  
                      'UniformOutput', false)) ;

They all three produce a 26 x #codes matrix whose columns are the distributions of the 26 letters of the alphabet for each code, with row index = letter ID, A=1,..,Z=26. (the sparse version produces a sparse matrix) :

 >> D
 D =
   2     2     2     1     1
   0     0     0     0     0
   0     0     0     1     1
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0
   0     1     0     1     1
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0
   1     0     1     0     0
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0
   0     0     0     0     0

Note that the 3rd version doesn't assume 3 letters codes and would work with arbitrary codes lengths. The first 2 versions could be adapted to have this flexibility.

Cheers,

Cedric

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

counting occurances of a specific character in a cell array

1 Comment
Show -1 older comments Hide -1 older comments

Answers (1)

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Tags

Community Treasure Hunt

counting occurances of a specific character in a cell array

1 Comment Show -1 older comments Hide -1 older comments

Answers (1)

0 Comments Show -2 older comments Hide -2 older comments

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

0 Comments
Show -2 older comments Hide -2 older comments