extracting subsequences of binary string

as would be the code for the following string have the next subsequences ?
STRING
1(1), 0(2), 1(3), 1(4), 0(5), 0(6), 1(7), 0(8), 0(9), 1(10), 1(11), 1(12), 1(13), 0(14), 0(15), 0(16), 1(17), 1(18), 1(19), 0(20)
SUBSEQUENCES
01: 1(01), 0(02), 1(03), 1(04) -> [1,0,1,1],
02: 1(01), 1(03), 0(05), 1(07) -> [1,1,0,1],
03: 1(01), 1(04), 1(07), 1(10) -> [1,1,1,1],
04: 1(01), 0(05), 0(09), 1(13) -> [1,0,0,1],
05: 1(01), 0(06), 1(11), 0(16) -> [1,0,1,0],
06: 1(01), 1(07), 1(13), 1(19) -> [1,1,1,1],
07: 0(02), 1(03), 1(04), 0(05) -> [0,1,1,0],
08: 0(02), 1(04), 0(06), 0(08) -> [0,1,0,0],
09: 0(02), 0(05), 0(08), 1(11) -> [0,0,0,1],
10: 0(02), 0(06), 1(10), 0(14) -> [0,0,1,0],
11: 0(02), 1(07), 1(12), 1(17) -> [0,1,1,1],
12: 0(02), 0(08), 0(14), 0(20) -> [0,0,0,0],
13: 1(03), 1(04), 0(05), 0(06) -> [1,1,0,0],
14: 1(03), 0(05), 1(07), 0(09) -> [1,0,1,0],
15: 1(03), 0(06), 0(09), 1(12) -> [1,0,0,1],
16: 1(03), 1(07), 1(11), 0(15) -> [1,1,1,0],
17: 1(03), 0(08), 1(13), 1(18) -> [1,0,1,1],
18: 1(04), 0(05), 0(06), 1(07) -> [1,0,0,1],
19: 1(04), 0(06), 0(08), 1(10) -> [1,0,0,1],
20: 1(04), 1(07), 1(10), 1(13) -> [1,1,1,1],
21: 1(04), 0(08), 1(12), 0(16) -> [1,0,1,0],
22: 1(04), 0(09), 0(14), 1(19) -> [1,0,0,1],
23: 0(05), 0(06), 1(07), 0(08) -> [0,0,1,0],
24: 0(05), 1(07), 0(09), 1(11) -> [0,1,0,1],
25: 0(05), 0(08), 1(11), 0(14) -> [0,0,1,0],
26: 0(05), 0(09), 1(13), 1(17) -> [0,0,1,1],
27: 0(05), 1(10), 0(15), 0(20) -> [0,1,0,0],
28: 0(06), 1(07), 0(08), 0(09) -> [0,1,0,0],
29: 0(06), 0(08), 1(10), 1(12) -> [0,0,1,1],
30: 0(06), 0(09), 1(12), 0(15) -> [0,0,1,0],
31: 0(06), 1(10), 0(14), 1(18) -> [0,1,0,1],
32: 1(07), 0(08), 0(09), 1(10) -> [1,0,0,1],
33: 1(07), 0(09), 1(11), 1(13) -> [1,0,1,1],
34: 1(07), 1(10), 1(13), 0(16) -> [1,1,1,0],
35: 1(07), 1(11), 0(15), 1(19) -> [1,1,0,1],
36: 0(08), 0(09), 1(10), 1(11) -> [0,0,1,1],
37: 0(08), 1(10), 1(12), 0(14) -> [0,1,1,0],
38: 0(08), 1(11), 0(14), 1(17) -> [0,1,0,1],
39: 0(08), 1(12), 0(16), 0(20) -> [0,1,0,0],
40: 0(09), 1(10), 1(11), 1(12) -> [0,1,1,1],
41: 0(09), 1(11), 1(13), 0(15) -> [0,1,1,0],
42: 0(09), 1(12), 0(15), 1(18) -> [0,1,0,1],
43: 1(10), 1(11), 1(12), 1(13) -> [1,1,1,1],
44: 1(10), 1(12), 0(14), 0(16) -> [1,1,0,0],
45: 1(10), 1(13), 0(16), 1(19) -> [1,1,0,1],
46: 1(11), 1(12), 1(13), 0(14) -> [1,1,1,0],
47: 1(11), 1(13), 0(15), 1(17) -> [1,1,0,1],
48: 1(11), 0(14), 1(17), 0(20) -> [1,0,1,0],
49: 1(12), 1(13), 0(14), 0(15) -> [1,1,0,0],
50: 1(12), 0(14), 0(16), 1(18) -> [1,0,0,1],
51: 1(13), 0(14), 0(15), 0(16) -> [1,0,0,0],
52: 1(13), 0(15), 1(17), 1(19) -> [1,0,1,1],
53: 0(14), 0(15), 0(16), 1(17) -> [0,0,0,1],
54: 0(14), 0(16), 1(18), 0(20) -> [0,0,1,0],
55: 0(15), 0(16), 1(17), 1(18) -> [0,0,1,1],
56: 0(16), 1(17), 1(18), 1(19) -> [0,1,1,1],
57: 1(17), 1(18), 1(19), 0(20) -> [1,1,1,0],

 Accepted Answer

Andrei Bobrov
Andrei Bobrov on 21 Aug 2013
Edited: Andrei Bobrov on 21 Aug 2013
N = 20;
n = 4;
A = hankel(1:N-n+1,N-n+1:N);
k = 0:n-1;
idx = [];
for ii = 1:size(A,1)
p = A(ii,:);
while p(end,end) + k(end) <= N
p = [p;p(end,:)+k];
end
idx=[idx;p];
end
or
N = 20;
n = 4;
A = hankel(1:N-n+1,N-n+1:N);
k = 0:n-1;
c = ceil((N - A(:,end) + 1)/k(end));
i2 = cumsum(c);
i1 = i2 - c + 1;
idx = zeros(i2(end),n);
for jj = 1:N-n+1
idx(i1(jj):i2(jj),:) = bsxfun(@plus,A(jj,:),(0:c(jj)-1)'*k);
end
ADD
s = [1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0];
[j1,j2,j2] = unique(s(idx),'rows')
out = [j1, histc(j2,1:max(j2))/i2(end)]; % This row corrected

8 Comments

thank you very much, that command should now be used to calculate the number of times to repeat each subsequence? is to calculate the probability by dividing the number of occurrences of that subsequence by the total number of subsequences. But I'm not sure which command used to count the number of occurrences of each subsequence
see ADD part in my answer
I get the following error:
Error using horzcat CAT arguments dimensions are not consistent.
Do not find me subsequences. j2 be the number of occurrences of each subsequence?
Op! My typo. Corrected.
N = 20;
n = 4;
A = hankel(1:N-n+1,N-n+1:N);
k = 0:n-1;
idx = [];
for ii = 1:size(A,1) p = A(ii,:); while p(end,end) + k(end) <= N
p = [p;p(end,:)+k];
end
idx=[idx;p];
end
s = [1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0];
[j1,j2,j2] = unique(s(idx),'rows')
out = [j1, histc(j2,1:max(i2))];
This makes me:
Undefined function or variable 'i2'.
and interpret the outputs? are the number of occurrences of each subsequence?
many thanks
sorry, I have not understood the code. This it does is calculate the number of times to repeat each subsequence?. It calculates the sub but if calculated occurrences each subsequence?. it?
Again correct last row in my code.

Sign in to comment.

More Answers (2)

n = 20;
d = 4;
c = zeros(sum([1,floor((d:n-1)/(d-1))]),d); % Allocate space for c
j = 0;
for k = 1:n-d+1
r = 1;
while k+r*(d-1) <= n
j = j+1;
c(j,:) = k:r:k+r*(d-1);
r = r+1;
end
end
The c array will be a 57 x 4 matrix of subsequence indices taken from 1:20.
c =
1 2 3 4
1 3 5 7
1 4 7 10
.....
17 18 19 20
If you replace the line "c(j,:) = k:r:k+r*(d-1);" by
c(j,:) = s(k:r:k+r*(d-1));
where s is your string, this will generate the subsequence of binary strings you are (apparently) asking for.

3 Comments

I have modified the above code so as to allocate the proper size for the c array.
thank you very much, that command should now be used to calculate the number of times to repeat each subsequence? is to calculate the probability by dividing the number of occurrences of that subsequence by the total number of subsequences. But I'm not sure which command used to count the number of occurrences of each subsequence
One question, as I can do with structure for you automatically calculate subsequences of length 4-20? ie, d = 4:20 but applying for so I said why not have the same dimension:
if true
% code
for d=4:20
c(d)=zeros(sum([1,floor((d:n-1)/(d-1))]),d);
j=0;
for k=1:n-d+1
r=1;
while k+r*(d-1)<=n
j=j+1;
c(j,:)=s(k:r:k+r*(d-1));% s es la cadena binaria / me da las subsecuencias
r=r+1;
end
end
end
end

Sign in to comment.

Here is a slightly shorter version:
n = 20;
d = 4;
f2 = cumsum([0,floor((n-1:-1:d-1)/(d-1))]);
f1 = f2(1:end-1)+1;
f2 = f2(2:end);
c = repmat(0:d-1,f2(end),1);
for k = 1:length(f1)
c(f1(k),:) = c(f1(k),:) + k;
c(f1(k):f2(k),:) = cumsum(c(f1(k):f2(k),:),1);
end

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!