how can i complement a DNA matrix using a binary vector?
2 views (last 30 days)
Show older comments
I have a DNA matrix, its length for example is m*4n.
for example:
B = 'GATT' 'AACT' 'ACAC' 'TTGA' 'GGCT'
'GCAC' 'TCAT' 'GTTC' 'GCCT' 'TTTA'
'AACG' 'GTTA' 'ACGT' 'CGTC' 'TGGA'
'CTAC' 'AAAA' 'GGGC' 'CCCT' 'TCGT'
'GTGT' 'GCGG' 'GTTT' 'TTGC' 'ATTA'
i have also a vector of real numbers X = {xi, i = 1..m*4n}.
Taking mod(X,1) to keep the real numbers in the range [o,1] .
the output will be like X = [0.223 0.33 0.71 0.44 0.91 0.32 0.11 ....... m*4n];
then need to transform the obtained result into a binary vector by applying the
f(x)={0 ,0 < X(i,j) ≤ 0.5; 1 ,0.5 < X(i,j) ≤ 1;)
the output according the previous values will be like X = [0010100 ....]
if X(i,j)=1, then A(i,j) is complemented otherwise it is unchanged.
i tried to code this part as following but it didn't work:
%%maping X chaotic sequence from real numbers to binary sequence using threshold function
X = v(:,3);
X(257)=[];
disp (X);
mode (X,1);
for i=1
for j=1:256
if ((X(i,j)> 0) && (X(i,j)<= .5))
X(i,j) = 0;
elseif ((X(i,j)> .5) && (X(i,j)<= 1))
X(i,j) = 1;
end
end
end
disp(X);
and suppose i can get the binary vector ,how to complement the DNa matrix A using the sequence X ??
%%P.S. the complement of A - T, T - A, C - G, G - C
To be more specific i need the following:
1- Apply mode (X,1) on the vector to get the values in tha rang of 0,1.
2- Mapping the real number vector into a binary vector by applying this function f(x)={0 ,0 < X(i,j) ≤ 0.5; 1 ,0.5 < X(i,j) ≤ 1;).
3- Using this binary vector X(i,j) to complement the DNA matrix A(i,j) by applying the condition, if X(i,j)=1 then A(i,j) is complemented , otherwise it is unchanged.
0 Comments
Accepted Answer
John BG
on 22 Dec 2016
Does the following help?
in this example m=5
A0='ACGT'
B='0000'
m=5
for k=1:1:m
B=[B;A0(randi(4,1,4))]
end
B(1,:)=[]
X=rand(1,5)
fX=round(X)
for k=1:1:m
if fX(k)==1
L=B(k,:)
L=strrep(L,'A','0'); L=strrep(L,'T','A'); L=strrep(L,'0','T'); % A - T swap
L=strrep(L,'C','0'); L=strrep(L,'G','C'); L=strrep(L,'0','G'); % C - G swap
end
B(k,:)=L
end
2 Comments
John BG
on 22 Dec 2016
Sure
please comment and correction because I don't have a clue about genetics:
1. the key string containing one of each genetic symbols, it's not a gene coding, just a way to put together the alphabet of 4 characters:
A0='ACGT'
2. initialise result variable
B='0000'
3. m is the amount of genes, in your example you have shown 25, correct me if wrong, humans have 23.
m=5
4. generating 5 random genes, this is just to test the next step, the answer to your question works.
Replace this random generation with whatever sequence you want to process.
For instance you can have the input sequence in a text file. Do you know how to use the command textscan to load text files into MATLAB variables? I can show you how if you don't
for k=1:1:m
B=[B;A0(randi(4,1,4))]
end
B(1,:)=[]
5. generating another random X sequence, for test purposes, only, replace X with your X sequence
X=rand(1,5)
6. The MATLAB command round() does precisely the 'polarising' you requested:
X(i)<0.5 then X(i)=0, else X(i)>=0.5 then X(i)=1
This can be modified if you want to for instance
X(i)<=0.5 then 0 if X(i)>0. then X(i)=1
fX=round(X)
7. reversing the sequence B according to X
for k=1:1:m
if fX(k)==1
L=B(k,:)
L=strrep(L,'A','0'); L=strrep(L,'T','A'); L=strrep(L,'0','T'); % A - T swap
L=strrep(L,'C','0'); L=strrep(L,'G','C'); L=strrep(L,'0','G'); % C - G swap
end
B(k,:)=L
end
would it be possible for you to click on the ACCEPT ANSWER so I can get the points?
If there is any further steps you would like to develop before accepting my answer please ask and I will do my best.
Appreciating time and attention, awaiting answer
John BG
More Answers (2)
James Tursa
on 21 Dec 2016
Edited: James Tursa
on 21 Dec 2016
Not quite sure I fully understand, but maybe something like this?
mask = mod(X,1) > 0.5; % logical indexes of the characters to flip
Bmask = B(mask); % get the characters to flip
Bmask(Bmask=='T') = 'a'; % flip T to a
Bmask(Bmask=='A') = 't'; % flip A to t
Bmask(Bmask=='C') = 'g'; % flip C to g
Bmask(Bmask=='G') = 'c'; % flip G to c
B(mask) = upper(Bmask); % replace the original characters with their flipped versions
or using ismember:
mask = mod(X,1) > 0.5; % logical indexes of the characters to flip
Bmask = B(mask); % get the characters to flip
[~,loc] = ismember(Bmask,'ATCG'); % identify the characters to flip
S = 'TAGC'; % the flipped reference string
B(mask) = S(loc); % replace the masked characters with their flipped versions
David Barry
on 21 Dec 2016
X = [0.223 0.33 0.71 0.44 0.91 0.32 0.11];
X(X<= 0.5 & X >0) = 0;
X(X>0.5 & X<=1) = 1;
See Also
Categories
Find more on Genomics and Next Generation Sequencing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!