MATLAB Answers

How can I save the beginning and end positions of each sequence in a cell array?

2 views (last 30 days)
Austin Shipley
Austin Shipley on 26 Nov 2020
Answered: Rik on 29 Nov 2020
So I am looping through codons and recording them on a .txt file. The script works, but I need the sequence to begin at the starting codon position, stop at the end codon then continue through the cell array while recording all of the following start and end codon sequences. I would just like to know the best option I can use to tweak my code here. Thanks in advance!
fid = fopen("sequence_long2.txt",'r');
C = textscan(fid,'%3s');
x = C{1}
fclose(fid);
%Start sequence
ss = 1;
% end sequence
es = 183479;
seq_id = long_codon(x(ss:es));
function seq = long_codon(v)
seq = (v);
for pos = 1:length(seq)
if strcmp(seq{pos},'TAC')
index = find(strcmp(v,seq{pos}));
StartPos = index;
elseif (strcmp(seq{pos},'ACT') || strcmp(seq{pos},'ATT') || strcmp(seq{pos},'ATC'))
index = find(strcmp(v,seq{pos}));
EndPos = index;
end
end
fid2 = fopen('report_long.txt','w+');
fprintf(fid2,'Name: OP \n');
fprintf(fid2,'Lab 13: DNA Pattern Matching\n \n');
fprintf(fid2,'Start Position of Gene is: %d \n',StartPos);
fprintf(fid2, 'End Position of Gene is: %d \n',EndPos);
fclose(fid2);
end

  14 Comments

Austin Shipley
Austin Shipley on 28 Nov 2020
My primary issue right now is that the end codon it records is the beginning of where a start codon should begin 'TAC'. For example:
x_conv(74:279)
ans = 'TACTCTGTTGCAAC...............AAAGACTATTCTCTGCT' %the end element here is 'T'
If I index up to 281, you would see that the codon it is indexing the position of is 'TAC', which is obviously wrong.
It doesn't seem to be notice the elseif statement at all. I would need a way of stopping the loop after it locates 'TAC' then pass control to the second condition. I thought that maybe a break or continue in the loop would give me what I need, but that doesn't seem to be working. Presumably, I might need to nest another while loop with a different condition to be fulfilled, but I'm not completely sure how to implement that just yet.
Advice is greatly appreciated.
Rik
Rik on 28 Nov 2020
I would urge you to change to strfind first. Then you can loop through all start codons, removing later start codons if they are inside the gene being read.
Austin Shipley
Austin Shipley on 28 Nov 2020
So I have been trying to use strfind, but I am still having this issue where my end codon positions are not being recorded correctly. Do I need to nest another while loop or am I just not using strfind properly?
fid = fopen("sequence_long2.txt",'r');
C = textscan(fid,'%s');
x = C{1};
fclose(fid);
x_conv = char(x);
Start_loc = [];
End_loc = [];
flag = 0;
i = 1;
while i<(numel(x_conv)-2)
if (strcmp(x_conv(i+[0 1 2]),'TAC')) && flag == 0
Start_loc = strfind(x_conv,'TAC');
i = i + 3;
flag = flag + 1;
elseif ismember(x_conv(i+[0 1 2]),{'ATC','ACT','ATT'}) && flag == 1
End_loc = [End_loc i];
i = i + 3;
flag = flag - 1;
else
i = i+1;
end
end
fid2 = fopen('report_long.txt','w+');
fprintf(fid2,'Name: Austin \n');
fprintf(fid2,'Lab 13: DNA Pattern Matching\n \n');
fprintf(fid2,'Start Position of Gene is: %d End Position of Gene is: %d\n ',Start_loc,End_loc);
fclose(fid2);

Sign in to comment.

Accepted Answer

Rik
Rik on 29 Nov 2020
%Since your code is working fine you can keep it as is.
%I just used my own function to use your data.
x_conv=readfile('https://www.mathworks.com/matlabcentral/answers/uploaded_files/430218/sequence_long2.txt');
x_conv=x_conv{1};
%find all possible start codons and stop codons
Start_loc = strfind(x_conv,'TAC');
End_loc = cellfun(@(stopcodon)strfind(x_conv,stopcodon),{'ATC','ACT','ATT'},'UniformOutput',false);
End_loc = horzcat(End_loc{:});
n=0;
while n<numel(Start_loc)
n=n+1;
this_start=Start_loc(n);
%select all possible end codons
this_end=End_loc(End_loc>this_start);
%figure out which is the first end codon with an offset of 3
this_end=this_end(mod(this_end-this_start,3)==0);
this_end=this_end(1);
%now we need to remove elements in Start_loc that in the current gene
Start_loc(Start_loc>this_start & Start_loc<this_end)=[];
%store the end as well
End_loc(n)=this_end;
end
%remove extra values in End_loc
End_loc((n+1):end)=[];
genes=cell(size(End_loc));
for n=1:numel(End_loc)
genes{n}=x_conv(Start_loc(n):End_loc(n));
end

  0 Comments

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!