How can I save the beginning and end positions of each sequence in a cell array?

3 views (last 30 days)
So I am looping through codons and recording them on a .txt file. The script works, but I need the sequence to begin at the starting codon position, stop at the end codon then continue through the cell array while recording all of the following start and end codon sequences. I would just like to know the best option I can use to tweak my code here. Thanks in advance!
fid = fopen("sequence_long2.txt",'r');
C = textscan(fid,'%3s');
x = C{1}
fclose(fid);
%Start sequence
ss = 1;
% end sequence
es = 183479;
seq_id = long_codon(x(ss:es));
function seq = long_codon(v)
seq = (v);
for pos = 1:length(seq)
if strcmp(seq{pos},'TAC')
index = find(strcmp(v,seq{pos}));
StartPos = index;
elseif (strcmp(seq{pos},'ACT') || strcmp(seq{pos},'ATT') || strcmp(seq{pos},'ATC'))
index = find(strcmp(v,seq{pos}));
EndPos = index;
end
end
fid2 = fopen('report_long.txt','w+');
fprintf(fid2,'Name: OP \n');
fprintf(fid2,'Lab 13: DNA Pattern Matching\n \n');
fprintf(fid2,'Start Position of Gene is: %d \n',StartPos);
fprintf(fid2, 'End Position of Gene is: %d \n',EndPos);
fclose(fid2);
end
  14 Comments
Rik
Rik on 28 Nov 2020
I would urge you to change to strfind first. Then you can loop through all start codons, removing later start codons if they are inside the gene being read.
Austin Shipley
Austin Shipley on 28 Nov 2020
Edited: Austin Shipley on 28 Nov 2020
So I have been trying to use strfind, but I am still having this issue where my end codon positions are not being recorded correctly. Do I need to nest another while loop or am I just not using strfind properly?
fid = fopen("sequence_long2.txt",'r');
C = textscan(fid,'%s');
x = C{1};
fclose(fid);
x_conv = char(x);
Start_loc = [];
End_loc = [];
flag = 0;
i = 1;
while i<(numel(x_conv)-2)
if (strcmp(x_conv(i+[0 1 2]),'TAC')) && flag == 0
Start_loc = strfind(x_conv,'TAC');
i = i + 3;
flag = flag + 1;
elseif ismember(x_conv(i+[0 1 2]),{'ATC','ACT','ATT'}) && flag == 1
End_loc = [End_loc i];
i = i + 3;
flag = flag - 1;
else
i = i+1;
end
end
fid2 = fopen('report_long.txt','w+');
fprintf(fid2,'Name: Austin \n');
fprintf(fid2,'Lab 13: DNA Pattern Matching\n \n');
fprintf(fid2,'Start Position of Gene is: %d End Position of Gene is: %d\n ',Start_loc,End_loc);
fclose(fid2);

Sign in to comment.

Accepted Answer

Rik
Rik on 29 Nov 2020
%Since your code is working fine you can keep it as is.
%I just used my own function to use your data.
x_conv=readfile('https://www.mathworks.com/matlabcentral/answers/uploaded_files/430218/sequence_long2.txt');
x_conv=x_conv{1};
%find all possible start codons and stop codons
Start_loc = strfind(x_conv,'TAC');
End_loc = cellfun(@(stopcodon)strfind(x_conv,stopcodon),{'ATC','ACT','ATT'},'UniformOutput',false);
End_loc = horzcat(End_loc{:});
n=0;
while n<numel(Start_loc)
n=n+1;
this_start=Start_loc(n);
%select all possible end codons
this_end=End_loc(End_loc>this_start);
%figure out which is the first end codon with an offset of 3
this_end=this_end(mod(this_end-this_start,3)==0);
this_end=this_end(1);
%now we need to remove elements in Start_loc that in the current gene
Start_loc(Start_loc>this_start & Start_loc<this_end)=[];
%store the end as well
End_loc(n)=this_end;
end
%remove extra values in End_loc
End_loc((n+1):end)=[];
genes=cell(size(End_loc));
for n=1:numel(End_loc)
genes{n}=x_conv(Start_loc(n):End_loc(n));
end

More Answers (0)

Categories

Find more on Graph and Network Algorithms in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!