fastest way of amino acid composition feature extruction using matab? my codes are working fine but need a simplify it further

5 views (last 30 days)
% TRAIN TG dataset feature extruction
%% Import the data [~, ~, raw0_0] = xlsread('C:\Users\Amindra\Desktop\EE361\TG dataset-20170922\Train1,taguchi.xlsx','Sheet1','A1:A978'); [~, ~, raw0_1] = xlsread('C:\Users\Amindra\Desktop\EE361\TG dataset-20170922\Train1,taguchi.xlsx','Sheet1','D1:D978'); raw = [raw0_0,raw0_1]; raw(cellfun(@(x) ~isempty(x) && isnumeric(x) && isnan(x),raw)) = {''}; cellVectors = raw(:,[1,2]);
%% Create table Train1taguchi = table;
%% Allocate imported array to column variable names FOLDS = cellVectors(:,1); sequence = cellVectors(:,2);
fprintf('@RELATION TESTtg\n'); fprintf('@ATTRIBUTE one NUMERIC\n'); fprintf('@ATTRIBUTE two NUMERIC\n'); fprintf('@ATTRIBUTE three NUMERIC\n'); fprintf('@ATTRIBUTE four NUMERIC\n'); fprintf('@ATTRIBUTE five NUMERIC\n'); fprintf('@ATTRIBUTE six NUMERIC\n'); fprintf('@ATTRIBUTE seven NUMERIC\n'); fprintf('@ATTRIBUTE eight NUMERIC\n'); fprintf('@ATTRIBUTE nine NUMERIC\n'); fprintf('@ATTRIBUTE ten NUMERIC\n'); fprintf('@ATTRIBUTE eleven NUMERIC\n'); fprintf('@ATTRIBUTE twelve NUMERIC\n'); fprintf('@ATTRIBUTE thirteen NUMERIC\n'); fprintf('@ATTRIBUTE fourteen NUMERIC\n'); fprintf('@ATTRIBUTE fifteen NUMERIC\n'); fprintf('@ATTRIBUTE sixteen NUMERIC\n'); fprintf('@ATTRIBUTE seventeen NUMERIC\n'); fprintf('@ATTRIBUTE eighteen NUMERIC\n'); fprintf('@ATTRIBUTE nineteen NUMERIC\n'); fprintf('@ATTRIBUTE twenty NUMERIC\n'); fprintf('@ATTRIBUTE class {fold1,fold2,fold3,fold4,fold5,fold6,fold7,fold8,fold9,fold10,fold11,fold12,fold13,fold14,fold15,fold16,fold17,fold18,fold19,fold20,fold21,fold22,fold23,fold24,fold25,fold26,fold27,fold28,fold29,fold30}\n'); fprintf('@DATA\n');
for i=1:978 %in this case it is 978 protein sequence FOLDS = cellVectors(i,1); fold=char(FOLDS); sequence = cellVectors(i,2); % call each row of the table seq=char(sequence);% NOTE convert each row of the table to each CHAR %AA=aa2int(seq) AA = aacount(seq); % % count the ALL the # of AA(AMINO ACID)'s in the protein sequence A=AA.A;% count specifically the # of A R=AA.R;% count specifically the # of R N=AA.N;% count specifically the # of N D=AA.D;% count specifically the # of D C=AA.C;% count specifically the # of C Q=AA.Q;% count specifically the # of Q E=AA.E;% count specifically the # of E G=AA.G;% count specifically the # of G H=AA.H;% count specifically the # of H I=AA.I;% count specifically the # of I L=AA.L;% count specifically the # of L's in the protein sequence K=AA.K;% count specifically the # of K's in the protein sequence M=AA.M;% count specifically the # of M's in the protein sequence F=AA.F;% count specifically the # of F's in the protein sequence P=AA.P;% count specifically the # of P's in the protein sequence S=AA.S;% count specifically the # of S's in the protein sequence T=AA.T;% count specifically the # of T's in the protein sequence W=AA.W;% count specifically the # of W's in the protein sequence Y=AA.Y;% count specifically the # of Y's in the protein sequence V=AA.V;% countspecifically the # of V's in the protein sequence lenght = (A+R+N+D+C+Q+E+G+H+I+L+K+M+F+P+S+T+W+Y+V);% length of the protein sequence %fprintf('\nlenght of PROTEIN SEQUENCE = %d\n',lenght) % disply to USER the length of protein sequence
%% FEATURE EXTRACTION f1=(A/lenght); %fprintf('feature A = %d\n',f1) % feature for amino acid A SHIFTED 2 DECIMAL PLACE f2=(R/lenght); %fprintf('feature I = %d\n',f2)% feature for amino acid I SHIFTED 2 DECIMAL PLACE f3=(N/lenght); %fprintf('feature L = %d\n',f3)% feature for amino acid L SHIFTED 2 DECIMAL PLACE f4=(D/lenght); %fprintf('feature M = %d\n',f4)% feature for amino acid M SHIFTED 2 DECIMAL PLACE f5=(C/lenght); %fprintf('feature F = %d\n',f5)% feature for amino acid F SHIFTED 2 DECIMAL PLACE f6=(Q/lenght); %fprintf('feature V = %d\n',f6)% feature for amino acid V SHIFTED 2 DECIMAL PLACE f7=(E/lenght); %fprintf('feature P = %d\n',f7)% feature for amino acid P SHIFTED 2 DECIMAL PLACE f8=(G/lenght); %fprintf('feature G = %d\n',f8)% feature for amino acid G SHIFTED 2 DECIMAL PLACE K+M+F+P+S+T+W+Y+V f9=(H/lenght); %fprintf('feature R = %d\n',f9)% feature for amino acid R SHIFTED 2 DECIMAL PLACE f10=(I/lenght); %fprintf('feature K = %d\n',f10)% feature for amino acid K SHIFTED 2 DECIMAL PLACE f11=(L/lenght); %fprintf('feature D = %d\n',f11)% feature for amino acid D SHIFTED 2 DECIMAL PLACE f12=(K/lenght); %fprintf('feature E = %d\n',f12)% feature for amino acid E SHIFTED 2 DECIMAL PLACE f13=(M/lenght); %fprintf('feature Q = %d\n',f13)% feature for amino acid Q SHIFTED 2 DECIMAL PLACE f14=(F/lenght); %fprintf('feature N = %d\n',f14)% feature for amino acid N SHIFTED 2 DECIMAL PLACE f15=(P/lenght); %fprintf('feature H = %d\n',f15)% feature for amino acid H SHIFTED 2 DECIMAL PLACE f16=(S/lenght); %fprintf('feature S = %d\n',f16)% feature for amino acid S SHIFTED 2 DECIMAL PLACE f17=(T/lenght); %fprintf('feature T = %d\n',f17)% feature for amino acid T SHIFTED 2 DECIMAL PLACE f18=(W/lenght); %fprintf('feature Y = %d\n',f18)% feature for amino acid Y SHIFTED 2 DECIMAL PLACE f19=(Y/lenght); %fprintf('feature C = %d\n',f19)% feature for amino acid C SHIFTED 2 DECIMAL PLACE f20=(V/lenght);
fprintf('%f,',f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20) fprintf('%s',fold) fprintf('\n%d') end
  3 Comments
ASMBHAYA NAND
ASMBHAYA NAND on 12 Aug 2018
NOTE: IF you directory for the datasets is not right the codes will not run(READ MY USER MANUAL for further clarifications)

Sign in to comment.

Answers (1)

Luuk van Oosten
Luuk van Oosten on 20 Dec 2018
Although your question is poorly formulated, and I agree with Image Analyst about the formatting of your code....
Here is my answer to "fastest way of amino acid composition", as it might help someone else as well:
The "fastest way of amino acid composition" is using the MATLAB function aacount.
for example, lets assume your protein sequence is the following:
yoursequence = 'YURPRTEINSEQENCEYUCANPUTHERE'
you can use
compositionstruct = aacount(yoursequence)
Which will then return you the amino acid composition of your protein sequence in the struct
compositionstruct

Categories

Find more on Genomics and Next Generation Sequencing in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!