Synovec Peak Matching algorithm

Question

0 votes

NormInjB.xlsx

I'm a new user of Matlab and I've been trying to use a previously published and implemented code, but for some reason I cannot get the code to work for me. The code was developed by Synovec et al at the University of Washington, and is called "peakmatch". The code is supposed to perform retention time alignment of GC data.

I've included the entire algorithm below - this is the only information I have regarding the entire program. I've also attached a spreadsheet titled "NormInjB" which is an example of the data input I'm using.

"NormInjB" would be my input matrix. This is oriented so that each column corresponds to a different sample, so I transpose it when i enter it into matlab. The target command is defining that my target chromatogram is composed of the second row, all columns (after transposing my data, each row is a separate sample). Row 1 is the retention times and thus row 2 is the first sample. I've also tried removing the retention times, but still get similar errors.

>data=NormInjB';

>target=data(2,:);

My understanding is that I should then type [aldat,corrs,numtar,numpks]=peakmatch(data,window,target) and badabing, badaboom - magic happens. Unfortunately, I get a series of errors.

Some of the errors I am obtaining include:

Undefined function or variable 'XF'.
Error in peakmatch>basec2 (line 278)
    rawdat(i,:)=rawdat(i,:)-(p(2)+XF*p(1));
Error in peakmatch (line 21)
data=basec2(data,400);

I've also gotten errors that say:

Index exceeds the number of array elements (90)

I have 90 samples, but I don't know why I would be getting that, because I can't figure out where the index is defined?

Another:

Index in position 2 is invalid. Array indices must be positive integers or logical values

Everytime I think I've solved one problem, another pops up and then a few tries later, I get the same error as previously.

I've tried breaking the code down to identify different segments that don't work but I got so lost that I decided to pause and ask for help.

Any advice at all would be appreciated because I've been struggling for weeks over this and I'm very frustrated! Again, I'm a super newbie so I apologize if any of this doesn't make sense or seems to have an obvious solution. I have scoured the matlab answers and just confused myself more.

Complete peakmatch.m algorithm:

function [aldat,corrs,numtar,numpks]=peakmatch(data,window,target)
% Chromatographic alignment through nearest neighbor peak matching
%
% Input are the chromatograms to be aligned, "data", the maximum shift to consider, "window",
% and the alignment target chromatogram, "target"
%
% If a target is not specified, then first row of the data matrix will be used
%
% Window should be set to a value that's less than typical peak to peak distances in the chromatogram
%
% Output is the aligned data as well as the correlation coefficients of each chromatogram
% to the target both before and after alignment.
%
% IO: [aldat,corrs]=altest(data,window,target);
%Kevin Johnson 5/23/02
%baseline correction as best fit line through first and last 400 data points.
%WILL NOT BE APPROPRIATE FOR ALL DATA!! may wish to comment out or alter number of points used
data=basec2(data,400);
% Alignment target is the first chromatogram in the dataset, if none is specified
if nargin==2
	target=data(1,:);
end
% Alignment quality measure is the correlation coefficient between chromatograms
% following the application of a Wallis filter to mimimize the efect of varying
% peak heights.  The filter algorithm also normalizes by the max peak height, prior
% to filtering.  Filtered chromatograms are temporary, and are not output by the program.
% the filter assumes peaks are ~20 data points wide, other widths can be specified.
wftarget=wfilt(target,20);
[m,n]=size(data);
% peak finding algorithm uses 5x std dev of first 400 points as measure of baseline noise
% WILL NOT BE APPROPRIATE FOR ALL DATA!! May wish to specify noise level explicitly.
%"stds" is a vector containing the noise threshold for each chromatogram
%"tpks" is the noise threshold for the target chroamtogram
stds=std(data(:,1:200)')*5;
tpks=pfind(target,std(target(1:200))*5);
%number of peaks located in target (for debugging purposes)
numtar=length(tpks);
%plot of detected target peaks (for debugging purposes)
%plot(target);set(gca,'NextPlot','add');plot(tpks,target(tpks),'r*');set(gca,'NextPlot','replace');
%initialize matrix to hold aligned data
aldat=data;
numpks=zeros(1,m);
% step through each chromatogram and do the peak matching alignment
for i=1:m
    [aldat(i,:),shifts,shifttimes,numpks(i)]=align(target,aldat(i,:),tpks,stds(i),window);
    %calculate "before" and 
    temp=corrcoef(wftarget',wfilt(data(i,:))');bcors(i)=temp(1,2);
    %"after" correlations to check alignment
    temp=corrcoef(wftarget',wfilt(aldat(i,:))');cors(i)=temp(1,2);
end
corrs=[bcors',cors'];
%%%%%%%%%%%%%%%
%sub-functions%
%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [newvec2,templist,temptimes,numpks]=align(vec1,vec2,peaks1,nthresh,width)
%aligns two chromatograms through peak matching and interpolation
%find peaks in sample chromatogram
peaks2=pfind(vec2,nthresh);
numpks=length(peaks2);
%plot of detected peaks in sample chromatogram (for debugging purposes)
%plot(vec2);set(gca,'NextPlot','add');plot(peaks2,vec2(peaks2),'r*');set(gca,'NextPlot','replace');
if isempty(peaks2);
	newvec2=vec2;   
   return
end
[peaks1,peaks2]=cmpks2(peaks1,peaks2,width);
if isempty(peaks2);
	newvec2=vec2;   
   return
end
peaks1=fix(peaks1);
peaks2=fix(peaks2);
peaks1=[1 peaks1 length(vec1)];
peaks2=[1 peaks2 length(vec2)];
templist=peaks2-peaks1;
temptimes=peaks2;
for i=1:length(peaks1)-1
   seg1=vec1(peaks1(i):peaks1(i+1));
   seg2=vec2(peaks2(i):peaks2(i+1));
   if length(seg2)>1
      newvec2(peaks1(i):peaks1(i+1))=intwin2(seg2,seg1);
   end
   clear seg1 seg2
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [newpks1,newpks2]=cmpks2(peaks1,peaks2,thresh)
%"fuzzy" intersection of two sets of peak retention times
j=1;
k=1;
flag2=0;
flag3=1;
newpks1=peaks1;
q=[1:length(peaks2)];
for i=1:length(peaks1)
   pdif=abs(peaks2-peaks1(i));
   mindif=min(pdif(find(q)));
   if mindif<thresh
      newpks2(j)=peaks2(find(pdif==mindif));
      q(find(pdif==mindif))=0;
      j=j+1;
      flag3=0;
   else
      trim(k)=i;
      k=k+1;
      flag2=1;
   end
end
if flag3
   newpks2=[];
end
if flag2
   newpks1(trim)=[];
end
while ~isempty(find(diff(newpks2)<=0))
   bads=find(diff(newpks2)<=0)
   newpks1(bads+1)=[];
   newpks2(bads+1)=[];
end
return
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [peaks]=pfind(chrom,nthresh)
%peak finding algorithm, based on estimated first derivative zero
dif=diff(chrom);
dif=[0,dif];
lng=length(dif);
flag=0;
flag2=0;
j=1;
k=1;
for i=1:lng-2
   flag3=or(dif(i)>nthresh,flag);
   if flag3
      
   	if dif(i)>dif(i+1) & dif(i)>dif(i+2)
      	flag=1;
      	y(j)=i;
      	x(j)=dif(i);
      	j=j+1;
         flag2=1;
         
   	else
         if flag2
            if length(x)>=2 & x(end)<0
		      	y(j)=i;
      			x(j)=dif(i);   
               yi=interp1(x',y',0);
         		peaks(k)=yi;
         		k=k+1;
         		flag=0;
            	flag2=0;
            	j=1;
               clear x y yi
            else
         		flag=0;
            	flag2=0;
            	j=1;
               clear x y yi               
            end
         end   
      end
   end
end
if k==1
   peaks=[];
end
return
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [out]=wfilt(in,loc)
%Wallis filter, sets local means and variances to arbitrary values of
%0 and 1, respectively.  Assumes features are ~20 points wide
%unless otherwise specified
%noramlize to maximum intensity - prevents scaling problems
in=in./max(in);
[m,n]=size(in);
if nargin==1
loc=20;
end
out=in;
wid=loc*2+1;
mat=zeros(wid,n-(2*loc));
num=(n-(wid-1)-1);
for i=1:wid
    mat(i,:)=in(i:i+num);
end
sig=std(mat);
mn=mean(mat);
out(loc+1:n-loc)=(100./(100*sig+1)).*(in(loc+1:n-loc)-mn);
return
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [newvec]=intwin2(pvec,tvec)
%intwin - interpolates warped window to correct size
%!!new version!! (vectorized commented-out portion)
%Kevin Johnson 10/21/99
tv=length(tvec);
pv=length(pvec);
%for i=0:tv-1,
%   x(i+1)=((pv-1)/(tv-1))*i+1;
%end
test=[0:tv-1];
test2=repmat((pv-1)/(tv-1),[1,tv]);
x=(test.*test2)+1;
x(end)=fix(x(end));
newvec=interp1([1:pv],pvec,x,'*linear');
return
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function rawdat=basec2(rawdat,numpts)
%baseline corrects a set of raw chromatograms, subtracts best fit line
%through first and last "numpts" data points
[mr,nr]=size(rawdat);
X=[1:numpts, (nr-(numpts-1)):nr];
for i=1:mr,
 	p=polyfit(X,rawdat(i,X),1);
    rawdat(i,:)=rawdat(i,:)-(p(2)+XF*p(1));
end
return

1 Comment
Show -1 older comments Hide -1 older comments

Guillaume Vannier on 11 Nov 2019

Hello,

I use exactly the same function and have the same problem as well. Did you manage to solve your problem? I would be interested to know if you succeed to use this function or another on your sample set.

Sign in to comment.

Sign in to answer this question.

Follow Question

Synovec Peak Matching algorithm

1 Comment
Show -1 older comments Hide -1 older comments

Answers (0)

Categories

Tags

Community Treasure Hunt

Synovec Peak Matching algorithm

1 Comment Show -1 older comments Hide -1 older comments

Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments