Synovec Peak Matching algorithm

I'm a new user of Matlab and I've been trying to use a previously published and implemented code, but for some reason I cannot get the code to work for me. The code was developed by Synovec et al at the University of Washington, and is called "peakmatch". The code is supposed to perform retention time alignment of GC data.
I've included the entire algorithm below - this is the only information I have regarding the entire program. I've also attached a spreadsheet titled "NormInjB" which is an example of the data input I'm using.
"NormInjB" would be my input matrix. This is oriented so that each column corresponds to a different sample, so I transpose it when i enter it into matlab. The target command is defining that my target chromatogram is composed of the second row, all columns (after transposing my data, each row is a separate sample). Row 1 is the retention times and thus row 2 is the first sample. I've also tried removing the retention times, but still get similar errors.
>data=NormInjB';
>target=data(2,:);
My understanding is that I should then type [aldat,corrs,numtar,numpks]=peakmatch(data,window,target) and badabing, badaboom - magic happens. Unfortunately, I get a series of errors.
Some of the errors I am obtaining include:
Undefined function or variable 'XF'.
Error in peakmatch>basec2 (line 278)
rawdat(i,:)=rawdat(i,:)-(p(2)+XF*p(1));
Error in peakmatch (line 21)
data=basec2(data,400);
I've also gotten errors that say:
Index exceeds the number of array elements (90)
I have 90 samples, but I don't know why I would be getting that, because I can't figure out where the index is defined?
Another:
Index in position 2 is invalid. Array indices must be positive integers or logical values
Everytime I think I've solved one problem, another pops up and then a few tries later, I get the same error as previously.
I've tried breaking the code down to identify different segments that don't work but I got so lost that I decided to pause and ask for help.
Any advice at all would be appreciated because I've been struggling for weeks over this and I'm very frustrated! Again, I'm a super newbie so I apologize if any of this doesn't make sense or seems to have an obvious solution. I have scoured the matlab answers and just confused myself more.
Complete peakmatch.m algorithm:
function [aldat,corrs,numtar,numpks]=peakmatch(data,window,target)
% Chromatographic alignment through nearest neighbor peak matching
%
% Input are the chromatograms to be aligned, "data", the maximum shift to consider, "window",
% and the alignment target chromatogram, "target"
%
% If a target is not specified, then first row of the data matrix will be used
%
% Window should be set to a value that's less than typical peak to peak distances in the chromatogram
%
% Output is the aligned data as well as the correlation coefficients of each chromatogram
% to the target both before and after alignment.
%
% IO: [aldat,corrs]=altest(data,window,target);
%Kevin Johnson 5/23/02
%baseline correction as best fit line through first and last 400 data points.
%WILL NOT BE APPROPRIATE FOR ALL DATA!! may wish to comment out or alter number of points used
data=basec2(data,400);
% Alignment target is the first chromatogram in the dataset, if none is specified
if nargin==2
target=data(1,:);
end
% Alignment quality measure is the correlation coefficient between chromatograms
% following the application of a Wallis filter to mimimize the efect of varying
% peak heights. The filter algorithm also normalizes by the max peak height, prior
% to filtering. Filtered chromatograms are temporary, and are not output by the program.
% the filter assumes peaks are ~20 data points wide, other widths can be specified.
wftarget=wfilt(target,20);
[m,n]=size(data);
% peak finding algorithm uses 5x std dev of first 400 points as measure of baseline noise
% WILL NOT BE APPROPRIATE FOR ALL DATA!! May wish to specify noise level explicitly.
%"stds" is a vector containing the noise threshold for each chromatogram
%"tpks" is the noise threshold for the target chroamtogram
stds=std(data(:,1:200)')*5;
tpks=pfind(target,std(target(1:200))*5);
%number of peaks located in target (for debugging purposes)
numtar=length(tpks);
%plot of detected target peaks (for debugging purposes)
%plot(target);set(gca,'NextPlot','add');plot(tpks,target(tpks),'r*');set(gca,'NextPlot','replace');
%initialize matrix to hold aligned data
aldat=data;
numpks=zeros(1,m);
% step through each chromatogram and do the peak matching alignment
for i=1:m
[aldat(i,:),shifts,shifttimes,numpks(i)]=align(target,aldat(i,:),tpks,stds(i),window);
%calculate "before" and
temp=corrcoef(wftarget',wfilt(data(i,:))');bcors(i)=temp(1,2);
%"after" correlations to check alignment
temp=corrcoef(wftarget',wfilt(aldat(i,:))');cors(i)=temp(1,2);
end
corrs=[bcors',cors'];
%%%%%%%%%%%%%%%
%sub-functions%
%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [newvec2,templist,temptimes,numpks]=align(vec1,vec2,peaks1,nthresh,width)
%aligns two chromatograms through peak matching and interpolation
%find peaks in sample chromatogram
peaks2=pfind(vec2,nthresh);
numpks=length(peaks2);
%plot of detected peaks in sample chromatogram (for debugging purposes)
%plot(vec2);set(gca,'NextPlot','add');plot(peaks2,vec2(peaks2),'r*');set(gca,'NextPlot','replace');
if isempty(peaks2);
newvec2=vec2;
return
end
[peaks1,peaks2]=cmpks2(peaks1,peaks2,width);
if isempty(peaks2);
newvec2=vec2;
return
end
peaks1=fix(peaks1);
peaks2=fix(peaks2);
peaks1=[1 peaks1 length(vec1)];
peaks2=[1 peaks2 length(vec2)];
templist=peaks2-peaks1;
temptimes=peaks2;
for i=1:length(peaks1)-1
seg1=vec1(peaks1(i):peaks1(i+1));
seg2=vec2(peaks2(i):peaks2(i+1));
if length(seg2)>1
newvec2(peaks1(i):peaks1(i+1))=intwin2(seg2,seg1);
end
clear seg1 seg2
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [newpks1,newpks2]=cmpks2(peaks1,peaks2,thresh)
%"fuzzy" intersection of two sets of peak retention times
j=1;
k=1;
flag2=0;
flag3=1;
newpks1=peaks1;
q=[1:length(peaks2)];
for i=1:length(peaks1)
pdif=abs(peaks2-peaks1(i));
mindif=min(pdif(find(q)));
if mindif<thresh
newpks2(j)=peaks2(find(pdif==mindif));
q(find(pdif==mindif))=0;
j=j+1;
flag3=0;
else
trim(k)=i;
k=k+1;
flag2=1;
end
end
if flag3
newpks2=[];
end
if flag2
newpks1(trim)=[];
end
while ~isempty(find(diff(newpks2)<=0))
bads=find(diff(newpks2)<=0)
newpks1(bads+1)=[];
newpks2(bads+1)=[];
end
return
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [peaks]=pfind(chrom,nthresh)
%peak finding algorithm, based on estimated first derivative zero
dif=diff(chrom);
dif=[0,dif];
lng=length(dif);
flag=0;
flag2=0;
j=1;
k=1;
for i=1:lng-2
flag3=or(dif(i)>nthresh,flag);
if flag3
if dif(i)>dif(i+1) & dif(i)>dif(i+2)
flag=1;
y(j)=i;
x(j)=dif(i);
j=j+1;
flag2=1;
else
if flag2
if length(x)>=2 & x(end)<0
y(j)=i;
x(j)=dif(i);
yi=interp1(x',y',0);
peaks(k)=yi;
k=k+1;
flag=0;
flag2=0;
j=1;
clear x y yi
else
flag=0;
flag2=0;
j=1;
clear x y yi
end
end
end
end
end
if k==1
peaks=[];
end
return
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [out]=wfilt(in,loc)
%Wallis filter, sets local means and variances to arbitrary values of
%0 and 1, respectively. Assumes features are ~20 points wide
%unless otherwise specified
%noramlize to maximum intensity - prevents scaling problems
in=in./max(in);
[m,n]=size(in);
if nargin==1
loc=20;
end
out=in;
wid=loc*2+1;
mat=zeros(wid,n-(2*loc));
num=(n-(wid-1)-1);
for i=1:wid
mat(i,:)=in(i:i+num);
end
sig=std(mat);
mn=mean(mat);
out(loc+1:n-loc)=(100./(100*sig+1)).*(in(loc+1:n-loc)-mn);
return
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [newvec]=intwin2(pvec,tvec)
%intwin - interpolates warped window to correct size
%!!new version!! (vectorized commented-out portion)
%Kevin Johnson 10/21/99
tv=length(tvec);
pv=length(pvec);
%for i=0:tv-1,
% x(i+1)=((pv-1)/(tv-1))*i+1;
%end
test=[0:tv-1];
test2=repmat((pv-1)/(tv-1),[1,tv]);
x=(test.*test2)+1;
x(end)=fix(x(end));
newvec=interp1([1:pv],pvec,x,'*linear');
return
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function rawdat=basec2(rawdat,numpts)
%baseline corrects a set of raw chromatograms, subtracts best fit line
%through first and last "numpts" data points
[mr,nr]=size(rawdat);
X=[1:numpts, (nr-(numpts-1)):nr];
for i=1:mr,
p=polyfit(X,rawdat(i,X),1);
rawdat(i,:)=rawdat(i,:)-(p(2)+XF*p(1));
end
return

1 Comment

Hello,
I use exactly the same function and have the same problem as well. Did you manage to solve your problem? I would be interested to know if you succeed to use this function or another on your sample set.

Sign in to comment.

Answers (0)

Asked:

on 1 Apr 2019

Commented:

on 11 Nov 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!