I finally got around to readdressing this challenge. The following is a function that will evaluate a power spectral density using a thomson multitaper method on a 2D input matrix with signal in each column. It uses the czt method to compute the DFT, and the eigen method for estimating the average of dpss windows. It takes about 1/15 of the time as using pmtm() in a parfor loop (on my machine). This is a rather specific implementation, but hopefully it will be useful to someone.
function psd = mypmtm(xin,fs,bins_per_hz)
    [m, n] = size(xin);%m=length of signal, n=# of signals
    k=fs*bins_per_hz;
    nfft = 2^nextpow2(m+k-1);%- Length for power-of-two fft.
    [E,V] = dpss(m,4);
    g=gpuDevice;
    s=(m+nfft)*32*length(V);%how many bytes will be needed for ea. signal
    ne=floor(g.AvailableMemory/s);%number of signals that can be processed at once with available memory
    indx=[0:ne:n,n];%number of iterations that will be necessary
      psd=zeros(k/2,n);%initialize output
      for i=1:length(indx)-1
      x=gpuArray(xin(:,1+indx(i):indx(i+1)));
      w = exp(-1i .* 2 .* pi ./ k); 
      x=x.*permute(E,[1 3 2]); %apply dpss windows
      %------- Premultiply data.
      kk = ( (-m+1):max(k-1,m-1) )';
      kk = (kk .^ 2) ./ 2;
      ww = w .^ (kk);   % <----- Chirp filter is 1./ww
      nn = (0:(m-1))';
      x = x .* ww(m+nn);
      %------- Fast convolution via FFT.
      x = fft(  x, nfft );
      fv = fft( 1 ./ ww(1:(k-1+m)), nfft );   % <----- Chirp filter.
      x = x .* fv;
      x  = ifft( x );
      %------- Final multiply.
      x = x( m:(m+k-1), : , :) .* ww( m:(m+k-1) );
      x = abs(x).^2;
      x=x.*permute(V,[2 3 1])/length(V);%'eigen' method of estimating average
      x=sum(x,3);
      x=x(2:end/2+1,:)./fs;
      psd(:,1+indx(i):indx(i+1))=gather(x);
      end
  end

