Clear Filters
Clear Filters

Randomization testing with fitcsvm on GPU is excruciatingly slow

2 views (last 30 days)
I am new to GPU computing. We wanted to use it for randomization statistics of SVM classification.
The idea is that we have real data, on which classification gives us a classification accuracy. Then we randomize class labels (which removes all possible information from data) hundreds of times to create the null distribution of classification accuracies.
We now notice that for real data execution times of fitcsvm on GPU and CPU are similar (larger training sets are better for GPU). For the randomized data, however, the GPU is completely unusable. For example, whereas the call to fitcsvm with the real labels takes 0.15s on GPU and 0.18s on CPU, the same call takes 252s on GPU but only (still!) 40s on CPU when labels are scrambled.
fitcsvm seem to have difficulties converging if there is no difference between classes (which is often the case in our real data and always the case in randomized data), but it is unusable on GPU. Using fitclinear is much faster and makes no difference between real and randomized labels, but does not exist on GPU.
Is there any way that we can calculate SVMs on GPU for data with small or non-existent class differences?
% prepare
ntrials = 1000; nfeatures = 1000; vshared = 0.4; verror = 3; vclass = 1.5; % vclass is the difference between classes
% generate data
dshared = gpuArray.randn(1,nfeatures)*vshared; % variance shared between classes
x1 = dshared + gpuArray.randn(1,nfeatures)*vclass; % class 2 specific variance
x2 = dshared + gpuArray.randn(1,nfeatures)*vclass; % class 1 specific variance
x1 = repmat(x1,ntrials,1) + gpuArray.randn(ntrials,nfeatures)*verror; % add error variance for each trial
x2 = repmat(x2,ntrials,1) + gpuArray.randn(ntrials,nfeatures)*verror; % add error variance for each trial
% classify
labelstrain = [-ones(ntrials,1); ones(ntrials,1)]; % true labels
rlabelstrain = labelstrain(randperm(2*ntrials)); % randomized labels
datatrain = [x1;x2];
tic; fitcsvm(datatrain, labelstrain); % train true labels on GPU
gputime = toc
tic; fitcsvm(datatrain, rlabelstrain); % train randomized labels on GPU
rgputime = toc
datatrain = gather(datatrain); % get data from GPU to CPU memory
tic; fitcsvm(datatrain, labelstrain); % train true labels on CPU
cputime = toc
tic; fitcsvm(datatrain, rlabelstrain); % train randomized labels on CPU
rcputime = toc
fprintf('GPU: %4.1f R-GPU: %4.1f CPU: %4.1f R-CPU: %4.1f\n',gputime,rgputime,cputime,rcputime);

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!