testing SVD Performance on M1

23 views (last 30 days)
Marko
Marko on 29 Jun 2021
Commented: Benny Hartwig on 2 Aug 2021
Hello Community,
here is a script for testing the M1 performance on solving a SVD Problem. (Parallel Computing Toolbox is required)
% This script evaluates the Singular Value Decomposition (SVD) of size 1 to N.
% It detect also the maximum number of threads and determine the parallel
% effiency.
%% clear and define environment
clear all;delete(gcp('nocreate'));clc;
%% detect max threads
core_info = evalc('feature(''numcores'')');
maxThreads = str2num(core_info(53));
disp(['maximum number of simultanously threads: ',num2str(maxThreads)])
%% define lowest possible problemsize N wihtout a reminder for every thread
N = smallest_multiple(max(maxThreads,8));
% increase N to around 1000 for differen
%if N < 1000
% N = ceil(1000/N)*N;
%end
disp(['problem size N = ',num2str(N)])
%% benchmark
result = kron(1:maxThreads,[1 0 0]')';
for k = 1:maxThreads
y = zeros(N,1);
myCluster = parcluster('local');
myCluster.NumWorkers = k; % 'Modified' property now TRUE
saveProfile(myCluster);
% evalc('parpool(''local'',k)');
tic
parfor n = 1:N
y(n) = max(svd(randn(n)));
end
result(k,2)=toc;
disp([' Problem solved with ',num2str(k),' of '...
,num2str(maxThreads),' threads has finished in ',num2str(result(k,2)),'s'])
evalc('delete(gcp(''nocreate''))');
end
%% present the results
clc;
result(:,3) = result(1,2)./(result(:,2).*result(:,1));
disp(' #threads |time in s|Efficiency')
disp(result)
%% alternative solution to smallest multiple
% https://de.mathworks.com/matlabcentral/answers/386271-write-a-function-called-smallest_multiple#answer_319994
function r = smallest_multiple(k)
r = 1;
for n = 1:k
r = r * (n / gcd(r,n));
end
end
Could anybody run the script and post the output?
I can't understand the bad performance:
#threads |time in s|Efficiency
1.0000 93.1989 1.0000
2.0000 80.0887 0.5818
3.0000 76.6610 0.4052
4.0000 87.3170 0.2668
  5 Comments
Benny Hartwig
Benny Hartwig on 2 Aug 2021
Hi Marko,
thank you very much for the hint. I managed to connect the M1 Mini 8gb to all eight cores and run the benchmark run again. However, I needed to make some changes to the exercise to get a better understanding of the performance:
N = smallest_multiple(max(maxThreads,8))*1000*5 % increase the size of the loop
evalc('parpool(''local'',k)'); % activate before tic toc (otherwise dilutes time keeping)
parfor n = 1:N*k % scale the loop by the number of threads s.t. every worker has to finish N jobs
y(n) = max(svd(randn(10))); % reduce size of the random matrix to contain memory pressure
end
result(k,2)=toc/k; % divide total time by the number of workers
#threads |time in s|Efficiency
1.0000 33.1795 1.0000
2.0000 17.1001 0.9702
3.0000 11.8776 0.9312
4.0000 9.2032 0.9013
5.0000 8.4971 0.7810
6.0000 7.7836 0.7105
7.0000 7.2541 0.6534
8.0000 7.1321 0.5815
So its seems that the efficiency cores also improve the performance but are a bit slower than the performance cores. Moreover, the chart on memory pressure indicates that these efficiency numbers might be downward biased because the memory usage turned yellow during the run with 5 to 8 threads. With 1 to 4 threads, the memory useage was always green.
So it probably pays off to get the model with 16gb of ram as the ram consumption of the parfor loop increases quite strongly when you add more workers. Maybe this problem could be solved when Matlab runs natively on the M1.
Best,
Benny

Sign in to comment.

Answers (0)

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!