Accuracy problems using mldivide on GPU

6 views (last 30 days)
Hello all,
I want to solve a system of complex linear equations A*x=B on the GPU. However, when I compare the solution with the results of the same computation on the CPU, there seems to be a relatively large error.
Here is a minimal working example:
rng('default');
%% CPU
A = randn(60,2,60001) + 1i*randn(60,2,60001);
B = randn(60,2,60001) + 1i*randn(60,2,60001);
for k = length(A):-1:1
x_CPU(:,:,k) = A(:,:,k) \ B(:,:,k);
end
%% GPU
g_A = gpuArray(A);
g_B = gpuArray(B);
g_x = pagefun(@mldivide, g_A, g_B);
x_GPU = gather(g_x);
%% comparison
max_error = max(abs((x_GPU - x_CPU) ./ x_CPU), [], 'all'); % relative error
disp(max_error);
On my machine, I get
max_error = 1.0197e+03
which means that the results are unusable.
In a last-ditch attempt I tried using complex() both in gpuArray() and in pagefun(), as described here, but it changes nothing.
I am running Matlab R2021a with the latest updates. GPU driver is also the latest version (527.27).
ans =
CUDADevice with properties:
Name: 'Quadro P2200'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 11.6000
ToolkitVersion: 11
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 5.3685e+09
AvailableMemory: 4.3222e+09
MultiprocessorCount: 10
ClockRateKHz: 1493000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
I would be very grateful for ideas and suggestions on how to solve the problem. Thanks in advance!
  1 Comment
Bruno Luong
Bruno Luong on 9 Jan 2023
Again the issue only occurs with A complex and not square.

Sign in to comment.

Accepted Answer

Matt J
Matt J on 9 Jan 2023
Confirmed. But it doesn't seem to be mldivide that's the problem per se, but rather pagefun:
load data
for k = length(A):-1:1
x_CPU(:,:,k) = A(:,:,k) \ B(:,:,k);
end
%% GPU
g_A = gpuArray(A);
g_B = gpuArray(B);
%g_x = pagefun(@mldivide, g_A, g_B);
for k = length(A):-1:1
g_x(:,:,k) = g_A(:,:,k) \ g_B(:,:,k);
end
x_GPU = gather(g_x);
nrm=@(z) vecnorm(z(:,:),inf,1);
%% comparison
max_error = max(nrm(x_GPU - x_CPU) ./ nrm(x_CPU)) % relative error
max_error =
7.4269e-20
  8 Comments
Matt J
Matt J on 9 Jan 2023
Edited: Matt J on 9 Jan 2023
In pageloop in this example, unlike in your first one, you pull A and B from the GPU, use mldivide pagewise (on the CPU) and write it back to the GPU. Is it possible that you mixed up gather() and gpuArray()?
No, that was deliberate. That is what you would have to do to interrupt and then resume your GPU workflow.

Sign in to comment.

More Answers (2)

Joss Knight
Joss Knight on 7 Jan 2023
Edited: Joss Knight on 7 Jan 2023
The answer is not less accurate, it is just as inaccurate. What you have are 60001 massively overdetermined systems generated by completely random data. The least squares solution to this is going to be all over the place, in other words, there will be no good answer. If you look at the relative residuals you'll find there's little difference in the (lack of) success of either CPU or GPU.
%%
rng('default');
%% CPU
A = randn(60,2,60001) + 1i*randn(60,2,60001);
B = randn(60,2,60001) + 1i*randn(60,2,60001);
x_CPU = pagemldivide(A,B);
%% GPU
g_A = gpuArray(A);
g_B = gpuArray(B);
g_x = pagemldivide(g_A, g_B);
x_GPU = gather(g_x);
%% comparison
max_error = max(abs((x_GPU - x_CPU) ./ x_CPU), [], 'all'); % relative error
disp(max_error);
res = abs(sqrt(sum((pagemtimes(A, x_CPU)-B).^2,1)));
g_res = abs(sqrt(sum((pagemtimes(g_A, g_x)-g_B).^2,1)));
rhsNorm = abs(sqrt(sum(B.^2,1)));
relres = reshape(res./rhsNorm, 2, []);
g_relres = reshape(g_res./rhsNorm, 2, []);
average_cpu_relative_residual = mean(relres, 'all')
average_gpu_relative_residual = mean(g_relres, 'all')
std_cpu = std(relres(:))
std_gpu = std(g_relres(:))
To which I got the answers
1.0197e+03
average_cpu_relative_residual =
0.9979
average_gpu_relative_residual =
1.0543
std_cpu =
0.1935
std_gpu =
0.2664
It seems the GPU solution is marginally worse, but given that the input was nonsense anyway I wouldn't read too much into it.
It's possible that you intended for the system matrix A to be square and only B to be 60x2 (i.e. 2 right-hand-sides), in which case with random data you'd get something much better behaved. I got:
1.3056e-11
average_cpu_relative_residual =
1.8911e-14
average_gpu_relative_residual =
1.6029e-14
std_cpu =
3.6059e-14
std_gpu =
3.1088e-14
  3 Comments
Joss Knight
Joss Knight on 9 Jan 2023
Well, I'll admit that numerical analysis is not my speciality (at doctorate level, anyway) but I'm fairly sure that 17% residual indicates that the data is still very poorly conditioned. Since it's 2-D data why not plot some of the data along with the best fit lines to get an idea of how poorly the GPU is doing. It may be that you can fix the problem with some basic outlier rejection, which would be sensible anyway if you have some bad outliers.
cublas's batch QR routines may have some issues, of course, with correct handling of poorly conditioned complex systems, so we'll take a look at that. In the meantime you could try using a pseudo-inverse as Matt J shows.
Damian
Damian on 9 Jan 2023
You are right about the quality of the data. Since I know that, I did not expect a perfect fit, so in my case 7.5% residual over the entire dataset is good enough.
Thank you for your suggestions and your help!

Sign in to comment.


Walter Roberson
Walter Roberson on 9 Jan 2023
Edited: Matt J on 9 Jan 2023
It is a known bug that should be fixed soon that affects page left and right divisions for complex numbers
  2 Comments
Damian
Damian on 9 Jan 2023
Edited: Damian on 9 Jan 2023
I am not sure if this applies here since I am still on 2021a, so pagemldivide does not exist for me (yet). But this is something I will keep in mind when upgrading.
Thank you for the info!
Bruno Luong
Bruno Luong on 9 Jan 2023
I agree with Walter it looks pretty much similar cause (wrong detection of matrix type and decomposition) for GPU and CPU.

Sign in to comment.

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!