Info
This question is closed. Reopen it to edit or answer.
gnu Memory issue / overall usage question
1 view (last 30 days)
Show older comments
I have the following function fnDist
function [ mat_out ] = fnDist( img_in, varargin )
%fnDist Summary of this function goes here
% Detailed explanation goes here
g=gpuDevice(1);
v2_size = [size(img_in,1), size(img_in,2)];
v1_size = size(img_in,1)*size(img_in,2);
%%Get row number of each pixel
mat_row = repmat((1:size(img_in,1))', [1,size(img_in,2)]);
if(nargin==1)
%%Column-major default ie col-1, col-2, col-3
mat_row = reshape(mat_row, [v1_size,1]);
else
if(strcmpi(varargin{1},'column') || strcmpi(varargin{1},'cols'))
%%Column-major ie col-1, col-2, col-3
mat_row = reshape(mat_row, [v1_size,1]);
else
if(strcmpi(varargin{1},'row'))
%%Row-major ie col-1, col-2, col-3
mat_row = reshape(mat_row', [v1_size,1]);
else
error('Invalid type of stride');
end
end
end
gmat_row = mat_row;
% gmat_row = gpuArray(mat_row);
gmat_row = repmat(gmat_row, [1,v1_size]);
gmat_row_diff = abs(gmat_row - gmat_row');
mat_col = repmat(1:size(img_in,2), [size(img_in,1),1]);
if(nargin==1)
%%Column-major default ie col-1, col-2, col-3
mat_col = reshape(mat_col, [v1_size,1]);
else
if(strcmpi(varargin{1},'column') || strcmpi(varargin{1},'col'))
%%Column-major ie col-1, col-2, col-3
mat_col = reshape(mat_col, [v1_size,1]);
else
if(strcmpi(varargin{1},'row'))
%%Row-major ie col-1, col-2, col-3
mat_col = reshape(mat_col', [v1_size,1]);
else
error('Invalid type of stride: Must be ''row'' or ''column''. ');
end
end
end
gmat_col = mat_col;
% gmat_col = gpuArray(mat_col);
gmat_col = repmat(gmat_col, [1,v1_size]);
gmat_col_diff = abs(gmat_col - gmat_col');
if(nargin==1 || length(varargin)==1)
gmat_dist = gmat_row_diff.^2 + gmat_col_diff.^2;
gmat_dist = gmat_dist.^0.5;
else
if(strcmpi(varargin{2},'travel'))
gmat_dist = gmat_row_diff + gmat_col_diff;
else
if(strcmpi(varargin{2},'square'))
gmat_dist = gmat_row_diff.^2 + gmat_col_diff.^2;
else
if(strcmpi(varargin{2},'euclidean'))
gmat_dist = gmat_row_diff.^2 + gmat_col_diff.^2;
gmat_dist = gmat_dist.^0.5;
else
error('Invalid type of distance: Must be ''travel'' (dX+dY), ''square'' (dX^2+dY^2), ''euclidean'' ([dX^2+dY^2]^0.5), or ommitted (default of ''euclidean''). ');
end
end
end
end
mat_out = gmat_dist;
% mat_out = gather(gmat_dist);
% reset(g);
end
Which I'm attempting to calculate the distance between every pixel and every other pixel in the img_in matrix. So if I call
mat_image = zeros(100,100);
sqr = fnDist(mat_image,'column','travel');
sqr should be a symmetric 10,000 by 10,000 matrix (10,000 = 100x100), where each row or column represents a single pixel and its distance to every other pixel.
My problem occurs when I uncomment the GPU code. It bombs out when trying to perform operations on the GPU, but not when keeping the GPU code commented out, with the following error
Error using gpuArray/repmat
Out of memory on device. To view more detail about available memory on the
GPU, use 'gpuDevice()'. If the problem persists, reset the GPU by calling
'gpuDevice(1)'.
Error in fnDist (line 30)
gmat_row = repmat(gmat_row, [1,v1_size]);
Error in test_spectralclustering (line 8)
sqr = fnDist(mat_image,'column','travel');
I'll admit I'm a novice when it comes to GPU programming. I would have expected there to be more memory available on the GPU.
Am I doing something wrong with regards to layout necessary to perform GPU calculations?
Memory errors aside is there anything else that should be done to optimize on the GPU that would not be done when performing the same calculations on the CPU?
The following are my GPU properties.
CUDADevice with properties:
Name: 'GeForce GT 650M'
Index: 1
ComputeCapability: '3.0'
SupportsDouble: 1
DriverVersion: 7.5000
ToolkitVersion: 7.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 1.0734e+09
AvailableMemory: 112365568
MultiprocessorCount: 2
ClockRateKHz: 405000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Any help would be greatly appreciated.
0 Comments
Answers (1)
Walter Roberson
on 17 Jun 2016
10,000 by 10,000 by 8 bytes per entry would be 800 megabytes.
AvailableMemory: 112365568
which is 112 megabytes.
Also, for some operations, the GPU needs an extra copy of the data, re-arranged into the way that is the most efficient for the computations, so it is not uncommon for array sizes to be effectively limited to half of the available GPU memory.
1 Comment
Joss Knight
on 22 Jun 2016
Yes, those mobile GPU chips are really for graphics. It's convenient that you can test your GPU code on them, but you're not going to be able to do much and I'd be surprised if you get much performance benefit, especially in double precision.
This question is closed.
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!