Answered
How do I use arrayfun on GPU when the size of the output array doesn't equal the size (of some) input arrays (Error using gpuArray/arrayfun
|gpuArray| |arrayfun| is different from the CPU version, which is effectively just a syntactic convenience to avoid writing |for...

10 years ago | 0

Answered
Manipulating arrays within GPU arrayfun
The general rule is that you cannot manipulate matrices or vectors inside a |gpuArray| |arrayfun| function, only scalar operatio...

10 years ago | 1

| accepted

Answered
Matrix algebra very slow on GPU
It might be worth answering this question for posterity. The questioner it seems was testing at least the linear solves with ...

10 years ago | 0

Answered
Solve large linear equation on GPU
If you look online, or perhaps read a book like <https://www.amazon.co.uk/dp/1421407949/ref=pd_lpo_sbs_dp_ss_1/277-0329359-50611...

10 years ago | 1

| accepted

Answered
Parallel Computing - use 8 cores and 2 video cards
A couple of questions answered: * Your graphics card is optimised for single precision compute. Make sure your data is in sin...

10 years ago | 0

| accepted

Answered
GPU performance with short vectors
Computation in a GPU core is significantly slower than in a modern CPU core. It makes up for that by having a lot of them - thou...

10 years ago | 0

| accepted

Answered
How to increase the speed of submatrix data transfer using gpuArray rather than CPU?
I think the problem is intrinsic. Even on my Kepler K20c the GPU is slower for this case. This is a memory intensive non-coheren...

10 years ago | 0

Answered
Efficient training of LSTM network with GPU
To get good performance out of the GPU, you need to give it a lot of data to process. Your best bet is to vectorize your code to...

10 years ago | 0

| accepted

Answered
How do I use @mldivide together with pagefun on the GPU?
Your systems are rectangular. You can only solve square systems with |pagefun|, sorry.

10 years ago | 1

| accepted

Answered
How to use multiple GPUs in distributed computing server?
Try |gpuDeviceCount| to determine how many GPUs a worker has access to, and then some manipulation of |labindex| to assign devic...

10 years ago | 1

Answered
How does one apply max(0,z) element-wise in a vectorized way (apply ReLu linear rectifier element-wise)
Not quite sure whether your point is clear because this is exactly how the two-argument form of |max| works - element-wise. ...

10 years ago | 0

Answered
Serious Bug in GPU Accelerated Interp1? -> EDIT: Present in R2015a, Fixed in R2016a
I can reproduce your slowdown, but I can't explain it since there have been no changes to the |interp| functions in recent relea...

10 years ago | 0

Answered
circshift slower on GPU
Yes, |circshift| is a bit slow on the GPU particularly when it has to shift both rowwise and columnwise, because that means it h...

10 years ago | 3

| accepted

Answered
Parallel looped interp1 on GPU
The best way to parallelize multiple 1D interpolations is to use 2D interpolation, and just set the Y interp point to |(1:M)'|, ...

10 years ago | 1

| accepted

Answered
Sparse for gpuArray problem
Yes, in R2015b the five-argument form of |sparse| is not supported for |gpuArray|. See |help gpuArray/sparse|. Now that it's ...

10 years ago | 0

| accepted

Answered
Mexcuda Error on Windows 10 (Matlab 2015 b)
As you can see from the first line of your output, your chosen host compiler is not supported by the CUDA compiler (nvcc) and yo...

10 years ago | 0

Answered
My code in Matlab takes longer time on Gpu compare with Cpu. my Gpu device is (GeForce 980 ti), could you some one run it on his/her gpu to see problem isrelated to code or my GPU hardwarte? (here is my code)
The problem is with your code. You need to vectorize the |mean| operation. At the moment you are calling thousands of kernels, s...

10 years ago | 0

Answered
How can I set shared memory configuration for NVIDIA graphics card to increase L1 cache?
You can't, but you can write a mex function to do it, that will affect your next CUDAKernel call. MEX functions do not initia...

10 years ago | 0

Answered
is it normal that the computation time on GPU for the first time be longer than the running it for more than 2 times? (run the code for first time on GPU and also after reset the device takes more time but after 2 or 3 times it shows the short time))
Firstly, yes, it's perfectly normal. The first time you call a GPU function, the GPU libraries are loaded into MATLAB, which tak...

10 years ago | 2

| accepted

Answered
"switch" like functionality on GPUarray
Any |switch| statement can be reformulated as a sequence of |if|, |elseif| statements, which is supported by GPU |arrayfun|, so ...

10 years ago | 0

| accepted

Answered
How to rewrite parfor loop using arrayfun to compute on GPU?
Seems like you're just trying to add together all the elements of A in every permutation, with the proviso that k > j > i. So re...

10 years ago | 0

Answered
In-place operation at gpuarray
Do it inside a function. On the command line MATLAB cannot operate on the arguments to functions in-place, because (to cite one ...

10 years ago | 2

Answered
Why for GPU is slower than CPU for this code? Is it because of sparsity or because of "for" loop
What you have here is a highly serial algorithm, accumulating small amounts of data inside a large array. To vectorize, perhaps ...

10 years ago | 0

Answered
Having problem In MatConvNet to Compiling the cuDNN support.
This appears to be a bug in MatConvNet's |vl_compilenn| function preventing it from working when your cudnn include path contain...

10 years ago | 1

| accepted

Answered
CUDA Quadro series comparison
For MATLAB, unless you are doing strongly divisible operations such as Monte Carlo sampling, you are better off with one really ...

10 years ago | 0

Answered
Parallelize a code for Schnakenberg model PDE reaction-diffusion
Try reading up on vectorization: * <http://www.mathworks.com/help/matlab/matlab_prog/vectorization.html> * <http://devblogs....

10 years ago | 0

Answered
reading saved gpuArray data with a non-gpu computer
Unfortunately, this is not possible. To make data visible to a MATLAB that does not have Parallel Computing Toolbox, you must fi...

10 years ago | 1

| accepted

Answered
How to gpuArraying the imported data?
It looks like you are assuming that by putting your mesh data on the GPU it will be available directly to the GPU for graphics d...

10 years ago | 0

Answered
Have I reached the limitations of GPU processing?
What you've done is unlikely to work well on the GPU, you are not doing enough work inside the loop. I will assume for the pu...

10 years ago | 1

| accepted

Load more