Okay, I was able to fix this problem by getting gpucoder to produce the code that the app used to compile my function.
- Install the GPU coder interface for deep learning libraries.
- use the GPU Coder app to compile your code.
gpucoder -script yourscript.m -tocode yourgpucoderproject.prj
replacing the .m and .prj files with the names of you files.
This will output a script called yourscript.m with the code the app used. Now you can turn this script into a function that takes in the sizes of the arrays it should expect.
The critical difference between what I was doing and what the GPU Coder app does is it uses
coder.typeof(single(0), [N 1], 'Gpu', true)
to signal a gpuArray input. NOWHERE in the documentation is this syntax shown or explained.