Efficient training of LSTM network with GPU
Show older comments
Hi all,
I recently introduced a GPU implemented computer and currently trying to refactor my LSTM codes to take advantage of GPU. However, I found my implementation doesn't show improvement on speed, actually using CPU is faster than using GPU. Below testing codes are testing of basic algorithm of LSTM for comparison. Could anyone give some advice on how to employ the potential of GPU for LSTM? I tried using pagefun, arrayfun and bsxfun but they seemed not working to improve speed.
This one is for GPU.
function LSTM_gpu2()
vis = 700; hid = 500;
T = 80; epochs = 10;
sigmoid = @(x) 1./(1+exp(-x));
x = rand(vis,1,T); h = zeros(hid,1,T+1); c = h;
W_z = rand(hid,vis,'gpuArray'); W_i = rand(hid,vis,'gpuArray');
W_f = rand(hid,vis,'gpuArray'); W_o = rand(hid,vis,'gpuArray');
R_z = rand(hid,hid,'gpuArray'); R_i = rand(hid,hid,'gpuArray');
R_f = rand(hid,hid,'gpuArray'); R_o = rand(hid,hid,'gpuArray');
P_i = diag(rand(hid,1,'gpuArray')); P_f = diag(rand(hid,1,'gpuArray'));
P_o = diag(rand(hid,1,'gpuArray'));
b_z = rand(hid,1,'gpuArray'); b_i = rand(hid,1,'gpuArray');
b_f = rand(hid,1,'gpuArray'); b_o = rand(hid,1,'gpuArray');
I = zeros(hid,T,'gpuArray'); F = zeros(hid,T,'gpuArray');
O = zeros(hid,T,'gpuArray'); G = zeros(hid,T,'gpuArray');
x = gpuArray(x); h = gpuArray(h); c = gpuArray(c);
tic;
for i=1:epochs
for t=1:T
G(:,t) = tanh(W_z*x(:,:,t) + R_z*h(:,:,t) + b_z);
I(:,t) = sigmoid(W_i*x(:,:,t) + R_i*h(:,:,t) + P_i*c(:,:,t) + b_i);
F(:,t) = sigmoid(W_f*x(:,:,t) + R_f*h(:,:,t) + P_f*c(:,:,t) + b_f);
c(:,:,t+1) = G(:,t).*I(:,t) + c(:,:,t).*F(:,t);
O(:,t) = sigmoid(W_o*x(:,:,t) + R_o*h(:,:,t) + P_o*c(:,:,t+1) + b_o);
h(:,:,t+1) = tanh(c(:,:,t+1)).*O(:,t);
end
%%backprop
%%update
end
toc;
return;
And this one is for CPU.
function LSTM_cpu()
vis = 700; hid = 500;
T = 80; epochs = 10;
sigmoid = @(x) 1./(1+exp(-x));
x = rand(vis,1,T); h = zeros(hid,1,T+1); c = h;
W_z = rand(hid,vis); W_i = rand(hid,vis);
W_f = rand(hid,vis); W_o = rand(hid,vis);
R_z = rand(hid,hid); R_i = rand(hid,hid);
R_f = rand(hid,hid); R_o = rand(hid,hid);
P_i = diag(rand(hid,1)); P_f = diag(rand(hid,1));
P_o = diag(rand(hid,1));
b_z = rand(hid,1); b_i = rand(hid,1);
b_f = rand(hid,1); b_o = rand(hid,1);
I = zeros(hid,T); F = zeros(hid,T);
O = zeros(hid,T); G = zeros(hid,T);
tic;
for i=1:epochs
for t=1:T
G(:,t) = tanh(W_z*x(:,:,t) + R_z*h(:,:,t) + b_z);
I(:,t) = sigmoid(W_i*x(:,:,t) + R_i*h(:,:,t) + P_i*c(:,:,t) + b_i);
F(:,t) = sigmoid(W_f*x(:,:,t) + R_f*h(:,:,t) + P_f*c(:,:,t) + b_f);
c(:,:,t+1) = G(:,t).*I(:,t) + c(:,:,t).*F(:,t);
O(:,t) = sigmoid(W_o*x(:,:,t) + R_o*h(:,:,t) + P_o*c(:,:,t+1) + b_o);
h(:,:,t+1) = tanh(c(:,:,t+1)).*O(:,t);
end
%%backprop
%%update
end
toc;
return;
OS: Windows 10,
GPU: NVIDIA Quadro M5000,
CPU: Intel i7-5820K,
MATLAB: R2016a
Thank you,
Yuto Ozaki
1 Comment
Yuto Ozaki
on 10 Apr 2016
Edited: Yuto Ozaki
on 10 Apr 2016
Accepted Answer
More Answers (0)
Categories
Find more on Deep Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!