How do I substitute all the activation functions of a neural network?

3 views (last 30 days)
Hi everyone!
I have an out-of-memory issue when substituiting the activation functions ina neural network with other activation functions. This is the code that I use:
while index < totLayers - removedLayers
if contains(lower(lgraph.Layers(index).Name),'relu')
name = lgraph.Layers(index).Name;
conn = lgraph.Connections;
for i = 1:size(conn,1)
if strcmp(conn.Source{i},name)
out = conn.Destination{i};
elseif strcmp(conn.Destination{i},name)
in = conn.Source{i};
end
end
channels = findChannels(lgraph,in);
%create new activation layers
newActivationLayers = createActivationLayers(newActivations,channels,index+removedLayers,relativeLearnRate,maxInput);
lgraph = removeLayers(lgraph,name);
lgraph = addLayers(lgraph,newActivationLayers);
lgraph = connectLayers(lgraph,in,newActivationLayers(1).Name);
lgraph = connectLayers(lgraph,newActivationLayers(end).Name,out);
removedLayers = removedLayers + length(newActivationLayers);
else
index = index + 1;
end
end
plot(lgraph)
findChannels and createActivationLayers only create the new layer to be inserted in the network at that specific point in the network.
The code seems to work because, when I plot lgraph, the output is correct. However, the GPU goes out of memory at training time. I tried to debug my code by substituting every activation in the network with itself (i.e: leaving lgraph unchanged) and a network that I was able to train on my GPU returns me an out-of-memory problem on the network returned by my code.
The only difference that I could see is that the order of the layers in lgraph.Layers is different from the original one and all the activation layers are at the end. However, the graph is correct and I would be surprised if this was the problem.
Does anyone know why I have this issue?

Answers (2)

Srivardhan Gadila
Srivardhan Gadila on 18 Jul 2020
I would suggest you to try training the network on the cpu. It may be possible that the gpu memory is not sufficient for training the new network.
Refer to 'ExecutionEnvironment' name value pair argument in Hardware Options of trainingOptions and set it to 'cpu'.
If you are able to train the network on cpu successfully then try reducing the batch size while training on gpu.
  2 Comments
Gianluca Maguolo
Gianluca Maguolo on 21 Jul 2020
Hello, thanks for the answer!
However that does not solve the problem. How is that possibile that a network with the same graph does not give me any memory issue using the same training options? Is it possibile that the order of the layers in lgraph.Layers affects the memory requirements? I guess it shouldn't, but I seems to happen.

Sign in to comment.


Joss Knight
Joss Knight on 4 Aug 2020
It looks as though you've replaced every relu layer with multiple other layers. This will make your network deeper. The deeper the network, the more memory you need for training - this is the way backpropagation works, it needs to hold onto all the activations from every layer. In addition, we can only guess at the memory requirements of your extra layers, since you don't say what they are.
I wonder what your extra layers are and why you need more than one new layer to replace something as simple as a relu activation.
  2 Comments
Gianluca Maguolo
Gianluca Maguolo on 6 Aug 2020
Thank yyou very much for you answer. Those could be anything in the future, at the momentI am trying to make this work... The problem is that I had already implemented a naive algorithm to subistute the activation function in a specific newtork, this code should be a generalization. I already thought about what you said, however I tried to figure out if there weere any bugs and I made some simple tests.
I applied the algorithm above to a network that contained a custom activation that I created. That original network worked well. However, after I apply this new algorithm to substitute my custom function with itself (i.e: no changes expected), then the training goes out of memory even with smaller batch sizes than before. When I plot the layerGraph of the two networks, they are exactly the same. The only change is in the order of lgraph.Layers, but that should not affect training. I wonder if I miss something at a lower level. Maybe trainNetwork uses the memory in a way that is not clear to me and it depends on the order of the layers in lgraph.Layers.
Joss Knight
Joss Knight on 9 Aug 2020
Did you delete the first network before training the second network? Try calling reset(gpuDevice) before training the modified network.

Sign in to comment.

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!