- - Initialize velocities with zeros if it's empty.
- - Perform the lookahead step by computing lookaheadParams using the previous velocity.
- - Evaluate the gradients at the lookahead point using dlfeval and your modelLoss function.
- - Update the velocity using the lookahead gradients.
- - Update the parameters using the updated velocity.
Function dlupdate to train network with Nesterov accelerated gradient
12 views (last 30 days)
Show older comments
Dear all,
I wanted to use example from matlab website to train network with Nesterov accelerated gradient. I found functions to train network with sgd, sgdm, but I couldn't find function to train network with nesterov accelerated gradient. I found on mathworks website that to create my own function to train network I have to use dlupdate function. I started with example from mathworks website (Update parameters using custom function - MATLAB dlupdate (mathworks.com)) and it works, but I don't know how to do it with Nesterov accelerated gradient. Here is my code with sgd:
[XTrain,TTrain] = digitTrain4DArrayData;
classes = categories(TTrain);
numClasses = numel(classes);
layers = [
imageInputLayer([28 28 1],'Mean',mean(XTrain,4))
convolution2dLayer(5,20)
reluLayer
convolution2dLayer(3,20,'Padding',1)
reluLayer
convolution2dLayer(3,20,'Padding',1)
reluLayer
fullyConnectedLayer(numClasses)
softmaxLayer];
net = dlnetwork(layers);
miniBatchSize = 128;
numEpochs = 30;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
learnRate = 0.01;
numIterations = numEpochs * numIterationsPerEpoch;
monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
iteration = 0;
epoch = 0;
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
idx = randperm(numel(TTrain));
XTrain = XTrain(:,:,:,idx);
TTrain = TTrain(idx);
i = 0;
while i < numIterationsPerEpoch && ~monitor.Stop
i = i + 1;
iteration = iteration + 1;
% Read mini-batch of data and convert the labels to dummy
% variables.
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
T = zeros(numClasses, miniBatchSize,"single");
for c = 1:numClasses
T(c,TTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
X = dlarray(single(X),"SSCB");
% If training on a GPU, then convert data to a gpuArray.
if canUseGPU
X = gpuArray(X);
end
% Evaluate the model loss and gradients using dlfeval and the
% modelLoss function.
[loss,gradients] = dlfeval(@modelLoss,net,X,T);
% Update the network parameters using the SGD algorithm defined in
% the sgdFunction helper function.
updateFcn = @(net,gradients) sgdFunction(net,gradients,learnRate);
net = dlupdate(updateFcn,net,gradients);
% Update the training progress monitor.
recordMetrics(monitor,iteration,Loss=loss);
updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration/numIterations;
end
end
[XTest,TTest] = digitTest4DArrayData;
XTest = dlarray(XTest,"SSCB");
if canUseGPU
XTest = gpuArray(XTest);
end
YTest = predict(net,XTest);
[~,idx] = max(extractdata(YTest),[],1);
YTest = classes(idx);
accuracy = mean(YTest==TTest)
function [loss,gradients] = modelLoss(net,X,T)
Y = forward(net,X);
loss = crossentropy(Y,T);
gradients = dlgradient(loss,net.Learnables);
end
function parameters = sgdFunction(parameters,gradients,learnRate)
parameters = parameters - learnRate .* gradients;
end
And it gives nice result with 0.8192 accuracy score
But when I try Nesterov accelearated gradient
[XTrain,TTrain] = digitTrain4DArrayData;
classes = categories(TTrain);
numClasses = numel(classes);
layers = [
imageInputLayer([28 28 1],'Mean',mean(XTrain,4))
convolution2dLayer(5,20)
reluLayer
convolution2dLayer(3,20,'Padding',1)
reluLayer
convolution2dLayer(3,20,'Padding',1)
reluLayer
fullyConnectedLayer(numClasses)
softmaxLayer];
net = dlnetwork(layers);
miniBatchSize = 128;
numEpochs = 30;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
learnRate = 0.001;
momentum = 0.9; % Momentum parameter for Nesterov algorithm
numIterations = numEpochs * numIterationsPerEpoch;
monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
iteration = 0;
epoch = 0;
velocities = []; % Initialize velocities for Nesterov algorithm
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
idx = randperm(numel(TTrain));
XTrain = XTrain(:,:,:,idx);
TTrain = TTrain(idx);
i = 0;
while i < numIterationsPerEpoch && ~monitor.Stop
i = i + 1;
iteration = iteration + 1;
% Read mini-batch of data and convert the labels to dummy
% variables.
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
T = zeros(numClasses, miniBatchSize,"single");
for c = 1:numClasses
T(c,TTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
X = dlarray(single(X),"SSCB");
% If training on a GPU, then convert data to a gpuArray.
if canUseGPU
X = gpuArray(X);
end
% Evaluate the model loss and gradients using dlfeval and the
% modelLoss function.
[loss,gradients] = dlfeval(@modelLoss,net,X,T);
% Update the network parameters using the Nesterov momentum
% algorithm defined in the nesterovFunction helper function.
updateFcn = @(net,gradients) nesterovFunction(net, gradients, learnRate, momentum, velocities);
net = dlupdate(updateFcn, net, gradients);
% Update the training progress monitor.
recordMetrics(monitor,iteration,Loss=loss);
updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration/numIterations;
end
end
[XTest,TTest] = digitTest4DArrayData;
XTest = dlarray(XTest,"SSCB");
if canUseGPU
XTest = gpuArray(XTest);
end
YTest = predict(net,XTest);
[~,idx] = max(extractdata(YTest),[],1);
YTest = classes(idx);
accuracy = mean(YTest==TTest)
function [loss,gradients] = modelLoss(net,X,T)
Y = forward(net,X);
loss = crossentropy(Y,T);
gradients = dlgradient(loss,net.Learnables);
end
function parameters = nesterovFunction(parameters, gradients, learnRate, momentum, velocities)
% Perform Nesterov Accelerated Gradient (NAG) update.
if isempty(velocities)
velocities = gradients;
else
% Update velocity
velocities = momentum * velocities + learnRate * gradients;
end
% Update parameters
parameters = parameters - velocities;
end
I got only 0.1 accuracy score and loss function is probably bad
I'm not sure, that this is Nesterov accelerated gradient or it is only sgdm with momentum. What is more I don't know why the loss function does not converge to zero, and why it constant.
Best regards,
Daniel
3 Comments
Answers (1)
Gayathri
on 20 Sep 2024
I understand that you want to adapt the “dlupdate” example based on “SGD” to “Nesterov accelerated gradient”. Also, I can see that a test accuracy of only 0.1 is obtained with code given in the question.
I can see that the “NAG” function is not implemented correctly. The “velocities” variable is always initialized to “gradients” in the code. It can be modified as shown below.
function [parameters, velocities] = nesterovFunction(parameters, gradients,velocities, learnRate, momentum)
velocities = momentum * velocities - learnRate * gradients;
parameters = parameters + velocities;
end
To call this “nesterovFunction” first we need to initialize “velocities” variable to a format suitable for the “dlupdate” function. It needs the “velocities” variable to be in table format with values given under “Value” header. Hence the following code can be followed to initialise the variable.
i=2;
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
T = zeros(numClasses, miniBatchSize,"single");
for c = 1:numClasses
T(c,TTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
X = dlarray(single(X),"SSCB");
% If training on a GPU, then convert data to a gpuArray.
if canUseGPU
X = gpuArray(X);
end
[loss,gradients] = dlfeval(@modelLoss,net,X,T);
velocities=gradients;
After this initialization the function can be called to update “gradients” and “velocities” variable.
updateFcn = @(net,gradients,velocities) nesterovFunction(net, gradients,velocities, learnRate, momentum);
[net, velocities] = dlupdate(updateFcn, net, gradients, velocities);
With these changes, I am able to obtain a test accuracy of 0.9882 . I have kept the “learning rate” to be 0.01. The training loss curve is shown below.
For more information on “NAG” please refer to the following link.
For more information on “dlupdate” refer to the following link:
Hope you find this information helpful.
0 Comments
See Also
Categories
Find more on Custom Training Loops in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!