Function dlupdate to train network with Nesterov accelerated gradient

12 views (last 30 days)
Dear all,
I wanted to use example from matlab website to train network with Nesterov accelerated gradient. I found functions to train network with sgd, sgdm, but I couldn't find function to train network with nesterov accelerated gradient. I found on mathworks website that to create my own function to train network I have to use dlupdate function. I started with example from mathworks website (Update parameters using custom function - MATLAB dlupdate (mathworks.com)) and it works, but I don't know how to do it with Nesterov accelerated gradient. Here is my code with sgd:
[XTrain,TTrain] = digitTrain4DArrayData;
classes = categories(TTrain);
numClasses = numel(classes);
layers = [
imageInputLayer([28 28 1],'Mean',mean(XTrain,4))
convolution2dLayer(5,20)
reluLayer
convolution2dLayer(3,20,'Padding',1)
reluLayer
convolution2dLayer(3,20,'Padding',1)
reluLayer
fullyConnectedLayer(numClasses)
softmaxLayer];
net = dlnetwork(layers);
miniBatchSize = 128;
numEpochs = 30;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
learnRate = 0.01;
numIterations = numEpochs * numIterationsPerEpoch;
monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
iteration = 0;
epoch = 0;
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
idx = randperm(numel(TTrain));
XTrain = XTrain(:,:,:,idx);
TTrain = TTrain(idx);
i = 0;
while i < numIterationsPerEpoch && ~monitor.Stop
i = i + 1;
iteration = iteration + 1;
% Read mini-batch of data and convert the labels to dummy
% variables.
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
T = zeros(numClasses, miniBatchSize,"single");
for c = 1:numClasses
T(c,TTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
X = dlarray(single(X),"SSCB");
% If training on a GPU, then convert data to a gpuArray.
if canUseGPU
X = gpuArray(X);
end
% Evaluate the model loss and gradients using dlfeval and the
% modelLoss function.
[loss,gradients] = dlfeval(@modelLoss,net,X,T);
% Update the network parameters using the SGD algorithm defined in
% the sgdFunction helper function.
updateFcn = @(net,gradients) sgdFunction(net,gradients,learnRate);
net = dlupdate(updateFcn,net,gradients);
% Update the training progress monitor.
recordMetrics(monitor,iteration,Loss=loss);
updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration/numIterations;
end
end
[XTest,TTest] = digitTest4DArrayData;
XTest = dlarray(XTest,"SSCB");
if canUseGPU
XTest = gpuArray(XTest);
end
YTest = predict(net,XTest);
[~,idx] = max(extractdata(YTest),[],1);
YTest = classes(idx);
accuracy = mean(YTest==TTest)
function [loss,gradients] = modelLoss(net,X,T)
Y = forward(net,X);
loss = crossentropy(Y,T);
gradients = dlgradient(loss,net.Learnables);
end
function parameters = sgdFunction(parameters,gradients,learnRate)
parameters = parameters - learnRate .* gradients;
end
And it gives nice result with 0.8192 accuracy score
But when I try Nesterov accelearated gradient
[XTrain,TTrain] = digitTrain4DArrayData;
classes = categories(TTrain);
numClasses = numel(classes);
layers = [
imageInputLayer([28 28 1],'Mean',mean(XTrain,4))
convolution2dLayer(5,20)
reluLayer
convolution2dLayer(3,20,'Padding',1)
reluLayer
convolution2dLayer(3,20,'Padding',1)
reluLayer
fullyConnectedLayer(numClasses)
softmaxLayer];
net = dlnetwork(layers);
miniBatchSize = 128;
numEpochs = 30;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
learnRate = 0.001;
momentum = 0.9; % Momentum parameter for Nesterov algorithm
numIterations = numEpochs * numIterationsPerEpoch;
monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
iteration = 0;
epoch = 0;
velocities = []; % Initialize velocities for Nesterov algorithm
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
idx = randperm(numel(TTrain));
XTrain = XTrain(:,:,:,idx);
TTrain = TTrain(idx);
i = 0;
while i < numIterationsPerEpoch && ~monitor.Stop
i = i + 1;
iteration = iteration + 1;
% Read mini-batch of data and convert the labels to dummy
% variables.
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
T = zeros(numClasses, miniBatchSize,"single");
for c = 1:numClasses
T(c,TTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
X = dlarray(single(X),"SSCB");
% If training on a GPU, then convert data to a gpuArray.
if canUseGPU
X = gpuArray(X);
end
% Evaluate the model loss and gradients using dlfeval and the
% modelLoss function.
[loss,gradients] = dlfeval(@modelLoss,net,X,T);
% Update the network parameters using the Nesterov momentum
% algorithm defined in the nesterovFunction helper function.
updateFcn = @(net,gradients) nesterovFunction(net, gradients, learnRate, momentum, velocities);
net = dlupdate(updateFcn, net, gradients);
% Update the training progress monitor.
recordMetrics(monitor,iteration,Loss=loss);
updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration/numIterations;
end
end
[XTest,TTest] = digitTest4DArrayData;
XTest = dlarray(XTest,"SSCB");
if canUseGPU
XTest = gpuArray(XTest);
end
YTest = predict(net,XTest);
[~,idx] = max(extractdata(YTest),[],1);
YTest = classes(idx);
accuracy = mean(YTest==TTest)
function [loss,gradients] = modelLoss(net,X,T)
Y = forward(net,X);
loss = crossentropy(Y,T);
gradients = dlgradient(loss,net.Learnables);
end
function parameters = nesterovFunction(parameters, gradients, learnRate, momentum, velocities)
% Perform Nesterov Accelerated Gradient (NAG) update.
if isempty(velocities)
velocities = gradients;
else
% Update velocity
velocities = momentum * velocities + learnRate * gradients;
end
% Update parameters
parameters = parameters - velocities;
end
I got only 0.1 accuracy score and loss function is probably bad
I'm not sure, that this is Nesterov accelerated gradient or it is only sgdm with momentum. What is more I don't know why the loss function does not converge to zero, and why it constant.
Best regards,
Daniel
  3 Comments
Daniel Krystian
Daniel Krystian on 13 May 2024
Edited: Daniel Krystian on 13 May 2024
Hi,
I'm not sure where I should place this line od code : "lookaheadParams = parameters - momentum * velocities;" (Line before nesterovFunction). I tried in different places, but I couldn't run code without any errors.
[XTrain,TTrain] = digitTrain4DArrayData;
classes = categories(TTrain);
numClasses = numel(classes);
layers = [
imageInputLayer([28 28 1],'Mean',mean(XTrain,4))
convolution2dLayer(5,20)
reluLayer
convolution2dLayer(3,20,'Padding',1)
reluLayer
convolution2dLayer(3,20,'Padding',1)
reluLayer
fullyConnectedLayer(numClasses)
softmaxLayer];
net = dlnetwork(layers);
miniBatchSize = 128;
numEpochs = 30;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
learnRate = 0.01;
momentum = 0.9; % Momentum parameter for Nesterov algorithm
numIterations = numEpochs * numIterationsPerEpoch;
monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
iteration = 0;
epoch = 0;
velocities = []; % Initialize velocities for Nesterov algorithm
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
idx = randperm(numel(TTrain));
XTrain = XTrain(:,:,:,idx);
TTrain = TTrain(idx);
i = 0;
while i < numIterationsPerEpoch && ~monitor.Stop
i = i + 1;
iteration = iteration + 1;
% Read mini-batch of data and convert the labels to dummy
% variables.
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
T = zeros(numClasses, miniBatchSize,"single");
for c = 1:numClasses
T(c,TTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
X = dlarray(single(X),"SSCB");
% If training on a GPU, then convert data to a gpuArray.
if canUseGPU
X = gpuArray(X);
end
% Evaluate the model loss and gradients using dlfeval and the
% modelLoss function.
[loss,gradients] = dlfeval(@modelLoss,net,X,T);
% Update the network parameters using the Nesterov momentum
% algorithm defined in the nesterovFunction helper function.
updateFcn = @(net,gradients,velocities) nesterovFunction(net, gradients, learnRate, momentum, velocities, X, T);
[net, velocities] = dlupdate(updateFcn, net, gradients, velocities);
% Update the training progress monitor.
recordMetrics(monitor,iteration,Loss=loss);
updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration/numIterations;
end
end
[XTest,TTest] = digitTest4DArrayData;
XTest = dlarray(XTest,"SSCB");
if canUseGPU
XTest = gpuArray(XTest);
end
YTest = predict(net,XTest);
[~,idx] = max(extractdata(YTest),[],1);
YTest = classes(idx);
accuracy = mean(YTest==TTest)
function [loss,gradients] = modelLoss(net,X,T)
Y = forward(net,X);
loss = crossentropy(Y,T);
gradients = dlgradient(loss,net.Learnables);
end
lookaheadParams = parameters - momentum * velocities;
function [parameters, velocities] = nesterovFunction(parameters, gradients, learnRate, momentum, velocities, X, T)
% Perform Nesterov Accelerated Gradient (NAG) update.
if isempty(velocities)
velocities = zeros(size(gradients));
end
% Lookahead step
lookaheadParams = parameters - momentum * velocities;
% Compute gradients at the lookahead point
[~, lookaheadGradients] = dlfeval(@modelLoss, lookaheadParams, X, T);
% Update velocity
velocities = momentum * velocities + learnRate * lookaheadGradients;
% Update parameters
parameters = parameters - velocities;
end
Daniel Krystian
Daniel Krystian on 13 May 2024
Error: File: Nesterov.m Line: 80 Column: 2
Function definitions in a script must appear at the end of the file.
Move all statements after the "modelLoss" function definition to before the first local function definition.

Sign in to comment.

Answers (1)

Gayathri
Gayathri on 20 Sep 2024
I understand that you want to adapt the “dlupdate” example based on “SGD” to “Nesterov accelerated gradient”. Also, I can see that a test accuracy of only 0.1 is obtained with code given in the question.
I can see that the “NAG” function is not implemented correctly. The “velocities” variable is always initialized to “gradients” in the code. It can be modified as shown below.
function [parameters, velocities] = nesterovFunction(parameters, gradients,velocities, learnRate, momentum)
velocities = momentum * velocities - learnRate * gradients;
parameters = parameters + velocities;
end
To call this “nesterovFunction” first we need to initialize “velocities” variable to a format suitable for the “dlupdate” function. It needs the “velocities” variable to be in table format with values given under “Value” header. Hence the following code can be followed to initialise the variable.
i=2;
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
T = zeros(numClasses, miniBatchSize,"single");
for c = 1:numClasses
T(c,TTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
X = dlarray(single(X),"SSCB");
% If training on a GPU, then convert data to a gpuArray.
if canUseGPU
X = gpuArray(X);
end
[loss,gradients] = dlfeval(@modelLoss,net,X,T);
velocities=gradients;
After this initialization the function can be called to update “gradients” and “velocities” variable.
updateFcn = @(net,gradients,velocities) nesterovFunction(net, gradients,velocities, learnRate, momentum);
[net, velocities] = dlupdate(updateFcn, net, gradients, velocities);
With these changes, I am able to obtain a test accuracy of 0.9882 . I have kept the “learning rate” to be 0.01. The training loss curve is shown below.
For more information on “NAG” please refer to the following link.
For more information on “dlupdate” refer to the following link:
Hope you find this information helpful.

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!