# Improve convergence of GD with momentum

6 views (last 30 days)
Askic V on 20 May 2024
Answered: recent works on 20 May 2024
Hello,
I have made a simple implementation of the GD algorithm with momentum and it seems to me that convergence is very slow, it takes about 15k iterations to reach predefined tolerance.
I feel like there should be an improvement to this code, but I don't see where.
Please, suggest me a better implementation:
clear all
clc
% Initial guess
x0 = [1.3, 0.7, 0.8, 1.9, 1.2]';
% Parameters
alpha = 0.001;
beta = 0.4;
max_iter = 10000;
tol = 1e-8;
x = x0; % find the values of x that minimize␣
i = 0;
step = zeros(size(x));
mse = rosenbrock(x);
fprintf('Initial MSE = %14.10f x = %s\n', mse, mat2str(x')); % print initial values
while mse > tol
x = x + step;
mse = rosenbrock(x); % update the mean squared error
i = i + 1;
end
fprintf('iterations = %6d\n', i);
fprintf('Final MSE = %14.10f x = %s\n', mse, mat2str(x'));
% Define the Rosenbrock function
function [mse] = rosenbrock(x)
mse = sum(100.0 * (x(2:end) - x(1:end-1).^2.0).^2.0 + (1 - x(1:end-1)).^2.0);
end
% Define the gradient of the Rosenbrock function
n = length(x);
grad(1) = -400 * x(1) * (x(2) - x(1)^2) - 2 * (1 - x(1));
grad(2:n-1) = 200 * (x(2:n-1) - x(1:n-2).^2) - 400 * x(2:n-1) .* (x(3:n) - x(2:n-1).^2) - 2 * (1 - x(2:n-1));
grad(n) = 200 * (x(n) - x(n-1)^2);
end

recent works on 20 May 2024
To improve the convergence speed of your Gradient Descent (GD) algorithm with momentum, there are several strategies you can consider. Here are some suggestions and modifications to your code:
2. Learning Rate Tuning:Sometimes, adjusting the learning rate (𝛼α) and momentum (𝛽β) parameters can lead to faster convergence. You might need to experiment with different values.
3. Early Stopping:Implementing early stopping criteria can prevent unnecessary iterations once the algorithm has converged sufficiently.
4. Function Tolerance Check:Check the function value tolerance as well as the gradient norm to ensure you have a robust stopping criterion.
clear all
clc
% Initial guess
x0 = [1.3, 0.7, 0.8, 1.9, 1.2]';
alpha = 0.001; % initial learning rate
beta1 = 0.9; % decay rate for the first moment estimate
beta2 = 0.999; % decay rate for the second moment estimate
epsilon = 1e-8; % small number to prevent division by zero
max_iter = 10000;
tol = 1e-8;
m = zeros(size(x0)); % first moment vector
v = zeros(size(x0)); % second moment vector
t = 0; % timestep
x = x0; % find the values of x that minimize
i = 0;
mse = rosenbrock(x);
fprintf('Initial MSE = %14.10f x = %s\n', mse, mat2str(x')); % print initial values
while mse > tol && i < max_iter
t = t + 1;
m = beta1 * m + (1 - beta1) * grad; % update biased first moment estimate
v = beta2 * v + (1 - beta2) * (grad .^ 2); % update biased second moment estimate
m_hat = m / (1 - beta1^t); % compute bias-corrected first moment estimate
v_hat = v / (1 - beta2^t); % compute bias-corrected second moment estimate
step = -alpha * m_hat ./ (sqrt(v_hat) + epsilon);
x = x + step;
mse = rosenbrock(x); % update the mean squared error
i = i + 1;
end
fprintf('iterations = %6d\n', i);
fprintf('Final MSE = %14.10f x = %s\n', mse, mat2str(x'));
% Define the Rosenbrock function
function [mse] = rosenbrock(x)
mse = sum(100.0 * (x(2:end) - x(1:end-1).^2.0).^2.0 + (1 - x(1:end-1)).^2.0);
end
% Define the gradient of the Rosenbrock function
n = length(x);
grad(1) = -400 * x(1) * (x(2) - x(1)^2) - 2 * (1 - x(1));
grad(2:n-1) = 200 * (x(2:n-1) - x(1:n-2).^2) - 400 * x(2:n-1) .* (x(3:n) - x(2:n-1).^2) - 2 * (1 - x(2:n-1));
grad(n) = 200 * (x(n) - x(n-1)^2);
end
Learning Rates and Moments: The parameters 𝛽1 and 𝛽2 are set to common defaults, but they can be adjusted based on your specific problem.
Stopping Criteria: The loop stops either when the tolerance level is reached or after the maximum number of iterations.By using Adam, you can achieve faster convergence due to its adaptive nature, which helps navigate the optimization landscape more effectively.