How we can use vectors in Deep Learning custom training loop?

Hi every one.
I am trying to train a CNN with my own optimizer through costum training loop.
[loss,gradient]= dlfeval(@modelgradient,dlnet, Xtrian,YTrain)
myFun = @(dlnet,gradient,loss)myOptimizer(dlnet,gradient,loss,...)
dlnet = dlupdate(myFun,dlnet,gradient,loss)
My optimizer needs w (current parameter vector), g (its corresponding gradient vector), f (its corresponding loss value) and… as inputs. This optimizer needs many computations with w, g, f inside to give w = w + p, p is a optimal vector that my optimizer has to compute it by which I can update my w.
I need something by which I can convert the parameters and gradient of dl format to vectors for those computations inside of my optimizer, then to use above syntax I need to convert vector to dl formats required in loop and in my optimizer as well. This back and forth is necessary for my job for using training loop. Can you help to find functions in the toolbox to do these jobs (vector to table (because gradient and dlnet’s parameters are tables with dlarray cells) and vice versa), or any other solutions?

Answers (1)

Weight parameters of the dlnetwork object can be accessed from the "Learnables" property of the object. Both the "gradient" and this "Learnables" property will return a table with variables "Layer", "Parameter" and "Value". You can access the weight parameter and its corresponding gradient by indexing into the table in a loop as follows.
for i=1:size(dlnet.Learnables,1)
w=dlnet.Learnables{i,"Value"}{1,1};
layerName = dlnet.Learnables{i,"Layer"};
paramName = dlnet.Learnables{i,"Parameter"};
g=gradient{gradient.Layer==layerName & gradient.Parameter==paramName,"Value"}{1,1}
end
For more information on indexing into table, refer to the following documentation.
"w" and "g" will be dlarray objects in the above code snippet. Converting this to double array might be unnecessary as many operations/functions that support double array also support dlarray objects. Refer to the following documentation for a list of functions that support dlarray.

3 Comments

Thanks for your response. Your code shows the recorded values (of course only last layer.) I did not get what is the need of layer’s names here as well. By the way, lets I explain better my meaning...
My optimization algorithm accepts VECTOR of parameter (w) and Vector of gradient (g). My optimizer has to take w, g to compute Vector (p) so that update new parameter in this way: w = w+p. Now for coding of this algorithm with costum training loop, I know my the values of vectors w and g are recorded in dlnet.Learnables.Value and gradients.Value but in table. I wrote following function to extract values from tables:
Val= dlnet.Learnables.Value;
% or
Val = gradients.Value;
function vec = set2vec(Val)
numWb = length(Val);
vec = [];
for i = 1:2:numWb
weights = double(gather(extractdata(Val{i ,1})));
bias = double(gather(extractdata(Val{i+1,1})));
vec = [ vec; weights(:); bias(:) ];
end
end
To get how I could work with "training loop" and "custom optimizer" I followed this example of:
https://nl.mathworks.com/help/deeplearning/ref/dlupdate.htm
As the example shows, sgdFunction computes
param = param + lr.*gradient
and is called through dlupdate. Here, gradient is a table. My optimizer algorithm, as I mentioned, is computing w = w + p. So, for coding it I have to write
dlnet = dlnet + table_p
at the end line of my own cosum function, say MYFunction, and then call dlupdate. My question is this how can I convert vector p to a table format. Or as generally question,
How can I convert the (unroll) vector g to table gradients?
How can I convert the (unroll) vector wto dlnet object? (which is more general than)
The nature of my optimizer is somehow that I need this convert from vector to tables or dl object to compute new loss, new gradients and new dlnet. I would be appreciete for your kind help.
"dlupdate" will call "myOptimizer" with weights and gradient of each layer individually. So, the inputs to "myOptimizer" (and even "sgdFunction" in the example) are of type dlarray and "myOptimizer" will be called several times in one iteration with each layer's parameters. Inside "myOptimizer", you won't need table indexing if it is called using "dlupdate". Refer to the following documentation for more information on "dlupdate".
The recommended approach for working with dlarray, as mentioned in the answer, is to perform direct operation on it and not convert to double array. However, if you convert dlarray to double array using "extractdata", you can convert the results of the computation back to dlarray by passing it to "dlarray" function.
For example, to convert a double array X
dlX=dlarray(X);
Refer to the following documentation to know more about "dlarray".
Having only dlarray is not all my require. I need (gradient and parameter) matrices per each layer. I have a vector such as g or w or p that I want to convert them to matrices for each layer.
For instance in conv1, I have 20 filters with 5*5 sizes and so on up to Fc layers. I have an unrolled vector of these matrices for all layers. Now I want to comeback from this vector to these matrices per each layer. For sure to use dlnet I have to have these matrices per each layer. Why I convert gradient (table) to vector and I want to come bach one more time is in bellow:
This code is something like that exist in "dlupdate" page.
for epoch = 1:numEpochs
% Shuffle data...
for i = 1:maxit
iteration = iteration + 1;
% Read mini-batch
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
Y = zeros(numClasses, miniBatchSize, 'single');
for c = 1:numClasses
Y(c,YTrain(idx)==classes(c)) = 1;
end
dlX = dlarray(single(X),'SSCB');
if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"
dlX = gpuArray(dlX);
end
[gradients,loss] = dlfeval(@modelGradients,dlnet,dlX,Y);
updateFcn = @(dlnet,gradients) MyFunction(dlnet,gradients,learnRate);
dlnet = dlupdate(updateFcn,dlnet,gradients);
...
end
end
%***********************************************************
function [gradients,loss] = modelGradients(dlnet,dlX,Y)
dlYPred = forward(dlnet,dlX);
loss = crossentropy(dlYPred,Y);
gradients = dlgradient(loss,dlnet.Learnables);
end
function dlnet = MyFunction(???????)
?????????
end
I do not have any idea to MyFunction because ss you see in my optimizer, in step (1), I need gradient and its norm. For varifying condition in step (3), I should know about the loss and gradient vector at "candidate" parameter w_cand through step (2) ((it is NOT "update" parameter)). This is the framework of my optimizer in vector setting.
function w_update = Optimizer(w, f, g, alpha,c)
% w: vector of parameters
% g: vector of gradients at w
% f: loss at w
% w_cand: vector of candidate parameters
% g_cand: vector of gradients at w_cand
% f_cand: loss at w_cand
%(1)
norm_g = norm(g);
p = Func_to_compute_p(g,norm_g);
w_cand = w + alpha*p;
%(2)
%(((((((((((((((((((((((((((((((((((((((((((((((((((
% compute loss f_cand and gradient g_cand evaluated at w_cand
%)))))))))))))))))))))))))))))))))))))))))))))))))))
%(3)
while alpha > tol
if f_cand > f + c*alpha*(g_cand'*p)
p = alpha*p;
w_update = w + p;
break
else
alpha = alpha/2;
end
end
w_update = w + p;
I know "dlnet" initialized with Xavier method. Lets say "w" is an unrolled vector of parameter matrices at each layer. loss and gradients in
[gradients,loss] = dlfeval(@modelGradients,dlnet,dlX,Y);
are corresponding with "dlnet". So, to work with my optimizer I can convert loss and gradients to have f and g corresponding with w through function "set2vector". In this way I cannot take warning about operation support. But for step(2), I need "dlnet_cand" and thus "gradients_cand" and "loss_cand". I think I have to write this code at step(2):
[gradients_cand,loss_cand] = dlfeval(@modelGradients,dlnet_cand,dlX,Y);
By this thought, now, I have a vector p by which I can not update dlnet_cand = dlnet + p because p is a vector. To follow as sgdFunction(( where dlnet = dlnet - gradients.*lr)) I need to convert vector p to matrices per each layer (having a table).
For step 3 one more time I have to convert loss_cand and gradients_cand to f_cand and vector g_cand.

Sign in to comment.

Asked:

on 7 Nov 2020

Commented:

on 12 Nov 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!