How does the Q-Learning update the qTable by using the reinforcement learning toolbox?
Show older comments
The 'MaxEpisodes' and "maxStepPerEpisode' are set to 1.
I ran the following code. After the first episode, the Q(4,1) is set to -1.
However, I ran the “train section" and the both Q(4,1) and Q(4,2) are updated, as shown in the following figure.
In the second episode, the action 2 is executed in state 4. Therefore, In my opion, only Q(4,2) should be updated as -1.
Why is Q(4,2) set to 0.7441?
Why is Q(4,1) is updated too and set to -1.67?
clear
GW = createGridWorld(4,4);
GW.CurrentState = '[2,1]';
GW.TerminalStates = '[4,4]';
nS = numel(GW.States);
nA = numel(GW.Actions);
GW.R = -1*ones(nS,nS,nA);
GW.R(:,state2idx(GW,GW.TerminalStates),:) = 10;
env = rlMDPEnv(GW);
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
critic = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
critic.Options.LearnRate =1;
agentOpt = rlQAgentOptions;
agentOpt.EpsilonGreedyExploration.Epsilon = 0.05;
agentOpt.DiscountFactor = 1;
agent = rlQAgent(critic, agentOpt);
plot(env)
env.Model.Viewer.ShowTrace = true;
env.Model.Viewer.clearTrace;
%% train section
rng(0)
opt = rlTrainingOptions(...
'MaxEpisodes',1,...
'MaxStepsPerEpisode',1,...
'StopTrainingCriteria',"AverageReward",...
'Plots', "none",...
'StopTrainingValue',480);
trainStats = train(agent,env,opt);
%%
aa = getLearnableParameters(getCritic(agent));

Answers (1)
Emmanouil Tzorakoleftherakis
on 3 May 2021
Can you try
critic.Options.L2RegularizationFactor=0;
This parameter is nonzero by default and likely the reason for the discrepancy you are observing
2 Comments
Tracy Shang
on 4 May 2021
Edited: Tracy Shang
on 4 May 2021
Adi Firdaus
on 10 Dec 2021
need answer too
Categories
Find more on Reinforcement Learning in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


