I'm trying to train reinforcement learning agen in grid world but the idea is that in every step, the robot checks its surroundings to find the obstacles. So at the first episode, the grid world would have only the start point and the goal point but no obstacles and the StateTransition matrix would be updated in each episode. For that I have created a class that constructs a GridWorld environment and in the step function I have called ObstaclesStates and updateStateTranstionForObstacles which are functions of GridWorld class. The StateTransition matrix seems to be updated but it seems that the RL agent doesn't consider it to select the action so I don't know if maybe I am updating a "copy" of the GridWorld and my idea is not possible.
classdef Copy_of_rlMDPEnv_CoppeliaSim < rl.env.MATLABEnvironment
% rlMDPEnv: Create a MATLAB based reinforcement learning environment for a
% MDP(Markov Decision Process) by supplying the MDP model.
%
% ENV = rlMDPEnv(MDP) creates a reinforcement learning environment with
% the specified MDP model. See createGridWorld and createMDP on how to
% create MDP models.
properties
% heredadas de rlMDPEnv
Model rl.env.GridWorld
ResetFcn
% necesarias para CoppeliaSim
...
end
%% Public Methods
methods
function obj = Copy_of_rlMDPEnv_CoppeliaSim(MDP)
%Copy_of_rlMDPEnv_CoppeliaSim(MDP) Construct an GridWorld environment for reinforcement learning
% MDP should be a rl.env.GridWorld
narginchk(1,1)
if ~(isa(MDP, 'rl.env.GridWorld') && isscalar(MDP))
error(message('dInput'))
end
ActionInfo = rlFiniteSetSpec(1:numel(MDP.Actions));
ActionInfo.Name = 'MDP Actions';
ObservationInfo = rlFiniteSetSpec(1:numel(MDP.States));
ObservationInfo.Name = 'MDP Observations';
% get Observation and Action information from MDP
obj = obj@rl.env.MATLABEnvironment(ObservationInfo,ActionInfo);
obj.Model = MDP;
end
end
%% Implement Abstact Methods
methods
% define step function on gridworld
function [Observation,Reward,isTerminal,Info] = step(this,Action)
% heredado de rlMDPEnv
Info = [];
Action = idx2action(this.Model,Action);
[Observation,Reward,isTerminal] = move(this.Model,Action);
Observation = state2idx(this.Model,Observation);
obstacles = updateObstacles(this); % states
this.Model.ObstacleStates = obstacles;
this.Model.updateStateTranstionForObstacles();
end
...
end
end