How to to updata episodes number?

2 views (last 30 days)
ryunosuke tazawa
ryunosuke tazawa on 10 Aug 2021
I am making code by reinforcement learning.
The purpose of reinforcement learning describes a simple pendulum that throws a ball at a target point.
However, the figure below shows the learning situation.
I feel that there is a problem with the episode reward.
Is this because the episodes haven't been updated, that is, the observations haven't been updated?
Or is there some other cause?
Below is the code for the update of the observed values.
function [Observation,Reward,IsDone,LoggedSignals] = step(this,Action)
LoggedSignals = [];
Force = getForce(this,Action);           % torque
theta = this.State(1); % state is pendulum's theta (angular)
w = this.State(2); % w is Angular velocity of the pendulum
IsDone = false;
R = 0;
% pendulum dynamics Euler method
q2 = w - (this.g/this.L) *theta*this.Ts- this.b * this.Ts-Force*this.Ts; % angular velocity
q1 = theta + w * this.Ts; % angular
% ball dynamics
ball_x = this.L * sin(q1); % x initial position of ball
ball_y = -this.L * cos(q1); % y initial position of ball
ball_time = sqrt(2*abs(ball_y)/9.8); % reaching time of ball
ball_reach = ball_x +abs(q2).*ball_time; % Horizontal ball flight distance
ball_gosa = ball_reach-this.Target;    % Difference between target point and flight distance   
q3 = ball_gosa;         
% condition of reward
% If the difference between the target point and the flight distance is 1 or less, a reward will be given.
if 0 < q3 && q3 < 1
IsDone = true;
R = this.RewardForStrike;
else
R = this.RewardForNotFalling;
end
Observation = [q1 q2 q3 Force]'; % observation states
this.State = Observation;
this.IsDone = IsDone;
Reward = getReward(this,R);
notifyEnvUpdated(this);
end

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!