# rlPPOAgentOptions

Create options for PPO agent

## Syntax

``opt = rlPPOAgentOptions``
``opt = rlPPOAgentOptions(Name,Value)``

## Description

example

````opt = rlPPOAgentOptions` creates an `rlPPOAgentOptions` object for use as an argument when creating a PPO agent using all default settings. You can modify the object properties using dot notation.`opt = rlPPOAgentOptions(Name,Value)` creates a PPO agent options object using the specified name-value pairs to override default property values.```

## Examples

collapse all

Create a PPO agent options object, specifying the experience horizon.

`opt = rlPPOAgentOptions('ExperienceHorizon',256)`
```opt = rlPPOAgentOptions with properties: ExperienceHorizon: 256 MiniBatchSize: 128 ClipFactor: 0.2000 EntropyLossWeight: 0.0100 NumEpoch: 3 AdvantageEstimateMethod: "gae" GAEFactor: 0.9500 SampleTime: 1 DiscountFactor: 0.9900 ```

You can modify options using dot notation. For example, set the agent sample time to `0.5`.

`opt.SampleTime = 0.5;`

## Input Arguments

collapse all

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'ExperienceHorizon',256`

Number of steps the agent interacts with the environment before learning from its experience, specified as the comma-separated pair consisting of `'ExperienceHorizon'` and a positive integer.

The `ExperienceHorizon` value must be greater than or equal to the `MiniBatchSize` value.

Clip factor for limiting the change in each policy update step, specified as the comma-separated pair consisting of `'ClipFactor'` and a positive scalar less than `1`.

Entropy loss weight, specified as the comma-separated pair consisting of `'EntropyLossWeight'` and a scalar value between `0` and `1`. A higher loss weight value promotes agent exploration by applying a penalty for being too certain about which action to take. Doing so can help the agent move out of local optima.

For episode step t, the entropy loss function, which is added to the loss function for actor updates, is:

`${H}_{t}=E\sum _{k=1}^{M}{\mu }_{k}\left({S}_{t}|{\theta }_{\mu }\right)\mathrm{ln}{\mu }_{k}\left({S}_{t}|{\theta }_{\mu }\right)$`

Here:

• E is the entropy loss weight.

• M is the number of possible actions.

• μk(St|θμ) is the probability of taking action Ak when in state St following the current policy.

Mini-batch size used for each learning epoch, specified as the comma-separated pair consisting of `'MiniBatchSize'` and a positive integer.

The `MiniBatchSize` value must be less than or equal to the `ExperienceHorizon` value.

Number of epochs for which the actor and critic networks learn from the current experience set, specified as the comma-separated pair consisting of `'NumEpoch'` and a positive integer.

Method for estimating advantage values, specified as the comma-separated pair consisting of `'AdvantageEstimateMethod'` and one of the following:

• `"gae"` — Generalized advantage estimator

• `"finite-horizon"` — Finite horizon estimation

For more information on these methods, see the training algorithm information in Proximal Policy Optimization Agents.

Smoothing factor for generalized advantage estimator, specified as the comma-separated pair consisting of `'GAEFactor'` and a scalar value between `0` and `1`, inclusive. This option applies only when the `AdvantageEstimateMethod` option is `"gae"`

Sample time of agent, specified as the comma-separated pair consisting of `'SampleTime'` and a positive scalar.

Discount factor applied to future rewards during training, specified as the comma-separated pair consisting of `'DiscountFactor'` and a positive scalar less than or equal to `1`.

## Output Arguments

collapse all

PPO agent options, returned as an `rlPPOAgentOptions` object. The object properties are described in Name-Value Pair Arguments.