Skip to content

Beta Policy #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AntoineRichard
Copy link

Hi there,

With this PR I propose to add a "Beta Policy" this policy is naturally bounded which provides nice guarantees when it comes to learning on constrained action spaces.

I had some issues with the automatic model instantiator. It works right now, but it expects that the user does not set the:

output: ACTIONS flag in the model definition. (that's because the model needs to have two heads, one to output alpha, the other to output beta. When with the GaussianMixin we only need the mean (as the std is a single parameter).

In any case, I'd be more than happy to make any modification you suggest. For now I only support pytorch since I don't have a Jax workflow to test things. On a side note I'm also looking into adding a squashed gaussian (SAC style) into the GaussianMixin to take into account bounded action spaces.

Let me know!

Cheers,

Antoine

Below is an example of configuration for it from IsaacLab:

seed: 42


# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
models:
  separate: True
  policy:  # see gaussian_model parameters
    class: BetaMixin
    network:
      - name: net
        input: STATES
        layers: [64, 64]
        activations: elu
  value:  # see deterministic_model parameters
    class: DeterministicMixin
    clip_actions: False
    network:
      - name: net
        input: STATES
        layers: [64, 64]
        activations: elu
    output: ONE


# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
  class: RandomMemory
  memory_size: -1  # automatically determined (same as agent:rollouts)


# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
  class: PPO
  rollouts: 32
  learning_epochs: 8
  mini_batches: 8
  discount_factor: 0.99
  lambda: 0.95
  learning_rate: 5.0e-4
  learning_rate_scheduler: KLAdaptiveLR
  learning_rate_scheduler_kwargs:
    kl_threshold: 0.008
  state_preprocessor: RunningStandardScaler
  state_preprocessor_kwargs: null
  value_preprocessor: RunningStandardScaler
  value_preprocessor_kwargs: null
  random_timesteps: 0
  learning_starts: 0
  grad_norm_clip: 1.0
  ratio_clip: 0.2
  value_clip: 0.2
  clip_predicted_values: True
  entropy_loss_scale: 0.0
  value_loss_scale: 2.0
  kl_threshold: 0.0
  rewards_shaper_scale: 0.1
  time_limit_bootstrap: False
  # logging and checkpoint
  experiment:
    directory: "jetbot_direct"
    experiment_name: ""
    write_interval: auto
    checkpoint_interval: auto
    wandb: True             # whether to use Weights & Biases
    wandb_kwargs:          # wandb kwargs (see https://docs.wandb.ai/ref/python/init)
      project: jetbot_direct
      entity: spacer-rl
      group: 'zeroG'
      notes: ''


# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
  class: SequentialTrainer
  timesteps: 16000
  environment_info: log

@AntoineRichard AntoineRichard changed the title initial commit beta policy Beta Policy Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant