Skip to content

Latest commit

 

History

History
62 lines (45 loc) · 1.87 KB

README.md

File metadata and controls

62 lines (45 loc) · 1.87 KB

Proximal Policy Optimization RL

This repo is my implementation of PPO, mainly from the HuggingFace RL Course.

The main script will automatically utilize PyTorch to use multiple GPUs if available.

Environments and Metrics

The environments are available from PyBullet-Gym. Unfortunately I could not get the Mujoco envs to work no matter what Gym and PyBullet / Mujoco combinations I used. In the the mean time PyBullet will need to suffice as this is a known issue. At least until I can test Genesis World Model which can create massively parallel physics sims, capable of setting up vectorized gym-like environments (see example).

HuggingFace Metrics (showing the results and a video of performance) at: huggingface.co
WandB (Weights & Biases) Metrics (showing training info such as the loss convergenge, GPU use, etc) at: WandB.ai

Installation (UV)

uv venv
source .venv/bin/activate
uv sync

Login to Online Services:

huggingface

huggingface-cli login After creating an identity token at huggingface.co

wandb

wandb login

after you have a WandB account (go to settings for the API key).

Running (Training and Evaluating)

Pybullet Cheetah Environment (default)

uv run main.py

Bullet Humanoid Env with WandB tracking

uv run main.py --track --env-id "HumanoidBulletEnv-v0"

*** DEPRECATED (See Discrete Branch) ***

Only for Discrete (see MAIN branch prior to PR)

uv run main.py --env-id "CartPole-v1"

to specify the CartPole-v1 environment instead.

Doom (todo)

Doom Environment

uv run main.py --env-id doom