Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
liuzuxin committed Jun 6, 2023
1 parent 8d7118b commit 1840b85
Showing 1 changed file with 13 additions and 39 deletions.
52 changes: 13 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,11 @@

---

OSRL (Offline Safe Reinforcement Learning) is an open-source implementation for offline safe reinforcement learning algorithms.
**OSRL (Offline Safe Reinforcement Learning)** offers a collection of elegant and extensible implementations of state-of-the-art offline safe reinforcement learning (RL) algorithms. Aimed at propelling research in offline safe RL, OSRL serves as a solid foundation to implement, benchmark, and iterate on safe RL solutions.

The OSRL package is a crucial component of our larger benchmarking suite for offline safe learning, which also includes [FSRL](https://github.com/liuzuxin/fsrl) and [DSRL](https://github.com/liuzuxin/dsrl), and is built to facilitate the development of robust and reliable offline safe RL solutions.

To learn more, please visit our [project website](http://www.offline-saferl.org).

## Structure
The structure of this repo is as follows:
Expand All @@ -38,48 +42,18 @@ The structure of this repo is as follows:
## Installation
Pull the repo and install:
```
git clone https://github.com/liuzuxin/offline-safe-rl-baselines.git
cd offline-safe-rl-baselines
git clone https://github.com/liuzuxin/osrl.git
cd osrl
pip install -e .
```

## How to use DSRL
DSRL uses the [OpenAI Gym](https://github.com/openai/gym) API. Tasks are created via the `gym.make` function. Each task is associated with a fixed offline dataset, which can be obtained with the `env.get_dataset()` method. This method returns a dictionary with:
- `observations`: An N × obs_dim array of observations.
- `next_observations`: An N × obs_dim of next observations.
- `actions`: An N × act_dim array of actions.
- `rewards`: An N dimensional array of rewards.
- `costs`: An N dimensional array of costs.
- `terminals`: An N dimensional array of episode termination flags. This is true when episodes end due to termination conditions such as falling over.
- `timeouts`: An N dimensional array of termination flags. This is true when episodes end due to reaching the maximum episode length.

```python
import gym
import dsrl
## How to use OSRL

# set seed
seed = 0
The example usage are in the `examples` folder, where you can find the training and evaluation scripts for all the algorithms.

# Create the environment
env = gym.make('OfflineCarCircle-v0')
For example, to train the `bcql` method, simply run by overriding the default parameters:

# dsrl abides by the OpenAI gym interface
obs, info = env.reset(seed=seed)
obs, reward, terminal, timeout, info = env.step(env.action_space.sample())
cost = info["cost"]

# Each task is associated with a dataset
# dataset contains observations, next_observatiosn, actions, rewards, costs, terminals, timeouts
dataset = env.get_dataset()
print(dataset['observations']) # An N x obs_dim Numpy array of observations
```shell
python examples/train/train_bcql.py --param1 args1
```

Datasets are automatically downloaded to the `~/.dsrl/datasets` directory when `get_dataset()` is called. If you would like to change the location of this directory, you can set the `$DSRL_DATASET_DIR` environment variable to the directory of your choosing, or pass in the dataset filepath directly into the `get_dataset` method.

### Normalizing Scores
- Set target cost by using `env.set_target_cost(target_cost)` function, where `target_cost` is the undiscounted sum of costs of an episode
- You can use the `env.get_normalized_score(return, cost_return)` function to compute a normalized reward and cost for an episode, where `returns` and `cost_returns` are the undiscounted sum of rewards and costs respectively of an episode.
- The individual min and max reference returns are stored in `dsrl/infos.py` for reference.



All the parameters and their default configs for each algorithm are available in the `examples/configs` folder.

0 comments on commit 1840b85

Please sign in to comment.