Skip to content

Latest commit

 

History

History
40 lines (29 loc) · 1.18 KB

README.md

File metadata and controls

40 lines (29 loc) · 1.18 KB

Hindsight policy gradients

This software supplements the paper "Hindsight policy gradients".

The implementation focuses on clarity and flexibility rather than computational efficiency.

Examples

Training an agent in a bit flipping environment (k = 8) using a weighted per-decision hindsight policy gradient estimator (HPG):

python3 hpg/scripts/run.py hpg/examples/flipbit8/flipbit8_bs2_hpg

Training an agent in a bit flipping environment (k = 8) using a goal-conditional policy gradient estimator (GCPG):

python3 hpg/scripts/run.py hpg/examples/flipbit8/flipbit8_bs2_gcpg

Combining the corresponding results into a single plot (see folder "results/flipbit8_bs2"):

mkdir -p results/flipbit8_bs2
cp -r hpg/examples/flipbit8/flipbit8_bs2_hpg hpg/examples/flipbit8/flipbit8_bs2_gcpg results/flipbit8_bs2
python3 hpg/scripts/analysis.py results/flipbit8_bs2

Dependencies