Welcome to Deep Reinforment Learning world!
This is an explaintable and modified version of udacity DRL homework~
- DQN: modified from Udacity repo, tested on Breakout-v0 env.
- PPO: wrote by myself, tested on Pendulum-v0 and BipedalWalker-v2 envs.
- policy gradient: REINFORCE with baseline and entropy loss, tested on CartPole-v0
- monte-carlo: modified version, tested on BlackJack env.
- Temporal Difference: modified version, tested on CliffWalking-v0