In this project, I used Deep Reinforcement Learning to combine artificial neural networks with reinforcement learning.
I have created 2 hidden (Dense) layers with 128 neurons in each to create my DQN. My input layer is directly feeded by states of the environment. There are 9 possible actions that my agent can take, so my output layer consists of 9 neurons in total.
Same applied for my target network.
My agent takes its actions based on epsilon greedy strategy, where I initialized epsilon as 1.0 and decayed it gradually.
Even though initial results are promising, the model has only trained around 70 hours now, whereas I need weeks to complete full training.