You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
In ppo_continous_tensorflow.py, when you calculate entropy with: dist_entropy = tf.math.reduce_mean(self.distributions.entropy(action_mean, self.std))
since entropy only depends on std and std is a static parameter, dist_entropy has always the same value all the time.
Thus, entropy loss has no effect on learning.
To Reproduce
Launch any env and stop your debugger on dist_entropy. Check that it has the same value for every batch at any given point during learning.
Expected behavior
Std shall not be static but somehow represent real prediction confidence of the network.
The text was updated successfully, but these errors were encountered:
Describe the bug
In ppo_continous_tensorflow.py, when you calculate entropy with:
dist_entropy = tf.math.reduce_mean(self.distributions.entropy(action_mean, self.std))
since entropy only depends on std and std is a static parameter, dist_entropy has always the same value all the time.
Thus, entropy loss has no effect on learning.
To Reproduce
Launch any env and stop your debugger on dist_entropy. Check that it has the same value for every batch at any given point during learning.
Expected behavior
Std shall not be static but somehow represent real prediction confidence of the network.
The text was updated successfully, but these errors were encountered: