Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BC]
ExplorationWithPolicy
#378[BC]
ExplorationWithPolicy
#378Changes from 31 commits
a06d452
15876f7
9bc8c05
86e4d12
27dee69
47f5efc
bf12cee
f5b6b6f
35d9e8c
5d6783d
7997f14
59d3677
6d8c0c7
3b0cefd
78460ce
5b5d67b
6be7186
6342dda
3082ae7
2e26243
56fa72a
463d813
5568aaf
0c20688
ac307fc
07a77ce
49af23a
b7e6fb2
958f3b6
dad551c
2504a95
6268902
9a19f99
3fd273c
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
facilitated (spelling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing something - how is policy_deca changed from line 79?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not change. get_advice is supposed to return the original advice from the policy and what step we are supposed to explore on. That is, where to explore is handled here and how to explore is going to be handled in a different class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so but why would the user, who passed the policy in, care to pick it up again from the return here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExplorationWithPolicy has the following behavior: play actions from the replay_prefix until it is exhausted. Play according to policy after it is exhausted and simultaneously check the conditions for exploration which use explore_policy. If the condition is satisfied at a given step of the trajectory compilation, set explore_step to that step to be used by the exploration logic to take a new decision there. The whole exploration process works in the following way: 1. compile the module with policy and save the trajectory, 2. from 1. we know what step to explore on so construct a new replay prefix which includes all actions before exploration step as played by policy and includes the action selected by the exploration procedure at the exploration step selected in 1. 3. play the replay_prefix from 2. until after the exploration_step, then follow policy until the end. We can make this clear in the current docstring or in the docstring where the exploration logic is contained or maybe more appropriately in the docstring for generate_bc_trajectories.py in the beginning.