Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] VectorEnv action space seed problem #2680

Closed
1 task done
vwxyzjn opened this issue Mar 9, 2022 · 8 comments
Closed
1 task done

[Bug Report] VectorEnv action space seed problem #2680

vwxyzjn opened this issue Mar 9, 2022 · 8 comments

Comments

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Mar 9, 2022

Describe the bug
envs.action_space.sample() is not reproducible after seeds.

Code example

import gym
import numpy as np
def make_env(env_id, seed):
    def thunk():
        env = gym.make(env_id)
        env.seed(seed)
        env.action_space.seed(seed)
        env.observation_space.seed(seed)
        return env

    return thunk
seed = 1
envs = gym.vector.SyncVectorEnv([make_env("CartPole-v1", seed)])
sampled_actions = []
for _ in range(20):
    sampled_actions += [envs.action_space.sample()]
print(np.array(sampled_actions).flatten())
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[1 0 0 1 0 1 1 0 1 1 1 1 1 0 0 1 0 1 0 0]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[0 1 1 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1 1]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[0 1 1 0 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[0 0 0 1 1 1 1 1 0 1 1 0 1 1 0 1 0 0 0 0]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[0 1 0 1 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 0]

I think the problem is #2280 batches a new unseeded action space... So if I seed this action space again it will work:

import gym
import numpy as np
def make_env(env_id, seed):
    def thunk():
        env = gym.make(env_id)
        env.seed(seed)
        env.action_space.seed(seed)
        env.observation_space.seed(seed)
        return env

    return thunk
seed = 1
envs = gym.vector.SyncVectorEnv([make_env("CartPole-v1", seed)])
envs.action_space.seed(seed)
envs.observation_space.seed(seed)
sampled_actions = []
for _ in range(20):
    sampled_actions += [envs.action_space.sample()]
print(np.array(sampled_actions).flatten())
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ 

But this may seem unintuitive to the users. I am not sure how to resolve this, @tristandeleu What do you think?

Checklist

  • I have checked that there is no similar issue in the repo (required)
@jkterry1
Copy link
Collaborator

An additional concern is that this managed to get through CI tests

@RedTachyon
Copy link
Contributor

Off the top of my head I don't really know how to fix this, but I think it's pretty low severity, right? Since it's a relatively niche feature, and there's a pretty simple way of doing this "correctly". It might be enough to just mention it in the documentation that batched spaces don't preserve their RNG.

Also, what exactly is the intended behavior here? After seeding a space and using it in a vectorized env, do we expect all environments's spaces to have the same RNG?

@vwxyzjn
Copy link
Contributor Author

vwxyzjn commented Mar 13, 2022

Yeah, this is a weird one. Something I can think of is the seed the batched space with the seed from the sub environment space if the seed is set. If there are multiple different seeds from the sub environment spaces probably just take the first one’s seed…?

@tristandeleu
Copy link
Contributor

The second option (seeding the observation_space and action_space of VectorEnv, instead of the individual environments) should be the preferred one, since a VectorEnv really is just like any environment. This could be made clearer in the documentation.

As @RedTachyon said, there is no easy way to fix this unfortunately. The only way I could see seeding being enforced is if we had a way to know the seed of an instantiated space, which is impossible with the current API; the only way to have access to seeding information is when we call space.seed().

As a side note (this was extensively discussed in #2422), if reproducibility is critical, I would suggest defining (and seeding) a random policy instead, using PyTorch/Jax/Numpy/TF. You'd be losing the ease to sample from any kind of space (barely, it should be easy to write code that works for any space), but you would gain a lot of control over what you want to be fully reproducible.

@RedTachyon
Copy link
Contributor

Should we include a disclaimer like "This method is meant to serve as a quick and dirty way of getting a valid action, and should not be used in situations where reproducibility is desirable" around space.sample()?

@jkterry1
Copy link
Collaborator

I like that disclaimer

@pseudo-rnd-thoughts
Copy link
Contributor

pseudo-rnd-thoughts commented Apr 2, 2022

Looking at creating a solution for this, the problem is in gym.vector.utils.spaces as when a batch space is created then the new space, a seed is not provided based on the original space.
However, the problem is this seed value needs to be an integer value that can't be reversed

Therefore, I have modified the space.init seed parameter to allow a raw seeding.RandomNumberGenerator object
Using this modification, the first env provided to a sync vector env is used for the action space seeding

I will create a PR with this solution to allow review and comments

Edit: PR is #2727

An additional concern is that this managed to get through CI tests

There doesn't appear to be any tests for the batch space seeds, Im adding that as part of the PR

pseudo-rnd-thoughts added a commit to pseudo-rnd-thoughts/gym that referenced this issue Apr 19, 2022
jkterry1 pushed a commit that referenced this issue Apr 24, 2022
… 2680 (#2727)

* Add a case for the Box shape where the low and high values are both scalars

* Add seeding.RandomNumberGenerator parameter to Dict seed. Modify __repr__ for the dictionary space string looks similar to an actual dictionary

* Add seeding.RandomNumberGenerator parameter to Multi Binary seed

* Add seeding.RandomNumberGenerator parameter to Multi Binary seed. Modify nvec typing to include np.ndarray

* Space seed typing can be a seeding.RandomNumberGenerator. If a seeding.RNG is provided then it is assigned to _np_random and .seed is not run

* Fixed the tuple seeding type as List[int] is not a valid Space seed type

* Added typing to batch_space. The batch_space seed is equal to the space's seeding

* Fixed the seeding type

* Add test for batch space seeds are identical to the original space's seeding

* Add equivalence function for RandomNumberGenerator comparing the bit_generator.state

* The batch_space functions uses a copy of the seed for the original space

* Set the action space seed for sync_vector_env seed testing

* Add test for the seeding of the sync vector environment

* Update the test_batch_space_seed to check the resulting sampling are equivalent for testing

* Revert representation back to the original version

* Remove additional Box shape initialisation

* Remove additional typing of MultiDiscrete

* Fixed bug of Space batch space where the original space's np_random is not a complete copy of the original space

* Add CustomSpace to the batched space seed test

* Modify the CustomSpace sample to produce a random number not a static value

* Fix CustomSpace to reflect the sample function

* Copy the space.np_random for the batched_space seed to ensure that the original space doesn't sampling doesn't effect the batched_space

* Parameterized the batch_space_seed, added testing for rng_different_at_each_index and test_deterministic

* Black and isort pre-commit changes

* Pre-commit fix

* MacOS, test_read_from_shared_memory throws an error that the inner _process_write function was unpicklable. Making the function a top-level function solves this error

* Fixed typing of seed where a space's seed function differs from Space.seed's typing

* Added check that the sample lengths are equal and explicitly provided the number of batched spaces n=1

* Removed relative imports for absolute imports

* Use deepcopy instead of copy

* Replaces `from numpy.testing._private.utils import assert_array_equal` with `from numpy.testing import assert_array_equal`

* Using the seeding `__eq__` function, replace `np_random.bit_generator.state` with `np_random`

* Added docstrings and comments to the tests to explain their purpose

* Remove __eq__ from RandomNumberGenerator and add to tests/vector/utils

* Add sync vector determinism test for issue #2680

* Fixed bug for https://github.com/openai/gym/pull/2727/files/462101d3846bc35ff3fad9f65979c693472a93a8#r850740825

* Made the new seeds a list of integers
@pseudo-rnd-thoughts
Copy link
Contributor

@vwxyzjn Thanks has been solved #2727, could you close please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants