Goal-conditioned reinforcement learning (RL) is a powerful approach for learning general-purpose skills by reaching diverse goals. However, it has limitations when it comes to task-conditioned policies, where goals are specified by temporally extended instructions written in the Linear Temporal Logic (LTL) formal language. Existing approaches for finding LTL-satisfying policies rely on sampling a large set of LTL instructions during training to adapt to unseen tasks at inference time. However, these approaches do not guarantee generalization to out-of-distribution LTL objectives, which may have increased complexity. In this work, we developed a novel neurosymbolic approach to address this challenge. We showed that simple goal-conditioned RL agents can be instructed to follow arbitrary LTL specifications without additional training over the LTL task space.
We use a robot from Safety Gym called Point, with one actuator for turning and another for moving forward or backward. An agent can observe the LiDAR information of its surrounding zones. Given this indirect geographical information, it has to visit and/or avoid certain zones to satisfy sampled LTL task specifications. The initial positions of the zones and the robot are random in every episode.
- Using conda and install
pygraphviz
.conda install -c conda-forge pygraphviz
- Install mujoco and mujoco-py.
- Install safty-gym.
pip install -e zones/envs/safety/safety-gym/
- Install required pip packages
numpy torch stable-baslines3 graphviz gym mujoco-py
- Training primitive action policies for
ZoneEnv
, includingUP
,DOWN
,LEFT
, andRIGHT
:python train_primitives.py
- Training goal-conditioned policy for
ZoneEnv
and acquire a trajectory dataset:python train_agent.py
- Training goal-value function for
ZoneEnv
:python train_gcvf.py
- Optionally, train a goal-value function without training a new policy:
python collect_traj.py python train_gcvf.py
- Primitive action policies for navigating the
Point
robot are saved in:[project_base]/zones/models/primitives/*.zip
- Trained goal-conditioned policies are saved in:
where
[project_base]/zones/models/goal-conditioned/best_model_ppo_[N].zip
N
denotes the number of zones present in the environment (8 by default).
- Avoidance experiments e.g.
$\neg y U (j \wedge (\neg wUr))$ (where$y$ for yellow,$j$ for jet-black,$w$ for white, and$r$ for red).python exp.py --task='avoid'
- Loop experiments e.g.
$GF(r \wedge XF y) \wedge G(\neg w)$ python exp.py --task='traverse'
- Goal-chaining experiments e.g.
$F(j \wedge F(w \wedge F(r \wedge Fy)))$ python exp.py --task='chain'
- Stability experiments e.g
$FGy$ See scriptpython exp.py --task='stable'
[project_base]/zones/exp.py
for more details including specifyingeval_repeats
anddevice
, etc.
- The left and right figures show the trajectory for the task
$\neg y U (j \wedge (\neg wUr))$ .
- The left and right figures show the trajectory for the task
$F(j \wedge X(\neg y U r)) \wedge G(\neg w) $ .
- The left and right figures show the trajectories for the task
$F(j \wedge F(w \wedge F(r \wedge Fy)))$ .
- The left and right figures show the trajectories for the task
$GF(r \wedge XF y) \wedge G(\neg w)$ .
- The left and right figures show the trajectories for the task
$FGy$ .
- The left figure shows the trajectory for the task
$F(j \wedge r)$ . - The right figure shows the trajectory for the task
$F(j \wedge \neg r)$ .
- The left figure shows the trajectory for the task
$GFw \wedge GFy$ . - The right figure shows the trajectory for the task
$GFw \wedge GFy \wedge G(\neg j)$
- The figure shows the trajectory for the task
$Fj \wedge (\neg r \wedge \neg y \wedge \neg w)Uj$
- pygraphviz, https://pygraphviz.github.io
- LTL2Action, https://github.com/LTL2Action/LTL2Action
- gltl2ba, https://github.com/PatrickTrentin88/gltl2ba
- safety-gym, https://github.com/openai/safety-gym
Ant-16rooms is an environment with continuous observation and action space. In this walled environment with 16 rooms, each room has the same size 8 × 8 divided by walls and corridors with a thickness of 1. There are two obstacles denoted by black squares in the environment. We place a Mujoco Ant robot in this environment for navigation.
The environment for the Ant16rooms experiment is based on the following version of the packages:
numpy=1.18.5
torch=1.5.1
gym=0.13.1
mujoco_py=2.0.2.5
along with MuJoCo simulator version mujoco200
from MuJoCo release website
-
Docker The environment is designed based on the environment used in GCSL. To download the docker image:
docker pull dibyaghosh/gcsl:0.1
-
Conda For conda environment setting-up, please refer to conda_environment.yml for all specific versions of packages.
-
Python(pip) For Python pip packages, please refer to python_requirement.txt for all specific versions of packages. Also, install packages used in "dependencies" folder with "pip -e ."
Add workspace directory to PYTHONPATH:
export PYTHONPATH="${PYTHONPATH}:{path_of_GCRL-LTL_ant_folder}"
python experiments/TestLTLspecs_Buchi.py ant16rooms {#ofspecification}
specifications