scan range: 10m maps: ss train 1,2 wps: 101 points ~ 3m apart cp radius: 2m cp reward: 0.1 max_v: 12 noise on obs: N(0, 0.03) obs dim: 110
alg: ppo num_workers: 16 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 16 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
slow:
- solves train1: 126.95s
- solves train2: 138.2s fails on obs
increased cp reward
scan range: 10m maps: ss train 1,2 wps: 101 points ~ 3m apart cp radius: 2m cp reward: 0.5 max_v: 12 noise on obs: N(0, 0.03) obs dim: 110
alg: ppo num_workers: 16 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 16 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
- solves train1: 82s
- fails train2
- fails on obs
only trained on train2
scan range: 10m maps: ss train 1,2 wps: 101 points ~ 3m apart cp radius: 2m cp reward: 0.5 max_v: 12 noise on obs: N(0, 0.03) obs dim: 110
alg: ppo num_workers: 16 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 16 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
- solves train2: 96s
- solves obs: 93s
- fails on train1
scan range to 15
scan range: 15m maps: ss train 1,2 wps: 101 points ~ 3m apart cp radius: 2m cp reward: 0.5 max_v: 12 noise on obs: N(0, 0.03) obs dim: 110
alg: ppo num_workers: 16 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 16 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
awful!
reduced speed to 10 no noise gamma increased
scan range: 10m maps: ss train 1,2 wps: 101 points ~ 3m apart cp radius: 2m cp reward: 0.5 max_v: 10 obs dim: 110
gamma: 0.995 alg: ppo num_workers: 16 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 16 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0001
worked on train1,2 fails on obs
reduced speed to 10 add noise to obs gamma increased padding 15cm
scan range: 10m maps: ss train 1,2 wps: 101 points ~ 3m apart cp radius: 3m cp reward: 0.1 max_v: 10 obs dim: 110
gamma: 0.995 alg: ppo num_workers: 16 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 16 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
Doesn't solve any!
padding 5cm added third training track added a reward for fast completion (exp(4-t/20)) -> 80s gets a reward of e 100s gets a reward of 1/e
scan range: 10m maps: ss train 1,2,3 wps: 101 points ~ 3m apart cp radius: 2m cp reward: 0.1 max_v: 10 obs dim: 110
gamma: 0.99 alg: ppo num_workers: 16 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 16 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003 padding: 5cm
gets stuck in train3 solves all others
stopping and returning punishments
scan range: 10m maps: ss train 1,2,3 wps: 101 points ~ 3m apart cp radius: 2m cp reward: 0.1 max_v: 10 obs dim: 110
gamma: 0.99 alg: ppo num_workers: 16 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 16 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003 padding: 5cm
works on 1, 2 fails on 3 solves test sometimes!
less workers only train on 1,2 trimmed obs to only look forward 70-290 deg
scan range: 10m maps: ss train 1,2 wps: 200 points cp radius: 3m cp reward: 0.1 max_v: 10 obs dim: 220
gamma: 0.99 alg: ppo num_workers: 16 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 16 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003 padding: 5cm
solves all begining from start point
increased speed to 12
scan range: 10m maps: ss train 1,2 wps: 200 points cp radius: 2m cp reward: 0.1 max_v: 12 obs dim: 220
gamma: 0.99 alg: ppo num_workers: 16 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 16 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003 padding: 5cm
solves train1,2,3 fails at obs
padding of 30 cm
scan range: 10m maps: ss train 1,2 wps: 200 points cp radius: 2m cp reward: 0.1 max_v: 12 obs dim: 220 padding: 30cm neg reward finish time reward: max(0.2*(50 - t), 0.1)
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 16 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
checkpoint 36 solves train1,2,3 solves obs
increased speed to 15
scan range: 10m maps: ss train 1,2 wps: 200 points cp radius: 2m cp reward: 0.1 max_v: 15 obs dim: 220 padding: 30cm neg reward finish time reward: max(0.2*(50 - t), 0.1)
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 16 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
can't solve all
more complex network
scan range: 10m maps: ss train 1,2,3 wps: 200 points cp radius: 3m cp reward: 0.1 max_v: 12 obs dim: 220 padding: 30cm neg reward finish time reward: max(0.2*(50 - t), 0.1)
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 network: (300, 300, 300) kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 16 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
solves all. a bit slower
padding of 30 cm
scan range: 10m maps: ss train 1,2,3 wps: 200 points cp radius: 2m cp reward: 0.1 max_v: 12 obs dim: 110 padding: 30cm neg reward finish time reward: 0.1
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 1 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
scan range: 10m
maps: train 1
wps: 100 points
cp radius: 3m
cp reward: 0.1
max_v: 12
obs dim: 222
obs range: 70 to 290
padding: 30cm neg reward 0.05
finish time reward:
max(0.2*(50 - t), self.cp_reward)
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 1 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
solved both train and test in ~46s had room for improvement (only 50 iters)
fine tuning v1.0 changed final reward a bit
scan range: 10m
maps: train 1
wps: 100 points
cp radius: 3m
cp reward: 0.1
max_v: 12
obs dim: 222
obs range: 70 to 290
padding: 30cm neg reward 0.05
finish time reward:
max(0.2*(45 - t), self.cp_reward)
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 1 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
last checkpoint super fast but fails at test cp -> need to have an evaluation env and another test env
scan range: 10m
maps: train 1
wps: 100 points
cp radius: 3m
cp reward: 0.1
max_v: 10
obs dim: 182
obs range: 90 to 180
padding: 30cm neg reward 0.05
finish time reward:
max(0.2*(45 - t), self.cp_reward)
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 1 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
successful first try some points crash to obstacles in turns so close to wall low steering angle on car
angle reduced to 0.34 noise var to 0.1
scan range: 10m
maps: train 1
wps: 100 points
cp radius: 3m
cp reward: 0.1
max_v: 10
obs dim: 182
obs range: 90 to 180
padding: 30cm neg reward 0.05
finish time reward:
max(0.2*(45 - t), self.cp_reward)
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 1 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
cool!
removed padding speed to 20 cp reward to half
scan range: 10m
maps: train 1, train 2
wps: 110 points
cp radius: 3m
cp reward: 0.05
max_v: 20
obs dim: 202
obs range: 80 to 280
finish time reward:
max(0.2*(30 - t), self.cp_reward)
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 1 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
3 maps in 2 directions trained with speed based checkpoints
scan range: 10m maps: train 1, train 2,3 wps: 110 points cp radius: 3m cp reward: velocity based max_v: 20 obs dim: 202 obs range: 80 to 280
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 1 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
solved 1 and 2 tests in 25, 27s empty map in 25s
two times fine tuning first time reward was cp time diffs second time more punishment to padding with wider range third time less punishment to padding
scan range: 10m maps: train 1, train 2,3 wps: 110 points cp radius: 3m cp reward: velocity based max_v: 15 obs dim: 180 obs range: 90 to 270
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 1 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
fails at some tests but solves trains and is fast
An initial safe model for fine tuning its speed later
scan range: 10m maps: train 3 to 9 wps: 110 points cp radius: 3m cp reward: cp based max_v: 15 obs dim: 180 obs range: 90 to 270
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 1 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
pretty safe but slow
changed lidar view to 110 to 250
scan range: 10m maps: train 3 to 9 wps: 110 points cp radius: 3m cp reward: cp based max_v: 15 obs dim: 142 obs range: 110 to 250
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 1 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0003
fast and steady training lr to 0.0001
speed to 20
scan range: 10m maps: train 3 to 9 wps: 110 points cp radius: 3m cp reward: cp based max_v: 20 obs dim: 142 obs range: 110 to 250
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 1 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0001
faster and better reward
net to 300 300 100
scan range: 10m maps: train 3 to 9 wps: 110 points cp radius: 3m cp reward: cp based max_v: 20 obs dim: 142 obs range: 110 to 250
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 1 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0001
faster up until even 23.8s but fail a lot (maybe try less learning rate later)
increase range to 20m
scan range: 20m maps: train 3 to 9 wps: 110 points cp radius: 3m cp reward: cp based max_v: 20 obs dim: 142 obs range: 110 to 250
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 1 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0001
same as2.0.5 only for 500 epoch and net is 500 500
scan range: 10m maps: train 3 to 9 wps: 110 points cp radius: 3m cp reward: cp based max_v: 20 obs dim: 142 obs range: 110 to 250
gamma: 0.99 alg: ppo num_workers: 15 num_gpus: 1.0 kl_coeff: 1.0 clip_param: 0.2 num_envs_per_worker: 1 train_batch_size: 100000 sgd_minibatch_siz': 4096 batch_mode: 'truncate_episodes' lr: .0001
candidates: v3.1.0 v3.0.3 v3.0.4