Skip to content

Video Explanations and Tutorials

Stephanie Hughes edited this page Jun 25, 2021 · 20 revisions

This page contains videos explaining the concepts in the training process and tutorials on how to run portions of the code.

Recommended to replay these videos at 1.5x speed.

Video: Conducting an episode

Topics:

  • Input/Ouput of experiments
  • Episode Loop Structure (Grasping + Lifting Stages)
  • Variable Speed Controller example run

This video goes through how we conduct our episode loop (grasping + lifting stages) through an example using an example of the Variable-Speed controller. > It also takes a look at how we get our action from our controllers through the get_action() function located in expert_data.py. Through this tutorial, we are also able to see how the code starts from the command line input, sets up the directory structure for each run, and finally generates the saved output (coordinates and plots). Note: At the end of the video, the recommended python editor is Pycharm (not Pytorch lol!)

Video: DDPGfD Overview

Topics:

  • DDPGfD algorithm
  • Actor/Critic network setup
  • Behavior Cloning loss

This video compares the DDPGfD algorithm to our current implementation. It also discusses some of the additions we have made (Behavior Cloning loss).

Video: Training the Policy (Sample and Update -- Conceptual Diagram)

Topics:

  • Sampling from the replay buffer (agent and expert)
  • Sampling N-steps
  • Updating the policy

This video goes through the diagram that displays our current process for how we train the policy. Training the policy includes: sampling experience (trajectories) from the replay buffer and updating the network weights based on minimizing the loss between the target and current actor-critic networks. Miro Board (with original diagram):

Video: Training Pipeline + How to run each stage

Topics:

  • RL Training Pipeline (Controller-->Pre-training-->Training)
  • Variation input (Baseline, Baseline + HOV, Shapes + HOV, Sizes + HOV, Orientations + HOV)
  • Changing the command-line input based on the experiment type (Controller/Policy, Variation input)

This video steps through the training pipeline starting with the controllers, followed by pre-training, and finally training the policy with varied inputs. For each stage, we take a look at example commands that can be used to generate experiments within the training pipeline.

Clone this wiki locally