You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Teaching the Donkey car to drive a track in the simulator using State Representation Learning and different Reinforcement Learning Algorithms including Deep Q-Network, Soft Actor-Critic and Proximal Policy Optimization Algorithms.
Building an LLM with RLHF involves fine-tuning using human-labeled preferences. Based on Learning to Summarize from Human Feedback, it uses supervised learning, reward modeling, and PPO to improve response quality and alignment.