Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)
-
Updated
Jul 20, 2024 - Python
Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)
Code for the paper "Reward Design for Justifiable Sequential Decision-Making"; ICLR 2024
Share RL study material
Add a description, image, and links to the preference-based-reinforcement-learning topic page so that developers can more easily learn about it.
To associate your repository with the preference-based-reinforcement-learning topic, visit your repo's landing page and select "manage topics."