Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

📝 News

[2025.03.20]: 🔥 The pre-trained models are released!
[2025.03.20]: 🔥 The source code is publicly available in this repository!

📖 Introduction

This repository contains the official pytorch implementation of the paper “Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization” paper.

In this work, we analyze the challenges when pixel-level reward models are used in step-level preference optimization for diffusion models. Then we propose the Latent Reward Model (LRM) to utilize diffusion models for step-level reward modeling, based on the insights that diffusion models possess text-image alignment abilities and can perceive noisy latent images across different timesteps. We further introduce Latent Preference Optimization (LPO), a method that employs LRM for step-level preference optimization, operating entirely within the latent space.

Extensive experiments demonstrate that LPO significantly improves the image quality of various diffusion models and consistently outperforms existing DPO and SPO methods across the general, aesthetic, and alignment preferences. Moreover, LPO exhibits remarkable training efficiency, achieving a speedup of 10-28× over Diffusion-DPO and 2.5-3.5× over SPO.

🛠️ Usage

Please see the repository.

⭐ Citation

If you find this repository helpful, please consider giving it a star ⭐ and citing:

@article{zhang2025diffusion,
  title={Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization},
  author={Zhang, Tao and Da, Cheng and Ding, Kun and Jin, Kun and Li, Yan and Gao, Tingting and Zhang, Di and Xiang, Shiming and Pan, Chunhong},
  journal={arXiv preprint arXiv:2502.01051},
  year={2025}
}

🤗 Acknowledgments

This codebase is built upon the PickScore repository and the SPO repository. Thanks for their great work！

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
imgs		imgs
readme.assets		readme.assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

📝 News

📖 Introduction

🛠️ Usage

⭐ Citation

🤗 Acknowledgments

About

Releases

Packages

License

casiatao/LPO

Folders and files

Latest commit

History

Repository files navigation

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

📝 News

📖 Introduction

🛠️ Usage

⭐ Citation

🤗 Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages