PromptRL is a framework that jointly trains language models (LMs) and flow-matching models (FMs) within a unified reinforcement learning loop for text-to-image generation. By incorporating LMs as adaptive prompt refiners, PromptRL addresses two critical limitations in current flow-based RL pipelines: exploration collapse due to insufficient generation diversity, and prompt overfitting where models memorize specific training formulations.
PromptRL achieves 2× sample efficiency compared to flow-only RL while obtaining an adaptive prompt refinement agent to improve test-time performance.
Unified RL loop for both language and flow models
Significantly improved sample efficiency
Dynamic prompt refinement for better generation
PromptRL consistently outperforms baseline methods across multiple benchmarks.
| Benchmark | Metric | PromptRL w/ PE | Best Baseline |
|---|---|---|---|
| GenEval | Avg. Score ↑ | 0.97 | 0.92 (FlowGRPO) |
| Aesthetic | PickScore ↑ | 24.05 | 23.63 (DiffusionNFT) |
| Aesthetic | HPS ↑ | 32.03 | 31.79 (DiffusionNFT) |
| OCR | OCR-1k ↑ | 0.98 | 0.89 (FlowGRPO) |
| Image Editing | EditReward Avg. ↑ | 1.43 | 1.44 (ReasonEdit-Think) |
| Model | 1 Obj. | 2 Obj. | Cnt. | Clr. | Pos. | Attr. | Avg. ↑ |
|---|---|---|---|---|---|---|---|
| Foundation Models | |||||||
| Show-o | 0.95 | 0.52 | 0.49 | 0.82 | 0.11 | 0.28 | 0.53 |
| Emu3-Gen | 0.98 | 0.71 | 0.34 | 0.81 | 0.17 | 0.21 | 0.54 |
| SD3 Medium | 0.98 | 0.74 | 0.63 | 0.67 | 0.34 | 0.36 | 0.62 |
| FLUX.1-dev | 0.98 | 0.81 | 0.74 | 0.79 | 0.22 | 0.45 | 0.66 |
| Qwen-Image | 0.99 | 0.92 | 0.89 | 0.88 | 0.76 | 0.77 | 0.87 |
| RL-based Methods | |||||||
| RePrompt | 0.98 | 0.87 | 0.77 | 0.85 | 0.62 | 0.49 | 0.76 |
| FlowGRPO | 1.00 | 0.99 | 0.91 | 0.89 | 0.95 | 0.80 | 0.92 |
| DiffusionNFT | 1.00 | 0.98 | 0.74 | 0.92 | 0.85 | 0.80 | 0.88 |
| PromptRL w/o PE | 1.00 | 0.96 | 0.95 | 0.95 | 0.93 | 0.85 | 0.94 |
| PromptRL w/ PE | 1.00 | 0.99 | 0.99 | 0.96 | 0.99 | 0.90 | 0.97 |
| Model | P.S. | HPS | U.R. | OCR-1k | TMDB | OpenLib |
|---|---|---|---|---|---|---|
| SD1.5 | 20.92 | 23.71 | 2.00 | 0.05 | 0.13 | 0.08 |
| SDXL | 22.14 | 26.67 | 2.78 | 0.13 | 0.20 | 0.09 |
| FLUX.1-schnell | 22.64 | 29.39 | 3.25 | 0.54 | 0.66 | 0.50 |
| Qwen-Image | 23.05 | 30.40 | 3.53 | 0.65 | 0.79 | 0.94 |
| RL-based Methods | ||||||
| FlowGRPO | 23.33 | 29.80 | 3.33 | 0.89 | 0.83 | 0.73 |
| DiffusionNFT | 23.63 | 31.79 | 3.39 | 0.89 | 0.91 | 0.86 |
| PromptRL w/o PE | 24.01 | 31.79 | 3.38 | 0.97 | 0.92 | 0.95 |
| PromptRL w/ PE | 24.05 | 32.03 | 3.44 | 0.98 | 0.91 | 0.95 |
| Model | Swap | Style | Add. | Attr. | Env. | Removal | Avg. ↑ |
|---|---|---|---|---|---|---|---|
| InstructPix2Pix | -0.24 | 0.91 | -0.45 | 0.45 | 0.48 | -0.80 | 0.02 |
| Qwen-Image-Edit | 1.11 | 1.14 | 0.95 | 0.90 | 1.39 | 0.61 | 1.03 |
| FLUX.2-klein | 1.42 | 1.73 | 1.29 | 1.42 | 1.80 | 0.32 | 1.34 |
| ReasonEdit-Think | 1.52 | 1.47 | 1.19 | 1.44 | 1.69 | 1.27 | 1.44 |
| PromptRL w/o PE | 1.45 | 1.46 | 1.28 | 1.35 | 1.56 | 0.98 | 1.36 |
| PromptRL w/ PE | 1.47 | 1.43 | 1.29 | 1.39 | 1.72 | 1.24 | 1.43 |
Follow these steps to set up PromptRL on your system:
conda env create -f environment.yml
conda activate unirl
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/huggingface/diffusers.git
pip install flash-attn==2.7.4.post1 --no-build-isolation
# Run evaluation
bash gen.sh
If you find our work useful, please consider citing our papers:
@article{wang2025promptrl,
title={PromptRL: Prompt Matters in RL for Flow-Based Image Generation},
author={Wang, Fu-Yun and Zhang, Han and Gharbi, Michael and Li, Hongsheng and Park, Taesung},
journal={arXiv preprint arXiv:2602.01382},
year={2026}
}
@article{wang2025unirl,
title={UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts},
author={Wang, Fu-Yun and Zhang, Han and Gharbi, Michael and Li, Hongsheng and Park, Taesung},
journal={arXiv preprint arXiv:2510.17937},
year={2025}
}
This codebase builds upon UniRL-Zero.