PromptRL

Prompt Matters in RL for Flow-Based Image Generation

Overview

PromptRL is a framework that jointly trains language models (LMs) and flow-matching models (FMs) within a unified reinforcement learning loop for text-to-image generation. By incorporating LMs as adaptive prompt refiners, PromptRL addresses two critical limitations in current flow-based RL pipelines: exploration collapse due to insufficient generation diversity, and prompt overfitting where models memorize specific training formulations.

PromptRL achieves 2× sample efficiency compared to flow-only RL while obtaining an adaptive prompt refinement agent to improve test-time performance.

🎯

Joint Training

Unified RL loop for both language and flow models

2× Efficiency

Significantly improved sample efficiency

🔄

Adaptive Prompts

Dynamic prompt refinement for better generation

Qualitative Results

Text-to-Image Generation

Text-to-Image Comparison
Comparison of text-to-image generation quality across different methods

Instructional Image Editing

Image Editing Comparison
Comparison of instructional image editing results

Quantitative Results

Performance Summary

PromptRL consistently outperforms baseline methods across multiple benchmarks.

Benchmark Metric PromptRL w/ PE Best Baseline
GenEval Avg. Score ↑ 0.97 0.92 (FlowGRPO)
Aesthetic PickScore ↑ 24.05 23.63 (DiffusionNFT)
Aesthetic HPS ↑ 32.03 31.79 (DiffusionNFT)
OCR OCR-1k ↑ 0.98 0.89 (FlowGRPO)
Image Editing EditReward Avg. ↑ 1.43 1.44 (ReasonEdit-Think)

GenEval Benchmark (Full Results)

Model 1 Obj. 2 Obj. Cnt. Clr. Pos. Attr. Avg. ↑
Foundation Models
Show-o 0.950.520.490.820.110.280.53
Emu3-Gen 0.980.710.340.810.170.210.54
SD3 Medium 0.980.740.630.670.340.360.62
FLUX.1-dev 0.980.810.740.790.220.450.66
Qwen-Image 0.990.920.890.880.760.770.87
RL-based Methods
RePrompt 0.980.870.770.850.620.490.76
FlowGRPO 1.000.990.910.890.950.800.92
DiffusionNFT 1.000.980.740.920.850.800.88
PromptRL w/o PE 1.000.960.950.950.930.850.94
PromptRL w/ PE 1.00 0.99 0.99 0.96 0.99 0.90 0.97

Aesthetic & OCR Metrics

Model P.S. HPS U.R. OCR-1k TMDB OpenLib
SD1.5 20.9223.712.000.050.130.08
SDXL 22.1426.672.780.130.200.09
FLUX.1-schnell 22.6429.393.250.540.660.50
Qwen-Image 23.0530.403.530.650.790.94
RL-based Methods
FlowGRPO 23.3329.803.330.890.830.73
DiffusionNFT 23.6331.793.390.890.910.86
PromptRL w/o PE 24.0131.793.380.970.920.95
PromptRL w/ PE 24.05 32.03 3.44 0.98 0.91 0.95

Image Editing - EditReward

Model Swap Style Add. Attr. Env. Removal Avg. ↑
InstructPix2Pix -0.240.91-0.450.450.48-0.800.02
Qwen-Image-Edit 1.111.140.950.901.390.611.03
FLUX.2-klein 1.421.731.291.421.800.321.34
ReasonEdit-Think 1.521.471.191.441.691.271.44
PromptRL w/o PE 1.451.461.281.351.560.981.36
PromptRL w/ PE 1.47 1.43 1.29 1.39 1.72 1.24 1.43

Installation

Follow these steps to set up PromptRL on your system:

conda env create -f environment.yml
conda activate unirl
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/huggingface/diffusers.git
pip install flash-attn==2.7.4.post1 --no-build-isolation

# Run evaluation
bash gen.sh

Citation

If you find our work useful, please consider citing our papers:

@article{wang2025promptrl,
  title={PromptRL: Prompt Matters in RL for Flow-Based Image Generation},
  author={Wang, Fu-Yun and Zhang, Han and Gharbi, Michael and Li, Hongsheng and Park, Taesung},
  journal={arXiv preprint arXiv:2602.01382},
  year={2026}
}
@article{wang2025unirl,
  title={UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts},
  author={Wang, Fu-Yun and Zhang, Han and Gharbi, Michael and Li, Hongsheng and Park, Taesung},
  journal={arXiv preprint arXiv:2510.17937},
  year={2025}
}

Acknowledgments

This codebase builds upon UniRL-Zero.