ConsistencySolver

Image Diffusion Preview with Consistency Solver

CVPR 2026

Fu-Yun Wang1,2, Hao Zhou1, Liangzhe Yuan1, Sanghyun Woo1, Boqing Gong1, Bohyung Han1,3, Ming-Hsuan Yang1, Han Zhang1, Yukun Zhu1, Ting Liu1, Long Zhao1
1Google DeepMind   2The Chinese University of Hong Kong   3Seoul National University

Overview

The slow inference process of image diffusion models significantly degrades interactive user experiences. ConsistencySolver introduces a preview-and-refine paradigm: users first get fast low-step previews, then only run expensive full-step sampling on satisfactory ones.

ConsistencySolver is a lightweight, learnable high-order ODE solver derived from general linear multistep methods, optimized via Reinforcement Learning (PPO). It achieves FID scores on-par with Multistep DPM-Solver using 47% fewer steps, while reducing overall user interaction time by nearly 50%.

Fidelity

Previews closely resemble final outputs in visual and structural quality, enabling informed user decisions.

Efficiency

Minimizes computational overhead for rapid iteration, letting users explore multiple variations quickly.

Consistency

Preserves deterministic PF-ODE mapping so refining a satisfactory preview always produces an aligned final result.

Method

ConsistencySolver is derived from classical Linear Multistep Methods (LMMs), adapted to the simplified PF-ODE. The update rule at each step is:

\[ \mathbf{y}_{t_{i+1}} = \mathbf{y}_{t_i} + (n_{t_{i+1}} - n_{t_i}) \cdot \biggl[ \sum_{j=1}^{m} w_j(t_i, t_{i+1}) \cdot \boldsymbol{\epsilon}_{i+1-j} \biggr] \]

where \(\mathbf{y}_t = \mathbf{x}_t / \alpha_t\) is the signal-normalized state, \(n_t = \sigma_t / \alpha_t\) is the noise-to-signal ratio, and \(\boldsymbol{\epsilon}_{i+1-j}\) are cached noise predictions. The adaptive coefficients \(w_j(t_i, t_{i+1})\) are generated by a lightweight MLP policy, trained via PPO to maximize preview-target similarity.

This framework unifies existing solvers as special cases: DDIM (\(m{=}1, w_1{=}1\)), PNDM (4-step Adams-Bashforth with fixed coefficients), and DPM-Solver-2 (alternating 1- and 2-step updates). ConsistencySolver replaces these fixed coefficients with learned ones, optimized end-to-end.

For a detailed mathematical derivation, see our companion theory blog.

ConsistencySolver Method Overview
Overview of ConsistencySolver. A lightweight MLP policy generates adaptive coefficients for the linear multistep update, trained via PPO to maximize preview–target similarity.

Qualitative Results

Text-to-Image Generation

Text-to-Image Comparison
Visual comparison on Stable Diffusion. ConsistencySolver produces previews with sharper details and superior alignment to refined outputs.

Instructional Image Editing

Image Editing Comparison
Visual comparison on FLUX.1-Kontext for instructional image editing. Previews generated with 5 inference steps.

Quantitative Results

Stable Diffusion (Text-to-Image)

Method Steps FID ↓ CLIP ↑ Seg. ↑ Dep. ↑ Inc. ↑ Img. ↑ DINO ↑
Training-Free ODE Solvers
DDIM 552.5987.841.914.274.116.473.2
Multistep DPM 525.8793.166.619.185.620.685.5
Multistep DPM 819.5395.976.321.890.823.290.6
Multistep DPM 1019.2997.080.524.193.125.193.0
Distillation-Based Methods
DMD2 119.8889.342.112.670.512.173.8
Rectified Diff. 420.6494.467.618.587.019.785.6
Distillation-Based Solvers
AMED 819.2294.972.420.088.320.588.8
Ours-Distill 819.6595.174.020.889.321.189.5
Proposed Method
ConsistencySolver 520.3994.269.419.387.120.886.5
ConsistencySolver 818.8296.478.522.291.623.491.2
ConsistencySolver 1018.6697.283.224.993.925.393.5
ConsistencySolver 1218.5397.985.626.795.126.795.0

FLUX.1-Kontext (Image Editing)

Method Steps E. R. ↑ E. S. ↑ DINO ↑ Inc. ↑ CLIP ↑ Dep. ↑
Euler 40.615.4591.3186.7593.9523.99
Euler 50.795.8093.0989.1695.2524.76
Multistep DPM 40.725.5791.8388.1294.4923.70
Multistep DPM 50.835.9293.4490.1795.5324.59
ConsistencySolver 40.735.6792.3988.7194.8624.27
ConsistencySolver 50.866.0293.9090.7695.8725.18

Cross-Model Generalization

ConsistencySolver trained on SD1.5 generalizes zero-shot to unseen models without retraining.

Model Steps Method FID ↓ CLIP ↑
SDXL 10Multistep DPM26.3232.52
SDXL 10ConsistencySolver23.3233.45
SD1.4 5Multistep DPM25.2229.94
SD1.4 5ConsistencySolver20.2230.16

User Study: Preview Efficiency

Diffusion Preview reduces end-to-end inference time by up to 55% while maintaining generation quality.

Evaluator GenEval COCO 2017 LAION Speedup
FullPreview FullPreview FullPreview
Claude Sonnet 4 2.88s1.74s 3.64s1.85s 6.35s2.87s 1.88x
Human 3.82s2.16s 3.52s2.03s 5.18s2.58s 1.85x

Citation

If you find our work useful, please consider citing:

@inproceedings{wang2025consolver,
  title={Image Diffusion Preview with Consistency Solver},
  author={Wang, Fu-Yun and Zhou, Hao and Yuan, Liangzhe and Woo, Sanghyun and Gong, Boqing and Han, Bohyung and Yang, Ming-Hsuan and Zhang, Han and Zhu, Yukun and Liu, Ting and Zhao, Long},
  booktitle={CVPR},
  year={2026}
}

Acknowledgments

We thank Hartwig Adam and Florian Schroff from Google DeepMind for their support of this project, and Prof. Hongsheng Li from CUHK for his valuable advice.