PCM : Phased Consistency Model

Fu-Yun Wang¹, Zhaoyang Huang², Alexander William Bergman^3,6, Dazhong Shen⁴, Peng Gao⁴, Michael Lingelbach^3,6, Keqiang Sun¹, Weikang Bian¹, Guanglu Song5, Yu Liu⁴, Hongsheng Li¹, Xiaogang Wang¹

¹MMLab, CUHK ²Avolution AI ³Hedra ⁴Shanghai AI Lab ⁵Sensetime Research ⁶Stanford University

Paper arXiv Video Code Huggingface Demo

We show the design space of LCM is limited in three perspective. We propose PCM, generalizaing the design space and well tackling these limitations.

Video

Motivation

Consistency model~(CM) is a promising new family of generative models known for high-quality yet fast generation. Latent consistency model~(LCM) tried to extend it into the latent space for text-conditioned high-resolution generation. However, its results is not pleasant. In this work, we show that the current designs of LCM are flawed in three aspects. We propose Phased consistency model~(PCM), generalizing the design space of LCM and well tackling those limitations. Innovative strategies are proposed in both training and inference to improve the generation quality. Vast experimental results covering 1-step, 2-step, 4-step, 8-step, 16-step with widely applied Stable Diffusion and Stable Diffusion XL foundation models validate the advancements of PCM.

Motivation Summary Figure

There are three main limitations for the latent consistency model. (1️) LCM can only accept CFG scale less than 2. Lager values cause over exposure. LCM is insensitive to negative prompt. (2) LCM fails to produce consistent results with different inference steps. Its results are blurry when step is too large or too small. (3) Loss term of LCM fails to achieve distribution consistency, produce bad results at low step regime. In this work, we investigate the reasons behind these limitations and propose PCM, which well tackles all these limitations.

Text to Image Qualitative Comparison

Presentation Video.

Sample quality comparison of PCM and previous best methods.

Gallery of Stable Diffusion v1-5 + PCM

Images generated by PCM

Gallery of Stable Diffusion XL + PCM

Images generated by PCM

Text to Video

Video generation quality comparison with AnimateLCM in low-step regime.

Screen recording

Our model is stable for generating good-quality videos in 2 steps.

BibTeX

@article{wang2024phased,
  author    = {Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach3, Keqiang Sun1, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li, Xiaogang Wang},
  title     = {Phased Consistency Model},
  year      = {2024},
}