We show the design space of LCM is limited in three perspective. We propose PCM, generalizaing the design space and well tackling these limitations.
Consistency model~(CM) is a promising new family of generative models known for high-quality yet fast generation. Latent consistency model~(LCM) tried to extend it into the latent space for text-conditioned high-resolution generation. However, its results is not pleasant. In this work, we show that the current designs of LCM are flawed in three aspects. We propose Phased consistency model~(PCM), generalizing the design space of LCM and well tackling those limitations. Innovative strategies are proposed in both training and inference to improve the generation quality. Vast experimental results covering 1-step, 2-step, 4-step, 8-step, 16-step with widely applied Stable Diffusion and Stable Diffusion XL foundation models validate the advancements of PCM.
There are three main limitations for the latent consistency model. (1️) LCM can only accept CFG scale less than 2. Lager values cause over exposure. LCM is insensitive to negative prompt. (2) LCM fails to produce consistent results with different inference steps. Its results are blurry when step is too large or too small. (3) Loss term of LCM fails to achieve distribution consistency, produce bad results at low step regime. In this work, we investigate the reasons behind these limitations and propose PCM, which well tackles all these limitations.
@article{wang2024phased,
author = {Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach3, Keqiang Sun1, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li, Xiaogang Wang},
title = {Phased Consistency Model},
year = {2024},
}