2026-05-25visionmultimodalcode

Channel-wise Vector Quantization

Wei Song, Tianhang Wang, Yitong Chen, Tong Zhang, Zuxuan Wu, Ming Li, Jiaqi Wang, Kaicheng Yu

PDF preview unavailable

Key claim

CVQ improves image generation quality significantly.

The paper presents Channel-wise Vector Quantization (CVQ), which improves image tokenization by using channel-wise tokens instead of patch-wise ones. This method leads to a new visual autoregressive model that enhances image generation quality, achieving high scores in evaluation metrics. The key result is that CVQ significantly improves reconstruction quality over traditional vector quantization methods.

In plain English

Novelty

8.0/10

The introduction of channel-wise tokenization and a new autoregressive framework represents a significant advancement in image generation methods.

Reliability

7.5/10

The empirical results are strong and demonstrate improvements over conventional methods, though more extensive baselines could enhance reliability.

Deep reliability assessment

The methodology supports improved codebook utilization and reconstruction quality through channel-wise vector quantization (CVQ) compared to traditional patch-wise methods, but claims of achieving 100% codebook utilization without additional complexity may be overstated. The empirical results demonstrate significant improvements, yet the generalizability of these findings across all image types and tasks remains to be fully validated.

Reproducibility

Yes, the paper mentions that the code will be available, but does not specify a dataset used for training.

Discussion questions

1.What assumptions underlie the effectiveness of channel-wise quantization over patch-wise methods in diverse image contexts?
2.How can builders leverage CVQ in practical applications, and what are the potential trade-offs in terms of computational efficiency?
3.What specific conditions or datasets would lead to a failure of the claimed improvements in reconstruction quality and codebook utilization?

Key figure

Figure 1 illustrates the difference between conventional vector quantization (VQ) and channel-wise vector quantization (CVQ), highlighting how CVQ assigns indices to each channel of the feature map rather than to spatial patches.

Benchmark results

ImageNet-1KrFID: 2.6vs Vanilla VQ-2.24SOTA

ImageNet-1KPSNR: 20.94vs Vanilla VQ+1.01SOTA

MJHQ-30KFID: 6.42vs LlamaGen-19.17SOTA

Codelink

The code will be available here.Official