LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws
Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma
Key claim
Insufficient SNR leads to U-shaped performance degradation.
The paper presents the Shannon Scaling Law, which models LLM training as information transmission, capturing the effects of noise on performance. A key result is that failing to maintain a sufficient signal-to-noise ratio leads to performance degradation, which is effectively predicted by this new framework.
The Shannon Scaling Law introduces a new theoretical framework for understanding LLM training.
The experiments validate the theory against multiple perturbations with strong performance metrics.
Deep reliability assessment
The methodology supports the idea that scaling laws can be modeled using a noisy channel framework, capturing both monotonic improvements and U-shaped degradation. However, the claim of universal applicability across all perturbation scenarios may be overclaimed without further empirical validation across more diverse datasets and models.
Reproducibility
No open source code or dataset is explicitly mentioned in the paper.
Discussion questions
- How does the assumption of mapping model parameters to channel bandwidth hold across different architectures and tasks?
- What are the practical implications of the Shannon Scaling Law for optimizing model training in resource-constrained environments?
- What specific experimental results or scenarios would falsify the claims made by the Shannon Scaling Law?
Key figure
Figure 1 illustrates the loss landscapes between pretraining and downstream supervised fine-tuning, showing that while pretraining exhibits monotonic improvement, fine-tuning reveals a loss basin indicating performance degradation beyond a critical threshold.