2026-05-22scalinginfra

Training-Free Looped Transformers

Lizhang Chen, Jonathan Li, Chen Liang, Ni Lao, Qiang Liu

Key claim

Looping pretrained transformers improves performance without fine-tuning.

This paper presents a training-free method for enhancing transformer models by applying a looping strategy at inference time. The key result shows significant performance improvements on various benchmarks, including a +2.64 percentage point increase on MMLU-Pro for Qwen3-4B-Instruct.

Novelty

8.0/10

The method introduces a novel approach to applying recurrence to pretrained models without fine-tuning.

Reliability

7.0/10

The evaluation across multiple model families and tasks supports the claims, though more extensive baselines could strengthen reliability.

Deep reliability assessment

The methodology supports the application of a training-free looped transformer wrapper to improve performance on various benchmarks without additional training. However, it may overclaim the generalizability of these improvements across all model architectures and tasks.

Reproducibility

Yes, the paper mentions using lm-eval-harness v0.4.11 for evaluations, which is an open-source framework.

Discussion questions

What assumptions underlie the effectiveness of applying a looped structure to frozen models without retraining?
How can builders leverage this methodology in real-world applications, especially in resource-constrained environments?
What specific conditions or datasets would lead to a failure of the proposed looping strategy?

Key figure

Figure 1 illustrates the training-free looped transformer wrapper, showing two iteration modes: block-mode and layer-mode, for applying a contiguous mid-block of layers during inference.

Benchmark results

MMLU-Proaccuracy: 0.5979vs Qwen3-4B-Instruct+2.64 ppSOTA

GPQA-Mainaccuracy: 0.3571vs Qwen3-4B-Instruct+2.01 ppSOTA

OpenBookQAaccuracy: 0.328vs Moonlight-16B-A3B-Instruct+1.20 ppSOTA

Read on arXiv →