2026-05-25agentsscalinginfracode

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

Shangding Gu

Key claim

Future agentic AI progress relies on system design.

This paper highlights the importance of designing modular and verifiable architectures around foundation models for agentic AI. It identifies key bottlenecks in context governance, trustworthy memory, and skill routing, proposing a new evaluation framework that focuses on the quality of agent behavior over simple task success. The key result is the introduction of CheetahClaws, a reference harness for evaluating these architectures.

Novelty

8.0/10

The paper introduces a new framework for understanding agentic AI that emphasizes system design over just model scaling.

Reliability

7.0/10

The claims are supported by a proposed research agenda and a reference harness, though empirical validation is limited.

Deep reliability assessment

The methodology supports the claim that system design is crucial for agentic AI, but it may overclaim the extent to which system scaling alone can address all performance bottlenecks without further model improvements.

Reproducibility

Yes, the paper mentions the release of CheetahClaws, a Python-native reference harness, which is available as open-source code.

Discussion questions

How critical is the role of system scaling compared to model scaling in achieving long-term agentic AI performance?
What are the practical challenges builders might face when implementing the proposed system-scaling framework in real-world applications?
What evidence or results would challenge the claim that system scaling is as important as model scaling for future progress in agentic AI?

Key figure

Figure 1 illustrates a six-component view of an agentic system, highlighting the interaction between reasoning, memory, context construction, skill routing, orchestration, and governance.

GitHub1 repo

SafeRL-Lab/cheetahclawsOfficial

Read on arXiv →