2026-05-27reasoningdatacode

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Linas Nasvytis, Simon Jerome Han, Ben Prystawski, Satchel Grant, Noah D. Goodman, Judith E. Fan

Key claim

CORE enables efficient reasoning improvements with fewer resources.

The CORE algorithm allows language models to improve reasoning tasks more efficiently by distilling insights from past attempts. It achieves comparable or better performance with fewer training samples and rollouts than existing methods. This suggests a promising direction for more interpretable and efficient model self-improvement.

In plain English

The authors developed a new algorithm called CORE, which helps language models improve their reasoning skills by learning from past attempts. Unlike previous methods that often require a lot of training data and computational resources, CORE uses a more efficient approach by analyzing successful and unsuccessful reasoning attempts to generate useful insights. This means that builders can achieve better performance with fewer examples and less processing power. By making the learning process more interpretable and compact, CORE offers a promising way to enhance model self-improvement without the heavy resource demands of traditional methods. Builders should care because this could lead to faster and more effective development of AI systems that require less data and computational cost.

Novelty

8.0/10

CORE introduces a new non-parametric learning algorithm that enhances reasoning efficiency.

Reliability

7.5/10

The claims are supported by comparisons to multiple baselines across various tasks.

Deep reliability assessment

The methodology supports the claim that CORE can improve reasoning tasks with fewer samples and rollouts compared to baselines, but the generalizability to other types of tasks or models is not fully explored.

Reproducibility

Yes, the paper provides open source code at https://github.com/LinasNas/core-reasoning.

Discussion questions

1.How does CORE handle tasks where verifiable rewards are not easily defined?
2.What are the practical implications of CORE's context efficiency for deploying large language models in resource-constrained environments?
3.What specific scenarios or tasks would demonstrate the limitations or failure of CORE's approach?

Key figure

Figure 1 illustrates the CORE algorithm, showing how a model retrieves insights, generates a reasoning trace, and updates insights based on success or failure.

Benchmark results

Customaccuracy: 0.907vs GEPA+0.214SOTA

MathGAPaccuracy: 0.873vs MemRL+0.126SOTA

Customaccuracy: 0.423vs MemRL-0.304

ZebraLogicaccuracy: 0.717vs GEPA+0.010SOTA

GitHub1 repo

LinasNas/core-reasoningOfficial