2026-05-25agentsreasoningcode

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

Junlin Yang, Dylan Zhang, Xiangchen Song, Qirun Dai, Xiao Liu, Yuen Chen, Aniket Vashishtha, Jing Shi, Chenhao Tan, Hao Peng

PDF preview unavailable

Read on arXiv →

Key claim

CausaLab reveals LLMs' limits in causal reasoning.

CausaLab is a new environment for testing how well LLMs can understand and predict causal relationships. A key finding is that while GPT-5.2-high achieves high task accuracy, it struggles with causal understanding, highlighting the need for better intervention strategies.

In plain English

Novelty

8.0/10

CausaLab introduces a novel framework for evaluating causal reasoning in LLMs, extending the capabilities of existing models.

Reliability

7.5/10

The experiments are well-structured, showing clear results and addressing limitations, though more diverse baselines could strengthen claims.

Deep reliability assessment

The methodology supports evaluating LLMs' ability to discover causal mechanisms through interactive environments, but it may overclaim generalizability to real-world systems due to its synthetic nature and limited scope.

Reproducibility

yes, open source code is available at https://github.com/DylanZSZ/CausaLab

Discussion questions

1.How well do synthetic benchmarks like CausaLab translate to real-world causal discovery tasks?
2.What are the practical implications of using LLMs for causal discovery in scientific research?
3.What evidence would be needed to falsify the claim that LLMs can effectively discover causal mechanisms in complex environments?

Key figure

Figure 1 provides an overview of a CausaLab episode, illustrating the process of causal discovery through observation, intervention, and prediction.

GitHub1 repo

DylanZSZ/CausaLabOfficial