2026-05-25agentsreinforcement learning

Global Convergence of Wasserstein Policy Gradient for Entropy-Regularized Reinforcement Learning

Zhaoyu Zhu, Rui Gao, Shuang Li

PDF preview unavailable

Key claim

WPG achieves global convergence through Bellman-based arguments.

This paper develops a global convergence theory for Wasserstein policy gradient in reinforcement learning by utilizing the Bellman structure. The key result is that the Bellman recursion induces a favorable geometry that supports global convergence, despite the non-convex nature of the entropy-regularized RL objective.

In plain English

Novelty

8.0/10

The paper introduces a new global convergence theory for Wasserstein policy gradient, leveraging the Bellman structure in a novel way.

Reliability

7.5/10

The claims are supported by a theoretical framework, though empirical validation is not provided.

Deep reliability assessment

The methodology supports the global convergence of Wasserstein Policy Gradient for entropy-regularized reinforcement learning under specific assumptions, but the practical applicability of these assumptions in real-world scenarios may be overclaimed.

Reproducibility

No open source code or dataset is mentioned in the paper.

Discussion questions

1.How does the assumption of uniform log-Sobolev inequality impact the generalizability of the results?
2.What are the practical implications of the discretization bias in real-world reinforcement learning applications?
3.What experimental results or scenarios would falsify the claimed global convergence of the Wasserstein Policy Gradient?

Key figure

The paper does not provide a specific figure or architectural diagram description.