Global Convergence of Wasserstein Policy Gradient for Entropy-Regularized Reinforcement Learning
Zhaoyu Zhu, Rui Gao, Shuang Li
Read on arXiv →Key claim
WPG achieves global convergence through Bellman-based arguments.
This paper develops a global convergence theory for Wasserstein policy gradient in reinforcement learning by utilizing the Bellman structure. The key result is that the Bellman recursion induces a favorable geometry that supports global convergence, despite the non-convex nature of the entropy-regularized RL objective.
In plain English
This paper develops a global convergence theory for Wasserstein policy gradient in reinforcement learning by utilizing the Bellman structure. The key result is that the Bellman recursion induces a favorable geometry that supports global convergence, despite the non-convex nature of the entropy-regularized RL objective.
The paper introduces a new global convergence theory for Wasserstein policy gradient, leveraging the Bellman structure in a novel way.
The claims are supported by a theoretical framework, though empirical validation is not provided.
Deep reliability assessment
The methodology supports the global convergence of Wasserstein Policy Gradient for entropy-regularized reinforcement learning under specific assumptions, but the practical applicability of these assumptions in real-world scenarios may be overclaimed.
Reproducibility
No open source code or dataset is mentioned in the paper.
Discussion questions
- 1.How does the assumption of uniform log-Sobolev inequality impact the generalizability of the results?
- 2.What are the practical implications of the discretization bias in real-world reinforcement learning applications?
- 3.What experimental results or scenarios would falsify the claimed global convergence of the Wasserstein Policy Gradient?
Key figure
The paper does not provide a specific figure or architectural diagram description.