← Back to feed
2026-05-25agentsvisioncode

AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models

Branislav Kveton, Anup Rao, Subhojyoti Mukherjee, Krishna Kumar Singh, Viet Dac Lai

PDF preview unavailable
Read on arXiv →

Key claim

AdvantageFlow outperforms existing flow model methods.

AdvantageFlow is a new reinforcement learning algorithm that optimizes a forward-process prediction loss for flow models. It stabilizes the optimization problem through rollout policy regularization, leading to improved performance in image generation tasks. The key result shows that AdvantageFlow outperforms both Flow-GRPO and a state-of-the-art baseline.

In plain English

AdvantageFlow is a new reinforcement learning algorithm that optimizes a forward-process prediction loss for flow models. It stabilizes the optimization problem through rollout policy regularization, leading to improved performance in image generation tasks. The key result shows that AdvantageFlow outperforms both Flow-GRPO and a state-of-the-art baseline.

Novelty
7.5/10

The introduction of a forward-process RL algorithm for flow models represents a meaningful extension of existing methods.

Reliability
8.0/10

The evaluation against strong baselines and the use of policy regularization support the claims made.

Deep reliability assessment

The methodology supports the effectiveness of AdvantageFlow in improving image generation tasks through a forward-process RL approach, but claims of outperforming all baselines may overstate its generalizability across different models and tasks.

Reproducibility

Yes, the paper mentions that all experiments are implemented in the DiffusionNFT code base, which is available on GitHub.

Discussion questions

  1. What assumptions about the stability of advantage-weighted loss functions under different conditions could be challenged?
  2. How can builders leverage the findings of AdvantageFlow in practical applications beyond image generation?
  3. What specific conditions or datasets would lead to a failure of the AdvantageFlow approach in outperforming existing methods?

Key figure

Figure 1 shows images generated by AdvantageFlow compared to DiffusionNFT and the base model, highlighting improvements in object generation and understanding.

Benchmark results

Stable Diffusion 3.5 MediumPickScore: 0.975vs DiffusionNFT+0.004SOTA
Stable Diffusion 3.5 MediumHPSv2.1: 0.835vs DiffusionNFT+0.040SOTA
Stable Diffusion 3.5 MediumOCR: 0.737vs DiffusionNFT+0.128SOTA
GitHub1 repo
NVlabs/DiffusionNFTOfficial
AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models — Frontier Papers