Papers
arxiv:2507.15758

LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

Published on Jul 21, 2025
· Submitted by
Yongliang Shen
on Jul 25, 2025
Authors:
,
,
,
,
,
,

Abstract

Length-Adaptive Policy Optimization (LAPO) reduces token usage and improves accuracy by enabling models to internally manage reasoning depth through reinforcement learning.

Large reasoning models have achieved remarkable performance through extended chain-of-thought sequences, yet this computational freedom leads to excessive token generation even for simple problems. We present Length-Adaptive Policy Optimization (LAPO), a novel framework that transforms reasoning length control from an external constraint into an intrinsic model capability. Unlike existing approaches that impose rigid limits or rely on post-hoc interventions, LAPO enables models to internalize an understanding of appropriate reasoning depth through a two-stage reinforcement learning process. In the first stage, models learn natural reasoning patterns by discovering the statistical distribution of successful solution lengths. The second stage leverages these patterns as meta-cognitive guidance, embedding them directly within the model's reasoning context to ensure inference-time flexibility. Experiments on mathematical reasoning benchmarks demonstrate that LAPO reduces token usage by up to 40.9\% while improving accuracy by 2.3\%. Our analysis reveals that models trained with LAPO develop emergent abilities to allocate computational resources based on problem complexity, achieving efficient reasoning without sacrificing quality.

Community

Paper author Paper submitter

A two-stage RL framework that teaches models to internalize reasoning efficiency.

Github: https://github.com/ZJU-REAL/LAPO

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2507.15758
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.15758 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.15758 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.15758 in a Space README.md to link it from this page.

Collections including this paper 6