arxiv:2606.01838

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

Published on Jun 1

· Submitted by

Prateek Sikdar on Jun 8

Accenture

Upvote

Authors:

Prateek Kumar Sikdar

Abstract

LayerRoute is a lightweight adapter that selectively skips transformer blocks during inference based on input type, achieving compute savings while maintaining or improving model quality through gated routing and LoRA adaptation.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Agentic language model systems alternate between two structurally distinct step types: structured tool calls (short, deterministic, low perplexity) and open-ended planning/reasoning steps (long, complex, high perplexity). Despite this heterogeneity, current inference systems apply identical compute to every step. We introduce LayerRoute, a lightweight adapter that learns to selectively skip transformer blocks on a per-input basis. LayerRoute augments each of the 24 transformer blocks in Qwen2.5-0.5B-Instruct with: (1) a per-layer router (~897 parameters, Linear(896,1)) that outputs a hard binary gate via the straight-through estimator, and (2) LoRA adapters (rank 8, ~1.08M parameters) on the Q/K/V/O attention projections. The backbone weights remain frozen. A single end-to-end training pass on agentic data (Hermes, Glaive, GSM8K, Turing) with a gate regularisation term forces the system to discover which blocks are skippable per input type. After 3,000 steps (6.4 minutes on an A100 40GB), LayerRoute achieves a 12.91% skip differential: tool calls skip 15.25% of FLOPs while planning steps skip only 2.34%, using only 1.10M trainable parameters (0.22% of the 494M backbone). Quality improves over the base model due to LoRA adaptation, with perplexity delta of -1.29 on tool calls and -1.30 on planning.

View arXiv page View PDF Project page GitHub 0 Add to collection

Community

prateeksikdar

Paper author Paper submitter about 10 hours ago

I'm exploring a simple idea for making agentic LLMs more compute-efficient: not every step in an agent workflow needs the full depth of the model.

In this paper, LayerRoute, I add lightweight per-layer routers and LoRA adapters to a frozen Qwen2.5-0.5B model. The routers learn to selectively skip transformer layers based on the input, using hard binary gates trained with a straight-through estimator (STE). The interesting observation is that structured tool-calling steps naturally learn to skip more layers, while open-ended planning and reasoning steps retain most of the model depth.

On a mixed agentic dataset, the approach achieved a 12.9% skip differential between tool calls and planning steps, while maintaining or improving perplexity through LoRA adaptation. Training required only ~1.1M trainable parameters (0.22% of the backbone) and completed in a few minutes on a single A100.

I'm curious about the community's thoughts on a few questions:

Would sequence-level routing scale to larger models (7B–70B)?
Is the learned middle-layer skip pattern a genuine architectural property or an artifact of the training setup?
Could this be combined with speculative decoding, MoD-style routing, or KV-cache optimizations for larger gains?
What benchmarks would best evaluate adaptive depth methods in real-world agent systems?

Paper and feedback welcome!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.01838

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.01838 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.01838 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.01838 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.