Nabla-Reasoner: Gradient Descent at Inference Time Makes LLMs Think Harder

Kabui, Charles

Researchers at UT Austin, UC San Diego, TTIC, and Georgia Tech introduced ∇-Reasoner, a framework that applies gradient descent to a language model’s token outputs during inference to improve reasoning. Instead of generating many candidate answers and picking the best one (“Best-of-N”), ∇-Reasoner uses Differentiable Textual Optimization (DTO): it takes the model’s initial output logits (the raw scores before token selection), then runs gradient descent using signals from both the base model and a reward model to push the output toward better answers. It refines one token at a time, accepting a change only if it leads to a higher-scoring response. On MATH-500, AIME, and AMC benchmarks, ∇-Reasoner scored 80.4% on MATH-500 with Qwen-2.5-7B-Instruct, up from 71.2% with greedy decoding. It matched training-based methods like GRPO while using up to 40% fewer model calls than Best-of-N.

Current reasoning methods scale by generating more tokens: more serial model calls. ∇-Reasoner scales with more parallel computation per call, since gradient updates across all tokens happen in a single forward-backward pass. Better answers at lower serving cost. The paper also proves that optimizing outputs via gradient descent at inference time is mathematically equivalent to KL-regularized reinforcement learning, just applied per-sample instead of across the dataset.

This connects to how the field thinks about making LLMs reason better. Training-time and inference-time improvements may not be separate strategies but two views of the same optimization. If gradient-based inference scaling keeps closing the gap with RL-trained models, the question becomes less “how do we train better reasoners?” and more “how much compute do we spend at inference?”

Sources:

Disclaimer: For information only. Accuracy or completeness not guaranteed. Illegal use prohibited. Not professional advice or solicitation. Read more: /terms-of-service

Reuse

GNU GENERAL PUBLIC LICENSE v3.0(View License)

Citation

BibTeX citation:

@misc{kabui2026,
  author = {{Kabui, Charles}},
  title = {Nabla-Reasoner: {Gradient} {Descent} at {Inference} {Time}
    {Makes} {LLMs} {Think} {Harder}},
  date = {2026-03-14},
  url = {https://toknow.ai/posts/nabla-reasoner-gradient-descent-latent-space-llm-reasoning/},
  langid = {en-GB}
}

For attribution, please cite this work as:

Kabui, Charles. 2026. “Nabla-Reasoner: Gradient Descent at Inference Time Makes LLMs Think Harder.” https://toknow.ai/posts/nabla-reasoner-gradient-descent-latent-space-llm-reasoning/.

Other Formats

Reuse

Citation