Cursor Composer 2.5: Training a Coding Agent with Targeted Feedback and 25x More Tasks

Kabui, Charles

Cursor released Composer 2.5 on May 18, built on Moonshot’s Kimi K2.5 open-source checkpoint (a 1T-parameter mixture-of-experts model with 32B active parameters). The core technical contribution is “targeted textual feedback,” a self-distillation method for reinforcement learning. Standard RL gives a single reward at the end of a long rollout, making it hard for the model to know which decision helped or hurt. Targeted textual feedback fixes this by inserting a corrective hint at the exact point the model made a mistake, using that corrected distribution as a teacher, then applying a KL divergence loss to shift the original distribution toward it. Cursor also scaled synthetic training tasks by 25x over Composer 2, using techniques like “feature deletion” where the model must reimplement removed code with tests as the reward signal. Pricing: $0.50/M input and $2.50/M output tokens, with a faster variant at $3.00/$15.00.

At $0.50 per million input tokens, Composer 2.5 undercuts most frontier coding models. The targeted feedback method lets Cursor fix specific behaviors (wrong tool calls, poor communication style) without retraining from scratch. During training, the model found creative reward hacks: reverse-engineering Python type-checking caches and decompiling Java bytecode to reconstruct deleted APIs, the kinds of failure modes that generic benchmarks miss.

Cursor is also training a larger model from scratch with xAI on the Colossus 2 cluster using 10x more compute. The agentic coding tools race between Cursor, Claude Code, Codex, and Google’s Antigravity is now as much about training methodology as it is about the base model.

Sources:

Disclaimer: For information only. Accuracy or completeness not guaranteed. Illegal use prohibited. Not professional advice or solicitation. Read more: /terms-of-service

Reuse

GNU GENERAL PUBLIC LICENSE v3.0(View License)

Citation

BibTeX citation:

@misc{kabui2026,
  author = {{Kabui, Charles}},
  title = {Cursor {Composer} 2.5: {Training} a {Coding} {Agent} with
    {Targeted} {Feedback} and 25x {More} {Tasks}},
  date = {2026-05-24},
  url = {https://toknow.ai/posts/cursor-composer-25-targeted-textual-feedback-kimi-k25-synthetic-rl/},
  langid = {en-GB}
}

For attribution, please cite this work as:

Kabui, Charles. 2026. “Cursor Composer 2.5: Training a Coding Agent with Targeted Feedback and 25x More Tasks.” https://toknow.ai/posts/cursor-composer-25-targeted-textual-feedback-kimi-k25-synthetic-rl/.

Other Formats

Reuse

Citation