TriAttention: 10.7x Memory Reduction for LLM Reasoning With No Accuracy Loss

Researchers from MIT, NVIDIA, and ZJU compress the KV cache by 10.7x using trigonometric series in pre-RoPE space, matching full reasoning accuracy while boosting throughput 2.5x.
artificial-intelligence
Author

Kabui, Charles

Published

2026-04-08

Keywords

triattention, kv-cache-compression, llm-inference, reasoning-efficiency, trigonometric-series