Google TurboQuant: Shrinking LLM Memory 6x at 3 Bits With Zero Quality Loss

Google Research’s TurboQuant compresses the LLM key-value cache to 3 bits with no accuracy loss and no retraining, cutting memory use 6x and speeding up attention computation 8x on H100 GPUs.
artificial-intelligence
Author

Kabui, Charles

Published

2026-04-02

Keywords

turboquant, kv-cache-compression, llm-inference, quantization, polar-coordinates