SLA2: Sparse-Linear Attention Achieves 18.6x Speedup in Video Diffusion Models

Researchers at Tsinghua and UC Berkeley propose SLA2, reaching 97% attention sparsity and an 18.6x speedup in video diffusion models with no quality loss.
artificial-intelligence
Author

Kabui, Charles

Published

2026-03-04

Keywords

sparse-attention, linear-attention, diffusion-models, video-generation, quantization, inference-optimization