DiffusionGemma: Google&amp;#039;s Open Model That Writes Text in Blocks, Up to 4x Faster

Kabui, Charles

Google released DiffusionGemma, an open model that generates text by diffusion instead of one word at a time. Most language models work like a typewriter, adding one token, a word or word-piece, in sequence. DiffusionGemma drafts a 256-token block at once, refining those placeholder tokens over a few passes, like an image generator sharpening static into a picture. It is a 26-billion-parameter Mixture-of-Experts model that runs only a slice of itself, activating 3.8 billion parameters per step, built on Gemma 4 with a new diffusion head under the Apache 2.0 license. It generates up to 4 times faster on GPUs, over 1,000 tokens per second on one NVIDIA H100.

Compressed to fit within 18GB of video memory, it runs on a single high-end consumer GPU with no cloud bill or per-token fee. Its bi-directional attention, where every token can see every other, suits work that is not strictly left-to-right: in-line code editing, filling gaps in text, or fixing a block’s formatting at once. Google notes the catch plainly: output quality sits below standard Gemma 4, so this is built for fast, interactive local work, not top-quality answers.

Open diffusion language models are not new, LLaDA2.0-Uni took a similar route across text and images, but DiffusionGemma’s focus is local speed. Word-by-word generation leaves a single user’s GPU idle; drafting in parallel keeps it busy. In high-volume cloud serving, where batching already saturates the hardware, that edge mostly disappears.

Sources:

Disclaimer: For information only. Accuracy or completeness not guaranteed. Illegal use prohibited. Not professional advice or solicitation. Read more: /terms-of-service

Reuse

GNU GENERAL PUBLIC LICENSE v3.0(View License)

Citation

BibTeX citation:

@misc{kabui2026,
  author = {{Kabui, Charles}},
  title = {DiffusionGemma: {Google’s} {Open} {Model} {That} {Writes}
    {Text} in {Blocks,} {Up} to 4x {Faster}},
  date = {2026-06-29},
  url = {https://toknow.ai/posts/diffusiongemma-open-text-diffusion-model-4x-faster/},
  langid = {en-GB}
}

For attribution, please cite this work as:

Kabui, Charles. 2026. “DiffusionGemma: Google’s Open Model That Writes Text in Blocks, Up to 4x Faster.” https://toknow.ai/posts/diffusiongemma-open-text-diffusion-model-4x-faster/.

Other Formats

Reuse

Citation