PaddleOCR-VL-1.6: A 0.9B Document Parser That Beats Models Hundreds of Times Its Size

Kabui, Charles

Baidu’s PaddlePaddle team released PaddleOCR-VL-1.6, a compact model that turns document images into structured Markdown and JSON, pulling out text, tables, formulas, charts, and seals across 109 languages. It pairs a layout detector with a 0.9B vision-language model that reads each region, built on a dynamic-resolution image encoder and the small ERNIE-4.5-0.3B language model. On OmniDocBench v1.6, a standard document-parsing test, it scores 96.33%, the top result on the leaderboard. That puts a 0.9B model ahead of Gemini 3 Pro (92.91%), GPT-5.2 (86.59%), the 235B-parameter Qwen3-VL (89.78%), and the 1.2B MinerU2.5-Pro (95.75%).

Because the model is so small, you do not need a giant cloud model or a paid per-page API to convert documents into clean data. Quantized builds run locally in tools like Ollama and LM Studio, and the Apache 2.0 license lets you fine-tune and ship it commercially. For anyone digitizing contracts, invoices, medical records, or old books, that means accurate extraction of nested tables and dense math at close to zero cost, in scripts from Arabic to Thai.

The gains did not come from more data. The team instead hunted for the old version’s weak spots: inputs where a tiny visual change flips the output, rare layouts the data barely covered, and training examples with wrong labels, checked against independent parsers like MinerU2.5-Pro. Fixing just those spots, plus a short reinforcement-learning pass, beat brute-force scaling. As with MinerU2.5-Pro, document AI now turns on data quality, not model size.

Sources:

Disclaimer: For information only. Accuracy or completeness not guaranteed. Illegal use prohibited. Not professional advice or solicitation. Read more: /terms-of-service

Reuse

GNU GENERAL PUBLIC LICENSE v3.0(View License)

Citation

BibTeX citation:

@misc{kabui2026,
  author = {{Kabui, Charles}},
  title = {PaddleOCR-VL-1.6: {A} {0.9B} {Document} {Parser} {That}
    {Beats} {Models} {Hundreds} of {Times} {Its} {Size}},
  date = {2026-06-15},
  url = {https://toknow.ai/posts/paddleocr-vl-16-document-parsing-under-optimized-regions/},
  langid = {en-GB}
}

For attribution, please cite this work as:

Kabui, Charles. 2026. “PaddleOCR-VL-1.6: A 0.9B Document Parser That Beats Models Hundreds of Times Its Size.” https://toknow.ai/posts/paddleocr-vl-16-document-parsing-under-optimized-regions/.

Other Formats

Reuse

Citation