MinerU2.5-Pro: A 1.2B Model Beats 200x Larger Rivals by Fixing the Data, Not the Architecture

OpenDataLab’s 1.2B-parameter document parser scores 95.69 on OmniDocBench v1.6, outperforming models with 200x more parameters, purely through data engineering.
artificial-intelligence
Author

Kabui, Charles

Published

2026-04-26

Keywords

document-parsing, data-centric-ai, mineru, ocr, training-data