PP-OCRv6 Lightweight OCR Beats Vision-Language Models on Text Recognition with 34.5M Parameters
<cite index="67-2">PaddlePaddle released PP-OCRv6, a lightweight OCR system that achieves 83.2% recognition accuracy and 86.2% detection Hmean, outperforming PP-OCRv5_server by +5.1% and +4.6% respectively while surpassing Qwen3-VL-235B, GPT-5.5, and Gemini-3.1-Pro with orders of magnitude fewer parameters</cite>. <cite index="62-2">The system spans three model tiers (medium, small, tiny) from 1.5M to 34.5M parameters, redesigned around a unified MetaFormer-style lightweight backbone (LCNetV4) with structural reparameterization</cite>.
<cite index="61-1,61-2">PP-OCRv6_medium covers 50 languages unified in a single model—Chinese, English, Japanese, and 46 Latin-script languages—with no model switching needed, and shows major improvements in specialized scenarios including digital displays, dot-matrix characters, tire prints, and industrial text recognition</cite>. <cite index="61-2">The tiny tier achieves 3.9x faster inference than PP-OCRv5_mobile on Intel Xeon CPU while maintaining comparable accuracy, and the full system achieves 5.2x CPU speedup via OpenVINO and 6.1x on Apple M4</cite>.
<cite index="61-2">All PP-OCRv6 models are available on HuggingFace and ModelScope, with three tiers for edge, mobile, and server deployment scenarios</cite>. The system targets the production OCR gap: general-purpose vision-language models suffer from hallucination, imprecise localization, and prohibitive compute cost for OCR tasks. <cite index="68-1">PP-OCRv6 integrates deeply with AI agent ecosystems including Dify, RAGFlow, and Cherry Studio, positioning it as a core component for document-to-data pipelines in agentic workflows</cite>.
Sources
- Primary source
- PP-OCRv6: From 1.5M to 34.5M Parameters, Surpassing Billion-Scale VLMs on OCR Tasks
“PP-OCRv6_medium achieves 83.2% recognition accuracy and 86.2% detection Hmean, outperforming PP-OCRv5_server by +5.1% and +4.6% respectively”
- PaddleOCR on GitHub
“50 languages unified: Single model covers Chinese, English, Japanese, and 46 Latin-script languages”
- PP-OCRv6 Collection on Hugging Face
“From 1.5M to 34.5M Parameters, Surpassing Billion-Scale VLMs on OCR Tasks”