U1-OCR-Parser
Light in weight, heavy on understanding
SOTA parsing across formats with one-step structured output.
U1-OCR-Parser: Light in weight, heavy on understanding
U1-OCR-Parser is a lightweight, high-performance document understanding model with just 0.9B parameters. Its core strengths are a compact size and high accuracy, purpose-built for enterprise document parsing. It accurately parses complex layouts, formulas, complex tables, and mixed-format documents, and natively outputs standardized structured results such as Markdown, JSON, and HTML, helping enterprises quickly complete content ingestion, retrieval, information extraction, and RAG knowledge base construction.
SOTA on authoritative benchmarks—industrial results you can measure and compare
OmniDocBench V1 overall score
D4LA overall F1 score
Much higher PDF parsing throughput
More efficient high-resolution image processing
Key strengths
Higher parsing accuracy
Industrial-grade OCR with leading complex layout and reading-order recovery.
Structured output, ready to use
Standard JSON/Markdown layout signals for human review, search, and downstream development.
Multilingual and multi-format
Chinese and multilingual documents; office files, PDFs, images, scans, and more.
Downstream friendly
A stable, high-quality structural base for extraction, retrieval, Q&A, and knowledge ingestion.
Technical highlights
Complex tables, tuned
Merged cells, multi-level headers, nested and irregular tables restored reliably.
Strong on formulas and mixed layouts
Papers, research reports, textbooks, exam sheets, and other dense layouts.
Unusual content supported
Handwriting, stamps, seals, code, and special symbols with stable recognition.
Multi-engine deployment
VLLM, FastDeploy, and SGLang for flexible deployment options.
Use cases
General document digitization
One-click parsing for Word, PPT, Excel, PDF, screenshots, photos, and scans—searchable content, faster.
Complex tables to structured data
Government, research, and finance ledgers structured in one step—less manual entry and reformatting.
RAG knowledge bases
Better ingestion quality; fewer issues from "fragmentation, errors, and messy layout" in QA and search.
Archives and compliance
Structured output for archiving, search, audit, and reuse.
Capabilities
1) Complex layout parsing
Recognizes titles, body text, chart regions, and how content is organized—supports multi-column and mixed layouts.
2) Formulas and complex tables
Structured restoration for merged cells, multi-level headers, nested and irregular tables.
3) Unconventional text
Handwriting, stamps, seals, code, and special symbols supported.
4) Structured output
Native Markdown, JSON, and HTML—clean, reusable results.
5) Multilingual
Major languages including Chinese, English, Russian, Arabic, and Hindi, plus minority scripts—ready for regional and multilingual document parsing.


