U1-OCR-Extract
Full-layout reading, precise text-image extraction
Precise segmentation, full extraction, trusted usable data.
U1-OCR-Extract: Full-layout reading, precise text-image extraction
U1-OCR-Extract is an industrial-grade document intelligence foundation model that can accurately understand diverse documents like a business expert. On top of text recognition and layout parsing, it automatically performs document classification and efficiently extracts business-critical key information. The model supports coordinate back-tracing and evidence-chain visualization for auditing and review, and also supports private and offline deployment to meet high-security requirements in government, healthcare, finance, and other regulated industries.
Key metrics
Nenonnts-KIE
OC-OCR-KIE
OCRBench_v2_XIE_CN
Classification accuracy on 50+ common business document types
Stronger performance in healthcare scenarios
Medical document classification accuracy
Imaging information extraction accuracy
Medical record information extraction accuracy
ID / card document extraction accuracy
Key strengths
SOTA performance: leading on benchmarks and in production
State-of-the-art on authoritative evaluations with strong comprehension and generalization; in real business tests, extraction accuracy and document classification outperform many mainstream general multimodal models.
Verifiable: traceable, auditable extraction
Fused coordinates, text, and semantics so every field traces back to source text with a reliable evidence chain—moving beyond blind trust in black-box outputs.
Ready to use: business-aligned
Industry knowledge and business rules support multi-field checks (amounts, logical consistency) so structured output meets operational standards.
Robust: stable on messy real-world documents
Even in long-tail complex scenarios such as non-standard captures, occlusion loss, fold blur, multi-column layouts, and multilingual mixed content, it still delivers stable, high-precision output.
Technical highlights
4B-scale + native multimodal architecture
Balances inference efficiency with strong semantic understanding.
Semantic-driven + dynamic alignment
Deep structure and reading-order modeling for human-like extraction.
Architecture alignment and dynamic resolution
Full tables and mixed text–image layouts with fine-grained detail.
Use cases
Healthcare intelligence
Medical records, lab reports, prescriptions, billing lists, receipts, and card verification.
Finance and invoicing automation
Invoices, receipts, expense claims, and line-item extraction with reconciliation.
Government and general office
Contracts, applications, and supporting materials—structured and archived.
Capabilities
1) Document Classification
Supports 50+ common document types and allows business-specific extension.
2) Information Extraction
General extraction: automatically detects key attributes such as time, location, amount, organization, and business entities without predefined fields. Schema-guided extraction: define target fields, constraints, and formats via standard JSON Schema for high-precision structured output.
3) Layout Understanding
Understands title hierarchy, body text, and chart/table relationships, and can still "read the right positions and extract the right fields" in complex layouts.
4) Coordinate Back-Trace
Outputs source coordinates for extracted fields, enabling highlight-based verification and evidence chains while reducing manual review effort.


