U1-OCR-Extract

Full-layout reading, precise text-image extraction

Precise segmentation, full extraction, trusted usable data.

U1-OCR-Extract: Full-layout reading, precise text-image extraction

U1-OCR-Extract is an industrial-grade document intelligence foundation model that can accurately understand diverse documents like a business expert. On top of text recognition and layout parsing, it automatically performs document classification and efficiently extracts business-critical key information. The model supports coordinate back-tracing and evidence-chain visualization for auditing and review, and also supports private and offline deployment to meet high-security requirements in government, healthcare, finance, and other regulated industries.

Key metrics

93.4

Nenonnts-KIE

94.86

OC-OCR-KIE

85.7

OCRBench_v2_XIE_CN

>98%

Classification accuracy on 50+ common business document types

Stronger performance in healthcare scenarios

98.2%

Medical document classification accuracy

95.31%

Imaging information extraction accuracy

95.65%

Medical record information extraction accuracy

98.87%

ID / card document extraction accuracy

Key strengths

SOTA performance: leading on benchmarks and in production

State-of-the-art on authoritative evaluations with strong comprehension and generalization; in real business tests, extraction accuracy and document classification outperform many mainstream general multimodal models.

Verifiable: traceable, auditable extraction

Fused coordinates, text, and semantics so every field traces back to source text with a reliable evidence chain—moving beyond blind trust in black-box outputs.

Ready to use: business-aligned

Industry knowledge and business rules support multi-field checks (amounts, logical consistency) so structured output meets operational standards.

Robust: stable on messy real-world documents

Even in long-tail complex scenarios such as non-standard captures, occlusion loss, fold blur, multi-column layouts, and multilingual mixed content, it still delivers stable, high-precision output.

Technical highlights

4B-scale + native multimodal architecture

Balances inference efficiency with strong semantic understanding.

Semantic-driven + dynamic alignment

Deep structure and reading-order modeling for human-like extraction.

Architecture alignment and dynamic resolution

Full tables and mixed text–image layouts with fine-grained detail.

Use cases

Healthcare intelligence

Medical records, lab reports, prescriptions, billing lists, receipts, and card verification.

Finance and invoicing automation

Invoices, receipts, expense claims, and line-item extraction with reconciliation.

Government and general office

Contracts, applications, and supporting materials—structured and archived.

Capabilities

1) Document Classification

Supports 50+ common document types and allows business-specific extension.

2) Information Extraction

General extraction: automatically detects key attributes such as time, location, amount, organization, and business entities without predefined fields. Schema-guided extraction: define target fields, constraints, and formats via standard JSON Schema for high-precision structured output.

3) Layout Understanding

Understands title hierarchy, body text, and chart/table relationships, and can still "read the right positions and extract the right fields" in complex layouts.

4) Coordinate Back-Trace

Outputs source coordinates for extracted fields, enabling highlight-based verification and evidence chains while reducing manual review effort.

Get started

Flexible pricing, tailored solutions, and private deployment