U2-ASR

Beyond hearing, truly understanding expression

Recognizes seven Chinese dialects, mixed speech and industry terms

U2-ASR: Beyond hearing, truly understanding expression

U2-ASR moves beyond rigid, word-by-word transcription by combining business context, contextual cues, and intelligent professional-term disambiguation. It evolves from simply hearing words to understanding intent, with stronger adaptation to industry terminology, noisy environments, dialect accents, and multilingual code-switching scenarios, making it broadly applicable to study, office, and daily-life use cases.

90%

Recognition Accuracy in Complex Noise

99.2%

AISHELL-1 Test Set Score

98.4%

AISHELL-3 Test Set Score

98.4%

Libri Clean Test Set Score

Excels on industrial-grade dialect test sets, with overall recognition performance surpassing mainstream ASR models.

Industrial-grade dialect benchmark comparison chart

Also delivers outstanding results on public Chinese and English benchmarks including AISHELL, FLBURS, LibriSpeech, WenetSpeech Meeting, and KeSpeech.

Chinese and English public benchmark comparison chart

Core Capabilities

Cross-dialect bilingual interoperability that understands local accents

Fully supports Chinese and English bilingual recognition, covering hundreds of dialects and regional accents including Cantonese, Sichuanese, Shanghainese, Hokkien, and Hakka, with complete coverage of the seven major Chinese dialect groups.

Context-aware understanding that goes beyond words

Abandons mechanical word-by-word recognition, relying on contextual understanding and specialized vocabulary optimization to adapt to noise, strong accents, technical terms, and mixed-language switching, delivering transcripts that naturally stay true to the original meaning.

Long-audio transcription with one-click handling for extra-long recordings

Easily handles hours-long audio files. Whether for daily records or long-term audio archiving, it consistently delivers stable and efficient batch transcription.

Structured and standardized output, ready to use

Automatically generates high-quality standardized transcripts with speaker diarization, accurate timestamps, smart sentence segmentation, and automatic punctuation. The text is clean, well-structured, and ready without manual post-editing.

Highlights

Strong Noise Robustness

Optimized for real recording environments, maintaining stable recognition under complex background sounds, noisy malls, and meeting rooms.

Broad Dialect and Language Coverage

In addition to Mandarin, it supports Cantonese and other dialects plus multilingual transcription for cross-region and cross-language scenarios.

Stronger Domain Semantic Understanding

Context and hotword injection improve recognition of domain terms in medical, automotive, and customer-service workflows.

Accurate Speaker-Separated Transcription

Speaker diarization, smart segmentation, punctuation prediction, and timestamps make transcripts clearer and easier to organize.

Use Cases

Office Document Input

Generate work docs, emails, and draft plans quickly from speech to speed up content input.

Medical Record Entry

Recognize large volumes of medical terminology in real time and support physician dictation for faster EMR creation.

Communication and Translation

Accurately restores and clearly records different accents and ways of expression.

Meeting Audio Transcription

Parse long recordings in one click and output structured, well-formatted transcripts with less manual effort.

Capabilities

  • Supports audio-to-text for long-form scenarios such as meetings, lectures, customer service, and business recordings.
  • Supports speaker diarization to distinguish segments from different speakers.
  • Supports smart sentence splitting and punctuation prediction to improve readability.
  • Supports timestamp output for subtitles, search, and audio-video alignment.
  • Supports context and hotword enhancement to improve recognition of proper nouns and industry terms.

Get Started

Flexible pricing, tailored solutions, and private deployment