Overview
Unisound Token Hub (maas.unisound.com) provides simple, flexible, and efficient large model APIs, enabling developers to quickly build intelligent applications at a lower cost and unleash AI innovation potential.
Model Overview
Unisound Token Hub integrates diverse large model capabilities, providing model services such as speech recognition, speech synthesis, visual text recognition, medical insurance text understanding, and intelligent decision-making, covering full-scenario applications including intelligent interaction, multimodal parsing, medical insurance supervision, commercial insurance risk control, and claims assistance.
Text Models
Text models currently focus on text understanding, semantic parsing, and intelligent decision-making in the medical insurance vertical. The platform provides core capabilities such as medical insurance policy Q&A, intelligent review, compliance supervision, risk identification, claims assistance, and multimodal medical text parsing, helping social and commercial insurance institutions rapidly build intelligent systems for medical insurance risk control, fund supervision, and insurance claims services.
U1-InsureMed is an industry-grade foundation model deeply customized for medical insurance vertical scenarios. It helps users understand medical records, examination reports, physical checkup results, as well as information related to medical insurance and commercial insurance. It is suitable for both general users to quickly grasp complex content and professional roles to perform key-point extraction, information comprehension, and assisted analysis.
Input Modality
Text, image
Output Modality
Text
Context Window
204K
Streaming Output
Supports real-time streaming responses to improve interactive experience
Speech Models
Speech models are used to understand and generate speech information. The platform currently provides ASR (Speech Recognition), TTS (Speech Synthesis), and TTS-Clone (Voice Cloning) capabilities, helping developers quickly build voice interaction and personalized voice applications.
U2-ASR is designed for real-world recording scenarios. It maintains stable recognition quality under challenging conditions such as complex noise, dialect accents, and mixed-language speech. It also supports long-audio asynchronous transcription and structured output that can be directly integrated into subtitle generation, quality inspection, search, and archiving workflows.
Input Modality
Audio
Output Modality
Text
Maximum Supported Audio Duration
5 hours
Vision Models
Vision models are used to extract and understand information from images. The platform currently provides OCR (Optical Character Recognition) capabilities, supporting efficient recognition of text content in images and documents to achieve information digitization and structured processing.
U1-OCR-Parser is a lightweight, high-performance document understanding model with only 0.9B parameters. Built around a "small size, high accuracy" design, it sets a new benchmark for document parsing and achieves state-of-the-art performance across mainstream document understanding benchmarks, including tables and formulas.
Positioning
Document understanding model (0.9B)
Context Window
16K
Input Modality
Image, text
Output Modality
Text