Create Async Speech Transcription Task
Create a speech transcription task, supporting multiple audio formats, timestamps, and speaker information output.
Authorization
- Security Scheme Type: http
- HTTP Authorization Scheme: Bearer API_key,Used to verify account information, can be viewed in Project Management > API Key .
Request Header
The media type of the request body, please set it toapplication/jsonto ensure the correct format of the requested data.
Available options:application/json
Request Bodyapplication/json
file_id long Required
Audio file ID (obtained through the file upload API).
The uploaded async speech recognition file must comply with the following specifications:
- Format: mp3, opus, wav, amr, m4a, ogg
- Duration: minimum 1 second, maximum 5 hours
- Size: no more than 1GB
modelenum<string>Required
Available model code options:u2-asr
format string Required
Audio file type: mp3, opus, wav, amr, m4a, ogg.
sample_rate integer
Audio sample rate, default 16000.
enable_itn boolean
Whether to enable Arabic numeral conversion (e.g., convert "nineteen ninety-seven" to "1997"), default true.
channel integer
Number of audio channels, 1(mono) / 2(stereo), default is 1.
enable_speaker boolean
Whether to enable speaker separation, valid when channel is mono, default false.
speaker_num integer
Number of speakers (valid only when speaker separation is enabled), defaults to automatic recognition.
word_info boolean
Whether to return word-level timestamps, default false.
context string
Context, used to specify contextual information for the model, limited to 500 characters.
hotwords string[ ]
List of hot words, max 200 words, max 5 characters per word.
Response Body Structure
task_idstring
base_respobject
base_resp.status_codeinteger
base_resp.status_msgstring