Create Async Speech Transcription Task

Create a speech transcription task, supporting multiple audio formats, timestamps, and speaker information output.

POST/v1/audio/asr/tasks

Authorization

AuthorizationstringheaderRequired
HTTP: Bearer Auth
  • Security Scheme Type: http
  • HTTP Authorization Scheme: Bearer API_key,Used to verify account information, can be viewed in Project Management > API Key .

Request Header

Content-Typeenum<string>Default:application/jsonRequired

The media type of the request body, please set it toapplication/jsonto ensure the correct format of the requested data.

Available options:application/json

Request Bodyapplication/json

file_id long Required

Audio file ID (obtained through the file upload API).

The uploaded async speech recognition file must comply with the following specifications:

  • Format: mp3, opus, wav, amr, m4a, ogg
  • Duration: minimum 1 second, maximum 5 hours
  • Size: no more than 1GB

modelenum<string>Required

Available model code options:u2-asr

format string Required

Audio file type: mp3, opus, wav, amr, m4a, ogg.

sample_rate integer

Audio sample rate, default 16000.

enable_itn boolean

Whether to enable Arabic numeral conversion (e.g., convert "nineteen ninety-seven" to "1997"), default true.

channel integer

Number of audio channels, 1(mono) / 2(stereo), default is 1.

enable_speaker boolean

Whether to enable speaker separation, valid when channel is mono, default false.

speaker_num integer

Number of speakers (valid only when speaker separation is enabled), defaults to automatic recognition.

word_info boolean

Whether to return word-level timestamps, default false.

context string

Context, used to specify contextual information for the model, limited to 500 characters.

hotwords string[ ]

List of hot words, max 200 words, max 5 characters per word.

Response Body Structure

task_idstring

Transcription task ID

base_respobject

Status code and details of this request

base_resp.status_codeinteger

Status code (0=normal; 100001=parameter error; 100101=authentication failed; 100501=triggered RPM rate limit; 100999=internal system error)

base_resp.status_msgstring

Status details