Text generation (OpenAI-compatible API)

Call the model through an OpenAI-compatible Chat API.

POST/v1/chat/completions

Authorization

AuthorizationstringheaderRequired
HTTP: Bearer Auth
  • Security Scheme Type: http
  • HTTP Authorization Scheme: Bearer API_key, used to verify account information and can be viewed in Project Management > API Key.

Request Header

Content-Typeenum<string>Default: application/jsonRequired

The media type of the request body. Set it to application/json to ensure the request data is JSON.

Available option: application/json

Request Bodyapplication/json

modelstringRequired

Model code. Available values: u2, u1-insuremed.

messagesarrayRequired

Context passed to the model, ordered by conversation sequence.

System MessageobjectOptional

System message used to set the model role, tone, task objective, or constraint. It is usually placed at the first position of the messages array.

contentstring | arrayRequired

System instruction used to define the model role, behavior rules, response style, and task constraints.

typestringRequired

Content type. Only the fixed value text is supported.

textstringRequired

Specific text content.

rolestringRequired

Role of the system message, fixed to system.

User MessageobjectRequired

User message used to pass questions, instructions, or context to the model.

contentstring | arrayRequired

Message content.

typestringRequired

Content type. Only the fixed value text is supported.

textstringRequired

Specific text content.

rolestringRequired

Role of the user message, fixed to user.

Assistant MessageobjectOptional

Reply from the model. It is usually sent back to the model as context in multi-turn conversations.

contentstring | arrayOptional

Text content of the model reply. When tool_calls is present, content can be empty; otherwise content is required.

typestringRequired

Content type. Only the fixed value text is supported.

textstringRequired

Specific text content.

reasoning_contentstringOptional

Reasoning chain content of the model.

rolestringRequired

Role of the assistant message, fixed to assistant.

tool_callsarrayOptional

Tool and argument information returned after Function Calling, containing one or more objects. It comes from the tool_calls field of the previous model response.

idstringRequired

ID of the tool response.

typestringRequired

Tool type. Currently only function is supported.

functionobjectRequired

Tool and argument information.

namestringRequired

Tool name.

argumentsstringRequired

Arguments in JSON string format.

indexintegerRequired

Index of the current tool information in the tool_calls array.

Tool MessageobjectOptional

Output information from the tool.

contentstring | arrayRequired

Output content of the tool function. If it is structured data, serialize it to a string.

typestringRequired

Content type. Only the fixed value text is supported.

textstringRequired

Specific text content.

rolestringRequired

Fixed to tool.

tool_call_idstringRequired

ID returned after Function Calling, obtained through completion.choices[0].message.tool_calls[$index].id, used to identify which tool the Tool Message corresponds to.

streambooleanOptionalDefault: false

Whether to reply in streaming mode.

  • false: return once after the model generates all content;
  • true: return incrementally while generating, and each partial result is returned as a data chunk. These chunks need to be read one by one in real time and concatenated into the full reply.

Setting it to true is recommended to improve reading experience and reduce timeout risk.

stream_optionsobjectOptional

Configuration items for streaming output. Effective only when stream is true.

include_usagebooleanOptionalDefault: false

Whether to include token usage information in the final data chunk of the response.

  • true: include it;
  • false: do not include it.
In streaming mode, token usage information can only appear in the last data chunk of the response.

temperaturefloatOptional

Sampling temperature, used to control output diversity. A higher temperature makes the generated text more diverse; a lower temperature makes it more deterministic.

Value range: [0,2).

Default value of temperature: 1.0.

top_kintegerOptional

Specifies the number of candidate tokens used during sampling. A larger value makes the output more random; a smaller value makes it more deterministic. The value must be an integer greater than or equal to 1.

Default value of top_k: 40.

This is a non-standard OpenAI parameter. When using the Python SDK, put it in the extra_body object. Configuration example: extra_body={"top_k":xxx}.

max_tokensintegerOptional

Used to limit the maximum number of output tokens. If the generated content exceeds this value, generation stops early and the returned finish_reason becomes length.

Useful when output length must be controlled, such as generating summaries or keywords, or for reducing cost and response time.

When max_tokens is triggered, the finish_reason field in the response becomes length.

max_tokens does not limit the length of the reasoning chain of thinking models.

thinkingobjectOptional

Whether to enable thinking mode. u2 enables thinking by default and does not support disabling it.

typestringRequired

Available values: enabled (thinking mode on), disabled (thinking mode off).

This is a non-standard OpenAI parameter. When using the Python SDK, put it in the extra_body object. Configuration example: extra_body={"thinking": "{"type":xxx}"}.

toolsarrayOptional

Array containing one or more tool objects for the model to call in Function Calling.

If tools is set and the model decides a tool call is needed, the response returns tool information through tool_calls.

typestringRequired

Tool type. Currently only function is supported.

functionobjectRequired

namestringRequired

Tool name. Only letters, digits, underscores (_), and hyphens (-) are allowed, with a maximum of 64 tokens.

descriptionstringRequired

Tool description that helps the model decide when and how to call the tool.

parametersobjectOptionalDefault: {}

Parameter description of the tool. It must be valid JSON Schema.

If the parameters field is empty, the tool has no arguments, such as a time query tool. For better tool-calling accuracy, passing parameters is recommended.

Chat response object (non-streaming)

idstring

Unique identifier for this request.

choicesarray

Array of model-generated content.

finish_reasonstring

Reason why the model stopped generating. There are three cases:

  • stop when the input stop parameter is triggered or the model stops naturally;
  • length when generation ends because the output is too long;
  • tool_calls when the model needs to call a tool.

indexinteger

Index of the current object in the choices array.

messageobject

Message output by the model.

contentstring

Reply content from the model.

reasoning_contentstring

Reasoning chain content of the model.

rolestring

Role of the message, fixed to assistant.

tool_callsarray

Tool and argument information generated by the model after Function Calling.

idstring

Unique identifier of this tool response.

typestring

Tool type. Currently only function is supported.

functionobject

Tool information.

namestring

Tool name.

argumentsstring

Arguments in JSON string format.

Because model responses are somewhat random, the generated arguments may not match the function signature. Validate argument validity before invocation.

createdinteger

Unix timestamp in seconds when the request was created.

modelstring

Model used for this request.

objectstring

Always chat.completion.

service_tierstring

This field is currently fixed to null.

system_fingerprintstring

This field is currently fixed to null.

usageobject

Token usage information for this request.

completion_tokensinteger

Number of output tokens.

prompt_tokensinteger

Number of input tokens.

total_tokensinteger

Total number of consumed tokens, equal to prompt_tokens plus completion_tokens.

Chat response chunk object (streaming)

idstring

Unique identifier of this request. Every chunk object has the same id.

choicesarray

Array of model-generated content, which may contain one or more objects. If include_usage is set to true, choices is an empty array in the last chunk.

deltaobject

Incremental object of the request.

contentstring

Incremental message content.

reasoning_contentstring

Incremental reasoning chain content.

rolestring

Role of the incremental message object. It only has a value in the first chunk.

tool_callsarray

Tool and argument information generated by the model after Function Calling.

indexinteger

Index of the current tool in the tool_calls array.

idstring

Unique identifier of this tool response.

functionobject

Information about the called tool.

argumentsstring

Incremental argument information. After concatenating arguments across all chunks, you get the complete arguments.

Because model responses are somewhat random, the generated arguments may not match the function signature. Validate argument validity before invocation.

namestring

Tool name, only present in the first chunk.

typestring

Tool type. Currently only function is supported.

finish_reasonstring

Reason why the model stopped generating. There are four cases:

  • stop when the input stop parameter is triggered or the model stops naturally;
  • null while generation has not finished;
  • length when generation ends because the output is too long;
  • tool_calls when the model needs to call a tool.

indexinteger

Index of the current response in the choices array. When the input parameter n is greater than 1, this value is required to merge the complete content of each different response.

createdinteger

Timestamp when this request was created. Each chunk has the same timestamp.

modelstring

Model used for this request.

objectstring

Always chat.completion.chunk.

service_tierstring

This field is currently fixed to null.

system_fingerprintstring

This field is currently fixed to null.

usageobject

Token usage of this request. It is shown only in the last chunk when include_usage is true.

completion_tokensinteger

Number of output tokens.

prompt_tokensinteger

Number of input tokens.

total_tokensinteger

Total number of tokens, equal to prompt_tokens plus completion_tokens.