Generative AI API (SaaS) OpenAI compatible API (1.0)

Download OpenAPI specification:Download

Create Chat Completion

Completion API similar to OpenAI's API.

See https://platform.openai.com/docs/api-reference/chat/create for the API specification. This API mimics the OpenAI ChatCompletion API.

NOTE: Currently we do not support the following features: - function_call (Users should implement this by themselves) - logit_bias (to be supported by vLLM engine)

Authorizations:
bearerAuth
Request Body schema: application/json
required
required
Messages (string) or Array of Messages (objects) (Messages)

A list of messages comprising the conversation so far.

model
required
string (Model)

ID of the model to use.

frequency_penalty
number (Frequency Penalty)
Default: 0

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

object (Logit Bias)
Default: null

Modify the likelihood of specified tokens appearing in the completion.

max_tokens
integer (Max Tokens)
Default: null

The maximum number of tokens that can be generated in the chat completion.

n
integer (N)
Default: 1

How many chat completion choices to generate for each input message.

presence_penalty
number (Presence Penalty)
Default: 0

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

Array of Stop (strings) or Stop (string) (Stop)
Default: null

Up to 4 sequences where the API will stop generating further tokens.

stream
boolean (Stream)
Default: false

If set, partial message deltas will be sent, like in ChatGPT.

temperature
number (Temperature)
Default: 1

What sampling temperature to use, between 0 and 2.

top_p
number (Top P)
Default: 1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

user
string (User)
Default: null

A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.

Responses

Request samples

Content type
application/json
{
  • "messages": "[{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},{\"role\": \"user\", \"content\": \"Hello!\"}]",
  • "model": "cotomi-fast-v2.0",
  • "frequency_penalty": 0,
  • "logit_bias": null,
  • "max_tokens": null,
  • "n": 1,
  • "presence_penalty": 0,
  • "stop": null,
  • "stream": false,
  • "temperature": 1,
  • "top_p": 1,
  • "user": null
}

Response samples

Content type
application/json
{
  • "id": "string",
  • "choices": [
    ],
  • "created": 0,
  • "model": "string",
  • "system_fingerprint": "string",
  • "object": "string",
  • "usage": {
    }
}

Creates an embedding vector representing the input text.

Authorizations:
bearerAuth
Request Body schema: application/json
required
required
string (string) or Array of array (strings) or Array of array (integers) or Array of array (integers)

Input text to embed, encoded as a string or array of tokens. To embed multiple inputs in a single request, pass an array of strings or array of token arrays. The input must not exceed the max input tokens for the model (512 tokens for multilingual-e5-large), cannot be an empty string, and any array must be 2048 dimensions or less.

model
required
string
Value: "multilingual-e5-large"

ID of the model to use.

encoding_format
string
Default: "float"
Enum: "float" "base64"

The format to return the embeddings in. Can be either float or base64.

user
string

A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.

Responses

Request samples

Content type
application/json
{
  • "input": "The quick brown fox jumped over the lazy dog",
  • "model": "multilingual-e5-large",
  • "encoding_format": "float",
  • "user": "user-1234"
}

Response samples

Content type
application/json
{
  • "data": [
    ],
  • "model": "string",
  • "object": "list",
  • "usage": {
    }
}