Generative AI API (SaaS) OpenAI compatible API (1.0)

Download OpenAPI specification:Download

Create Chat Completion

Completion API similar to OpenAI's API.

See https://platform.openai.com/docs/api-reference/chat/create for the API specification. This API mimics the OpenAI ChatCompletion API.

NOTE: Currently we do not support the following features: - function_call (Users should implement this by themselves) - logit_bias (to be supported by vLLM engine)

Authorizations:

bearerAuth

Request Body schema: application/json
required

required	Messages (string) or Array of Messages (objects) (Messages) A list of messages comprising the conversation so far.
model required	string (Model) ID of the model to use.
frequency_penalty	number (Frequency Penalty) Default: 0 Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
	object (Logit Bias) Default: null Modify the likelihood of specified tokens appearing in the completion.
max_tokens	integer (Max Tokens) Default: null The maximum number of tokens that can be generated in the chat completion.
n	integer (N) Default: 1 How many chat completion choices to generate for each input message.
presence_penalty	number (Presence Penalty) Default: 0 Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
	Array of Stop (strings) or Stop (string) (Stop) Default: null Up to 4 sequences where the API will stop generating further tokens.
stream	boolean (Stream) Default: false If set, partial message deltas will be sent, like in ChatGPT.
temperature	number (Temperature) Default: 1 What sampling temperature to use, between 0 and 2.
top_p	number (Top P) Default: 1 An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
user	string (User) Default: null A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.

Responses

Request samples

Payload

Content type

application/json

{"messages": "[{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},{\"role\": \"user\", \"content\": \"Hello!\"}]",
"model": "cotomi-fast-v2.0",
"frequency_penalty": 0,
"logit_bias": null,
"max_tokens": null,
"n": 1,
"presence_penalty": 0,
"stop": null,
"stream": false,
"temperature": 1,
"top_p": 1,
"user": null
}

Response samples

Content type

application/json

{"id": "string",
"choices": [{"finish_reason": "string",
"index": 0,
"message": {"content": "string",
"tool_calls": [{"id": "string",
"type": "string",
"function": {"name": "string",
"arguments": "string"
}
}
],
"role": "string"
},
"logprobs": {"content": [{"token": "string",
"logprob": 0,
"bytes": [ ],
"top_logprobs": [{"token": "string",
"logprob": 0,
"bytes": [ ]
}
]
}
]
}
}
],
"created": 0,
"model": "string",
"system_fingerprint": "string",
"object": "string",
"usage": {"completion_tokens": 0,
"prompt_tokens": 0,
"total_tokens": 0
}
}

Creates an embedding vector representing the input text.

Authorizations:

bearerAuth

Request Body schema: application/json
required

required	string (string) or Array of array (strings) or Array of array (integers) or Array of array (integers) Input text to embed, encoded as a string or array of tokens. To embed multiple inputs in a single request, pass an array of strings or array of token arrays. The input must not exceed the max input tokens for the model (512 tokens for `multilingual-e5-large`), cannot be an empty string, and any array must be 2048 dimensions or less.
model required	string Value: "multilingual-e5-large" ID of the model to use.
encoding_format	string Default: "float" Enum: "float" "base64" The format to return the embeddings in. Can be either `float` or `base64`.
user	string A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.

Responses

Request samples

Payload

Content type

application/json

{"input": "The quick brown fox jumped over the lazy dog",
"model": "multilingual-e5-large",
"encoding_format": "float",
"user": "user-1234"
}

Response samples

Content type

application/json

{"data": [{"index": 0,
"embedding": [0
],
"object": "embedding"
}
],
"model": "string",
"object": "list",
"usage": {"prompt_tokens": 0,
"total_tokens": 0
}
}

Generative AI API (SaaS) OpenAI compatible API (1.0)

Create Chat Completion

Authorizations:

Request Body schema: application/jsonrequired

Responses

Request samples

Response samples

Creates an embedding vector representing the input text.

Authorizations:

Request Body schema: application/jsonrequired

Responses

Request samples

Response samples

Request Body schema: application/json
required

Request Body schema: application/json
required