POST /v1/chat/completions · Token Factory Docs

POST/v1/chat/completions

Generate a chat completion. OpenAI-compatible wire shape — existing OpenAI SDKs work with a base_url swap to https://api.tokenfactory.omniva.com/v1.

#Authentication

Bearer token in the Authorization header. See Authentication for how to mint a key.

#Parameters

Field	Type	Description
modelrequired	string	ID of the model to use (e.g. Omniva/Kimi-K2.6). Must match a model available to your workspace.
messagesrequired	Message[]	Conversation so far, as an ordered array of { role, content } objects. role is one of system, user, assistant, or tool.
temperature	number	Sampling temperature. Higher values produce more random output; lower values are more deterministic. Typical range 0 to 2. Default: `1`
max_tokens	integer	Hard cap on the number of tokens generated in the response. If unset, the model may fill its full context window.
stream	boolean	If true, the response is streamed as Server-Sent Events. See the streaming section below. Default: `false`
n	integer	Number of completion choices to generate. Not supported today — the handler ignores this field and returns a single choice.
top_p	number	Nucleus-sampling cutoff in the range 0 to 1. Not supported today — the handler ignores this field; use temperature instead.
presence_penalty	number	Penalty in the range -2 to 2 for tokens already present in the prompt. Not supported today — the handler ignores this field.
frequency_penalty	number	Penalty in the range -2 to 2 for tokens by their frequency so far. Not supported today — the handler ignores this field.
logit_bias	object	Map of token-id to bias value. Not supported today — the handler ignores this field.
logprobs	boolean	If true, return log probabilities of the output tokens. Not supported today — the handler ignores this field.
top_logprobs	integer	Number of most-likely tokens to return at each position when logprobs is set. Not supported today — the handler ignores this field.
seed	integer	Seed for best-effort deterministic sampling. Not supported today — the handler ignores this field.
stop	string \| string[]	Up to 4 strings where the model will stop generating. Not supported today — the handler ignores this field.
response_format	object	Constrain the output format (e.g. JSON mode). Not supported today — the handler ignores this field.
tools	Tool[]	Tool definitions the model may call. Not supported today — the handler ignores this field and returns no tool_calls.
tool_choice	string \| object	Control which tool, if any, the model picks. Not supported today — the handler ignores this field.
user	string	End-user identifier for abuse-tracking. Not supported today — the handler ignores this field.
service_tier	string	Service-tier hint (e.g. auto, default). Not supported today — the handler ignores this field.

#Message object

Field	Type	Description
`role`	`string`	One of `system`, `user`, `assistant`, `tool`.
`content`	`string \| ContentBlock[]`	Plain string for text-only messages, or an array of typed content blocks for multimodal (vision) input. See Vision.
`name`	`string` (optional)	Identifies the speaker in named conversations. Most clients leave this unset.
`tool_call_id`	`string`	Required when `role` is `"tool"`. References the `tool_calls[N].id` from the assistant message that requested the tool call.
`tool_calls`	`ToolCall[]`	On an assistant message, the array of tool invocations the model wants to perform. Each entry has `id`, `type` (typically `"function"`), and a `function: { name, arguments }`. Token Factory does not yet emit `tool_calls` — see Parameters above.

For content: ContentBlock[], each block is { type: "text", text: string } or { type: "image_url", image_url: { url: string } }. The full shape is documented in Vision.

#Vision

Vision-capable chat models accept images alongside text in a single messages[].content array. Each entry in the array is a typed block — the wire format matches OpenAI's content-array shape.

#Content blocks

`type`	Shape	Description
`text`	`{ type: "text", text: string }`	A text segment. Multiple text blocks are concatenated in order.
`image_url`	`{ type: "image_url", image_url: { url: string } }`	A reference to an image, either as an https URL or a base64 data URI.

The image_url.url field accepts two forms:

HTTPS URL — e.g. https://example.com/cat.png. The gateway fetches the image server-side. The URL must be publicly reachable.
Base64 data URI — e.g. data:image/png;base64,iVBORw0KGgo.... The image is sent inline in the request body. Useful when the image isn't hosted, but counts toward request size limits.

Supported image types are model-dependent — typically PNG, JPEG, WebP, and non-animated GIF. Check the model's entry in the Model Library for vision capability and any per-model constraints (max dimensions, file size, count per request).

#Example

The response shape is unchanged — choices[].message.content is a plain string regardless of whether the request was text-only or multimodal. Filter the Model Library by the Vision capability chip to find models that accept image input.

#Response

The top-level fields follow OpenAI's chat.completion shape: id, object, created (epoch seconds, UTC), model, choices[], usage. Each choices[i] has index, message ({ role, content }), and finish_reason (typically stop).

#Streaming response shape

When stream: true, the response is text/event-stream. Each event is a data: line carrying a JSON chunk in OpenAI's chat.completion.chunk format. Chunks accumulate via delta.content; the final chunk sets finish_reason: "stop". The stream terminates with a sentinel data: [DONE] frame — clients should stop reading at that point.

The on-the-wire shape is one SSE event per chunk, blank-line separated:

When a model emits tool calls (not yet supported by Token Factory — included here for wire-shape reference), the tool_calls array streams incrementally. Each chunk carries a partial function.arguments string that the client concatenates by index:

See the Streaming guide for client-side consumption patterns.

#Errors

Errors return a JSON body with an error field. The HTTP status indicates the class.

Status	Cause	What to do
`400 Bad Request`	Validation error — missing `model` or `messages`, or malformed payload	Inspect `error.param` when present; fix the request shape
`401 Unauthorized`	Missing or invalid Bearer token	Check `Authorization` header; see Authentication
`403 Forbidden`	Key revoked, or model not enabled for your workspace	Verify the key in the dashboard; check the model catalog
`404 Not Found`	Unknown model	Check the `model` value against the dashboard catalog
`408 Request Timeout`	Request took too long or upstream worker timed out	Retry with a shorter prompt or honor `Retry-After`
`409 Conflict`	Resource-state conflict on workspace or key state	Read the `error` string; refresh state and retry
`422 Unprocessable Entity`	Validation failed (e.g. invalid value range)	Read the `error` string for the offending field
`429 Too Many Requests`	Rate limit or quota exceeded	Back off, then retry; see Tokens, pricing & quotas
`500 Internal Server Error`	Unhandled server error	Retry with exponential backoff and jitter; file a support ticket if persistent
`502 Bad Gateway`	Upstream proxy issue	Retry with exponential backoff and jitter
`503 Service Unavailable`	No healthy upstream for this model right now	Retry; if persistent, pick an alternate from the catalog

See Errors for the full error catalog and retry guidance.

#Code samples

#Not yet supported

The Parameters table above marks each unsupported OpenAI field with "Not supported today" in its Description. The handler accepts these fields in the request body without erroring, but ignores them — the response will not change. Track support status in Models overview.

#What next

Chat completions guide

The base request shape, parameters, and stop reasons walked through with examples.

Streaming

Server-Sent Events, partial deltas, and incremental UI updates.

Models overview

Browse the catalog and pick an ID for the model field.

Errors & status codes

What 401, 403, 429, and 503 mean and how to recover.