Generate a chat completion. OpenAI-compatible wire shape — existing OpenAI SDKs work with a base_url swap to https://api.tokenfactory.omniva.com/v1.
#Authentication
Bearer token in the Authorization header. See Authentication for how to mint a key.
#Parameters
| Field | Type | Description |
|---|---|---|
modelrequired | string | ID of the model to use (e.g. Omniva/Kimi-K2.6). Must match a model available to your workspace. |
messagesrequired | Message[] | Conversation so far, as an ordered array of { role, content } objects. role is one of system, user, assistant, or tool. |
temperature | number | Sampling temperature. Higher values produce more random output; lower values are more deterministic. Typical range 0 to 2. Default: 1 |
max_tokens | integer | Hard cap on the number of tokens generated in the response. If unset, the model may fill its full context window. |
stream | boolean | If true, the response is streamed as Server-Sent Events. See the streaming section below. Default: false |
n | integer | Number of completion choices to generate. Not supported today — the handler ignores this field and returns a single choice. |
top_p | number | Nucleus-sampling cutoff in the range 0 to 1. Not supported today — the handler ignores this field; use temperature instead. |
presence_penalty | number | Penalty in the range -2 to 2 for tokens already present in the prompt. Not supported today — the handler ignores this field. |
frequency_penalty | number | Penalty in the range -2 to 2 for tokens by their frequency so far. Not supported today — the handler ignores this field. |
logit_bias | object | Map of token-id to bias value. Not supported today — the handler ignores this field. |
logprobs | boolean | If true, return log probabilities of the output tokens. Not supported today — the handler ignores this field. |
top_logprobs | integer | Number of most-likely tokens to return at each position when logprobs is set. Not supported today — the handler ignores this field. |
seed | integer | Seed for best-effort deterministic sampling. Not supported today — the handler ignores this field. |
stop | string | string[] | Up to 4 strings where the model will stop generating. Not supported today — the handler ignores this field. |
response_format | object | Constrain the output format (e.g. JSON mode). Not supported today — the handler ignores this field. |
tools | Tool[] | Tool definitions the model may call. Not supported today — the handler ignores this field and returns no tool_calls. |
tool_choice | string | object | Control which tool, if any, the model picks. Not supported today — the handler ignores this field. |
user | string | End-user identifier for abuse-tracking. Not supported today — the handler ignores this field. |
service_tier | string | Service-tier hint (e.g. auto, default). Not supported today — the handler ignores this field. |
#Message object
| Field | Type | Description |
|---|---|---|
role | string | One of system, user, assistant, tool. |
content | string | ContentBlock[] | Plain string for text-only messages, or an array of typed content blocks for multimodal (vision) input. See Vision. |
name | string (optional) | Identifies the speaker in named conversations. Most clients leave this unset. |
tool_call_id | string | Required when role is "tool". References the tool_calls[N].id from the assistant message that requested the tool call. |
tool_calls | ToolCall[] | On an assistant message, the array of tool invocations the model wants to perform. Each entry has id, type (typically "function"), and a function: { name, arguments }. Token Factory does not yet emit tool_calls — see Parameters above. |
For content: ContentBlock[], each block is { type: "text", text: string } or { type: "image_url", image_url: { url: string } }. The full shape is documented in Vision.
#Vision
Vision-capable chat models accept images alongside text in a single messages[].content array. Each entry in the array is a typed block — the wire format matches OpenAI's content-array shape.
#Content blocks
type | Shape | Description |
|---|---|---|
text | { type: "text", text: string } | A text segment. Multiple text blocks are concatenated in order. |
image_url | { type: "image_url", image_url: { url: string } } | A reference to an image, either as an https URL or a base64 data URI. |
The image_url.url field accepts two forms:
- HTTPS URL — e.g.
https://example.com/cat.png. The gateway fetches the image server-side. The URL must be publicly reachable. - Base64 data URI — e.g.
data:image/png;base64,iVBORw0KGgo.... The image is sent inline in the request body. Useful when the image isn't hosted, but counts toward request size limits.
Supported image types are model-dependent — typically PNG, JPEG, WebP, and non-animated GIF. Check the model's entry in the Model Library for vision capability and any per-model constraints (max dimensions, file size, count per request).
#Example
The response shape is unchanged — choices[].message.content is a plain string regardless of whether the request was text-only or multimodal. Filter the Model Library by the Vision capability chip to find models that accept image input.
#Response
The top-level fields follow OpenAI's chat.completion shape: id, object, created (epoch seconds, UTC), model, choices[], usage. Each choices[i] has index, message ({ role, content }), and finish_reason (typically stop).
#Streaming response shape
When stream: true, the response is text/event-stream. Each event is a data: line carrying a JSON chunk in OpenAI's chat.completion.chunk format. Chunks accumulate via delta.content; the final chunk sets finish_reason: "stop". The stream terminates with a sentinel data: [DONE] frame — clients should stop reading at that point.
The on-the-wire shape is one SSE event per chunk, blank-line separated:
When a model emits tool calls (not yet supported by Token Factory — included here for wire-shape reference), the tool_calls array streams incrementally. Each chunk carries a partial function.arguments string that the client concatenates by index:
See the Streaming guide for client-side consumption patterns.
#Errors
Errors return a JSON body with an error field. The HTTP status indicates the class.
| Status | Cause | What to do |
|---|---|---|
400 Bad Request | Validation error — missing model or messages, or malformed payload | Inspect error.param when present; fix the request shape |
401 Unauthorized | Missing or invalid Bearer token | Check Authorization header; see Authentication |
403 Forbidden | Key revoked, or model not enabled for your workspace | Verify the key in the dashboard; check the model catalog |
404 Not Found | Unknown model | Check the model value against the dashboard catalog |
408 Request Timeout | Request took too long or upstream worker timed out | Retry with a shorter prompt or honor Retry-After |
409 Conflict | Resource-state conflict on workspace or key state | Read the error string; refresh state and retry |
422 Unprocessable Entity | Validation failed (e.g. invalid value range) | Read the error string for the offending field |
429 Too Many Requests | Rate limit or quota exceeded | Back off, then retry; see Tokens, pricing & quotas |
500 Internal Server Error | Unhandled server error | Retry with exponential backoff and jitter; file a support ticket if persistent |
502 Bad Gateway | Upstream proxy issue | Retry with exponential backoff and jitter |
503 Service Unavailable | No healthy upstream for this model right now | Retry; if persistent, pick an alternate from the catalog |
See Errors for the full error catalog and retry guidance.
#Code samples
#Not yet supported
The Parameters table above marks each unsupported OpenAI field with "Not supported today" in its Description. The handler accepts these fields in the request body without erroring, but ignores them — the response will not change. Track support status in Models overview.
#What next
The base request shape, parameters, and stop reasons walked through with examples.
Server-Sent Events, partial deltas, and incremental UI updates.
Browse the catalog and pick an ID for the model field.
What 401, 403, 429, and 503 mean and how to recover.