Glossary · Token Factory Docs

Short definitions for the terms that recur across these docs. If a term you saw on a page isn't covered here, file an issue.

#API key

A secret credential of the form sk-oais-... that authenticates requests to the public /v1 API. Keys are scoped to a single workspace; quotas, billing, and Observability all attribute to the owning workspace.

#Bearer token

The HTTP authorization scheme used by /v1. Pass your API key as Authorization: Bearer sk-oais-... — there is no separate token-exchange step.

#GPU hourly pricing

The reserved-capacity pricing mode: billed per GPU-hour for the duration a dedicated deployment is live, regardless of token volume. Arrives with Dedicated Endpoints (coming soon); best fit for steady high-volume traffic.

#Inference runtime

The actual model server behind the gateway — vLLM, TGI, or another OpenAI-compatible runtime — that executes a request and returns tokens. The gateway is the contract; the runtime is the implementation.

#Model ID

The string identifier for a model in API requests, formatted as provider/model-name. The prefix carries meaning: an Omniva/… id (for example Omniva/Kimi-K2.6) is an Omniva-optimized build of an open model — tuned and quantized for low-latency serving — while the raw upstream open weights are offered under the author prefix (for example MiniMaxAI/MiniMax-M2.5). Echoed back in the model field of responses.

#Model Library

The browseable catalog of available models in the Token Factory app — filterable by provider, modality, and status. The same set is exposed programmatically once /v1/models is live.

#Playground

The in-app chat and completion UI for trying a model interactively without writing code. The Playground calls the same /v1 gateway as your API key, but proxied through a session — not a substitute for an API key in production.

#Production gateway

The deployed /v1 endpoint at https://api.tokenfactory.omniva.com/v1. The contract documented across these pages.

#Token-based pricing

The pay-as-you-go pricing mode: billed per 1,000 input + output tokens at a per-model rate. Best fit for variable traffic and prototyping. GA today via /v1/chat/completions and /v1/embeddings.

#Token Factory

The Omniva control plane for inference — UI, API-key management, workspace billing, and the OpenAI-compatible /v1 gateway in front of one or more inference runtimes. The production gateway is live for chat and embeddings.

#Workspace

The scope unit for everything that costs money or carries quota: API keys, monthly token / image / audio caps, billing, and Observability dashboards. Users belong to one or more workspaces; each request attributes to exactly one.

#What next

Quickstart

Make your first API call in minutes.

Models overview

Browse the catalog and pick a model.

Errors & status codes

What 401, 403, and 429 mean and how to recover.