Models overview · Token Factory Docs

Choose, compare, and launch the right model from the Model Library and Token Factory API.

Token Factory models are both dashboard-browsable and API-discoverable. Use this page when you need to pick a model, verify the facts that matter, and launch it through the right path.

Browse

Model Library in the Token Factory dashboard

List

GET /v1/models

Launch

Token-based pricing

#Pick by job, then verify by facts

Start from the workload. The catalog is useful because it connects task fit, model facts, benchmark context, and launch path in one place.

If you need	Start with	Verify before launch
Chat or agents	Chat-capable live models	Tool-use support, context window, price, and benchmark fit
Embeddings or RAG	Embedding models	`/v1/embeddings` support, vector dimensions, retrieval quality, and cost
Vision	Vision-capable chat models	Image-input support in the catalog and the chat completions wire shape
Image, speech, or video	Matching modality tabs	Current public API status and Playground support
Long context	Context filter `≥ 128K` or `≥ 256K`	Whether your workload actually needs the headroom

Profile, don't guess

Pick the smallest model that plausibly fits the task, then test against your real prompts, latency budget, and cost target. Premium-by-default burns spend you may not need.

#Compare live models in the dashboard

Open Model Library to browse the workspace catalog. The Model Library is a workspace-scoped catalog of every model available to your team. Each entry carries:

A model ID in the form provider/model-name. The prefix carries meaning: an Omniva/… id (for example, Omniva/Kimi-K2.6) is an Omniva-optimized build of an open model — tuned and quantized for low-latency serving — while the same base model may also be offered under its upstream author prefix (for example, MiniMaxAI/MiniMax-M2.5) as standard open weights. Always copy the exact id from the catalog.
Modality — text, vision, embedding, image generation, audio, video, or speech.
Context window length — token capacity for the model's input plus output.
Cost class — Token-based pricing.
Lifecycle stage — GA, Beta, or Coming Soon.
Deployment-option toggles — which deploy paths the model supports.
Benchmark and model-card notes — when published by the provider.

Filter by modality, sort by recently added, or open a model entry to see its complete specifications, including supported context length, function-calling capability, and any model-card notes from the provider.

Modality tabs are filter views over the same workspace catalog — switching tabs (OSS models, Vision, Images, Video, Speech, Embeddings) filters which models render, not which catalog you're browsing. The catalog is unified.

Benchmarks are selection evidence, not marketing claims

Use benchmark sections to compare live models when data is present. If a specific metric is missing, do not infer it from another model or provider. Treat missing metrics as unknown until Token Factory publishes a sourced value.

#Discover callable models from the API

Use GET /v1/models when your app needs to discover callable models programmatically from the public Token Factory API surface.

The public response is an OpenAI-compatible list, shaped for client discovery. The example below is trimmed to one entry for brevity — the actual response includes every model in your workspace catalog:

That API response is intentionally smaller than the dashboard catalog. The Model Library uses an internal catalog shape for richer UI fields such as capability chips, example pricing, benchmark sections, and deploy affordances. Do not build user-facing API clients against the internal catalog shape; use the public OpenAI-compatible list for programmatic model discovery.

Public list and dashboard catalog are related, not identical

GET /v1/models tells your app which model IDs are callable through the public API. The dashboard catalog tells a human how to compare, benchmark, and launch those models inside Token Factory.

#Launch with Token-based pricing

During Early Access, models launch through Token-based pricing: OpenAI-compatible /v1 traffic billed per token, with no dedicated deployment to manage. It keeps the request shape close to OpenAI-compatible calls and removes deployment management overhead.

Dedicated capacity and custom models are coming soon

Dedicated Endpoints — reserved capacity for supported open models, with custom-model hosting to follow — are coming soon. Interested? Contact sales.

#Models you can try today

The supported models available through Token Factory Early Access, all callable through GET /v1/models:

Omniva/Kimi-K2.6 (chat) — long-horizon, autonomous coding and multi-step agents. The canonical chat model used across these docs.
Omniva/MiniMax-M2.5 (chat) — frontier-level coding and tool-use agents at a fraction of frontier cost.
Omniva/DeepSeek-V4-Pro (chat) — deep coding and mathematical reasoning with high accuracy.

For embeddings, the embeddings guide uses BAAI/bge-large-en-v1.5 as its canonical example.

Filter the Model Library by modality or capability chip to see every model your workspace can call today.

#What next

Quickstart

Make the first Token Factory request with the OpenAI-compatible base URL.

Chat completions API

See the request and response shape for chat models, streaming, tools, and vision inputs.

Tokens, pricing & quotas

Understand token accounting, pricing modes, and workspace quota behavior.