Token Factory models are both dashboard-browsable and API-discoverable. Use this page when you need to pick a model, verify the facts that matter, and launch it through the right path.
Model Library in the Token Factory dashboard
GET /v1/models
Token-based pricing
#Pick by job, then verify by facts
Start from the workload. The catalog is useful because it connects task fit, model facts, benchmark context, and launch path in one place.
| If you need | Start with | Verify before launch |
|---|---|---|
| Chat or agents | Chat-capable live models | Tool-use support, context window, price, and benchmark fit |
| Embeddings or RAG | Embedding models | /v1/embeddings support, vector dimensions, retrieval quality, and cost |
| Vision | Vision-capable chat models | Image-input support in the catalog and the chat completions wire shape |
| Image, speech, or video | Matching modality tabs | Current public API status and Playground support |
| Long context | Context filter ≥ 128K or ≥ 256K | Whether your workload actually needs the headroom |
Pick the smallest model that plausibly fits the task, then test against your real prompts, latency budget, and cost target. Premium-by-default burns spend you may not need.
#Compare live models in the dashboard
Open Model Library to browse the workspace catalog. The Model Library is a workspace-scoped catalog of every model available to your team. Each entry carries:
- A model ID in the form
provider/model-name. The prefix carries meaning: anOmniva/…id (for example,Omniva/Kimi-K2.6) is an Omniva-optimized build of an open model — tuned and quantized for low-latency serving — while the same base model may also be offered under its upstream author prefix (for example,MiniMaxAI/MiniMax-M2.5) as standard open weights. Always copy the exact id from the catalog. - Modality — text, vision, embedding, image generation, audio, video, or speech.
- Context window length — token capacity for the model's input plus output.
- Cost class — Token-based pricing.
- Lifecycle stage — GA, Beta, or Coming Soon.
- Deployment-option toggles — which deploy paths the model supports.
- Benchmark and model-card notes — when published by the provider.
Filter by modality, sort by recently added, or open a model entry to see its complete specifications, including supported context length, function-calling capability, and any model-card notes from the provider.
Modality tabs are filter views over the same workspace catalog — switching tabs (OSS models, Vision, Images, Video, Speech, Embeddings) filters which models render, not which catalog you're browsing. The catalog is unified.
Use benchmark sections to compare live models when data is present. If a specific metric is missing, do not infer it from another model or provider. Treat missing metrics as unknown until Token Factory publishes a sourced value.
#Discover callable models from the API
Use GET /v1/models when your app needs to discover callable models programmatically from the public Token Factory API surface.
The public response is an OpenAI-compatible list, shaped for client discovery. The example below is trimmed to one entry for brevity — the actual response includes every model in your workspace catalog:
That API response is intentionally smaller than the dashboard catalog. The Model Library uses an internal catalog shape for richer UI fields such as capability chips, example pricing, benchmark sections, and deploy affordances. Do not build user-facing API clients against the internal catalog shape; use the public OpenAI-compatible list for programmatic model discovery.
GET /v1/models tells your app which model IDs are callable through the public API. The dashboard catalog tells a human how to compare, benchmark, and launch those models inside Token Factory.
#Launch with Token-based pricing
During Early Access, models launch through Token-based pricing: OpenAI-compatible
/v1 traffic billed per token, with no dedicated deployment to manage. It keeps the
request shape close to OpenAI-compatible calls and removes deployment management
overhead.
Dedicated Endpoints — reserved capacity for supported open models, with custom-model hosting to follow — are coming soon. Interested? Contact sales.
#Models you can try today
The supported models available through Token Factory Early Access, all callable through GET /v1/models:
Omniva/Kimi-K2.6(chat) — long-horizon, autonomous coding and multi-step agents. The canonical chat model used across these docs.Omniva/MiniMax-M2.5(chat) — frontier-level coding and tool-use agents at a fraction of frontier cost.Omniva/DeepSeek-V4-Pro(chat) — deep coding and mathematical reasoning with high accuracy.
For embeddings, the embeddings guide uses BAAI/bge-large-en-v1.5 as its canonical example.
Filter the Model Library by modality or capability chip to see every model your workspace can call today.