Synthesize speech audio from input text. OpenAI-compatible wire shape — the openai SDKs work with a base_url swap to https://api.tokenfactory.omniva.com/v1.
Text-to-speech is in Beta. Voices, formats, and parameters may shift before GA.
#Authentication
Bearer token in the Authorization header. See Authentication for how to mint and rotate keys.
#Parameters
| Field | Type | Description |
|---|---|---|
modelrequired | string | ID of the text-to-speech model to use. Browse the Audio tab in Model Library for available IDs in your workspace. |
inputrequired | string | The text to synthesize. Length limits depend on the model — typical cap is a few thousand characters per request. |
voicerequired | string | Voice identifier. Each model exposes its own voice set — check the model card. |
response_format | string | Audio container: "mp3", "opus", "aac", "flac", or "wav". Defaults to mp3. Default: mp3 |
speed | number | Playback speed multiplier. 1.0 is the model's natural cadence. Typical range 0.25 to 4.0. Default: 1.0 |
#Response
The response body is binary audio bytes — not JSON. The Content-Type header reflects the requested response_format (e.g. audio/mpeg for mp3, audio/opus for opus).
Write the bytes to a file or stream them to a player. The shape below is illustrative — there is no JSON envelope to parse.
#Errors
Error responses are JSON, with the standard error envelope.
| Status | Code | When |
|---|---|---|
| 400 | invalid_payload | model, input, or voice missing; or response_format unsupported by the model. |
| 401 | unauthorized | Missing or invalid Bearer token. |
| 404 | model_not_found | Unknown TTS model, or unknown voice for the model. |
| 413 | payload_too_large | input exceeded the model's character limit. |
| 429 | rate_limited | Rate limit or quota exceeded. |
| 503 | upstream_unavailable | No healthy upstream — retry with backoff. |
See Errors for the full error envelope and retry guidance.