POST /v1/audio/speech · Token Factory Docs

POST/v1/audio/speech

Synthesize speech audio from input text. OpenAI-compatible wire shape — the openai SDKs work with a base_url swap to https://api.tokenfactory.omniva.com/v1.

Beta

Text-to-speech is in Beta. Voices, formats, and parameters may shift before GA.

#Authentication

Bearer token in the Authorization header. See Authentication for how to mint and rotate keys.

#Parameters

Field	Type	Description
modelrequired	string	ID of the text-to-speech model to use. Browse the Audio tab in Model Library for available IDs in your workspace.
inputrequired	string	The text to synthesize. Length limits depend on the model — typical cap is a few thousand characters per request.
voicerequired	string	Voice identifier. Each model exposes its own voice set — check the model card.
response_format	string	Audio container: "mp3", "opus", "aac", "flac", or "wav". Defaults to mp3. Default: `mp3`
speed	number	Playback speed multiplier. 1.0 is the model's natural cadence. Typical range 0.25 to 4.0. Default: `1.0`

#Response

The response body is binary audio bytes — not JSON. The Content-Type header reflects the requested response_format (e.g. audio/mpeg for mp3, audio/opus for opus).

Write the bytes to a file or stream them to a player. The shape below is illustrative — there is no JSON envelope to parse.

#Errors

Error responses are JSON, with the standard error envelope.

Status	Code	When
400	`invalid_payload`	`model`, `input`, or `voice` missing; or `response_format` unsupported by the model.
401	`unauthorized`	Missing or invalid Bearer token.
404	`model_not_found`	Unknown TTS model, or unknown voice for the model.
413	`payload_too_large`	`input` exceeded the model's character limit.
429	`rate_limited`	Rate limit or quota exceeded.
503	`upstream_unavailable`	No healthy upstream — retry with backoff.

See Errors for the full error envelope and retry guidance.

#Code samples

#What next

Models overview

Find audio-capable models in the catalog.