POST /v1/embeddings · Token Factory Docs

POST/v1/embeddings

Create one or more embeddings (vector representations) from input text. OpenAI-compatible wire shape — the openai SDKs work with a base_url swap to https://api.tokenfactory.omniva.com/v1.

#Authentication

Bearer token in the Authorization header. See Authentication for how to mint and rotate keys.

#Parameters

Field	Type	Description
modelrequired	string	ID of the embeddings model to use. Browse the Embeddings tab in Model Library for available IDs in your workspace.
inputrequired	string \| string[]	The text to embed. Pass a single string for one embedding, or an array of strings to embed a batch in a single call. Embeddings in the response preserve input order via the index field.

#Response

Returns a list object whose data contains one embedding per input, in the same order as the request.

Production embeddings are 1024-dimensional for BAAI/bge-large-en-v1.5; other models in the catalog use different dimensionalities — check the model's catalog entry. The example below is truncated for readability (... stands in for the remaining 1020 floats).

#Response fields

object — always "list".
data[].object — always "embedding".
data[].index — zero-based position matching the corresponding input.
data[].embedding — array of floats. Dimensionality depends on the model.
model — echoes the requested model ID.
usage.prompt_tokens / usage.total_tokens — token accounting for billing.

#Batched input

Pass an array of strings to embed multiple inputs in a single call. The response data array preserves input order via index — if your downstream code depends on order, key off index rather than assuming positional alignment.

Request:

Response:

#Errors

Status	Code	When
400	`invalid_payload`	`model` or `input` is missing from the request body.
401	`unauthorized`	Missing `Authorization` header or non-Bearer scheme.
401	`invalid_api_key`	Bearer token did not resolve to a live API key.
404	`model_not_found`	The requested `model` ID is not enabled for your workspace.
429	`rate_limited`	Workspace rate limit exceeded — back off and retry with jitter.
500	`internal_error`	Unexpected gateway failure — safe to retry.
503	`upstream_unavailable`	The upstream model provider is temporarily unreachable — retry with backoff.

See Errors for the full error envelope and retry guidance.

#Code samples

#Not yet supported

The handler currently accepts only model and input. The following OpenAI request fields are recognized in the spec but not yet implemented — they are silently ignored if sent:

encoding_format — output is always a JSON array of floats. base64 encoding is not available.
dimensions — output dimensionality is fixed by the model; you cannot request a truncated vector.
user — end-user identifier for abuse tracking is not propagated.

Coming

These parameters are on the roadmap for parity with the OpenAI embeddings API. Track progress in the changelog.

#What next

Embeddings guide

Common embedding workflows: indexing, retrieval, similarity, batching.

Models overview

Browse the embedding models available in your workspace.

Chat completions API

The sibling endpoint for text generation against the same gateway.

Tokens, pricing & quotas

Understand per-call costs and workspace caps for embedding traffic.