Create one or more embeddings (vector representations) from input text. OpenAI-compatible wire shape — the openai SDKs work with a base_url swap to https://api.tokenfactory.omniva.com/v1.
#Authentication
Bearer token in the Authorization header. See Authentication for how to mint and rotate keys.
#Parameters
| Field | Type | Description |
|---|---|---|
modelrequired | string | ID of the embeddings model to use. Browse the Embeddings tab in Model Library for available IDs in your workspace. |
inputrequired | string | string[] | The text to embed. Pass a single string for one embedding, or an array of strings to embed a batch in a single call. Embeddings in the response preserve input order via the index field. |
#Response
Returns a list object whose data contains one embedding per input, in the same order as the request.
Production embeddings are 1024-dimensional for BAAI/bge-large-en-v1.5; other models in the catalog use different dimensionalities — check the model's catalog entry. The example below is truncated for readability (... stands in for the remaining 1020 floats).
#Response fields
object— always"list".data[].object— always"embedding".data[].index— zero-based position matching the corresponding input.data[].embedding— array of floats. Dimensionality depends on the model.model— echoes the requested model ID.usage.prompt_tokens/usage.total_tokens— token accounting for billing.
#Batched input
Pass an array of strings to embed multiple inputs in a single call. The response data array preserves input order via index — if your downstream code depends on order, key off index rather than assuming positional alignment.
Request:
Response:
#Errors
| Status | Code | When |
|---|---|---|
| 400 | invalid_payload | model or input is missing from the request body. |
| 401 | unauthorized | Missing Authorization header or non-Bearer scheme. |
| 401 | invalid_api_key | Bearer token did not resolve to a live API key. |
| 404 | model_not_found | The requested model ID is not enabled for your workspace. |
| 429 | rate_limited | Workspace rate limit exceeded — back off and retry with jitter. |
| 500 | internal_error | Unexpected gateway failure — safe to retry. |
| 503 | upstream_unavailable | The upstream model provider is temporarily unreachable — retry with backoff. |
See Errors for the full error envelope and retry guidance.
#Code samples
#Not yet supported
The handler currently accepts only model and input. The following OpenAI request fields are recognized in the spec but not yet implemented — they are silently ignored if sent:
encoding_format— output is always a JSON array of floats.base64encoding is not available.dimensions— output dimensionality is fixed by the model; you cannot request a truncated vector.user— end-user identifier for abuse tracking is not propagated.
These parameters are on the roadmap for parity with the OpenAI embeddings API. Track progress in the changelog.
#What next
Common embedding workflows: indexing, retrieval, similarity, batching.
Browse the embedding models available in your workspace.
The sibling endpoint for text generation against the same gateway.
Understand per-call costs and workspace caps for embedding traffic.