Rate Limiting

Embedd.to uses per-API-key rate limiting to ensure fair usage.

Default Limits

Tier	Rate Limit
Default	60 requests per minute

Rate limits are applied per API key using a sliding window algorithm backed by Redis.

Every response includes rate limit headers:

Header	Description
`X-RateLimit-Limit`	Maximum requests per minute
`X-RateLimit-Remaining`	Requests remaining in current window
`X-RateLimit-Reset`	Unix timestamp when the window resets

When you exceed the rate limit, the API returns a 429 Too Many Requests response:

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Too many requests",
    "resolution": "Retry after 45 seconds"
  }
}

The response includes a Retry-After header with the number of seconds to wait.

Implement exponential backoff — Wait longer between retries on repeated 429s
Monitor rate limit headers — Track X-RateLimit-Remaining to avoid hitting limits
Batch operations — Use bulk operations where available instead of individual requests
Cache responses — Cache read-only responses to reduce API calls

The /health endpoint is not rate limited.