Skip to main content

Rate Limiting

Embedd.to uses per-API-key rate limiting to ensure fair usage.

Default Limits

TierRate Limit
Default60 requests per minute

Rate limits are applied per API key using a sliding window algorithm backed by Redis.

Rate Limit Headers

Every response includes rate limit headers:

HeaderDescription
X-RateLimit-LimitMaximum requests per minute
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetUnix timestamp when the window resets

Handling Rate Limits

When you exceed the rate limit, the API returns a 429 Too Many Requests response:

{
"error": {
"code": "rate_limit_exceeded",
"message": "Too many requests",
"resolution": "Retry after 45 seconds"
}
}

The response includes a Retry-After header with the number of seconds to wait.

Best Practices

  • Implement exponential backoff — Wait longer between retries on repeated 429s
  • Monitor rate limit headers — Track X-RateLimit-Remaining to avoid hitting limits
  • Batch operations — Use bulk operations where available instead of individual requests
  • Cache responses — Cache read-only responses to reduce API calls

Exempt Endpoints

The /health endpoint is not rate limited.