Skip to main content

Rate Limiting

Embedd.to enforces rate limits per-organization. All API keys in an organization share a single requests-per-minute (RPM) pool determined by the org's subscription tier.

Limits by Tier

TierRate Limit
Free25 requests/minute
Pro300 requests/minute
Business1,000 requests/minute
EnterpriseCustom

Rate Limit Headers

Every response includes rate limit headers:

HeaderDescription
X-RateLimit-LimitMaximum requests per minute for your org
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetUnix timestamp when the window resets

Handling Rate Limits

When you exceed the rate limit, the API returns a 429 Too Many Requests response:

{
"error": {
"code": "rate_limit_exceeded",
"message": "Too many requests",
"resolution": "Retry after 45 seconds"
}
}

The response includes a Retry-After header with the number of seconds to wait.

Best Practices

  • Implement exponential backoff — Wait longer between retries on repeated 429s
  • Monitor rate limit headers — Track X-RateLimit-Remaining to avoid hitting limits
  • Batch operations — Use bulk operations where available instead of individual requests
  • Cache responses — Cache read-only responses to reduce API calls

How It Works

Rate limiting uses a sliding window algorithm backed by Redis. The window is keyed by organization ID, so all API keys within an org count toward the same limit. This prevents circumventing limits by creating multiple keys.

Exempt Endpoints

The /health endpoint is not rate limited.