Rate Limiting

Embedd.to enforces rate limits per-organization. All API keys in an organization share a single requests-per-minute (RPM) pool determined by the org's subscription tier.

Limits by Tier

Tier	Rate Limit
Free	25 requests/minute
Pro	300 requests/minute
Business	1,000 requests/minute
Enterprise	Custom

Rate Limit Headers

Every response includes rate limit headers:

Header	Description
`X-RateLimit-Limit`	Maximum requests per minute for your org
`X-RateLimit-Remaining`	Requests remaining in current window
`X-RateLimit-Reset`	Unix timestamp when the window resets

Handling Rate Limits

When you exceed the rate limit, the API returns a 429 Too Many Requests response:

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Too many requests",
    "resolution": "Retry after 45 seconds"
  }
}

The response includes a Retry-After header with the number of seconds to wait.

Best Practices

Implement exponential backoff — Wait longer between retries on repeated 429s
Monitor rate limit headers — Track X-RateLimit-Remaining to avoid hitting limits
Batch operations — Use bulk operations where available instead of individual requests
Cache responses — Cache read-only responses to reduce API calls

How It Works

Rate limiting uses a sliding window algorithm backed by Redis. The window is keyed by organization ID, so all API keys within an org count toward the same limit. This prevents circumventing limits by creating multiple keys.

Exempt Endpoints

The /health endpoint is not rate limited.

Limits by Tier​

Rate Limit Headers​

Handling Rate Limits​

Best Practices​

How It Works​

Exempt Endpoints​