Rate Limiting
Embedd.to enforces rate limits per-organization. All API keys in an organization share a single requests-per-minute (RPM) pool determined by the org's subscription tier.
Limits by Tier
| Tier | Rate Limit |
|---|---|
| Free | 25 requests/minute |
| Pro | 300 requests/minute |
| Business | 1,000 requests/minute |
| Enterprise | Custom |
Rate Limit Headers
Every response includes rate limit headers:
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests per minute for your org |
X-RateLimit-Remaining | Requests remaining in current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Handling Rate Limits
When you exceed the rate limit, the API returns a 429 Too Many Requests response:
{
"error": {
"code": "rate_limit_exceeded",
"message": "Too many requests",
"resolution": "Retry after 45 seconds"
}
}
The response includes a Retry-After header with the number of seconds to wait.
Best Practices
- Implement exponential backoff — Wait longer between retries on repeated 429s
- Monitor rate limit headers — Track
X-RateLimit-Remainingto avoid hitting limits - Batch operations — Use bulk operations where available instead of individual requests
- Cache responses — Cache read-only responses to reduce API calls
How It Works
Rate limiting uses a sliding window algorithm backed by Redis. The window is keyed by organization ID, so all API keys within an org count toward the same limit. This prevents circumventing limits by creating multiple keys.
Exempt Endpoints
The /health endpoint is not rate limited.