Topic 09Building Blocks
Rate Limiting
Protect your system from abuse, overload, and runaway clients.
Rate limiting caps how many requests a client can make in a time window. It protects your infrastructure from abuse and ensures fair resource distribution across all users.
Rate limiting algorithms
Four common approaches with different tradeoffs.
- ›Fixed window — count requests per window (100 req/min). Simple. Edge case: burst at window boundary.
- ›Sliding window log — track timestamps of each request. Accurate, memory-heavy.
- ›Sliding window counter — hybrid of fixed windows. Good accuracy at low memory cost.
- ›Token bucket — tokens refill at a rate; each request consumes a token. Allows bursts.
- ›Leaky bucket — requests processed at constant rate. Smooth output, queue fills under burst.
Where to enforce limits
Rate limiting can live at different layers.
- ›API Gateway — best place for global limits, before any service logic runs
- ›Application layer — per-user or per-endpoint logic, needs shared state
- ›Redis — store counters in Redis with TTL for distributed enforcement
- ›CDN / Edge — block abusive IPs before traffic hits your infrastructure
What to rate limit by
The granularity of your limit affects fairness and complexity.
- ›IP address — simplest, but shared IPs (NAT, offices) unfairly affect many users
- ›User ID / API key — fairer, requires authentication
- ›Endpoint-specific — different limits for cheap vs expensive operations
- ›Tenant / organization — useful for B2B products with usage tiers
Interview tips
- ✓Rate limiting comes up in any high-traffic or public API design
- ✓Name the algorithm and justify it — don't just say 'I'd rate limit'
- ✓Address distributed systems: how do you share counters across servers?
- ✓Mention graceful degradation: queue instead of reject where possible
Follow-up questions to expect
- ?How do you enforce rate limits consistently across 50 API servers?
- ?What do you return to the client when they're rate limited?
- ?How would you design different rate limits for free vs paid users?
TLDR
- ›Token bucket for API rate limiting — allows controlled bursts
- ›Store counters in Redis with TTL for distributed enforcement
- ›Limit by user ID or API key, not just IP
- ›Return 429 Too Many Requests with a Retry-After header
- ›Rate limit at the API gateway before requests reach your services