Topic 09Building Blocks

Rate Limiting

Protect your system from abuse, overload, and runaway clients.

Rate limiting caps how many requests a client can make in a time window. It protects your infrastructure from abuse and ensures fair resource distribution across all users.

Rate limiting algorithms

Four common approaches with different tradeoffs.

›Fixed window — count requests per window (100 req/min). Simple. Edge case: burst at window boundary.
›Sliding window log — track timestamps of each request. Accurate, memory-heavy.
›Sliding window counter — hybrid of fixed windows. Good accuracy at low memory cost.
›Token bucket — tokens refill at a rate; each request consumes a token. Allows bursts.
›Leaky bucket — requests processed at constant rate. Smooth output, queue fills under burst.

Where to enforce limits

Rate limiting can live at different layers.

›API Gateway — best place for global limits, before any service logic runs
›Application layer — per-user or per-endpoint logic, needs shared state
›Redis — store counters in Redis with TTL for distributed enforcement
›CDN / Edge — block abusive IPs before traffic hits your infrastructure

What to rate limit by

The granularity of your limit affects fairness and complexity.

›IP address — simplest, but shared IPs (NAT, offices) unfairly affect many users
›User ID / API key — fairer, requires authentication
›Endpoint-specific — different limits for cheap vs expensive operations
›Tenant / organization — useful for B2B products with usage tiers

Interview tips

✓Rate limiting comes up in any high-traffic or public API design
✓Name the algorithm and justify it — don't just say 'I'd rate limit'
✓Address distributed systems: how do you share counters across servers?
✓Mention graceful degradation: queue instead of reject where possible

Follow-up questions to expect

?How do you enforce rate limits consistently across 50 API servers?
?What do you return to the client when they're rate limited?
?How would you design different rate limits for free vs paid users?

TLDR

›Token bucket for API rate limiting — allows controlled bursts
›Store counters in Redis with TTL for distributed enforcement
›Limit by user ID or API key, not just IP
›Return 429 Too Many Requests with a Retry-After header
›Rate limit at the API gateway before requests reach your services

Building blocks

Systems

←Replication & Sharding Design a URL Shortener→