unemployed.dev☕ Support
system-design/rate-limiting
Topic 09Building Blocks

Rate Limiting

Protect your system from abuse, overload, and runaway clients.

Rate limiting caps how many requests a client can make in a time window. It protects your infrastructure from abuse and ensures fair resource distribution across all users.

Rate limiting algorithms

Four common approaches with different tradeoffs.

  • Fixed window — count requests per window (100 req/min). Simple. Edge case: burst at window boundary.
  • Sliding window log — track timestamps of each request. Accurate, memory-heavy.
  • Sliding window counter — hybrid of fixed windows. Good accuracy at low memory cost.
  • Token bucket — tokens refill at a rate; each request consumes a token. Allows bursts.
  • Leaky bucket — requests processed at constant rate. Smooth output, queue fills under burst.

Where to enforce limits

Rate limiting can live at different layers.

  • API Gateway — best place for global limits, before any service logic runs
  • Application layer — per-user or per-endpoint logic, needs shared state
  • Redis — store counters in Redis with TTL for distributed enforcement
  • CDN / Edge — block abusive IPs before traffic hits your infrastructure

What to rate limit by

The granularity of your limit affects fairness and complexity.

  • IP address — simplest, but shared IPs (NAT, offices) unfairly affect many users
  • User ID / API key — fairer, requires authentication
  • Endpoint-specific — different limits for cheap vs expensive operations
  • Tenant / organization — useful for B2B products with usage tiers

Interview tips

  • Rate limiting comes up in any high-traffic or public API design
  • Name the algorithm and justify it — don't just say 'I'd rate limit'
  • Address distributed systems: how do you share counters across servers?
  • Mention graceful degradation: queue instead of reject where possible

Follow-up questions to expect

  • ?How do you enforce rate limits consistently across 50 API servers?
  • ?What do you return to the client when they're rate limited?
  • ?How would you design different rate limits for free vs paid users?
TLDR
  • Token bucket for API rate limiting — allows controlled bursts
  • Store counters in Redis with TTL for distributed enforcement
  • Limit by user ID or API key, not just IP
  • Return 429 Too Many Requests with a Retry-After header
  • Rate limit at the API gateway before requests reach your services