Topic 10System Walkthroughs
Design a URL Shortener
Classic first system design. High read QPS, simple write path.
Design a service like bit.ly. Users submit a long URL and get a short code back. Anyone who visits the short URL is redirected to the original. Sounds simple — the interesting parts are scale and reliability.
Requirements
Clarify scope before designing.
- ›Functional: shorten a URL, redirect short URL to original, optionally track click analytics
- ›Non-functional: 100M URLs created/day, 10:1 read-to-write ratio, URLs live for 10 years
- ›Scale estimate: 10B redirects/day ≈ ~115K QPS reads, ~1.1K QPS writes
- ›Storage: 100M URLs/day × 365 × 10 years × ~500 bytes ≈ ~180 TB total
Core design
The key components and decisions.
- ›Short code generation: base62 encode a unique ID (a-z, A-Z, 0-9 = 62 chars, 7 chars = 3.5T combinations)
- ›ID generation: auto-increment DB ID, or a distributed ID generator (Snowflake)
- ›Storage: write URL mapping to PostgreSQL. Cache hot URLs in Redis.
- ›Redirect: 301 (permanent, browser caches — saves QPS) vs 302 (temporary — better for analytics)
- ›Read path: check Redis cache → if miss, hit DB → return redirect
Scaling considerations
At 115K read QPS, the bottleneck is clear.
- ›Cache aggressively — 80% of redirects hit the same 20% of URLs
- ›Add read replicas for the database
- ›CDN for redirect responses if using 301 caching
- ›Shard by short code hash if write volume ever becomes a problem
Tradeoffs
Decisions worth discussing with the interviewer.
- ›301 vs 302 redirect: 301 saves server load; 302 enables per-click analytics
- ›Custom aliases: allow users to choose short code? Need uniqueness enforcement.
- ›Expiration: TTL on URLs adds complexity but controls storage growth
- ›Analytics: separate analytics write path from redirect path to avoid latency coupling
Interview tips
- ✓Always estimate QPS before proposing architecture
- ✓Explain the 301 vs 302 tradeoff — it shows product thinking
- ✓Mention ID generation strategy: DB auto-increment vs Snowflake
- ✓Address hash collisions if you use hashing instead of encoding
Follow-up questions to expect
- ?How do you prevent someone from guessing other users' short URLs?
- ?How would you add real-time click analytics?
- ?How do you handle URL expiration at scale?
TLDR
- ›Short code = base62-encoded unique ID
- ›Read-heavy: cache aggressively in Redis with LRU
- ›301 saves server load; 302 enables analytics — pick based on requirement
- ›Separate the analytics write path from the critical redirect path
- ›At scale: read replicas + cache covers almost all read load