Topic 12System Walkthroughs
Design a Chat App
Real-time messaging, message history, and delivery guarantees.
Design a system like WhatsApp or Slack. Users send messages to individuals or groups, messages are delivered in real-time, and history is persisted. The core challenge is real-time delivery at scale.
Requirements
Scoping the problem.
- ›1-to-1 and group messaging (up to 100 members per group)
- ›Messages delivered in real-time when recipient is online
- ›Messages persisted and retrievable when recipient was offline
- ›50M DAU, avg 40 messages/day per user → ~23K messages/sec
- ›Message storage: 23K msg/sec × 100 bytes × 86,400 × 365 ≈ 70 TB/year
Real-time delivery
How messages get to recipients immediately.
- ›WebSocket — persistent bidirectional connection between client and chat server
- ›Each user maintains a WebSocket connection to a chat server
- ›Message arrives → chat server looks up recipient's connection → pushes immediately
- ›Connection service maps user ID → which chat server holds their WebSocket
- ›Long polling as fallback for environments that don't support WebSockets
Message storage and history
Persisting messages for retrieval.
- ›Cassandra for message storage — write-heavy, append-only, partition by conversation ID
- ›Each message gets a unique monotonic ID (Snowflake) for ordering
- ›Offline delivery: message stored → when user reconnects, fetch unread messages
- ›Message sync: client tracks last seen message ID, fetches delta on reconnect
Group messaging
Fan-out to multiple recipients.
- ›Small groups: fan-out on write to all member connections
- ›Large groups (Slack channels): fan-out via message queue — each member's server pulls
- ›Read receipts: separate lightweight event, not the critical path
Interview tips
- ✓Always specify WebSockets for real-time — 'HTTP polling' is wrong here
- ✓Address offline delivery — a common follow-up
- ✓Mention message ordering and deduplication
- ✓Separate the concerns: presence, delivery, storage, and notifications
Follow-up questions to expect
- ?How do you handle message ordering across distributed servers?
- ?How do you implement end-to-end encryption?
- ?How do you scale to 1000-person group chats?
TLDR
- ›WebSockets for real-time delivery — persistent bidirectional connection
- ›Cassandra for message storage — write-heavy, append-only access pattern
- ›Connection service maps user ID to chat server for routing
- ›Offline users: store messages, deliver on reconnect
- ›Large groups: queue-based fan-out to avoid thundering herd