Strengths
- Strong Start on Requirements: Quickly and clearly listed all core functional and non-functional requirements, with helpful numerical SLOs (e.g., low latency, durability, ordering).
- Great Time Management: Completed requirements in ~6 minutes, moved into high-level design by 10-minute mark – excellent pacing.
- Communication Protocol Justification: Solid comparison between WebSocket and other alternatives (e.g., long polling), clearly explaining why WS fits group chat use case.
- Entity Modeling: Defined core entities early with clarity; able to expand schemas for media handling and delivery logic as discussion deepened.
- API Mapping: Neatly tied each FR to a corresponding API signature — well-organized and practical thinking.
- End-to-End CRUD Design: Walked through logical building blocks and used self-checks to revisit how each FR is satisfied — proactive and thoughtful.
- Upload Handling: Included S3 pre-signed URL pattern for media upload and mapping to media_id — good depth.
- Online Status Handling: Proposed heartbeat mechanism and device_session tracking, showing understanding of real-time presence.
- Offline Support: Added in-app notification service and inbox table for unsent messages — complete coverage of user delivery states.
- Logical Deep Dive Structure: Gradual, “shallow-to-deep” HLD → bottlenecks → design improvements flow worked really well.
- Scalability Strategies:
- Used Kafka partitioning for message ordering.
- Suggested Redis Pub/Sub with <chat_id, user_ids[]> mappings and TCP-based routing for message delivery.
- Discussed hot channel detection, batching writes, cache usage, and sharded writes by chat_id.
- Delivery Semantics: Proper handling of retries, separation of retriable and non-retriable cases using dead letter queues.
- Storage Estimation: Able to walk through rough calculations to justify storage, read/write throughput, and horizontal scaling.
Suggestions for Improvement
- Entities Expansion: It would help to include channel_participants and device_sessions tables early in your entity modeling stage — they’re essential at scale.
- Mapping Correction: Be consistent in terminology — user_id → websocket_id is the correct mapping, not websocket_session_id.
- Inbox Design for Scale: When you introduce the inbox design for offline storage, proactively consider message duplication for large group chats (i.e., fan-out inbox vs. centralized log).
- Diagramming Opportunity: While the flow was clear verbally, including a sketch (even rough) during the explanation would further clarify the layered architecture (e.g., Kafka, WebSocket servers, Redis Pub/Sub, DB writes).
Overall Assessment
Xi performed very well in this session — great depth, solid fundamentals, excellent structure and proactivity throughout. With some polish on data modeling and even more rigorous anticipation of scale edge cases, Xi is well-prepared for real-world senior-level system design interviews. Keep up the momentum!