Tony has provided the following feedback on your session:
Rafael demonstrated strong architectural thinking and communication throughout this complex system design mock focused on Slack-like messaging at OpenAI scale. He showed an ability to balance real-time messaging guarantees with practical tradeoffs in availability, latency, and complexity. The conversation was structured, with strong pacing and technical depth.
Areas of strength included:
- Solid assumptions and scalability awareness, including both peak and average load estimations.
- Fast and structured breakdown of functional requirements, plus thoughtful CAP theorem tradeoff discussions (e.g., high availability at system level vs. strong consistency in message delivery).
- Deep familiarity with websocket vs. long polling, and good initiative in focusing the discussion on online users first.
- Clear high-level design with workflow and entity expansion — modeling thread as parent of message, supporting both 1-1 DMs and channels.
- Strong understanding of entity relationships (user, message, channel, channel_participants, etc.) and how those power the backend.
- Great breakdown of websocket message events and payload structures.
- Excellent treatment of media message handling — pre-signed S3 URLs, media keys, upload parallelism, and why URLs don’t need to be tracked.
- Smart inclusion of CDNs and recipient-side media fetching using keys.
- Consideration for multi-device session management, local caching, and handling message history fetch with pagination.
- Strong modeling of offline inbox flows and flush-on-reconnect behavior.
- Good insight into hot channels (celebrity-style) — switching to pull rather than push.
- Nuanced discussion of message ordering, including lock-based service-level monotonic timestamps vs. Kafka queue partitioning by chat_id.
- Concrete breakdown of websocket connection flow and Redis TTL caching, including:
- User connects to ws server
- WS fetches chat_ids
- WS caches mapping with TTL lease
- WS subscribes to topic
- WS unsubscribes on TTL expiry
- Device+user-level mapping for deduplication
- Good idempotency design using dedupe keys
Areas for improvement:
- Start with clearer assumptions, especially around initial channel creation and setup, before jumping into message sending logic.
- Re-order some discussion — consider covering online/offline scenarios first, then deletion logic, to keep narrative clean.
- For message history fetch, discuss both volume and latency challenges more deeply — e.g., timestamp and message_id-based indexing in metadata DB to preserve order.
- For celebrity channels, consider inbox message expiry after 30 days, and fetching from DB only on user-triggered access (with optional caching layer).