Design a Slack-like chat system that supports 1:1 and group messaging, threaded conversations, and message deletion.

Slack

Coach with Author

Book a 90-minute 1:1 coaching session with the author of this post and video — get tailored feedback, real-world insights, and system design strategy tips.

Let’s sharpen your skills and have some fun doing it!

Designing a Slack-like chat app isn't just about sending messages — it’s about scaling reliably, handling failure, and keeping user experience smooth under load. Before diving in, ask yourself:

  1. How would you ensure strict message ordering in group chats or multi-clients when messages are sent concurrently from distributed devices?
  2. How would you design real-time notifications for 100x millions of users, supporting online (WebSocket) and offline (push) delivery without overwhelming your backend?
  3. How would you allow users to delete messages — and propagate that change to all clients — while supporting audit logs, retention policies, and long chat histories?
  4. How would you scale your infrastructure to handle users frequently going online and offline across data centers — while maintaining low-latency fanout and avoiding duplicate or missed messages?

Upgraded Challenge: How to design a ChatGPT-like ChatApp?

These aren’t toy problems — they’re real challenges from production-scale messaging systems. Think through your solutions, then follow along to see how we solve them, step by step.

Functional Requirements

1. Users can send and receive messages in 1:1 or group chats

This covers both private and group conversations, with messages delivered in order and stored persistently. Group chats can support dynamic membership and maintain message history.

2. Users can send rich media (e.g., images, videos, files) as part of a message

Messages can include text, attachments, or both. The system must support media upload, storage, and preview rendering across devices.

3. Users can receive real-time notifications for new messages

When a user receives a new message—either in a direct chat or group—they should be notified through in-app indicators, push notifications, or email (based on settings and availability).

4. Users can delete their own messages from a chat

A user should be able to remove a message they previously sent. The deletion may be visible (e.g., “This message was deleted”) or fully removed from view, depending on the UX and policy.

Non-Functional Requirements

1. High Scalability

The system should support 1 B users and 100k chats concurrently, especially in high-traffic group chats.

Illustration: Imagine a global company’s #all-hands channel with 100k active users — your system must fan out messages quickly without bottlenecks.

When discussing non-functional requirements in interviews, don’t just say "high scalability" — you need to quantify it.

For example, instead of saying: "The system should handle lots of users", you can say:
"The system should support 10M+ users, with 5K concurrent messages per second in large chat rooms."

Adding numeric expectations (like target QPS, latency percentiles, or fan-out scale) shows that you understand real-world constraints.

This is a Staff+ signal — it shows not just that you can design systems, but that you've seen them operate at scale.

2. Low Latency in Chat Delivery

Aim for P95 end-to-end message latency under 200ms for online users.

Illustration: A user sends a message and expects to see it reflected across all devices in near real-time — delays over 500ms degrade the user experience.

Low latency in messaging systems isn’t about benchmarks — it’s about perceived real-time-ness. Once message delivery feels delayed, trust and engagement drop fast.

In practice, aiming for P95 latency under 200ms means thinking end-to-end:

  • Client → API Gateway: 30ms
  • Message Broker or Queue Delay: 10–50ms
  • Fan-out to Recipients (DB or WebSocket push): 50–80ms
  • Client render time: 20–40ms

What Staff+ candidates do well:

They don’t just say "low latency" — they outline the latency budget, identify hot paths, and explain what tradeoffs they’d make when things spike.

Bonus: If you can say “we’d fall back to degraded fan-out or batch sends at 500ms+ load,” that’s a signal you’ve seen the real fire.

3. CAP - Aware Messaging Guarantees

The system should prefer high availability, but must enforce strong ordering consistency within each chat room.

Illustration: Even if one server fails, the user should still be able to chat — and messages in a group must never arrive out of order.

CAP Theorem states that in any distributed system, you can only guarantee two out of the following three properties at the same time:

  1. Consistency: All clients see the same data, even when there are concurrent updates.
  2. Availability: Every request receives a response, even when some servers fail.
  3. Partition Tolerance: The system continues to operate even if network delays or failures split it into disconnected parts.

Since network partitions are inevitable in real-world systems, the practical choice is between consistency and availability.

Many materials simplify this by choosing availability over consistency. For example, they might accept out-of-order messages to keep the chat responsive. But in messaging systems, users often expect messages to appear in order — especially in group chats or support conversations.

Our recommendation for Staff+ engineers:

Instead of picking a default, use this as an opportunity for deeper discussion. Ask:

  • What level of consistency is truly required for this feature?
  • Can we isolate consistency boundaries (e.g., enforce strong ordering within each chat room)?
  • Can we design for availability while still guaranteeing a good user experience under partition?

Treat CAP not as a binary choice, but as a design framework. Good engineers make a decision. Great engineers justify it in context.

4. Message Durability

Messages must be safely persisted before confirming to the sender. No acknowledged message should be lost, even if the server crashes right after.

Illustration: If a user hits send and then immediately loses internet, the message must still exist when they reconnect.

Just because a message was sent — or even delivered — doesn’t mean it can be deleted. In real-world chat systems, a message must remain durably stored until one of the following conditions is met:

  • The sender explicitly deletes the message (user-initiated deletion).
  • The system enforces a retention policy, such as auto-deletion after 30 days (e.g., for compliance or storage limits).

Even if a recipient has "seen" the message, it must stay in backend storage for multi-device sync, scroll-back history, and recovery after app reload.

Staff+ engineers should treat durability and deletion as two separate concerns — never assume delivery means it’s safe to erase.

5. Consistent in Multi-Device

Users often stay logged in on multiple devices — desktop, phone, tablet — at the same time. The system must ensure that messages, read receipts, and typing indicators remain synchronized across all active sessions. Events such as message delivery, deletion, or read status must be reflected in real-time across all clients.

Illustration: A user reads a message on their laptop — seconds later, their phone reflects it as read, clears the notification badge, and doesn’t re-alert them. We put this in the end, is to make sure all above functions can be fulfilled first and cross-device session can hold them all.

It’s easy to overlook this, but messaging systems must account for users with multiple simultaneous sessions. Without proper sync, users may:

  • See stale read states
  • Get duplicate notifications
  • Lose trust in the product's reliability

Staff+ candidates stand out by calling out these edge cases and proposing:

  • Session tracking per device
  • Idempotent event delivery
  • Consistent state propagation across all logged-in clients

If you bring this up in an interview — you’re showing real-world maturity.

Requirement Summary

FRs Description
1. Messaging in 1:1 and Group Chats Users can send and receive messages in private or group conversations, with ordered delivery and persistent history.
2. Rich Media Support Messages can include images, videos, and files; media is uploaded, stored, and rendered across devices.
3. Real-Time Notifications Users receive in-app and push notifications when new messages are sent in chats they belong to.
4. Message Deletion Users can delete their own messages; deletions must reflect across all devices and preserve conversation flow.
NFRs Target / Guarantee
1. High Scalability Support 1B+ MAU and 100K+ concurrent messages/sec across large chat rooms.
2. Low Latency Ensure P95 end-to-end message delivery latency under 200ms.
3. CAP-Aware Messaging Guarantees Prioritize availability, but enforce strong per-chat message ordering.
4. Message Durability Persist messages before acknowledging; no acknowledged message should ever be lost.
5. Consistent in Multi-Device P95 sync latency < 300ms across devices; <1% inconsistency in read state or notifications.

Before diving into Entities & APIs, you may notice that we’ve spent a significant amount of time unpacking both functional and non-functional requirements. That’s intentional. In many existing materials, this critical phase is often rushed or oversimplified — but in real-world design and high-level interviews, clarity around what the system must do and how it must behave under pressure is what separates great solutions from generic ones. We aim to set a higher bar here, not just to define the problem well, but to invite thoughtful trade-off discussions, scalability considerations, and system behaviors that hold up in production. The better your foundation, the sharper your design decisions will be — and we’ll carry that mindset through the rest of this article.

Core Entities

Entity Description
User A registered participant who can chat and receive notifications.
Chat A channel or direct conversation with a list of members.
Message A text or media unit sent to a chat.
Media Media asset linked to a message (image, video, file).
DeviceSession Active WebSocket or push-notification endpoint for a user’s device.

Slack Threaded Messages: A Crucial Extension to the Message Entity

In Slack-like systems, Threads allow users to reply to a specific message — creating a nested conversation within a larger chat.

To support this, the Message entity should include an optional field:
parent_message_id (nullable)

If present, this indicates the message is a thread reply, and the original message becomes the thread root.

This design allows you to:

  • Group replies under a thread view
  • Fetch all replies for a given message via /chats/{id}/messages?parent=xyz
  • Trigger notifications only to participants of that thread

Staff+ engineers often raise this early — not because it's hard to implement, but because it has major UX, API, and performance implications down the line.

In our design, threads are modeled as messages with a parent_message_id, rather than a separate Thread entity. This keeps the model flexible while supporting both flat and threaded conversations.

While some materials refer to "clients" (like mobile or web apps) when modeling messaging systems, we explicitly use the term DeviceSession to emphasize active, trackable connections between users and devices.

A client describes the platform (e.g., iOS app or web browser), but it doesn’t distinguish between:

  • Online vs offline state
  • Multiple simultaneous logins
  • Real-time delivery routes like WebSocket or push tokens

DeviceSession allows us to model:

  • Which devices are currently online
  • How to route events (e.g., via WebSocket or FCM)
  • Cross-device consistency, like syncing read receipts or preventing duplicate notifications

This precision is especially important in Slack-like systems, where users often operate across multiple devices at once. Including DeviceSession as a core entity gives us the flexibility to support reliable real-time behavior at scale.

APIs

In traditional REST APIs, clients make requests and receive responses. But Slack-like chat apps require bi-directional, low-latency communication, which is best achieved with WebSocket.

In messaging systems, your choice of communication protocol affects latency, server load, and user experience. Let’s compare a few common options:

  • HTTP Polling: Client repeatedly asks the server “any new messages?”
    Simple but inefficient — introduces latency and unnecessary load.
  • HTTP Long Polling: Server holds the request open until there’s new data.
    More efficient, but still involves re-establishing the connection each time.
  • WebSocket: A persistent, full-duplex TCP connection that allows both server and client to push data anytime.
    Ideal for low-latency, bi-directional communication — and the de facto standard for real-time apps like Slack, Discord, or WhatsApp Web.

Why WebSocket is right for Slack-like systems:

  • Reduces latency — no need to re-connect or poll
  • Enables live typing, read receipts, message delivery in real-time
  • Supports multiple concurrent sessions per user (e.g., desktop and mobile)

Staff+ engineers know: picking WebSocket isn’t just about speed — it’s about system efficiency, user experience, and long-term scalability.

Once connected, the server and client exchange structured events, such as new messages, read receipts, typing indicators, etc. There’s no need to repeatedly poll. Here is all supported WebSocket events:

Event Name Direction Purpose
send_message Client → Server User sends a message to a chat
new_message Server → Clients Fan-out of message to all participants
message_deleted Server → Clients Notify when a message is deleted
user_typing Client → Server Typing indicator (debounced)
typing_started / typing_stopped Server → Clients Render live typing indicators
read_receipt Client → Server User read a message
presence_update Server → Clients Updates about online/offline status

Read receipts aren’t triggered just because a message is received — they’re triggered when the user actually views the message.

From clients:

  • Track which messages are visible in the viewport
  • Trigger a read_receipt event when the user scrolls to or opens a chat
  • Debounce or batch updates to reduce noise

Client sends:


{
  "event": "read_receipt",
  "payload": {
    "chat_id": "chat_123",
    "message_id": "msg_999"
  }
}

From servers:

  • Update the user's last read message in that chat
  • Broadcast a read_receipt event to other participants (excluding the sender)

This is how Slack or WhatsApp shows "Read" ticks or "Seen by" indicators — triggered by viewing, not just receiving.

High Level Design

FR1 - User can send and receive messages in 1:1 or group chats

Let’s illustrate from step to step.

Be aware in this section, we assume:

  • All users are active, for the offline users to receive messages, we will cover in FR3 shortly
  • We use group chat (1-sender to many-receivers) as the example in diagram. The reason is that for the participants in receiver side, 1:1 chat flow is a specific scenario that will be covered by group chats (i.e., the number of recipients reduced from many to 1, which is the only change in new_message sent from ws servers back to clients).

1.User Sends Message from Client

When a user sends a message in a Slack-like chat system, the action begins on the client side. The user types a message in either a 1:1 or group chat and hits “Send.” The client emits a send_message event over an established WebSocket connection, including the chat_id, message content, and optionally a parent_message_id if the message is a threaded reply.

2.Gateway Handles Authentication and Routing

This event is first handled by the Gateway, which acts as the system’s front door. The Gateway authenticates the user session, checks rate limits, and determines routing based on the target chat. Once validated, the event is forwarded to the appropriate Chat Server instance — typically determined by a consistent hashing or partitioning scheme that ensures messages for the same chat are handled in order.

3.Chat Server Validates Membership and Writes to DB

The Chat Server receives the request and performs core validation logic. It checks that the specified chat_id exists in the chats table and confirms that the sender is indeed a member of the chat by querying the chat_members table. This step guarantees that only authorized participants can write to the chat.

Once validated, the server generates a message_id and a timestamp. It writes a new record into the messages table using a straightforward schema: message_id, chat_id, sender_id, content, and created_at. If the message is part of a thread, the optional parent_message_id is also populated. This design allows for both flat and threaded message retrieval without needing a separate thread table.

4.Message is Broadcast to Participants via WebSocket

After persisting the message, the Chat Server initiates real-time delivery. It queries all current participants of the chat via chat_members, then looks up each participant’s active WebSocket connections (offline users we will cover in FR3). A new_message event is emitted to every connected device, allowing clients to instantly receive and render the new message in their chat window.

Because this flow uses WebSocket for delivery, online users experience near-instant feedback. The message arrives without polling or refresh, and appears with accurate sender, timestamp, and optional thread context. This seamless propagation is what gives Slack and similar apps their fluid, real-time feel.

💡 We didn’t discuss more on how the group is created or how the users can be added into an existing group. This is a pretty straightforward operation, you can try answer yourself and be aware of the table insertions in database.

FR2 - User can send rich media (e.g., images, videos, files) as part of a message

To support rich media, we extend the message-sending flow to decouple media upload from message delivery. Files are stored in an object store like S3, and the message contains only a reference (media ID or signed URL). This keeps our messaging system lightweight and responsive.

1. Client Requests Upload URL

When the user selects a media file, the client sends a /upload_media_req call to the media server. The media server checks the user’s auth, validates file type and size (optional), and returns a pre-signed URL for direct upload to S3. It may also reserve a media_id to track the object.

2. Client Uploads File to S3

The client then uploads the file directly to S3 using the pre-signed URL. This avoids proxying the file through backend servers and offloads transfer to the object store. Upon success, the media server returns the final media_url (or the client constructs it from the media_id), which will be embedded in the outgoing message.

3. Client Sends Message with Media Reference

Once the upload completes successfully, the chat UI enables the “Send” button. The client emits a send_message event over WebSocket with the message content and the media_id or signed media_url. No binary data is transmitted through this path.

4. Gateway and Chat Server Handle Message

The Gateway forwards the message to the Chat Server. The Chat Server validates chat membership and that the media reference is owned by the sender. It then inserts the message (including media_id or media_url) into the messages table, and proceeds to fan it out like a regular message.

5. Clients Receive and Render Media

The WebSocket Server delivers a new_message event to all online recipients, which now includes the media_url. Clients render previews (thumbnails, play buttons, etc.) using the signed URL — valid only for a short time (e.g., 1 hour) to preserve security and access control.

FR3 - User can receive real-time notifications for new messages

Real-time messaging isn’t just about sending and receiving messages while the app is open. Users must be notified of new activity, even when they’re offline, the app is in the background, or they’re using another device. A scalable Slack-like system must support both WebSocket push for online users and push notifications for offline users — all while respecting user preferences and minimizing redundant alerts.

In messaging systems, online and offline are not just UI labels — they directly influence how messages and notifications are routed.

  • A user is considered online if they have an active WebSocket connection to the chat system — typically via an open app tab, background mobile session, or desktop client.
  • A user is offline when no active device sessions are tracked, meaning no open WebSocket connections exist.

This distinction is crucial:

  • Online users get instant message delivery via WebSocket — no delay, no need for push.
  • Offline users require fallback mechanisms, like push notifications or email alerts, to ensure they don’t miss messages.

Behind the scenes, the system uses a DeviceSession table (or Redis cache) to track current online sessions, and route events accordingly.

Push notification is a delivery of metadata, not the actual message.

To better understand, here is a breakdown:

Step What Happens Is the message delivered?
Message is sent Backend persists it + prepares delivery ❌ Not yet delivered
User is offline Backend sends push notification via APNs ❌ Still not delivered
Notification is shown at client side User sees a banner / lock screen message ❌ Only the notification arrived
User taps & opens app WebSocket connection is re-established ✅ Message delivery happens here
Message is synced Message is pulled or replayed from inbox ✅ Now it's delivered

In this way, we need to come up a diagram (blue part in the diagram below) with 2 phases:

  1. one with client offline, the system identify is offline state, persist rows into inbox table and send notification via APNs. (use solid lines for online state)
  2. the other one with client back to online via notification click, how the message is actually delivered and inbox cleanup based on retention. (use dash lines for offline state)

Here is the illustration of detailed steps based on the diagram:

1. Client Sends a Message via WebSocket (Same as previous)

The process starts when the sender (Client A) emits a send_message WebSocket event. This event includes fields like chat_id, content, and optionally media_id. It is routed through the Gateway for authentication and rate-limiting.

2. Gateway Authenticates and Routes the Event (Same as previous)

The Gateway validates the user’s identity, enforces rate limits, and forwards the request to the appropriate Chat Server based on routing logic like consistent hashing of the chat_id.

3. Chat Server Validates Chat Membership (Same as previous)

The Chat Server queries the chat_members table to confirm the sender belongs to the chat. It then fetches the list of all chat participants.

4. Chat Server Determines Online vs. Offline Users

Using the list of participants, the Chat Server queries the device_sessions table to identify which users are currently online. A session is considered active if the status is "active" and last_heartbeat is within an acceptable threshold.

5. Message is Persisted in the Database (Same as previous)

A new record is inserted into the messages table with metadata such as message_id, chat_id, sender_id, content, and optional media_id. This ensures durable storage before any delivery.

6. Insert into Inbox for Offline Users

For every offline participant, a record is created in the inbox table. Each entry contains user_id, message_id, created_at, and an optional delivered_at timestamp once delivered.

7. Real-Time Fanout via WebSocket Server for Online Users

The Chat Server pushes the message to the WebSocket Server, which fans out a new_message event to all connected clients based on their ws_connection_id in device_sessions.

8. Push Notification for Offline Devices

If an offline user has a registered push_token, the Chat Server triggers a notification through the Push Notification Server (e.g., FCM, APNs). This is only a notification and does not mark the message as delivered.

9. Message Replay on Reconnect

When an offline user comes back online (e.g., reopens the app), the Gateway re-establishes the WebSocket connection. The WebSocket Server queries the inbox table and sends all undelivered messages to the client. Delivered messages are then marked with delivered_at.

10. Cleanup via Retention Policies

A background cron job routinely scans the inbox table to delete old undelivered messages (e.g., after 30 days) based on the system’s data retention policy. This ensures storage efficiency and limits stale state buildup.

Let’s say User B has been offline for more than 30 days, and your system’s inbox retention policy only keeps undelivered messages for that period. This can happen after long PTO or an extended app absence. What now?

  • The system deletes the entry from the inbox table via a cleanup job.
  • When User B comes back online, the system no longer retries fanout of those older messages.
  • However, the original messages still exist in the messages table — they’re just not pushed to the client automatically.
  • It becomes the client's responsibility to fetch full chat history (e.g., via scrollback or pagination APIs).

This separation ensures storage doesn’t grow unbounded, while still allowing access to historical messages via manual retrieval, not real-time delivery.

Staff+ Tip: During interviews, highlight the trade-off here — low storage vs. guaranteed delivery — and ask whether durability vs. push guarantees are critical for your use case (e.g., ephemeral chats vs. permanent rooms).

FR4 - User can delete their own messages from a chat

Now, let’s come to answer one of the challenging questions we asked in the beginning of the article - “how to support message deletion”.

1. Client Sends delete_message Request

The deletion process begins when the sender (Client A) initiates a message deletion from the chat UI. The client emits a delete_message WebSocket event containing the message_id and chat_id in the payload. This request is routed through the Gateway, just like send_message, and forwarded to the appropriate Chat Server instance.

2. Gateway Authenticates and Routes the Deletion

The Gateway performs standard checks — user authentication, rate limiting, and routing — then forwards the delete_message request to the corresponding Chat Server. It does not modify the payload or perform authorization beyond the identity of the sender.

3. Chat Server Authorizes and Soft Deletes Message

Upon receiving the request, the Chat Server first validates that the message exists and that the requesting user is indeed the original sender. It queries the messages table by message_id and checks that sender_id == user_id. If the check passes, the Chat Server updates the message row to mark it as deleted. This is implemented as a soft delete by adding a new boolean column is_deleted (default: false) to the messages table and setting it to true.

4. Chat Server Fan-Outs Deletion to Online Participants

Next, the Chat Server queries the chat_members table to identify all participants of the chat. It then cross-references the device_sessions table to determine which participants are currently online. For those with active WebSocket connections, a message_deleted event is immediately emitted from the WS Server, instructing the clients to remove the message with that message_id from their view.

5. WS Server Generates Inbox Records for Offline Users

For chat members who are offline (i.e., without active WebSocket sessions), the WebSocket Server inserts an entry into the inbox table for each message that needs to be delivered later. Each row contains the user_id, message_id, delivered = false, inserted_at, and to_be_deleted = false.

If the sender deletes a message while some recipients are still offline, the system checks whether it had already been delivered to each user. If the message was never delivered (i.e., delivered = false), the corresponding inbox row is either deleted or skipped — no message_deleted event is sent to the client. The recipient will never know that message existed.

Only messages that were previously delivered will emit a message_deleted event when deleted, ensuring the interface is updated accordingly. This approach keeps the system clean and intuitive for the recipient, and reduces unnecessary deletion traffic for messages they never saw.

6. Deleted Message is Omitted from Scrollback Queries (Online Users only)

The deletion is visually enforced by excluding messages with is_deleted = true in any message list query. When offline clients come back online and load the chat history, the deleted message will be filtered out entirely, appearing as if it never existed.

Unlike some chat apps that leave behind a “This message was deleted” placeholder, Slack removes deleted messages entirely from the chat view — for all participants.

Here’s what happens:

  • When the backend confirms deletion, each online client receives a message_deleted WebSocket event.
  • Clients locate the message via message_id and remove it from the UI — no trace left.
  • If the user was offline, the message is missing from scrollback when they return (due to filtering is_deleted = true).

Why is this delightful? Because the message feels like it never existed — no awkward “(deleted)” clutter.

Interview Tip: Show you understand this UX detail. While the backend may retain data for auditing, Slack’s frontend performs a hard visual delete — clean and user-friendly.

For a Slack-like chat system, the ideal primary database should balance strong consistency, write throughput, and query flexibility.

Relational databases like MySQL or PostgreSQL are excellent for core entities like users, messages, chats, and chat_members, due to ACID support and predictable indexing.

However, scaling vanilla RDBMS for billions of messages is hard — which is why systems like Slack use Vitess, a sharding middleware that horizontally scales MySQL while preserving SQL semantics.

On the other hand, NoSQL solutions like DynamoDB optimize for scale and availability.

Benefits include:

  • ⚡ Single-digit millisecond latency
  • 📈 Auto-scaling throughput
  • 🔁 High availability and denormalized writes
  • 🔑 Composite keys like chat_id + created_at for ordered access

Trade-offs: weaker consistency, no joins, and complex schema evolution.

Interview tip: Great answers don’t just name a database — they explain trade-offs. Top candidates compare SQL and NoSQL trade-offs based on the system's scale, query patterns, and latency requirements.

Data Schema Summary

After walking through the high-level design and functional workflows, it's crucial to ground our system with a clear view of the data model. The tables presented below summarize the persistent entities that power our Slack-like chat system — from messages and media to session tracking and inbox management. This snapshot not only helps engineers understand what gets stored and queried behind each API or flow, but also serves as a bridge to implementation. Defining schema early also surfaces key tradeoffs around indexing, normalization, and consistency boundaries — especially at scale.

Users
ColumnTypeDescription
user_idUUIDPrimary key
nameStringUser display name
emailStringLogin or contact identifier
created_atTimestampTime of account creation
Chats
ColumnTypeDescription
chat_idUUIDPrimary key
nameStringOptional group chat name
is_groupBooleanIndicates group vs 1:1 chat
created_atTimestampChat creation time
Chat_members
ColumnTypeDescription
chat_idUUIDForeign key to chats
user_idUUIDForeign key to users
joined_atTimestampWhen user joined the chat
roleStringe.g. 'member', 'admin'
Messages
ColumnTypeDescription
message_idUUIDPrimary key
chat_idUUIDForeign key to chats
sender_idUUIDForeign key to users
contentTextText body of the message
media_urlStringOptional signed URL for media
parent_message_idUUIDIf threaded, points to parent message
sequence_numberIntegerMonotonic ID per chat for ordering
created_atTimestampWhen message was sent
deleted_atTimestampNull unless soft-deleted
Media_objects
ColumnTypeDescription
media_idUUIDPrimary key
user_idUUIDUploader (foreign key to users)
file_nameStringOriginal file name
storage_urlStringS3 signed object URL
created_atTimestampUpload time
Device_sessions
ColumnTypeDescription
session_idUUIDUnique device session ID
user_idUUIDOwner of session
ws_serverStringWebSocket server handling this session
connected_atTimestampWhen connection started
last_heartbeatTimestampFor TTL / lease validity
Inbox
ColumnTypeDescription
user_idUUIDRecipient (foreign key to users)
message_idUUIDMessage to be delivered
deliveredBooleanTrue if delivered via push or WS
inserted_atTimestampWhen inserted into inbox
to_be_deletedBooleanTrue if deleted by sender

Deep Dives

While the current design covers the end-to-end messaging, media handling, notifications, and deletion workflows, two complex areas remain unresolved and require deeper architectural consideration. First, ensuring strict message ordering and multi-device consistency becomes challenging when users operate from multiple clients — race conditions, duplicate receipts, and out-of-order views can easily arise without proper state coordination. Second, the system must scale to millions of users while maintaining real-time responsiveness, which puts pressure on the WebSocket infrastructure and active session lookup mechanism — especially when determining online/offline status at scale or routing fan-out efficiently. To address these bottlenecks and fulfill our original challenges, we’ll dedicate three deep dives to:

  1. Message Ordering in Group Chat at scale
  2. Scalability of WebSocket Infrastructure & Active Session Lookup
  3. Multi-Device Consistency

DD1 - Message Ordering Consistency

Where the Problem Surfaces in Our Design

In our architecture, messages flow from Client → Gateway → Chat Server → WebSocket Server → Devices, and are stored centrally in the messages table. In group chat scenarios, multiple users may send messages nearly simultaneously from different devices, networks, and regions.

Without a strict ordering mechanism, this can lead to message interleaving — where the order of messages received depends on network timing, not intent. For example:

  • User A sends “Let’s meet at 5.”
  • At the same time, User B replies “Sounds good!”

Depending on delivery race conditions, these messages could appear out of order on different user devices — disrupting the conversational flow and creating confusion in fast-moving chats.

This problem is amplified at scale when millions of users are participating across thousands of chats, potentially spanning distributed infrastructure.

To solve this problem and improve the performance of our current design, there are 3 options to discuss.

Option 1: Client-Side Timestamps

In this approach, the client attaches a timestamp when sending a message, and the server simply stores and uses that for ordering. While this may seem straightforward, it introduces significant risk in distributed multi-device scenarios: user devices often have unsynchronized clocks, leading to inconsistent ordering across clients. Worse, malicious or buggy clients could manipulate timestamps to reorder the conversation. This option is too fragile to support strong consistency expectations in real-time chat.

Option 2: Server-Side Sequence Number per Chat

Here, the Chat Server maintains an atomic, per-chat counter (e.g., chat_456 → seq=502) that increments with each new message. Each message is assigned a unique, monotonically increasing sequence_number during ingestion and persisted in the messages table. All queries, rendering, and read receipts use this sequence for deterministic order. This guarantees correct ordering even with concurrent multi-device sends. The downside is that atomic increments can become a bottleneck at high scale unless each chat is strictly routed to a single partition or server shard. Still, for many medium-scale systems, this option balances correctness with implementation simplicity.

Option 3: Kafka-Based Ordered Ingestion per Chat (Recommended)

This design routes all incoming messages through Kafka, mapping each chat to a specific Kafka partition using a consistent hash of chat_id. Kafka natively enforces strict message order per partition, so concurrent sends across devices are correctly serialized. A consumer service reads from each partition, assigns a sequence_number, and writes messages into the database in order. This architecture scales horizontally — as chats grow, more partitions can be added. It also enables async processing like analytics or moderation. The trade-off is increased infrastructure complexity (Kafka ops, offset management, retry logic). But for high-scale systems like Slack, this model decouples ingestion from persistence while providing strong ordering guarantees out-of-the-box.

Based on the previous discussion, our diagram improved as follow:

When Kafka partitions are based on chat_id, ultra-active chats like #all-hands, town hall channels, or celebrity support threads can cause a hot partition — where one Kafka partition and its consumer get overwhelmed while others stay idle.

This creates real production issues:

  • Back-pressure in the message pipeline
  • Increased tail latency for that chat only
  • Stuck fan-out or delayed delivery for online/offline recipients

To mitigate this, consider:

  • Dedicated Kafka topics for high-traffic chats (scales well but adds ops burden)
  • Hybrid fan-out models, where Redis is used for high-fanout rooms
  • Dedicated fan-out workers for hot chats to isolate and scale independently

Staff+ Candidates Tip:
Always flag this risk in interviews — hot shards are a real-world scalability killer, and handling them gracefully shows deep operational maturity.

DD2 - Scalability of WebSocket Infrastructure & Active Session Lookup

Where the Problem Lies

At moderate scale, it’s feasible to have the Chat Server or Fanout Worker query a central device_sessions table or in-memory cache (like Redis) to discover where users are connected. But at the scale of 100M+ concurrent WebSocket sessions, even doing lookups or broadcasting to the right subset becomes expensive.

Three core scalability issues emerge:

  • Active session populate becomes a bottleneck. You can’t reliably do DB queries for every message just to find out which device is online.
  • Churn from user connect/disconnect creates delivery uncertainty. As users rapidly go online/offline across devices or networks, ensuring correct fanout (and avoiding duplicate or missed messages) becomes difficult at scale.
  • Backend storage pressure (DB write & read load). High message throughput plus real-time reads can overwhelm underlying databases unless optimized with caching, denormalization, or async processing.

Let’s break down how to solve them.

Scalability Bottleneck 1: Active Session Populate

At 100M+ concurrent users, the challenge isn’t just about querying where each user is — it’s about how WebSocket servers can efficiently populate and manage active session data in real time, without overloading Redis or requiring centralized lookup for every message.

To avoid this, we recommend shifting to a Pub/Sub fanout model, which scales much more effectively. We can publish all messages to chat-specific channels in a distributed Pub/Sub system (e.g., Redis Streams, Kafka, or NATS). Each WebSocket server dynamically subscribes to the chat channels for the rooms its connected users participate in.

  • Message Published Once: Fanout workers publish the message to chat_channel:<chat_id>.
  • Selective Subscription: Each WebSocket server subscribes only to the channels it needs — no global broadcasting.
  • Local Filtering & Delivery: When a message arrives, the server uses its local session map to forward it to connected users — no Redis or DB lookup needed.
  • Offline Handling: Users who are disconnected won't receive the message live. Their delivery falls back to inbox update or push notification.

Here is the updated design diagram with data flow illustration:

Scalability Bottleneck 2: WebSocket Server Churn (Users Moving Between Nodes)

Even with recommended solution above, in our current design (Deep Dive 2), each WebSocket server dynamically subscribes to Redis Pub/Sub channels like: chat_channel:<chat_id>. Based on the chat rooms its connected users participate in. But if a user churns (e.g., disconnects and reconnects quickly on a different server), you get some issues:

  • Stale subscriptions: the old WS server is still subscribed to channels for a user it no longer owns.
  • Missed/unnecessary fanout: Redis may still push messages to the old WS server, wasting network and compute.
  • Racing conditions: during rapid reconnects, both the old and new servers may subscribe at the same time.

So the goal here is to ensure only one WebSocket server is subscribed to a given user's chat channels — the one currently serving that user — and update this dynamically as users move.

Here are potentially 2 options to think about:

Option 1: Always Subscribe to All Chat Channels on Connection

In this approach, when a user connects to a WebSocket server, that server looks up all the chats the user is part of (via chat_members table) and immediately subscribes to all corresponding Redis Pub/Sub channels (e.g., chat_channel:<chat_id>). This ensures that the server receives all relevant messages during the user's session. However, if the user reconnects and is routed to a different WebSocket server, both servers may now be subscribed to the same chat channels, potentially duplicating effort. There's no coordination to ensure only one server is actively managing the delivery. This makes the system simpler to implement but leads to wasteful fanout, unnecessary memory pressure, and risk of duplicate messages unless clients de-dupe.

Option 2: Leased Session Ownership with Redis TTL (Recommended)

This approach uses Redis to coordinate which WebSocket server owns a user's session at any given time. When a user connects, the WS server sets a key like user:<user_id> = <ws_id> with a short TTL (e.g., 10 seconds) and extends it periodically with a heartbeat (PEXPIRE). Only if a server holds the valid lease does it subscribe to the user’s chat_channel:<chat_id> channels. This guarantees that only one WS server is responsible for delivery per user, avoiding duplicates and reducing fanout overhead. However, it requires more infrastructure coordination, handling edge cases like lease expiration or reconnect races, but pays off in cleaner scalability and efficiency at very large user scale.

Let’s walk through what happens behind the scenes when User A moves between WebSocket servers (e.g., mobile switch, network drop):

Step-by-step Flow:

  1. Initial Connection
    • User A connects to WebSocket Server ws–1.
    • ws–1 sets a Redis key: SET user:A = ws–1 EX 10 (lease for 10s).
    • ws–1 subscribes to all channels: chat_channel:<chat_id>.
    • Heartbeats (PEXPIRE) extend the lease.
  2. Unexpected Disconnect (e.g., network drop)
    • ws–1 stops receiving heartbeats.
    • After TTL expires, Redis lease expires.
    • ws–1 can safely unsubscribe or ignore messages.
  3. Reconnect via New WebSocket Server
    • User A connects to ws–2.
    • ws–2 sets lease: SET user:A = ws–2 NX EX 10.
    • ws–2 becomes session owner and subscribes to chat_channel:<chat_id>.
  4. Message Delivery Resumes
    • Redis Pub/Sub sends messages only to ws–2.
    • ws–2 delivers messages to User A.
    • No fanout to ws–1, which no longer holds the lease.

Based on Option 2 as we recommend, here is the updated diagram:

Scalability Bottleneck 3: Backend Storage Pressure (DB Write & Read Load)

At the scale of billions of users and hundreds of millions of concurrent sessions, your backend storage becomes a critical bottleneck — particularly around fanout writes, message reads, and metadata lookups during WebSocket reconnects or chat bootstrapping.

Two primary stress points:

  1. High write throughput to messages and inbox tables
    Every chat message triggers N inbox writes (for N recipients), plus fanout events, which — at large group scale — translates to millions of writes per second. If your database is not horizontally partitioned, it becomes overwhelmed by concurrent insert and update traffic.
  2. High read amplification for chat metadata and user session state
    Every WebSocket reconnection or fanout event requires access to chat_members, device_sessions, and recent messages. Doing these reads from a monolithic DB becomes infeasible under heavy churn or reconnect storms.

There are several strategies we could apply, during your interviews, if you can answer with the 2 main points, that should be well enough.

Strategy 1: Horizontal Partitioning of Messages & Inbox Tables

To handle the massive write load — especially for group messages that fan out to many recipients — we recommend horizontally partitioning the messages and inbox tables. This can be done by sharding based on chat_id (for messages) and recipient_user_id (for inbox). This distributes the write and query load across multiple physical nodes or database instances. A NoSQL system (like DynamoDB with composite keys) can be used, depending on your durability and consistency requirements. The key tradeoff is managing cross-shard queries (e.g., listing a user’s recent chats), which must be handled carefully via query fanout or index denormalization.

Strategy 2: Read-Optimized Caching for Hot Metadata & Fanout Paths

For high-throughput lookups (e.g., chat_members, device_sessions, user presence), introducing dedicated caching layers like Redis or Memcached can significantly offload the primary database. For example:

  • Cache active chat membership per user (user:<id>:chats)
  • Cache online device sessions (user:<id>:devices)
  • Cache recent message slices (chat:<id>:recent_messages)

These caches should be asynchronously updated via change data capture (CDC) streams or near-real-time workers. They reduce DB read QPS during WebSocket reconnection, chat scrollback, or large fanouts. However, challenges include cache invalidation on membership changes and consistency under race conditions, so TTLs and refresh policies must be carefully tuned.

Hence, with all the recommended improvement, the design diagram looks like this:

DD3 - Multi-Device Management

Where the Problem Lies

In modern messaging platforms, it’s common for users to be logged in on multiple devices simultaneously — e.g., phone, laptop, tablet. This creates a new class of delivery and consistency challenges. Our system must ensure that:

  • All active devices receive real-time delivery for every message.
  • Devices stay in sync with message state (e.g., de-duping, ordering).
  • Disconnected sessions receive reliable replay when reconnecting.
  • We avoid duplicated fanout or missing messages.

At small scale, this seems straightforward. But as the number of connected devices grows (e.g., 3+ per user, 1B+ users), the operational overhead — from live fanout to reconnection handling — becomes massive.

Proposed Solution: Per-Device Session Registry + Inbox Replay

To support seamless multi-device experience, we break the problem into online and offline flows, and handle them with a mix of local mapping and durable persistence.

Step-by-Step Message Delivery Flow (Multi-Device Aware)

1. Device Connects

  • When a user device connects to the WebSocket server, it registers its session by writing:
  • device_sessions:{user_id}:{device_id} → ws_connection
  • (typically stored in Redis or an in-memory map).

2. Message Fanout

  • When a new message is delivered (via Pub/Sub fanout), the WebSocket server looks up all active sessions for the recipient user.
  • It sends the message to each connected device (e.g., both phone and desktop).

3. Offline Fallback

  • If a device isn’t connected, the message is stored in the inbox table, keyed by user_id and device_id.
  • This ensures the message is queued for delivery when that device comes back online.

4. Device Reconnects

  • Upon reconnection, the server checks the inbox for that specific (user_id, device_id) and delivers any missed messages.
  • After successful delivery, the entries are cleared or marked as delivered.

Bonus Point – Client-side message de-duplication logic

In a multi-device setup, the same user might receive the same message twice — once via WebSocket on one device, and again via inbox replay on another device after reconnecting. Without deduplication, this results in duplicated messages, phantom notifications, or confusing message history.

How It Works:

  • Every message carries a globally unique message_id, assigned at ingestion.
  • Each client maintains a local cache of recently received message_ids (e.g., 1000 entries).
  • On message receive:
    • If message_id is new → render and store.
    • If already seen → skip render, silently drop.

This ensures consistent user experience across reconnects, devices, and delivery paths — without relying on perfect backend deduplication.

Final Thoughts

Designing a Slack-like system goes far beyond just sending and storing messages. It’s about architecting a real-time, distributed, resilient messaging backbone that can support massive concurrency and device diversity — all while keeping latency low and user experience seamless.

In this article, we walked through every major building block — from creating chats, managing group members, to sending rich media, storing messages, and triggering WebSocket or push notifications. More importantly, we explored deep-dive design tradeoffs that help you answer four of the toughest system design challenges often brought up in interviews:

  1. Message Ordering in Group Chat: We use Kafka-based ingestion and per-chat sequence numbers to enforce a canonical timeline for all messages — ensuring consistent rendering even when messages are sent concurrently from distributed devices.
  2. Real-Time Notifications at 100M+ Scale: We designed an online + offline fanout mechanism using WebSocket and inbox tables, and built in notification delivery through Redis pub/sub and fallback logic for offline users — without overloading the backend.
  3. Deletion Propagation with Retention Guarantees: We built an audit-compliant deletion flow that respects sender-side deletion, ensures correct client updates across devices, and prevents resurfacing of deleted content via inbox filtering or replay rules.
  4. WebSocket Churn and Multi-Device Reconnects at Scale: We tackled infrastructure churn by introducing session leasing with Redis TTL, a pub-sub based fanout by chat channel, and client-side deduplication to handle reconnect storms — ensuring clean handoff and minimal duplication when users rapidly connect from multiple locations.

Together, these answers formed the backbone of a resilient, scalable chat infrastructure. Each challenge covered with trade-offs between consistency, scalability, latency, and complexity — and our deep dives laid out concrete, production-ready solutions for each.

How ShowOffer can help you?

We've included callouts and open-ended design prompts throughout this write-up — perfect for self-practice or interview discussion. If you want to walk through this system design in a coach session with the author, book a session with author at ShowOffer.io. We're here to help you sharpen your skills, gain confidence, and land your next big offer.

Coach + Mock
Practice with a Senior+ engineer who just get an offer from your dream (FANNG) companies.
Schedule Now
Content: