min read

Design a Slack-like chat system that supports 1:1 and group messaging, threaded conversations, and message deletion.

Slack

Coach with Author

Book a 90-minute 1:1 coaching session with the author of this post and video — get tailored feedback, real-world insights, and system design strategy tips.

‍Let’s sharpen your skills and have some fun doing it!

Schedule

Designing a Slack-like chat app isn't just about sending messages — it’s about scaling reliably, handling failure, and keeping user experience smooth under load. Before diving in, ask yourself:

How would you ensure strict message ordering in group chats or multi-clients when messages are sent concurrently from distributed devices?
How would you design real-time notifications for 100x millions of users, supporting online (WebSocket) and offline (push) delivery without overwhelming your backend?
How would you allow users to delete messages — and propagate that change to all clients — while supporting audit logs, retention policies, and long chat histories?
How would you scale your infrastructure to handle users frequently going online and offline across data centers — while maintaining low-latency fanout and avoiding duplicate or missed messages?

Upgraded Challenge: How to design a ChatGPT-like ChatApp?

These aren’t toy problems — they’re real challenges from production-scale messaging systems. Think through your solutions, then follow along to see how we solve them, step by step.

Functional Requirements

1. Users can send and receive messages in 1:1 or group chats

This covers both private and group conversations, with messages delivered in order and stored persistently. Group chats can support dynamic membership and maintain message history.

2. Users can send rich media (e.g., images, videos, files) as part of a message

Messages can include text, attachments, or both. The system must support media upload, storage, and preview rendering across devices.

3. Users can receive real-time notifications for new messages

When a user receives a new message—either in a direct chat or group—they should be notified through in-app indicators, push notifications, or email (based on settings and availability).

4. Users can delete their own messages from a chat

A user should be able to remove a message they previously sent. The deletion may be visible (e.g., “This message was deleted”) or fully removed from view, depending on the UX and policy.

Non-Functional Requirements

1. High Scalability

The system should support 1 B users and 100k chats concurrently, especially in high-traffic group chats.

Illustration: Imagine a global company’s #all-hands channel with 100k active users — your system must fan out messages quickly without bottlenecks.

2. Low Latency in Chat Delivery

Aim for P95 end-to-end message latency under 200ms for online users.

Illustration: A user sends a message and expects to see it reflected across all devices in near real-time — delays over 500ms degrade the user experience.

3. CAP - Aware Messaging Guarantees

The system should prefer high availability, but must enforce strong ordering consistency within each chat room.

Illustration: Even if one server fails, the user should still be able to chat — and messages in a group must never arrive out of order.

4. Message Durability

Messages must be safely persisted before confirming to the sender. No acknowledged message should be lost, even if the server crashes right after.

Illustration: If a user hits send and then immediately loses internet, the message must still exist when they reconnect.

5. Consistent in Multi-Device

Users often stay logged in on multiple devices — desktop, phone, tablet — at the same time. The system must ensure that messages, read receipts, and typing indicators remain synchronized across all active sessions. Events such as message delivery, deletion, or read status must be reflected in real-time across all clients.

Illustration: A user reads a message on their laptop — seconds later, their phone reflects it as read, clears the notification badge, and doesn’t re-alert them. We put this in the end, is to make sure all above functions can be fulfilled first and cross-device session can hold them all.

Requirement Summary

FRs	Description
1. Messaging in 1:1 and Group Chats	Users can send and receive messages in private or group conversations, with ordered delivery and persistent history.
2. Rich Media Support	Messages can include images, videos, and files; media is uploaded, stored, and rendered across devices.
3. Real-Time Notifications	Users receive in-app and push notifications when new messages are sent in chats they belong to.
4. Message Deletion	Users can delete their own messages; deletions must reflect across all devices and preserve conversation flow.

NFRs	Target / Guarantee
1. High Scalability	Support 1B+ MAU and 100K+ concurrent messages/sec across large chat rooms.
2. Low Latency	Ensure P95 end-to-end message delivery latency under 200ms.
3. CAP-Aware Messaging Guarantees	Prioritize availability, but enforce strong per-chat message ordering.
4. Message Durability	Persist messages before acknowledging; no acknowledged message should ever be lost.
5. Consistent in Multi-Device	P95 sync latency < 300ms across devices; <1% inconsistency in read state or notifications.

‍

Before diving into Entities & APIs, you may notice that we’ve spent a significant amount of time unpacking both functional and non-functional requirements. That’s intentional. In many existing materials, this critical phase is often rushed or oversimplified — but in real-world design and high-level interviews, clarity around what the system must do and how it must behave under pressure is what separates great solutions from generic ones. We aim to set a higher bar here, not just to define the problem well, but to invite thoughtful trade-off discussions, scalability considerations, and system behaviors that hold up in production. The better your foundation, the sharper your design decisions will be — and we’ll carry that mindset through the rest of this article.

Core Entities

Entity	Description
User	A registered participant who can chat and receive notifications.
Chat	A channel or direct conversation with a list of members.
Message	A text or media unit sent to a chat.
Media	Media asset linked to a message (image, video, file).
DeviceSession	Active WebSocket or push-notification endpoint for a user’s device.

‍

Slack Threaded Messages: A Crucial Extension to the Message Entity

In Slack-like systems, Threads allow users to reply to a specific message — creating a nested conversation within a larger chat.

To support this, the Message entity should include an optional field:
parent_message_id (nullable)

If present, this indicates the message is a thread reply, and the original message becomes the thread root.

This design allows you to:

Group replies under a thread view
Fetch all replies for a given message via /chats/{id}/messages?parent=xyz
Trigger notifications only to participants of that thread

Staff+ engineers often raise this early — not because it's hard to implement, but because it has major UX, API, and performance implications down the line.

In our design, threads are modeled as messages with a parent_message_id, rather than a separate Thread entity. This keeps the model flexible while supporting both flat and threaded conversations.

APIs

In traditional REST APIs, clients make requests and receive responses. But Slack-like chat apps require bi-directional, low-latency communication, which is best achieved with WebSocket.

Once connected, the server and client exchange structured events, such as new messages, read receipts, typing indicators, etc. There’s no need to repeatedly poll. Here is all supported WebSocket events:

Event Name	Direction	Purpose
`send_message`	Client → Server	User sends a message to a chat
`new_message`	Server → Clients	Fan-out of message to all participants
`message_deleted`	Server → Clients	Notify when a message is deleted
`user_typing`	Client → Server	Typing indicator (debounced)
typing_started / typing_stopped	Server → Clients	Render live typing indicators
`read_receipt`	Client → Server	User read a message
`presence_update`	Server → Clients	Updates about online/offline status

Read receipts aren’t triggered just because a message is received — they’re triggered when the user actually views the message.

From clients:

Track which messages are visible in the viewport
Trigger a read_receipt event when the user scrolls to or opens a chat
Debounce or batch updates to reduce noise

Client sends:


{
  "event": "read_receipt",
  "payload": {
    "chat_id": "chat_123",
    "message_id": "msg_999"
  }
}

From servers:

Update the user's last read message in that chat
Broadcast a read_receipt event to other participants (excluding the sender)

This is how Slack or WhatsApp shows "Read" ticks or "Seen by" indicators — triggered by viewing, not just receiving.

High Level Design

FR1 - User can send and receive messages in 1:1 or group chats

Let’s illustrate from step to step.

Be aware in this section, we assume:

All users are active, for the offline users to receive messages, we will cover in FR3 shortly
We use group chat (1-sender to many-receivers) as the example in diagram. The reason is that for the participants in receiver side, 1:1 chat flow is a specific scenario that will be covered by group chats (i.e., the number of recipients reduced from many to 1, which is the only change in new_message sent from ws servers back to clients).

1.User Sends Message from Client

When a user sends a message in a Slack-like chat system, the action begins on the client side. The user types a message in either a 1:1 or group chat and hits “Send.” The client emits a send_message event over an established WebSocket connection, including the chat_id, message content, and optionally a parent_message_id if the message is a threaded reply.

2.Gateway Handles Authentication and Routing

This event is first handled by the Gateway, which acts as the system’s front door. The Gateway authenticates the user session, checks rate limits, and determines routing based on the target chat. Once validated, the event is forwarded to the appropriate Chat Server instance — typically determined by a consistent hashing or partitioning scheme that ensures messages for the same chat are handled in order.

3.Chat Server Validates Membership and Writes to DB

The Chat Server receives the request and performs core validation logic. It checks that the specified chat_id exists in the chats table and confirms that the sender is indeed a member of the chat by querying the chat_members table. This step guarantees that only authorized participants can write to the chat.

Once validated, the server generates a message_id and a timestamp. It writes a new record into the messages table using a straightforward schema: message_id, chat_id, sender_id, content, and created_at. If the message is part of a thread, the optional parent_message_id is also populated. This design allows for both flat and threaded message retrieval without needing a separate thread table.

4.Message is Broadcast to Participants via WebSocket

After persisting the message, the Chat Server initiates real-time delivery. It queries all current participants of the chat via chat_members, then looks up each participant’s active WebSocket connections (offline users we will cover in FR3). A new_message event is emitted to every connected device, allowing clients to instantly receive and render the new message in their chat window.

Because this flow uses WebSocket for delivery, online users experience near-instant feedback. The message arrives without polling or refresh, and appears with accurate sender, timestamp, and optional thread context. This seamless propagation is what gives Slack and similar apps their fluid, real-time feel.

💡 We didn’t discuss more on how the group is created or how the users can be added into an existing group. This is a pretty straightforward operation, you can try answer yourself and be aware of the table insertions in database.

FR2 - User can send rich media (e.g., images, videos, files) as part of a message

To support rich media, we extend the message-sending flow to decouple media upload from message delivery. Files are stored in an object store like S3, and the message contains only a reference (media ID or signed URL). This keeps our messaging system lightweight and responsive.

1. Client Requests Upload URL

When the user selects a media file, the client sends a /upload_media_req call to the media server. The media server checks the user’s auth, validates file type and size (optional), and returns a pre-signed URL for direct upload to S3. It may also reserve a media_id to track the object.

2. Client Uploads File to S3

The client then uploads the file directly to S3 using the pre-signed URL. This avoids proxying the file through backend servers and offloads transfer to the object store. Upon success, the media server returns the final media_url (or the client constructs it from the media_id), which will be embedded in the outgoing message.

3. Client Sends Message with Media Reference

Once the upload completes successfully, the chat UI enables the “Send” button. The client emits a send_message event over WebSocket with the message content and the media_id or signed media_url. No binary data is transmitted through this path.

4. Gateway and Chat Server Handle Message

The Gateway forwards the message to the Chat Server. The Chat Server validates chat membership and that the media reference is owned by the sender. It then inserts the message (including media_id or media_url) into the messages table, and proceeds to fan it out like a regular message.

5. Clients Receive and Render Media

The WebSocket Server delivers a new_message event to all online recipients, which now includes the media_url. Clients render previews (thumbnails, play buttons, etc.) using the signed URL — valid only for a short time (e.g., 1 hour) to preserve security and access control.

FR3 - User can receive real-time notifications for new messages

Real-time messaging isn’t just about sending and receiving messages while the app is open. Users must be notified of new activity, even when they’re offline, the app is in the background, or they’re using another device. A scalable Slack-like system must support both WebSocket push for online users and push notifications for offline users — all while respecting user preferences and minimizing redundant alerts.

Step	What Happens	Is the message delivered?
Message is sent	Backend persists it + prepares delivery	❌ Not yet delivered
User is offline	Backend sends push notification via APNs	❌ Still not delivered
Notification is shown at client side	User sees a banner / lock screen message	❌ Only the notification arrived
User taps & opens app	WebSocket connection is re-established	✅ Message delivery happens here
Message is synced	Message is pulled or replayed from inbox	✅ Now it's delivered

In this way, we need to come up a diagram (blue part in the diagram below) with 2 phases:

one with client offline, the system identify is offline state, persist rows into inbox table and send notification via APNs. (use solid lines for online state)
the other one with client back to online via notification click, how the message is actually delivered and inbox cleanup based on retention. (use dash lines for offline state)

Here is the illustration of detailed steps based on the diagram:

1. Client Sends a Message via WebSocket (Same as previous)

The process starts when the sender (Client A) emits a send_message WebSocket event. This event includes fields like chat_id, content, and optionally media_id. It is routed through the Gateway for authentication and rate-limiting.

2. Gateway Authenticates and Routes the Event (Same as previous)

The Gateway validates the user’s identity, enforces rate limits, and forwards the request to the appropriate Chat Server based on routing logic like consistent hashing of the chat_id.

3. Chat Server Validates Chat Membership (Same as previous)

The Chat Server queries the chat_members table to confirm the sender belongs to the chat. It then fetches the list of all chat participants.

4. Chat Server Determines Online vs. Offline Users

Using the list of participants, the Chat Server queries the device_sessions table to identify which users are currently online. A session is considered active if the status is "active" and last_heartbeat is within an acceptable threshold.

5. Message is Persisted in the Database (Same as previous)

A new record is inserted into the messages table with metadata such as message_id, chat_id, sender_id, content, and optional media_id. This ensures durable storage before any delivery.

6. Insert into Inbox for Offline Users

For every offline participant, a record is created in the inbox table. Each entry contains user_id, message_id, created_at, and an optional delivered_at timestamp once delivered.

7. Real-Time Fanout via WebSocket Server for Online Users

The Chat Server pushes the message to the WebSocket Server, which fans out a new_message event to all connected clients based on their ws_connection_id in device_sessions.

8. Push Notification for Offline Devices

If an offline user has a registered push_token, the Chat Server triggers a notification through the Push Notification Server (e.g., FCM, APNs). This is only a notification and does not mark the message as delivered.

9. Message Replay on Reconnect

When an offline user comes back online (e.g., reopens the app), the Gateway re-establishes the WebSocket connection. The WebSocket Server queries the inbox table and sends all undelivered messages to the client. Delivered messages are then marked with delivered_at.

10. Cleanup via Retention Policies

A background cron job routinely scans the inbox table to delete old undelivered messages (e.g., after 30 days) based on the system’s data retention policy. This ensures storage efficiency and limits stale state buildup.

FR4 - User can delete their own messages from a chat

Now, let’s come to answer one of the challenging questions we asked in the beginning of the article - “how to support message deletion”.

1. Client Sends delete_message Request

The deletion process begins when the sender (Client A) initiates a message deletion from the chat UI. The client emits a delete_message WebSocket event containing the message_id and chat_id in the payload. This request is routed through the Gateway, just like send_message, and forwarded to the appropriate Chat Server instance.

2. Gateway Authenticates and Routes the Deletion

The Gateway performs standard checks — user authentication, rate limiting, and routing — then forwards the delete_message request to the corresponding Chat Server. It does not modify the payload or perform authorization beyond the identity of the sender.

3. Chat Server Authorizes and Soft Deletes Message

Upon receiving the request, the Chat Server first validates that the message exists and that the requesting user is indeed the original sender. It queries the messages table by message_id and checks that sender_id == user_id. If the check passes, the Chat Server updates the message row to mark it as deleted. This is implemented as a soft delete by adding a new boolean column is_deleted (default: false) to the messages table and setting it to true.

4. Chat Server Fan-Outs Deletion to Online Participants

Next, the Chat Server queries the chat_members table to identify all participants of the chat. It then cross-references the device_sessions table to determine which participants are currently online. For those with active WebSocket connections, a message_deleted event is immediately emitted from the WS Server, instructing the clients to remove the message with that message_id from their view.

5. WS Server Generates Inbox Records for Offline Users

For chat members who are offline (i.e., without active WebSocket sessions), the WebSocket Server inserts an entry into the inbox table for each message that needs to be delivered later. Each row contains the user_id, message_id, delivered = false, inserted_at, and to_be_deleted = false.

If the sender deletes a message while some recipients are still offline, the system checks whether it had already been delivered to each user. If the message was never delivered (i.e., delivered = false), the corresponding inbox row is either deleted or skipped — no message_deleted event is sent to the client. The recipient will never know that message existed.

Only messages that were previously delivered will emit a message_deleted event when deleted, ensuring the interface is updated accordingly. This approach keeps the system clean and intuitive for the recipient, and reduces unnecessary deletion traffic for messages they never saw.

6. Deleted Message is Omitted from Scrollback Queries (Online Users only)

The deletion is visually enforced by excluding messages with is_deleted = true in any message list query. When offline clients come back online and load the chat history, the deleted message will be filtered out entirely, appearing as if it never existed.

Data Schema Summary

After walking through the high-level design and functional workflows, it's crucial to ground our system with a clear view of the data model. The tables presented below summarize the persistent entities that power our Slack-like chat system — from messages and media to session tracking and inbox management. This snapshot not only helps engineers understand what gets stored and queried behind each API or flow, but also serves as a bridge to implementation. Defining schema early also surfaces key tradeoffs around indexing, normalization, and consistency boundaries — especially at scale.

Users

Column	Type	Description
user_id	UUID	Primary key
name	String	User display name
email	String	Login or contact identifier
created_at	Timestamp	Time of account creation

Chats

Column	Type	Description
chat_id	UUID	Primary key
name	String	Optional group chat name
is_group	Boolean	Indicates group vs 1:1 chat
created_at	Timestamp	Chat creation time

Chat_members

Column	Type	Description
chat_id	UUID	Foreign key to `chats`
user_id	UUID	Foreign key to `users`
joined_at	Timestamp	When user joined the chat
role	String	e.g. 'member', 'admin'

Messages

Column	Type	Description
message_id	UUID	Primary key
chat_id	UUID	Foreign key to `chats`
sender_id	UUID	Foreign key to `users`
content	Text	Text body of the message
media_url	String	Optional signed URL for media
parent_message_id	UUID	If threaded, points to parent message
sequence_number	Integer	Monotonic ID per chat for ordering
created_at	Timestamp	When message was sent
deleted_at	Timestamp	Null unless soft-deleted

Media_objects

Column	Type	Description
media_id	UUID	Primary key
user_id	UUID	Uploader (foreign key to `users`)
file_name	String	Original file name
storage_url	String	S3 signed object URL
created_at	Timestamp	Upload time

Device_sessions

Column	Type	Description
session_id	UUID	Unique device session ID
user_id	UUID	Owner of session
ws_server	String	WebSocket server handling this session
connected_at	Timestamp	When connection started
last_heartbeat	Timestamp	For TTL / lease validity

Inbox

Column	Type	Description
user_id	UUID	Recipient (foreign key to `users`)
message_id	UUID	Message to be delivered
delivered	Boolean	True if delivered via push or WS
inserted_at	Timestamp	When inserted into inbox
to_be_deleted	Boolean	True if deleted by sender

Deep Dives

While the current design covers the end-to-end messaging, media handling, notifications, and deletion workflows, two complex areas remain unresolved and require deeper architectural consideration. First, ensuring strict message ordering and multi-device consistency becomes challenging when users operate from multiple clients — race conditions, duplicate receipts, and out-of-order views can easily arise without proper state coordination. Second, the system must scale to millions of users while maintaining real-time responsiveness, which puts pressure on the WebSocket infrastructure and active session lookup mechanism — especially when determining online/offline status at scale or routing fan-out efficiently. To address these bottlenecks and fulfill our original challenges, we’ll dedicate three deep dives to:

Message Ordering in Group Chat at scale
Scalability of WebSocket Infrastructure & Active Session Lookup
Multi-Device Consistency

DD1 - Message Ordering Consistency

Where the Problem Surfaces in Our Design

In our architecture, messages flow from Client → Gateway → Chat Server → WebSocket Server → Devices, and are stored centrally in the messages table. In group chat scenarios, multiple users may send messages nearly simultaneously from different devices, networks, and regions.

Without a strict ordering mechanism, this can lead to message interleaving — where the order of messages received depends on network timing, not intent. For example:

User A sends “Let’s meet at 5.”
At the same time, User B replies “Sounds good!”

Depending on delivery race conditions, these messages could appear out of order on different user devices — disrupting the conversational flow and creating confusion in fast-moving chats.

This problem is amplified at scale when millions of users are participating across thousands of chats, potentially spanning distributed infrastructure.

To solve this problem and improve the performance of our current design, there are 3 options to discuss.

Option 1: Client-Side Timestamps

In this approach, the client attaches a timestamp when sending a message, and the server simply stores and uses that for ordering. While this may seem straightforward, it introduces significant risk in distributed multi-device scenarios: user devices often have unsynchronized clocks, leading to inconsistent ordering across clients. Worse, malicious or buggy clients could manipulate timestamps to reorder the conversation. This option is too fragile to support strong consistency expectations in real-time chat.

Option 2: Server-Side Sequence Number per Chat

Here, the Chat Server maintains an atomic, per-chat counter (e.g., chat_456 → seq=502) that increments with each new message. Each message is assigned a unique, monotonically increasing sequence_number during ingestion and persisted in the messages table. All queries, rendering, and read receipts use this sequence for deterministic order. This guarantees correct ordering even with concurrent multi-device sends. The downside is that atomic increments can become a bottleneck at high scale unless each chat is strictly routed to a single partition or server shard. Still, for many medium-scale systems, this option balances correctness with implementation simplicity.

Option 3: Kafka-Based Ordered Ingestion per Chat (Recommended)

This design routes all incoming messages through Kafka, mapping each chat to a specific Kafka partition using a consistent hash of chat_id. Kafka natively enforces strict message order per partition, so concurrent sends across devices are correctly serialized. A consumer service reads from each partition, assigns a sequence_number, and writes messages into the database in order. This architecture scales horizontally — as chats grow, more partitions can be added. It also enables async processing like analytics or moderation. The trade-off is increased infrastructure complexity (Kafka ops, offset management, retry logic). But for high-scale systems like Slack, this model decouples ingestion from persistence while providing strong ordering guarantees out-of-the-box.

Based on the previous discussion, our diagram improved as follow:

DD2 - Scalability of WebSocket Infrastructure & Active Session Lookup

Where the Problem Lies

At moderate scale, it’s feasible to have the Chat Server or Fanout Worker query a central device_sessions table or in-memory cache (like Redis) to discover where users are connected. But at the scale of 100M+ concurrent WebSocket sessions, even doing lookups or broadcasting to the right subset becomes expensive.

Three core scalability issues emerge:

Active session populate becomes a bottleneck. You can’t reliably do DB queries for every message just to find out which device is online.
Churn from user connect/disconnect creates delivery uncertainty. As users rapidly go online/offline across devices or networks, ensuring correct fanout (and avoiding duplicate or missed messages) becomes difficult at scale.
Backend storage pressure (DB write & read load). High message throughput plus real-time reads can overwhelm underlying databases unless optimized with caching, denormalization, or async processing.

Let’s break down how to solve them.

Scalability Bottleneck 1: Active Session Populate

At 100M+ concurrent users, the challenge isn’t just about querying where each user is — it’s about how WebSocket servers can efficiently populate and manage active session data in real time, without overloading Redis or requiring centralized lookup for every message.

To avoid this, we recommend shifting to a Pub/Sub fanout model, which scales much more effectively. We can publish all messages to chat-specific channels in a distributed Pub/Sub system (e.g., Redis Streams, Kafka, or NATS). Each WebSocket server dynamically subscribes to the chat channels for the rooms its connected users participate in.

Here is the updated design diagram with data flow illustration:

Scalability Bottleneck 2: WebSocket Server Churn (Users Moving Between Nodes)

Even with recommended solution above, in our current design (Deep Dive 2), each WebSocket server dynamically subscribes to Redis Pub/Sub channels like: chat_channel:<chat_id>. Based on the chat rooms its connected users participate in. But if a user churns (e.g., disconnects and reconnects quickly on a different server), you get some issues:

Stale subscriptions: the old WS server is still subscribed to channels for a user it no longer owns.
Missed/unnecessary fanout: Redis may still push messages to the old WS server, wasting network and compute.
Racing conditions: during rapid reconnects, both the old and new servers may subscribe at the same time.

So the goal here is to ensure only one WebSocket server is subscribed to a given user's chat channels — the one currently serving that user — and update this dynamically as users move.

Here are potentially 2 options to think about:

Option 1: Always Subscribe to All Chat Channels on Connection

In this approach, when a user connects to a WebSocket server, that server looks up all the chats the user is part of (via chat_members table) and immediately subscribes to all corresponding Redis Pub/Sub channels (e.g., chat_channel:<chat_id>). This ensures that the server receives all relevant messages during the user's session. However, if the user reconnects and is routed to a different WebSocket server, both servers may now be subscribed to the same chat channels, potentially duplicating effort. There's no coordination to ensure only one server is actively managing the delivery. This makes the system simpler to implement but leads to wasteful fanout, unnecessary memory pressure, and risk of duplicate messages unless clients de-dupe.‍

Option 2: Leased Session Ownership with Redis TTL (Recommended)

This approach uses Redis to coordinate which WebSocket server owns a user's session at any given time. When a user connects, the WS server sets a key like user:<user_id> = <ws_id> with a short TTL (e.g., 10 seconds) and extends it periodically with a heartbeat (PEXPIRE). Only if a server holds the valid lease does it subscribe to the user’s chat_channel:<chat_id> channels. This guarantees that only one WS server is responsible for delivery per user, avoiding duplicates and reducing fanout overhead. However, it requires more infrastructure coordination, handling edge cases like lease expiration or reconnect races, but pays off in cleaner scalability and efficiency at very large user scale.

Based on Option 2 as we recommend, here is the updated diagram:

Scalability Bottleneck 3: Backend Storage Pressure (DB Write & Read Load)

At the scale of billions of users and hundreds of millions of concurrent sessions, your backend storage becomes a critical bottleneck — particularly around fanout writes, message reads, and metadata lookups during WebSocket reconnects or chat bootstrapping.

Two primary stress points:

High write throughput to messages and inbox tables
Every chat message triggers N inbox writes (for N recipients), plus fanout events, which — at large group scale — translates to millions of writes per second. If your database is not horizontally partitioned, it becomes overwhelmed by concurrent insert and update traffic.
High read amplification for chat metadata and user session state
Every WebSocket reconnection or fanout event requires access to chat_members, device_sessions, and recent messages. Doing these reads from a monolithic DB becomes infeasible under heavy churn or reconnect storms.

There are several strategies we could apply, during your interviews, if you can answer with the 2 main points, that should be well enough.

Strategy 1: Horizontal Partitioning of Messages & Inbox Tables

To handle the massive write load — especially for group messages that fan out to many recipients — we recommend horizontally partitioning the messages and inbox tables. This can be done by sharding based on chat_id (for messages) and recipient_user_id (for inbox). This distributes the write and query load across multiple physical nodes or database instances. A NoSQL system (like DynamoDB with composite keys) can be used, depending on your durability and consistency requirements. The key tradeoff is managing cross-shard queries (e.g., listing a user’s recent chats), which must be handled carefully via query fanout or index denormalization.

Strategy 2: Read-Optimized Caching for Hot Metadata & Fanout Paths

For high-throughput lookups (e.g., chat_members, device_sessions, user presence), introducing dedicated caching layers like Redis or Memcached can significantly offload the primary database. For example:

Cache active chat membership per user (user:<id>:chats)
Cache online device sessions (user:<id>:devices)
Cache recent message slices (chat:<id>:recent_messages)

These caches should be asynchronously updated via change data capture (CDC) streams or near-real-time workers. They reduce DB read QPS during WebSocket reconnection, chat scrollback, or large fanouts. However, challenges include cache invalidation on membership changes and consistency under race conditions, so TTLs and refresh policies must be carefully tuned.

Hence, with all the recommended improvement, the design diagram looks like this:

DD3 - Multi-Device Management

Where the Problem Lies

In modern messaging platforms, it’s common for users to be logged in on multiple devices simultaneously — e.g., phone, laptop, tablet. This creates a new class of delivery and consistency challenges. Our system must ensure that:

All active devices receive real-time delivery for every message.
Devices stay in sync with message state (e.g., de-duping, ordering).
Disconnected sessions receive reliable replay when reconnecting.
We avoid duplicated fanout or missing messages.

At small scale, this seems straightforward. But as the number of connected devices grows (e.g., 3+ per user, 1B+ users), the operational overhead — from live fanout to reconnection handling — becomes massive.

Proposed Solution: Per-Device Session Registry + Inbox Replay

To support seamless multi-device experience, we break the problem into online and offline flows, and handle them with a mix of local mapping and durable persistence.

Step-by-Step Message Delivery Flow (Multi-Device Aware)

1. Device Connects

When a user device connects to the WebSocket server, it registers its session by writing:
device_sessions:{user_id}:{device_id} → ws_connection
(typically stored in Redis or an in-memory map).

2. Message Fanout

When a new message is delivered (via Pub/Sub fanout), the WebSocket server looks up all active sessions for the recipient user.
It sends the message to each connected device (e.g., both phone and desktop).

3. Offline Fallback

If a device isn’t connected, the message is stored in the inbox table, keyed by user_id and device_id.
This ensures the message is queued for delivery when that device comes back online.

4. Device Reconnects

Upon reconnection, the server checks the inbox for that specific (user_id, device_id) and delivers any missed messages.
After successful delivery, the entries are cleared or marked as delivered.

Bonus Point – Client-side message de-duplication logic

In a multi-device setup, the same user might receive the same message twice — once via WebSocket on one device, and again via inbox replay on another device after reconnecting. Without deduplication, this results in duplicated messages, phantom notifications, or confusing message history.

How It Works:

Every message carries a globally unique message_id, assigned at ingestion.
Each client maintains a local cache of recently received message_ids (e.g., 1000 entries).
On message receive:
- If message_id is new → render and store.
- If already seen → skip render, silently drop.

This ensures consistent user experience across reconnects, devices, and delivery paths — without relying on perfect backend deduplication.

Final Thoughts

Designing a Slack-like system goes far beyond just sending and storing messages. It’s about architecting a real-time, distributed, resilient messaging backbone that can support massive concurrency and device diversity — all while keeping latency low and user experience seamless.

In this article, we walked through every major building block — from creating chats, managing group members, to sending rich media, storing messages, and triggering WebSocket or push notifications. More importantly, we explored deep-dive design tradeoffs that help you answer four of the toughest system design challenges often brought up in interviews:

Message Ordering in Group Chat: We use Kafka-based ingestion and per-chat sequence numbers to enforce a canonical timeline for all messages — ensuring consistent rendering even when messages are sent concurrently from distributed devices.
Real-Time Notifications at 100M+ Scale: We designed an online + offline fanout mechanism using WebSocket and inbox tables, and built in notification delivery through Redis pub/sub and fallback logic for offline users — without overloading the backend.
Deletion Propagation with Retention Guarantees: We built an audit-compliant deletion flow that respects sender-side deletion, ensures correct client updates across devices, and prevents resurfacing of deleted content via inbox filtering or replay rules.
WebSocket Churn and Multi-Device Reconnects at Scale: We tackled infrastructure churn by introducing session leasing with Redis TTL, a pub-sub based fanout by chat channel, and client-side deduplication to handle reconnect storms — ensuring clean handoff and minimal duplication when users rapidly connect from multiple locations.

Together, these answers formed the backbone of a resilient, scalable chat infrastructure. Each challenge covered with trade-offs between consistency, scalability, latency, and complexity — and our deep dives laid out concrete, production-ready solutions for each.

How ShowOffer can help you?

We've included callouts and open-ended design prompts throughout this write-up — perfect for self-practice or interview discussion. If you want to walk through this system design in a coach session with the author, book a session with author at ShowOffer.io. We're here to help you sharpen your skills, gain confidence, and land your next big offer.

Coach + Mock

Practice with a Senior+ engineer who just get an offer from your dream (FANNG) companies.

Schedule Now

Content:

Slack

Unlock Full System Design Access