Design a real-time leaderboard service for an online game

Online Game Leaderboard

Coach with Author

Book a 90-minute 1:1 coaching session with the author of this post and video — get tailored feedback, real-world insights, and system design strategy tips.

Let’s sharpen your skills and have some fun doing it!

Design a real-time leaderboard service for an online game that:

  • Continuously ingests players’ scores.
  • Lets players see how they rank:
    • Globally (top‑N and current rank).
    • Locally around them (K players above and below).
    • Within their social graph (friends‑only leaderboard).
    • Within time windows such as daily, weekly, and seasonal leaderboards.

Here are some examples we made, and the following design aligns with these assumptions:

1. Game & Score Model

  • Score is an integer.
  • Each player has one active highest score (mode = “HIGH_SCORE”) per leaderboard (per game × season × region × period type) ⇒ Score is non-decreasing within a leaderboard period.
  • System is multi-game and multi-leaderboard (e.g., different modes), but each leaderboard is keyed separately.

2. Scale & Traffic Model

  • 100M DAU.
  • Writes:
    • 100–300M score updates / day.
    • Peak: ~10K score updates / second (across all regions).
  • Reads:
    • ~5× write volume → ≈ 1B leaderboard reads / day.
    • Mix of:
      • “My rank + K neighbors”.
      • Top-N (global / regional / time-windowed).
      • Friends-only leaderboards.

3. Leaderboard Scope (Multi-Dimension Support)

  • Global leaderboard — Per game / per season, across all regions.
  • Regional leaderboard — Regional splits such as NA / EU / APAC.
  • Time-window leaderboard
    • Daily / weekly / monthly leaderboards.
    • Each period starts at the exact first day (or boundary) of the period (no rolling windows).
  • Friends leaderboard — Rankings restricted to the user’s friends list (social graph provided by a separate service).

Functional Requirements

FR1 – Ingest & Update Player Scores

  • The system must accept score updates for a player on a given leaderboard.
  • Input includes: player_id, leaderboard_id (game / mode / region / period), and new_score plus an idempotency key (e.g., match_id).
  • The system must maintain a non-decreasing integer score per (player_id, leaderboard_id).

FR2 – View My Global Rank & K Neighbors

  • Given player_id and leaderboard_id, the system must return:
    • The player’s current score.
    • The player’s current global rank.
    • Up to K players above and K players below the user on the global leaderboard.
  • Must handle edge cases (player near top/bottom → fewer than K neighbors).
  • Results should reflect recent updates in near real-time (tied to our <500 ms update SLO).

FR3 – View K Friends Around Me

  • Given player_id, leaderboard_id, and K, the system must return:
    • The player’s rank and score within their friends-only leaderboard.
    • Up to K friends above and K friends below them.
  • Uses an external friends/social graph as input; our service filters or indexes rankings based on that list.

FR4 – View Top-N Leaderboards (Global / Regional / Time-Windowed)

  • Given a leaderboard_id and N, the system must return the top-N entries (player_id, score, rank) for that leaderboard.
  • Must support multiple scopes via leaderboard_id:
    • Global (per game / per season, all regions).
    • Regional (e.g., NA / EU / APAC).
    • Time-windowed (daily / weekly / monthly / seasonal, each with a strict start date).
  • Used for classic “Top 100 / Top 1000” views and tournament pages, and should refresh near real-time as scores change.

These aren’t just “screens” the product wants — each FR encodes a distinct access pattern that drives the whole design:

  • FR1 (Ingest score) → ultra-high write throughput and idempotent updates per (player_id, leaderboard_id).
  • FR2 (My global rank + K neighbors) → fast range queries around a single user on a sorted index (the core “ranked set” problem).
  • FR3 (My rank among friends)intersection of the global ranking with a dynamic friend set (social-graph flavored ranking).
  • FR4 (Top-N by scope/time window) → hot prefix reads on a sorted structure, plus multi-dimensional keys (game × region × period).

If you get these four access patterns right, most follow-up questions (sharding, caching, K-neighbors, hotspots, recovery) become specific implementation choices, not surprises.

Non-Functional Requirements

NFR1 – Correctness & Determinism

  • A player’s score on a leaderboard is never lost, mis-ordered, or double-counted.
  • Updates are monotonic and idempotent per (player_id, leaderboard_id) (e.g., keyed by match_id).

NFR2 – Real-time Freshness & Latency

  • Write → visible rank (unchanged):
    • p95 ≤ 500 ms, p99 ≈ 1 s for the player’s own score.
  • Read APIs (my rank, neighbors, top-N), server-side:
    • p95 ≤ 100 ms, p99 ≤ 500 ms.
  • It’s fine if some pages (e.g., big top-N lists) sit closer to the tail; players experience this as “instant enough” as long as it’s clearly under ~0.5–1 s end-to-end including network.

NFR3 – Availability & Graceful Degradation

  • Leaderboard reads and score updates: ≥ 99.9% availability.
  • Under failures, prefer serving last-known data (cached rank / top-N) over errors.
  • If degraded, surface simple hints like “last updated X seconds ago” instead of breaking UX.

NFR4 – Scalability & Hotspot Resistance

  • Handle 100–300M score updates/day → peak around 10K updates/sec.
  • Handle ~1B leaderboard reads/day (~5× writes).
  • Sharding / caching must avoid any single hot shard (e.g., top of global board or celebrity profiles) becoming a bottleneck.
  • System scales horizontally by adding nodes, not by vertically beefing up a single box.
▶ 💡 Why This Order?
  • NFR1 (Correctness) – If the leaderboard shows the wrong score or rank for you, the product loses trust immediately; correctness for your own score is non‑negotiable.
  • NFR2 (Freshness & Latency) – Once it’s correct, it needs to feel real-time and snappy; otherwise players assume it’s buggy or laggy.
  • NFR3 (Availability) – A correct, fast leaderboard that’s often down is still useless, so we keep it up and degrading gracefully.
  • NFR4 (Scalability) – Finally, we make sure the same guarantees hold at 100M DAU and beyond, without hot shards melting the system.

High Level Design

FR1 – Ingest & Update Player Scores

API (Abstract)

We’ll assume the game backend calls us after each match.

1. Endpoint (abstract)

POST /leaderboards/{leaderboard_id}/players/{player_id}/score Content-Type: application/json { "new_score": 12345, "match_id": "match-uuid-123", // idempotency key }

2. Response (example)

{ "player_id": "player-123", "leaderboard_id": "lb-ranked-na-s12-week-2025-11-17", "score": 12345, "previous_score": 12000, "last_match_id": "match-uuid-123", // last match that actually change sco "updated_at": "2025-11-17T10:15:00Z", "update_applied": true // false if this was a pure idempotent retry }

Core Entities

1.leaderboards

  • Describes one concrete leaderboard: game + mode + region + season + time window.
  • Tells us where a score update should go and whether that leaderboard is active.
  • Encodes score policy so we know we’re in the HIGH_SCORE world (monotonic best-score).

2. score_events (append-only log of raw updates)

  • Logs every match event that tried to update a leaderboard.
  • Useful for audit, debugging, and later replay / rebuild if needed.
  • Enforces uniqueness on (leaderboard_id, player_id, match_id) so the same match can’t be ingested twice.

3. leaderboard_scores (canonical per-player state)

  • Holds the current visible score per (player, leaderboard).
  • This is the row all future FRs will query for:
    • My score, my rank, K neighbors, top-N, etc.
  • Encodes the HIGH_SCORE, monotonic semantics and idempotency.
▶ Score Table: leaderboard_player_score
Column Comment
leaderboard_id Leaderboard this score belongs to (PK part 1; FK to leaderboards).
player_id Player identifier (PK part 2).
score Current high score for this player on this leaderboard (monotonic under HIGH_SCORE policy).
last_match_id The last match that actually changed this high score (used for idempotency and debugging).
last_update_ts When this high score was last updated.
version Optional version counter for optimistic locking / concurrency control.

Workflow

Step 1 - Match service sends score update

  • After a match, the match service POST the leaderboard_id, player_id, match_id, new_score to API-GW.

Step 2 - API-GW routes to Score Ingestion Service

  • API-GW does auth / basic validation and forwards the request to the Score Ingestion Service.

Step 3 - Score Ingestion validates leaderboard & logs event

  • Checks that the leaderboard_id exists and is ACTIVE.
  • Inserts a row into score_events (leaderboard_id, player_id, match_id, new_score, event_ts).
  • The unique index on (leaderboard_id, player_id, match_id) enforces idempotency (duplicate match → treated as retry).

Step 4 - Apply HIGH_SCORE rule using leaderboard_scores

  • Reads current row for (leaderboard_id, player_id) from leaderboard_scores.
  • If match_id == last_match_id → retry, no change.
  • Else if new_score <= current score → not a new high score, no change.
  • Else → update leaderboard_scores with score = new_score, last_match_id = match_id, last_update_ts = now().

Step 5 - Respond back to match service

  • Score Ingestion Service returns final score, previous_score, last_match_id, and update_applied = true/false → API-GW → match service.

Design Diagram

FR2 – View My Global Rank & K Neighbors

▶ Assumption in High-Level Design

We need to provide a working solution to the easy case here, i.e. global users with their scores can store inside one single table that can be sorted now. In deep dives with scalability, we need to extend this requirement to practical volume.

APIs

GET /leaderboards/{leaderboard_id}/players/{player_id}/rank

Query params

  • k – how many neighbors above and below to return (e.g., k=2 → up to 2 above + 2 below).

Response (example)

{ "leaderboard_id": "lb-ranked-na-s12-2025-11-18", "player_id": "player-123", "score": 12345, "global_rank": 452, // 1-based rank "neighbors": [ { "player_id": "player-119", "score": 12410, "rank": 450 }, { "player_id": "player-402", "score": 12380, "rank": 451 }, { "player_id": "player-123", "score": 12345, "rank": 452, "self": true }, { "player_id": "player-777", "score": 12340, "rank": 453 }, { "player_id": "player-990", "score": 12320, "rank": 454 } ] }

Behavior guarantees

  • Uses the same HIGH_SCORE semantics as FR1 (score = best-ever in this period).
  • Ranks are deterministic:
    • Sort by: score DESC, last_update_ts ASC, player_id ASC.
  • Handles edges (top / bottom of leaderboard) by returning fewer than k neighbors where necessary.

Core Entities

We reuse the entities from pre-defined ones; no new tables are required. To support deterministic ordering and rank queries from leaderboard_scores table:

▶ Assumption in High-Level Design

We need to provide a working solution to the easy case here, i.e. global users with their scores can store inside one single table that can be sorted now. In deep dives with scalability, we need to extend this requirement to practical volume.

Workflow

Step 1 - Client calls API-GW for my rank & neighbors

  • Sends: leaderboard_id, player_id, k to GET /leaderboards/{leaderboard_id}/players/{player_id}/rank.

Step 2 - API-GW routes to Leaderboard Read Service

  • Performs auth/basic checks, forwards to Leaderboard Read Service.

Step 3 - Read my current score & position input

  • Leaderboard Read Service queries leaderboard_scores for (leaderboard_id, player_id).
  • If no row → player is unranked (return 404 or rank = null, depending on product choice).
  • If row exists → capture my_score and my_last_update_ts.

Step 4 - Compute my global rank & K neighbors (Single Table Case)

  • Use a single ordered query over leaderboard_scores for this leaderboard_id with deterministic ordering:
    • ORDER BY score DESC, last_update_ts ASC, player_id ASC.
  • Conceptually (or via a window function):
    • Compute row_number for every player = their rank.
    • Find my row_number = my_rank by my player_id
    • Return rows where rank is between my_rank - k and my_rank + k.
  • This yields:
    • My own row (with rank = my_rank).
    • Up to k players above and k below.

Step 5 - Return rank + neighbors to the client

  • Leaderboard Read Service builds the response:
    • global_rank = my_rank.
    • score = my_score.
    • neighbors = [players above, me, players below].
  • Sends back via API-GW to the client.

Again, this keeps FR2 simple and correct:

  • Single DB, no sharding/caching yet.
  • Deterministic ranking based on score, last_update_ts, player_id.
  • Clear vertical path ready to scale later (Redis sorted sets, sharding, etc.) when you get to the deep dives.

Design Diagram

FR3 – View K Friends Around Me

💡 Assumption in High Level Design

We’re solving the easy, single-partition case first:

  • All scores for this leaderboard live in one leaderboard_scores table.
  • Each player’s friend list is small enough that we can fetch it in one call from a separate Social Graph / Friends Service and process it in memory.
  • In deep dives, we’ll extend this to large graphs and sharded leaderboards (pre-materialized friend indexes, caching, etc.).

API

GET /leaderboards/{leaderboard_id}/players/{player_id}/friends-rank?k={k}

Query params

  • k – how many neighbors above and below to return within the friends set
  • (e.g., k=2 → up to 2 friends above + 2 friends below).

Response (example)

{ "leaderboard_id": "lb-ranked-na-s12-2025-11-18", "player_id": "player-123", "score": 12345, "friends_rank": 7, // 1-based rank among friends only "friends_total": 23, // total friends who are on this leaderboard "neighbors": [ { "player_id": "friend-010", "score": 13010, "rank": 5 }, { "player_id": "friend-037", "score": 12800, "rank": 6 }, { "player_id": "player-123", "score": 12345, "rank": 7, "self": true }, { "player_id": "friend-222", "score": 12000, "rank": 8 }, { "player_id": "friend-555", "score": 11990, "rank": 9 } ] }

Behavior guarantees

  • Uses the same HIGH_SCORE semantics as FR1/FR2 (score = personal best in this period).
  • Ranking is done only among the player’s friends (plus optionally the player themselves).
  • Ordering is deterministic within the friend subset:
    • Sort by: score DESC, last_update_ts ASC, player_id ASC.
  • Handles edges:
    • If you’re near the top or bottom of your friend leaderboard, you may see fewer than k friends above/below.
  • Only friends who have a score row in this leaderboard_id are counted in friends_total and ranking.

Core Entities

We reuse existing leaderboard entities:

  • leaderboards – validate leaderboard_id and ensure reads are allowed.
  • leaderboard_scores – get scores for the player and their friends.

And we introduce a simple friends graph owned by a Social Graph / Friends Service (logically a separate system, but we’ll show its table so the FR feels concrete):

friendships

Stored and managed by the Social Graph service. Our leaderboard only reads from it.

📘 Friendship Table Schema

Column Comment
user_id The player who owns this friend edge.
friend_id The other player in the friendship (user_id’s friend).
status Friendship status: ACCEPTED, PENDING, BLOCKED, etc.
created_at When this friendship (or request) was created.
updated_at Last time this friendship status was updated.
  • Primary key can be (user_id, friend_id).
  • For undirected friendships, you can either:
    • Store two rows (A,B and B,A), or
    • Store one row and query both directions; the exact choice isn’t critical for this HLD.

Workflow

Step 1 – Client calls API-GW for friends rank

  • Game client sends
  • GET /leaderboards/{leaderboard_id}/players/{player_id}/friends-rank?k={k}
  • to API-GW.

Step 2 – API-GW routes to Leaderboard Service

  • API-GW performs auth / basic validation.
  • Forwards the request to the Leaderboard Service.

Step 3 – Fetch friend list from Social Graph

  • Leaderboard Service calls Social Graph Service (backed by friendships table):
  • GET /players/{player_id}/friends.
  • Social Graph queries friendships for user_id = player_id and status = 'ACCEPTED'.
  • Returns a list of friend_ids.
  • Leaderboard Service builds friends_set = friend_ids ∪ {player_id}.

Step 4 – Compute friends-only rank and K neighbors

  • Query leaderboard_scores for this leaderboard_id and player_id IN friends_set.
  • Drop any friends with no row (they haven’t played this leaderboard).
  • Sort the remaining rows by:
  • score DESC, last_update_ts ASC, player_id ASC.
  • Assign 1-based rank within this list to get each friend’s friends_rank.
  • Find my_friends_rank for player_id.
  • Take entries where rank is between my_friends_rank - k and my_friends_rank + k.

Step 5 – Return friends rank + neighbors

  • Leaderboard Service builds the response:
    • friends_rank, friends_total, score, and the neighbors array (friends above, self, friends below).
  • Sends the response back to the client via API-GW.

Design Diagram

FR4 – View Top-N Leaderboards (Global / Regional / Time-Windowed)

📘 Foundation & Assumption

  • Global, regional, and time-windowed leaderboards are all represented as separate rows in leaderboards and addressed by a single leaderboard_id.
  • FR4 is always:
    “Given a leaderboard_id, return the Top-N players on that leaderboard.”
  • In this High Level Design, we still assume a single-partition case:
    • All scores for a leaderboard_id live in one leaderboard_scores table.
    • N is modest (Top 10), so a single ordered query is fine.

API

GET /leaderboards/{leaderboard_id}/top?n={N}

Query params

  • n – how many top entries to return (e.g., n=10).
    • Service enforces a max N (e.g., 100) and clamps larger values.

Response (example)

{ "leaderboard_id": "game1:HIGH_SCORE:NA:S12:DAILY:2025-11-18", "top_n": 5, "entries": [ { "player_id": "p1", "score": 20000, "rank": 1 }, { "player_id": "p2", "score": 19880, "rank": 2 }, { "player_id": "p3", "score": 19880, "rank": 3 }, { "player_id": "p4", "score": 19750, "rank": 4 }, { "player_id": "p5", "score": 19610, "rank": 5 } ] }

Behavior guarantees

  • Uses HIGH_SCORE semantics (score = personal best in this period).
  • Top-N is per leaderboard_id:
    • If that ID represents “Global / Season S12 / Daily 2025-11-18”, we only show that scope.
  • Ordering is deterministic:
    • score DESC, last_update_ts ASC, player_id ASC.
  • If n is missing → use a default (e.g., 10).
  • If n > max_n → clamp to max_n.

Core Entities

No change needed, we can skip here.

Workflow

Step 1 – Client calls API-GW for Top-N

  • Game client sends
  • GET /leaderboards/{leaderboard_id}/top?n={N}
  • to API-GW.

Step 2 – API-GW routes to Leaderboard Service

  • API-GW does auth / simple validation.
  • Routes the request to the Leaderboard Service.

Step 3 – Validate leaderboard and normalize N

  • Leaderboard Service looks up leaderboards by leaderboard_id:
    • Ensures it exists and status ∈ {ACTIVE, FROZEN}.
  • Optionally reads metadata (game, region, period) for logging / response.
  • Normalizes N:
    • Default if null (e.g., 10).
    • Clamp to max_n if too large.

Step 4 – Query Top-N rows from leaderboard_scores

  • Runs an ordered query on leaderboard_scores using idx_scores_rank:
‍‍
SELECT player_id, score, last_update_ts FROM leaderboard_scores WHERE leaderboard_id = :leaderboard_id ORDER BY score DESC, last_update_ts ASC, player_id ASC LIMIT :N;

  • As results stream back, Leaderboard Service assigns 1-based ranks in order: 1, 2, …, N.

Step 5 – Build response and return to client

  • Builds the response:
    • leaderboard_id
    • top_n = N
    • entries = [{ player_id, score, rank }...]
  • Returns it to the client via API-GW.

Design Diagram

▶ Pagination for Long Leaderboards (Offset + Cursor)

So far FR4 returns a simple Top-N. In practice, you often need “next page” behavior (e.g., show ranks 1–50, then 51–100).

A simple, robust pattern is:

  • Request:
    GET /leaderboards/{leaderboard_id}/ranks?limit=50&cursor=XXXX
  • Response adds:
    • entries: [...]
    • next_cursor: "YYYY" (or null if no more)

Implementation idea:

  • On the first page, cursor is empty → we start from rank 1 (no OFFSET).
  • For each page, we remember the last item’s ordering key
    (score, last_update_ts, player_id) inside next_cursor.
  • On the next request, we resume from just after that tuple instead of doing
    OFFSET 50 (which gets slower as you go deeper).

Why this is good:

  • No big OFFSET scans – efficient even on very large leaderboards.
  • Stable ordering under writes – if someone jumps into the Top 10 between requests, your cursor still means “continue after this last (score, ts, id)”, not “skip 50 rows blindly.”
  • Fits nicely with our deterministic ordering:
    ORDER BY score DESC, last_update_ts ASC, player_id ASC
▶ From Simple HLD → Deep Dives (Scaling to NFRs)

Everything we’ve drawn in HLD assumes the easy world:

  • Single-region, single score DB partition per leaderboard_id.
  • All reads (FR2/FR3/FR4) and writes (FR1) can hit that DB directly.

This is enough to show correctness and API/data flows, but it won’t hit our NFRs at 100M DAU.

In the deep dives, we’ll evolve each FR to satisfy:

  • Correctness under high write volume
    • Keep the Top-10 / Top-100 always accurate even with 10K+ updates/sec.
    • Use idempotent, log-backed ingestion so we never double-count or lose updates.
  • Low write→rank latency
    • Move ranking into in-memory indexes (e.g., Redis sorted sets / custom rank index) per leaderboard_id.
    • Make FR2 (“my rank + K neighbors”) a cache hit in the common case.
  • High write throughput
    • Add a stream / queue between Score Ingestion Service and the rank index so DB writes and rank updates are decoupled.
    • Shard leaderboards and hot ranges to avoid a single hot shard.
  • Fast, cheap reads
    • Cache Top-N per leaderboard_id and refresh on change, instead of recomputing from DB every time.
    • Use cursor-based pagination for deeper browsing without heavy OFFSET scans.

Think of the HLD as “correct single-node version.”

Deep dives will answer: How do we keep the exact same APIs and correctness rules while scaling to 100M DAU, billion reads/day, and heavy hotspots at the top of the board?

Deep Dives

DD1 – How to Maintain a Global Leaderboard at Billion Updates/Day Without a Hot Shard

▶ What breaks in the simple HLD?

In the HLD, for each leaderboard_id:

  • All FR1 writes hit a single leaderboard_scores partition.
  • FR2/FR4 reads run ORDER BY score DESC, last_update_ts ASC, player_id ASC on that same index.

At hundreds of millions / billions of updates per day, this violates our NFRs:

  • Hot shard – one DB node / partition per leaderboard_id becomes a write + read hotspot.
  • Latency – heavy writes contend with sorted reads (Top‑N, K‑neighbors) on the same index.
  • Availability – if that partition is slow, the entire leaderboard is degraded.

We need to move ranking out of the single DB node and spread load without losing exact ordering.

Options to Discuss

Option A – Shard by leaderboard_id (simple, preferred when it fits)

  • Each concrete leaderboard (game + mode + region + season + period) lives in one sorted set on one ranking node (e.g., Redis ZSET):
  • key = leaderboard_id.
  • All members of that leaderboard are on the same node:
    • Top-N: ZREVRANGE leaderboard_id 0 N-1.
    • “K around me”: ZREVRANK + ZREVRANGE.
  • We then distribute leaderboards across many nodes:
    • node = hash(leaderboard_id) mod R.

Good when:

  • Billion updates/day is spread across many leaderboards.
  • Each single leaderboard’s QPS + cardinality fit comfortably on one node.

Option B – Shard by user_id inside a leaderboard (only if one board is too big)

  • For a single massive leaderboard_id, partition players by player_id across S shards:
    • shard_id = hash(player_id) mod S.
  • Pros:
    • Great write distribution, matches Kafka partitioning by user.
  • Cons:
    • Global range queries (Top-N, K-neighbors) now require an extra global index / aggregator to merge shards.

Useful only when one leaderboard truly can’t live on a single node.

Final Solution: Ranking tier + per-leaderboard sharding.

Workflow

Step 1 - Score update comes in (same as HLD FR1 entry)

  • Match Service → POST /leaderboards/{leaderboard_id}/players/{player_id}/score → API-GW → Score Ingestion Service.

Step 2 - Score Ingestion enforces correctness + writes DB + emits event

  • Applies HIGH_SCORE + idempotency (FR1 logic).
  • Upserts leaderboard_scores in the score DB (canonical truth).
  • Publishes a message to Kafka topic score_updates with key = player_id, value = {leaderboard_id, player_id, new_score, last_update_ts}.

Step 3 - Ranking Updater consumes Kafka and routes by leaderboard

  • Ranking Updater workers consume score_updates.
  • For each event, compute ranking_node = hash(leaderboard_id) mod R.
  • Send an update (e.g., ZADD) to that node for key = leaderboard_id, member = player_id.

Step 4 - Ranking Cluster maintains in-memory sorted sets per leaderboard

  • Each node in the Ranking Cluster holds many sorted sets, one per leaderboard_id it owns.
  • These sets are the authoritative in-memory rank index (exact order for Top-N and K-neighbors).

Step 5 - Read APIs (FR2/FR4) hit Ranking Cluster first

  • Leaderboard Service (for “my rank + K neighbors” and “Top-N”) now queries the relevant leaderboard_id sorted set in the Ranking Cluster.
  • DB is used for durability, cold recovery, and consistency checks—not for hot ranking queries.

So with our solution:

  • FR2/FR3/FR4 will now read primarily from the Ranking Cluster, not directly from DB.
  • This lets us handle billion+ updates/day across many leaderboards without a single hot shard, while keeping exact order and K-neighbor semantics.

Design Diagram

▶ Further Discussion – Why Redis Sorted Sets for Ranking (vs. Just Using a DB?)

Our design uses two storage layers with different jobs:

  • A durable database (leaderboard_scores in Postgres/Cassandra/DynamoDB) as the source of truth.
  • An in-memory ranking tier (Redis ZSETs) as the sorted index optimized for rank/Top-N.

Why not just use the DB for everything?

  • Relational DB with indexes:
    • Great for point lookups and small queries.
    • But ROW_NUMBER() OVER (ORDER BY ...) or "rank + neighbors" needs full index scans or heavy window functions.
    • At 100M DAU / 1B reads/day, this hits CPU/latency limits and creates hot shards on leaderboard_id.
  • Wide-column / key-value stores (Cassandra, DynamoDB):
    • Excellent for key-based reads and time-series.
    • Poor fit for “give me Top-N” or “20 around this user” without extra in-memory index.

Why Redis ZSETs fit our access patterns:

  • Native sorted set abstraction:
    • ZADD to upsert a player’s score.
    • ZREVRANK to get a player’s rank.
    • ZREVRANGE to get Top-N or a rank window (neighbors).
    • ZMSCORE / ZSCORE to fetch scores for friends in batch.
  • All of these are in-memory, O(log N + K) operations:
    • p95 ≤ 100 ms server-side for “my rank + neighbors”.
    • Low tail latency even under heavy write load.
  • We don’t put all data in Redis:
    • Only active leaderboards live in the ZSETs.
    • Full history and totals live in the DB; Redis is a rebuildable, shared index on top.
▶ Redis ZSETs Functions
  • ZADD
    ZADD game1:HIGH_SCORE:NA:S12:DAILY:2025-11-18 12345 player-123
  • ZREVRANK
    ZREVRANK game1:HIGH_SCORE:NA:S12:DAILY:2025-11-18 player-123
  • ZREVRANGE (Top 10)
    ZREVRANGE game1:HIGH_SCORE:NA:S12:DAILY:2025-11-18 0 9 WITHSCORES
  • ZMSCORE (friends batch)
    ZMSCORE game1:HIGH_SCORE:NA:S12:DAILY:2025-11-18 fr-1 fr-2 fr-3

DD2 – How to Fetch K-Neighbors Around a User (Global + Friends) Fast

▶ What breaks if we stick to the naive approach?

If we tried to serve “K neighbors” directly from the score DB:

  • Global neighbors (FR2)
    • We’d need a ROW_NUMBER() OVER (ORDER BY …) window query + a window around the player.
    • On a big leaderboard this is O(N) work per call and fights with writes → doesn’t hit our p95 ≤ 100 ms target at scale.
  • Friends neighbors (FR3)
    • We’d have to join leaderboard_scores with friendships and still do a windowed rank per user.
    • That’s heavy, hard to cache, and explodes with large friend graphs.

Even with DD1’s ranking tier, we still need to choose how to use it:

  • We don’t want to scan the whole sorted set for each request.
  • We want O(log N + K)-ish reads, even at high QPS.

Options to Discuss

Read Path A. For global neighbors (FR2)

  1. DB window query per request – already rejected (slow, expensive).
  2. Use ranking cluster ZSETs
    • ZREVRANK to get the user’s index (rank).
    • ZREVRANGE on [rank-K, rank+K] to get neighbors.
    • Time: ~O(log N + K) and all in-memory.

Read Path B. For friends neighbors (FR3)

  1. Pre-materialized friend leaderboards per user
    • Every score update fans out to all friends’ private boards.
    • Completely impractical at 100M DAU → huge write amplification.
  2. Scan the leaderboard and filter to friends
    • Obviously too slow (scan entire sorted set).
  3. On-demand intersection using ranking cluster + social graph
    • Fetch friend_ids from Social Graph.
    • Fetch each friend’s score from the same ZSET (batched).
    • Sort friends locally by score and compute rank + K neighbors.
    • Works because friend lists are typically small (O(10–100s)) even if the leaderboard has millions of entries.

We’ll pick:

  • Global neighbors → ZSET rank + range.
  • Friends neighbors → intersection of friend_ids with ZSET, done on demand.

Workflow – Global & Friends K-Neighbors (Combined)

  1. Client → API-GW → Leaderboard Service
    • Global: GET /leaderboards/{lb}/players/{p}/rank?k=K
    • Friends: GET /leaderboards/{lb}/players/{p}/friends-rank?k=K
  2. Build the target ID set
    • Global: id_set = { player_id } (we only care about this player for rank).
    • Friends: Leaderboard Service calls Social Graph → gets friend_ids
    • id_set = { player_id } ∪ friend_ids.
  3. Talk to Ranking Cluster for this leaderboard
    • Find ranking node: node = hash(leaderboard_id) mod R.
    • Global:
      • idx = ZREVRANK(lb, player_id)
      • neighbors = ZREVRANGE(lb, idx-K, idx+K, WITHSCORES)
    • Friends:
      • scores = ZMSCORE(lb, id_set) (or pipelined ZSCORE)
      • Drop IDs with null score.
  4. Compute ranks + neighbors
    • Global:
      • global_rank = idx + 1
      • Neighbors = slice around idx from neighbors list.
    • Friends:
      • Locally sort (id, score, last_update_ts) by
      • score DESC, last_update_ts ASC, player_id ASC.
      • Assign friends_rank = position in sorted list.
      • Take K above / K below the player.
  5. Return response via API-GW
    • Global: { score, global_rank, neighbors[] }
    • Friends: { score, friends_rank, friends_total, neighbors[] }.

This keeps DD2’s story tight:

  • One ranking path (Ranking Cluster) for both global and friends.
  • The only extra step for friends is the Social Graph lookup + local sort on a much smaller set.

Design Diagram

DD3 - How to Handle Hotspots at the Top & Celebrity Users

💡 What breaks if we stick to the naive approach?

Even after DD1 + DD2 (ranking cluster + ZSET per leaderboard), some cases are still scary:

  • The main global leaderboard for your flagship mode gets hammered:
    • Every lobby screen shows Top-10 / Top-100.
    • Streams / tournaments trigger constant polling.
  • Celebrity players (streamers, pros) attract:
    • Tons of “what’s their rank now?” calls.
    • Lots of friends/neighbors lookups around the same (leaderboard_id, player_id).

Symptoms:

  • One ranking node (the one that owns that leaderboard_id) gets disproportionate read traffic.
  • Many calls hit the same hot range (top scores) or the same hot player, wasting work recomputing the same neighbors over and over.
  • If we do nothing, we blow our latency & hotspot NFRs even though the architecture is “correct”.

Options to Discuss

  1. Just scale up the hot ranking node
    • Bigger instance, more CPU, maybe Redis replicas.
    • Works to a point, but you’re still fundamentally centralizing all hot reads.
  2. Split the leaderboard further or multi-shard it
    • Product-level splits: add regions/brackets; fewer players per board.
    • Infra-level: promote that leaderboard_id to a multi-shard configuration (user-id sharded + global aggregator, from DD1).
    • Powerful but more complex, and you don’t always need this.
  3. Add dedicated caches for “hot ranges” and “hot users”
    • Precompute & cache Top-N / Top-K ranges off the ZSET and serve them from a lighter cache tier.
    • Cache “celebrity: rank + neighbors” responses for a short TTL so thousands of fans share the same result.
    • Back these with rate limiting / request coalescing so the ranking node doesn’t get hammered.

We’ll choose Option 3 as the default, with Option 2 as an escalation path if one leaderboard is still too hot.

Solution - Hot Range Cache + Celebrity Cache

💡 Core Idea

Keep the ranking ZSET as the single source of truth, but put cheap caches in front of it so hotspots don’t hammer that node. Caches like:

  • a Top-N/Top-K cache for hot ranges, and
  • a short-TTL per-user “rank + neighbors” cache for celebrities.

1. Hot-range cache (for Top-N and “near the top”)

For each leaderboard_id we maintain a small top window (e.g., top 1000 players):

  • When an update affects someone in the top window, the ranking node refreshes that slice once from the ZSET and writes it into a TopRange Cache (Redis / in-memory list).
  • The TopRange Cache then serves:FR4 Top-N
    • If N ≤ hot_window_size, just read from TopRange Cache.
    • No ZSET call needed.
    FR2 global neighbors near the top
    • If the user’s rank is within the hot window:
      • Use ZREVRANK once to get idx.
      • Slice [idx-K, idx+K] from the cached top window in memory (instead of ZREVRANGE on the ZSET).
  • Effect:
    • Home-page “Top 10 / Top 100” traffic hits the cheap cache.
    • The ZSET is used mainly for updates and occasional cache refreshes, not every read.

2. Celebrity “rank + neighbors” cache

At the Leaderboard Service layer, we add a tiny, short-lived cache keyed by:

(leaderboard_id, player_id, k)
  • Value: the full FR2 result: score, rank, neighbors[].
  • TTL: very short (e.g., 200–500 ms).

So when a celebrity is hot:

  • Thousands of viewers share the same cached response in that TTL window.
  • The ranking node only sees one real query per window, not one per viewer.

We can do the same for friends view of a celebrity: cache

(leaderboard_id, celeb_id, friends_view)friends_rank + neighbors with a short TTL.

3. Rate limiting + “upgrade” if still too hot

  • Add per-user / per-IP QPS limits on FR2/FR4. If someone spams refresh, they get cached data or a gentle “slow down” response.
  • If a specific leaderboard_id is still too hot even with:
    • Top-range cache, and
    • Celebrity cache, and
    • Rate limiting
    then we promote that leaderboard to the heavier design:
    • Shard its players by player_id across multiple ranking nodes.
    • Use a small aggregator to merge shard top lists into a global Top-N and handle K-neighbors.

That way, we handle most hot cases with simple caching, and only pay the complexity of multi-shard ranking for the truly extreme leaderboards.

Workflow - Hotspots

Step 1 – Writes: same as DD1 + tiny extra

  • Match → API-GW → Score Ingestion → Kafka → Ranking Updaters → Redis ZSET per leaderboard_id.
  • If an update lands in the top window (e.g., top 1000), that ranking node:
    • Refreshes top_window once from the ZSET.
    • Stores it in **TopRange Cache[leaderboard_id]`.

Step 2 – Reads: Leaderboard Service checks caches first

For FR2 / FR3 / FR4:

  1. Client → API-GW → Leaderboard Service.
  2. Celebrity cache (L1 at service layer):
    • Key: (leaderboard_id, player_id, K).
    • If hit → return rank + neighbors immediately.
  3. Top-range cache (for Top-N + near-top):
    • If N or the player’s rank is within the top window → slice from TopRange Cache.
  4. If neither cache helps → use DD2 logic against Redis:
    • Global: ZREVRANK + ZREVRANGE.
    • Friends: ZMSCORE + local sort.
  5. On miss, write result back to celebrity cache (short TTL) and respond.

Step 3 – Safety valves

  • Per-user / per-IP rate limiting on FR2/FR4.
  • If a specific leaderboard_id is still too hot, promote just that board to the heavier multi-shard ranking design.

Design Diagram

DD4 - How to Recover from Cache / Ranking Loss

💡 What’s the failure & why it matters?

In our design:

  • Redis cluster (ZSETs) = live rank index (FR2/FR3/FR4).
  • TopRange Cache & Celebrity Cache = extra hot caches on top.
  • Score DB (leaderboard_scores) + Kafka score_updates = durable truth + ordered log.

If a Redis node (or the whole cluster) wipes its data:

  • We lose:
    • All ZSETs (global rank, neighbors, Top-N).
    • TopRange & celebrity caches (they’re cheap to rebuild).
  • If we do nothing, all reads fail or become wildly wrong → violates correctness & availability.

We need a way to:

  • Rebuild ranking state quickly and deterministically, and
  • Serve something reasonable while rebuilding.

Options to Discuss

  1. Rebuild only from DB (leaderboard_scores)
    • Bulk scan each leaderboard and ZADD into Redis with the right ordering.
    • Simple, correct, but can be slow if we have many leaderboards / players.
  2. Rebuild only from Kafka (score_updates)
    • Start from earliest offsets and re-apply all updates.
    • Perfect ordering, but replaying months of events can be too slow.
  3. Hybrid: DB snapshot + Kafka catch-up
    • Use DB to quickly get a current snapshot.
    • Then replay only the recent tail from Kafka to catch up.
    • Apply idempotent logic using version / match_id.

We’ll use Option 3.

If the Redis ranking cluster (and caches) are lost, we:

  1. Rebuild ZSETs from the canonical DB.
    • For each leaderboard_id on the failed node:
      • Scan leaderboard_scores in DB and ZADD (player_id, score) back into a fresh ZSET in the correct order.
    • This gives us a current snapshot of the leaderboard in Redis again.
  2. Replay recent updates from Kafka to catch up.
    • Ranking updaters consume the score_updates topic from “snapshot time” onward.
    • For each event {leaderboard_id, player_id, new_score, version}, they update Redis only if version is newer than what’s stored.
    • When lag is near zero, the leaderboard is fully consistent again.
  3. Serve reads from DB while Redis is rebuilding.
    • While a leaderboard is rebuilding:
      • FR4 Top-N: query leaderboard_scores directly with an ORDER BY … LIMIT N.
      • FR2/FR3: compute rank (and neighbors if we want) from DB + Social Graph, and optionally show a “syncing leaderboard” hint.
    • Once Redis is caught up, we switch reads back to the normal Redis + TopRange / Celebrity caches path.

That’s it: DB = source of truth, Kafka = ordered log, Redis = rebuildable cache.

Workflow

Assume one Redis node (or the whole ranking cluster) loses its data.

Step 0 – Normal operation (baseline)

  • Score Ingestion → DB (leaderboard_scores) + Kafka score_updates.
  • Ranking updaters consume Kafka → Redis ZSETs per leaderboard_id.
  • Leaderboard Service reads from Redis + TopRange Cache + Celebrity Cache.

Step 1 – Detect failure & mark affected leaderboards

  1. Redis node fails / restarts empty.
  2. Health checks / monitoring detect:
    • either node down or ZSETs missing for the leaderboards that node owns.
  3. A small coordinator (or config) marks those leaderboard_ids as:
    • status = DEGRADED_REBUILDING in an internal map.

Effect: Leaderboard Service knows “Redis data for these leaderboards is not trustworthy right now.”

Step 2 – Switch reads to DB fallback for those leaderboards

While a leaderboard_id is DEGRADED_REBUILDING:

  • FR4 Top-N
    • Leaderboard Service runs a DB query:
SELECT player_id, score FROM leaderboard_scores WHERE leaderboard_id = :lb ORDER BY score DESC, last_update_ts ASC, player_id ASC LIMIT :N;
  • FR2 (my global rank + K neighbors)Simple, correct fallback:
    1. Read my row from leaderboard_scores.
    2. Compute my rank as:
SELECT COUNT(*) + 1 FROM leaderboard_scores WHERE leaderboard_id = :lb AND (score > my_score OR (score = my_score AND last_update_ts < my_ts) OR (score = my_score AND last_update_ts = my_ts AND player_id < my_id));
  1. Optionally fetch a small window of neighbors around me from DB using ORDER BY ... LIMIT ... OFFSET ... (we accept higher latency here).
  2. Add a “leaderboard is syncing” flag in the response.
  • FR3 (friends rank)
    • Same as before but using DB instead of Redis:
      • Social Graph → friend_ids.
      • Query leaderboard_scores for those friends.
      • Sort in memory and compute friends_rank + neighbors.

Reads stay correct, just slower. UX can show a tiny “syncing” hint.

Step 3 – Rebuild Redis ZSETs from DB snapshot

In parallel, a Rebuild Worker for that Redis node:

  1. Enumerates all leaderboard_ids that belong to the failed node.
  2. For each lb:
  • Scans leaderboard_scores:
SELECT player_id, score, last_update_ts FROM leaderboard_scores WHERE leaderboard_id = :lb;
  • For each row, ZADD into Redis:
key = lb member= player_id score = composite(score, last_update_ts, player_id)
  • This reconstructs the full sorted set for that leaderboard.

After this step, Redis has a correct snapshot as of “snapshot time” (when DB was read).

Step 4 – Catch up from Kafka tail

To close the small gap between “snapshot time” and “now”:

  1. Ranking updaters start consuming score_updates for those leaderboards from snapshot time onward.
  2. Each event contains {leaderboard_id, player_id, new_score, version} (from DD4).
  3. For each event:
    • Read current version for that player in Redis.
    • If event.version > redis.version → apply update (ZADD / score change).
    • Else → ignore (stale / duplicate).

When lag to the head of Kafka is near zero:

  • Mark each rebuilt leaderboard_id as HEALTHY in the coordinator.

Step 5 – Flip reads back to Redis + warm caches

Once a leaderboard is HEALTHY again:

  1. Leaderboard Service resumes normal DD2/DD3 read path for that leaderboard_id:
    • Celebrity Cache → TopRange Cache → Redis ZSET.
  2. TopRange Cache is re-warmed automatically:
    • As soon as top ranks get updated, the ranking node refreshes the top window again.
  3. Celebrity Cache repopulates on demand from live reads.

Design Diagram

💡 You may ask – what about dedicated “High Scalability” requirements?

Agree! We’ve been talking about scalability the whole time, but we haven’t wrapped it into a single “Scalability” section yet. Now, let’s recap what we discussed regarding this high scalability scenario on “Hits 100M DAU / 1B Reads”.

  • Horizontally scaled write path
    • Score Ingestion Service is stateless → scale by adding instances behind API-GW.
    • Kafka score_updates topic is partitioned by player_id → we can add partitions as write QPS grows.
    • Ranking Updater workers are a consumer group → we scale workers linearly with partitions.
  • Sharded ranking tier (no single hot box)
    • Each concrete leaderboard_id (game × mode × region × period) maps to one Redis node; many leaderboards → many nodes.
    • Peak ~10K updates/sec across all leaderboards turns into hundreds of updates/sec per node, not tens of thousands.
  • Efficient read patterns
    • Global K-neighbors: ZREVRANK + ZREVRANGEO(log N + K) in memory.
    • Friends K-neighbors: Social Graph + batched ZSCORE + local sort over O(F) friends, where F ≪ total players.
    • Top-N: ZREVRANGE (or TopRange Cache for hot boards) → constant-time-ish per request.
  • Hotspot protection
    • TopRange Cache for the top window per leaderboard (e.g., top 100–1000) handles most Top-N / near-top traffic.
    • Celebrity cache at the service layer turns thousands of fan requests into one real query per 200–500 ms.
    • Per-user / per-IP rate limiting prevents abuse; extreme leaderboards can be promoted to a multi-shard ranking config.
  • Storage & rebuild
    • leaderboard_scores and score_events live in partitioned DB tables (by game/region/period) to keep any one partition manageable.
    • Redis cluster is rebuildable from DB snapshot + Kafka tail, so we can reshard/replace nodes without impacting long-term scale.

Coach + Mock
Practice with a Senior+ engineer who just get an offer from your dream (FANNG) companies.
Schedule Now
Content: