min read

Top-K and trending topic systems go beyond basic aggregations like ad click counts — they require ranking and returning only the top results. This adds complexity around ordering, windowing, and filtering by dimensions like time or location. It’s a common pattern in systems like YouTube Trending, Twitter hashtags, or Spotify top charts.

Top-K Videos

Written by

Member of Technical Staff at OpenAI

Last revisited

July 2, 2025

Functional Requirements

Users should be able to retrieve the Top-K most popular videos based on predefined ranking criteria.

This is the core capability of the system — allowing end users, internal services, or recommendation engines to request and receive a ranked list of trending videos. The system should return results ordered by a consistent popularity metric (e.g., view count, engagement score, or watch time). This functionality enables features like the “Trending” tab on YouTube, homepage carousels, or content recommendation surfaces.

Users should be able to filter Top-K video queries by geographic location.

Filter conditions can be beyond location, but to keep it simple and representative, we use location filter only. This allows the system to surface region-specific trends — for example, the top 10 videos in the United States, India, or Japan. Focusing on location keeps the scope manageable while still enabling meaningful personalization and relevance. Location-based filtering is essential for global platforms where content popularity varies significantly across regions, and supports both user-facing experiences and country-specific analytics.

Users should be able to define an arbitrary time window for which the Top-K videos are computed.

This means the system must support flexible input of start and end timestamps, such as “from June 1 to June 7” or “from 2025-06-20 10:00 to 2025-06-21 15:00.” Arbitrary time windowing enables a wide range of use cases — from generating daily or hourly trend reports, to supporting experiments, historical analysis, or detecting flash trends during specific events. This flexibility gives users precise control over how video popularity is evaluated.

Non-Functional Requirements

Key Design Trade-off: Exact vs. Approximate Accuracy

A critical design decision for Top-K systems is whether to guarantee exact or approximate ranking accuracy.

For use cases like YouTube Trending Top K, where the goal is to surface broadly representative popular content at scale, approximate methods (e.g., Count-Min Sketch, Heavy Hitters, we will discuss later) offer excellent performance, cost efficiency, and near-real-time responsiveness — with minimal impact on user experience.

In contrast, exact accuracy is essential in systems like leaderboards, where even a slight ranking error may affect fairness, rewards, or reputation. These designs typically rely on Distributed Heaps or precise sorting, but at the cost of higher complexity and resource usage.

[Common Pitfall]

Assuming exact accuracy is always necessary can lead to over-engineered and inefficient systems. Instead, accuracy strategy should match the product’s trust, scale, and latency needs.

[Further Read]

We’ll cover leaderboard system design — where precision matters — in a follow-up chapter on our website, including distributed heaps, sorted aggregators, tie-breakers, and real-time ranking consistency.

For user-facing Top-K use cases like YouTube Trending, approximate accuracy is the preferred approach. It enables real-time responsiveness, scalability, and cost efficiency — while maintaining enough precision to reflect actual trends. Exact accuracy can be reserved for offline use cases where correctness is critical, such as creator revenue reports or audits.

Therefore, the non-functional requirement list here, and in the next chapters, we will by default discuss in approximate accuracy design strategy.

Low Latency: Query response time < 10secs
High Scalability: Handle ≥ 5M events/sec across 10B+ videos
Approx. Precision: ≤ ±1% error with > 95% Top-K accuracy
Cost Efficiency: < 1MB memory per stream using sketches
Data Freshness: Update trends every 30–60 seconds

Data Flow Stages

Why Define Data Flow Stages?

Clearly outlining data flow stages helps break down system responsibilities, optimize for performance at each step, and isolate concerns like ingestion, aggregation, and serving. It enables better scalability, maintainability, and fault tolerance — especially in infrastructure systems where each stage may be independently scaled, tuned, or audited. For Top-K systems, this modularity is key to supporting real-time updates and accurate trend surfacing under high load.

Event Ingestion
- Input: Raw video events from user activity logs (views, likes, etc.)
- Output: Cleaned event stream with extracted dimensions (video ID, timestamp, location)
Pre-Aggregation & Count Tracking
- Input: Event stream
- Output: Approximate counts using structures like Count-Min Sketch or Heavy Hitters per (location, time window) partition
Top-K Materialization
- Input: Aggregated approximate counters
- Output: Precomputed Top-K lists (e.g., top 10 per location per hour), stored in fast-access storage like Redis or a key-value store
Serving Layer
- Input: Top-K cache or store queried by (location, time window)
- Output: Final sorted list of top K videos, surfaced in UIs or downstream services (e.g., home feed, dashboards)
Offline Audit (Optional)
- Input: Full raw event history and materialized counts
- Output: Exact Top-K rankings for billing, experimentation, or historical reporting

Further Discuss - Why Include an Offline Audit Stage In Practice?

While the online system favors speed and scalability using approximate aggregations, an offline audit stage is essential for reconciliation and accuracy validation. It allows the system to recompute exact Top-K results from raw logs for use cases like billing, creator rewards, compliance, or A/B test validation. This dual-path design ensures real-time performance without sacrificing correctness where it matters.

High Level Design

Now that we’ve defined the system’s functional and non-functional expectations, and established the data flow stages, we can translate these requirements into a high-level architecture. We’ll walk through key components such as:

Ingestion and stream processing
Aggregation using Count-Min Sketch
Top-K materialization and storage
Real-time query serving
Optional offline audit for reconciliation

Here is the high level design diagram to resolve each functional requirement and meets data flow stages:

Functional Requirement 1 - Retrieve Top-K most popular videos by view count

The process starts with user activity logs (e.g., video views, likes) emitted from clients. These logs are streamed into Kafka, which buffers and forwards them to a stream processor (Flink in this case).

Each event is parsed and transformed into a lightweight format, keeping only what’s necessary: video_id, location, event_time (bucketed to the minute), and a default score: 1.

The stream processor then performs three key steps:

Grouping by (video_id, location, time)
In-memory counting via a hash map
Periodic emission of Top-K, once per minute per region

This creates a minute-by-minute snapshot of trending content — capturing popularity dynamics with high temporal resolution.

This satisfies the first functional requirement by delivering a ranking of videos by popularity, computed in near real time.

Functional Requirement 2 - Filter Top-K by geographic location

The geographic filter is integrated early in the pipeline. When events are first received, each includes a location field (e.g., "US", "IN", "JP"). This field is preserved through transformation and used as part of the aggregation key in the stream processor.

The counters are stored using this location dimension:

Key: (video_id, location, minute)
Value: count
    

As a result, every emitted Top-K object is location-specific. The final Redis KV store and UI layers query by location and serve the right trending content for that user’s region.

Functional Requirement 3 - Support arbitrary time window for queries

The system writes Top-K results for every minute into a Redis KV store with a composite key:

topk:(location):(minute)

When a user queries a longer window (e.g., past 24 hours), the Trend Query Layer fetches and merges all relevant keys. It aggregates counts across time slices, sorts them, and returns a Top-K over that range.

This time-bucketed design supports composable queries across arbitrary ranges, as the data is already broken down into fine-grained, non-overlapping time units.

[Interview Pitfall Alert]

Many designs use large time windows (e.g., hourly or daily), making it impossible to support flexible user queries. Always think in fine-grained, fixed intervals (like minutes) to preserve flexibility.

Summary

This design satisfies all three functional requirements in a modular and extensible way:

FR1: Popularity is ranked per minute using in-memory aggregation
FR2: Results are partitioned and stored by location
FR3: Minute-level buckets support any user-specified query window

While this design works well for small to moderate scale, the deep dives will show how to evolve it for real-world production loads — introducing Count-Min Sketch, heavy hitters, and scaling strategies to meet non-functional demands.

Deep Dives

Deep Dive 1 - How to Efficiently Support Large-Scale Counting?

Problem: Large-Scale Counting Becomes a Bottleneck at Scale

At its core, a Top-K video system counts how often each video is viewed — but doing so at global scale breaks the naïve solution. Imagine you’re tracking view counts per minute, per region, per video. For a service like YouTube:

10 million active videos
200 regions
Per-minute windowing

That’s 2 billion unique (video_id, location, minute) counters — for just one minute of data.

Let’s say you use standard hash maps or in-memory counters. If each counter uses just 40 bytes (key + value + object overhead), that’s:

2B counters × 40 bytes = 80GB of RAM

Multiply that by an hour (60 mins), and you’re looking at nearly 5 terabytes (4.8TB) of state — just for view counts. Worse, 99% of these videos may have fewer than 5 views, wasting most of this memory.

As the number of active videos or regions increases, the cost balloons. Maintaining these exact counters becomes the system’s bottleneck — in memory, compute, GC overhead, and state checkpointing. At this scale, exact counting simply doesn’t scale.

Resolution: Use Approximate Data Structures Like Count-Min Sketch (CMS)

Count-Min Sketch offers a compelling alternative. Instead of tracking billions of distinct keys, we trade a small amount of accuracy for dramatic gains in space and performance.

What Is Count-Min Sketch?

Count-Min Sketch is a compact probabilistic data structure that estimates frequencies of items in a stream with bounded error and constant memory.

Structure

A 2D array with d rows and w columns
Each row is associated with a different hash function
Incoming items are hashed into each row and corresponding column is incremented

Insert Operation

Say we see an event for video_id = abc123.

We hash it into multiple columns:

h1(abc123) → 12
h2(abc123) → 385
h3(abc123) → 17

We increment those positions:

+----------------------------------------+
h1 | ... 12:[1], ...                     |
h2 | ... 385:[1], ...                    |
h3 | ... 17:[1], ...                     |
+----------------------------------------+

Every time a video_id shows up, we increment its hash buckets.

Estimate Operation

To estimate frequency of abc123, we:

Hash it through all hash functions
Retrieve count from each row
Return min of those values

Why min? Because hash collisions only ever inflate the count, never reduce it.

Estimated count = min(h1[12], h2[385], h3[17])

What CMS Solves

Bounded memory: E.g., 200KB per (location, minute) sketch
O(1) update and lookup
Handles high-cardinality domains (billions of IDs)
Error is controllable via size (w) and number of hash functions (d)

Further Read: Efficiency Enhancement by CMS

There is a good reasoning on why we need to use CMS. Let’s walk through a concrete comparison with calculations:

1. Baseline (Exact Counting)

(video_id, location, minute) combinations: 10M × 200 = 2B
Assume 40 bytes per exact counter (key, metadata, value)
Memory usage per minute:

2B × 40 bytes = 80GB

Multiply this by 60 minutes = 4.8TB/hour
And that’s just view counts — not engagement, likes, or watch time.

2. Count-Min Sketch (Approximate Counting)

Instead of tracking each (video_id, location, minute) pair individually, we maintain one sketch per (location, minute) — e.g., 200 sketches per minute.

Each sketch has size = d × w, where:
- d = 5 hash functions
- w = 10,000 buckets
- Each cell is a 4-byte integer

5 × 10,000 × 4B = 200KB

200 regions × 200KB = 40MB per minute
60 minutes = 2.4GB/hour

Improvement in Efficiency

Metric	Exact Counters	Count-Min Sketch	Savings
Memory per minute	80GB	40MB	~2000×
Memory per hour	4.8TB	2.4GB	~2000×
Count update speed	O(log n) or worse due to map	O(1) array access	Faster, simpler

For near-real-time trending, we don’t need perfect accuracy — we just need to know which videos are among the most viewed. Count-Min Sketch gives us bounded error (~±1%) while enabling a massive scale-up in capacity with minimal infrastructure overhead. However, CMS does not store keys, so you cannot list or sort items.

New Challenge: How can you tell which top K videos?

Imagine your CMS is tracking millions of video views per minute across many regions. It can quickly tell you an approximate count for any video, but it doesn’t store any keys itself. That means:

You can ask: “What’s the count for video_id=abc123?” → Answer: 2810 (approx)
But you can’t ask: “What are the top 10 videos right now?” → CMS doesn’t know!

Why? Because CMS doesn’t store a list of seen video IDs — it only holds counts in hashed positions. There’s no way to enumerate which videos exist, let alone which are the most frequent. To extract the Top-K list, you'd have to scan every possible video ID, estimate each one, and then sort them — an expensive and infeasible task at YouTube’s scale.

Improvement: Pair CMS with a Heavy Hitters Tracker

What is a Heavy Hitters tracker?

A Heavy Hitters tracker (e.g., Space-Saving algorithm) keeps a bounded in-memory list of the K most frequent items observed in the stream. It directly stores:

{
  "video_id": "abc123",
  "count": 2810
}

When a new event arrives:

The CMS is updated to keep broad frequency estimates.
The HH heap is updated to:
- Increment the count if the video already exists.
- If not, evict the lowest-count entry and insert the new one with a guaranteed minimum count, based on the CMS value or an internal bound.

This gives us a tight working set of the most popular videos at any given moment, while still retaining the compact scalability of CMS for everything else.

Let’s say CMS tracks 10 million video IDs with ±1% error.

CMS alone:

Can answer: “How many views for abc123?”
Can’t answer: “Which 10 videos are trending?”

CMS + HH:

HH maintains a 100-item Top-K list in memory.
As videos get views, CMS estimates counts and HH updates its heap.
You can now query “Top 10 videos in US in the last hour” in milliseconds — no reprocessing required.

Therefore the high level diagram with CMS+HH can be drawn as:

Deep Dive 2 - How to handle time-window?

To support queries like “Top 10 videos in the US from 6:00 PM to 6:45 PM”, the system must allow users to specify arbitrary time windows while keeping aggregation efficient and query performance low-latency. This brings up a classic system design challenge: how do we structure time in the system to support flexible queries, while keeping ingestion and storage cost manageable?

There are three commonly used strategies for handling time windows in large-scale aggregation systems: tumbling windows, sliding windows, and event-time bucketing.

Tumbling windows divide the timeline into fixed, non-overlapping intervals (e.g., every 15 minutes). Events are grouped strictly within each interval, and each window produces one independent aggregation. This approach is simple and efficient but lacks flexibility — if a user asks for data from 6:10 to 6:45, and your windows are aligned to 15-minute boundaries, there’s no way to answer the query without either reprocessing raw data or serving partially relevant results.
Sliding windows maintain overlapping aggregations that move forward in time with a fixed step (e.g., a 15-minute window that slides every 5 minutes). This helps with freshness and trend tracking, but introduces redundancy: the same event contributes to multiple overlapping windows. Sliding windows are also expensive to maintain in a streaming system due to frequent state updates and re-computations.
Event-time bucketing takes a more flexible and composable approach. It groups events into small, fine-grained buckets — usually by minute — using the event’s timestamp. This makes it possible to dynamically combine buckets at query time to match any arbitrary time range. This approach introduces more work at query time but gives the system the most flexibility without blowing up ingestion cost or storage footprint.

During the interview, you’d like to provide a thorough comparison and provide your decision made on this system design. Here comes the comparison for each strategy:

Property	Tumbling Window	Sliding Window	Event-Time Bucketing
Supports Arbitrary Window	❌ No	⚠️ Limited (fixed step)	✅ Yes
Ingestion Complexity	✅ Low	❌ High (overlapping updates)	✅ Moderate
Query Efficiency	✅ Fast (precomputed)	❌ Expensive (overlaps)	⚠️ Slower (requires merging)
Memory Efficiency	✅ Good	❌ Redundant storage	✅ Good (flat)
Flexibility	❌ Poor	⚠️ Moderate	✅ Excellent

Among the three strategies, event-time bucketing is the most flexible and scalable solution for supporting arbitrary time window queries. By assigning each event to a small, fixed bucket (such as 1 minute) based on its event timestamp, we avoid the rigidity of tumbling windows and the overhead of overlapping sliding windows. This structure creates a clean, non-overlapping timeline of buckets that can be composed on demand — allowing the system to handle queries like “Top-K from 6:07 to 6:42” without any waste or reprocessing.

This approach is also ideal for approximate counting techniques like Count-Min Sketch, since each minute-level bucket can independently maintain its own compact frequency map. On the ingestion side, the stream processor groups events by (video_id, location, minute) and updates the corresponding sketch or counter. These buckets are then stored in sorted time order, making them easy to scan and merge during query time. The result is a system that is efficient, composable, and fully supports time flexibility — all without requiring complex state duplication or downstream reprocessing.

The Challenge with Event-Time Bucketing

While event-time bucketing solves the flexibility problem elegantly, it introduces a new challenge at query time: performance.

Since data is spread across many small time-aligned buckets (e.g., one per minute), querying an arbitrary time window often requires fetching and merging dozens or even hundreds of buckets. For example, a 3-hour query at 1-minute resolution involves scanning 180 separate sketches. This can quickly degrade query latency, especially under high QPS or wide window ranges.

In this model, each time bucket stores partial scores for video IDs that must be merged and re-ranked at query time. While this design is highly composable, the naive approach of linearly scanning and merging all matching buckets is inefficient and memory-wasteful — particularly for trending surfaces that require real-time responsiveness.

To improve the performance of event-time bucketing, we apply the classic two-pointer technique. Since all time buckets are naturally stored in sorted order, we can use two pointers to identify the exact time range the query requires:

The left pointer finds the first bucket greater than or equal to the query’s start time
The right pointer finds the last bucket before the query’s end time

With these two pointers, the system efficiently slices the relevant segment of the bucket stream, fetching only what’s needed and avoiding unnecessary scans. This allows the query engine to focus on a bounded and ordered slice of time, drastically improving performance for long-range queries. After the scan, we merge the partial Top-K results from each bucket, aggregate the scores (summing across time), and re-rank to return the final Top-K list — all while keeping memory and latency under control.

Here is the updated logical diagram based on this deep dive at a high level:

Additionally, here is a summary on architectural change in each components:

Layer	Before	After (Optimized for Arbitrary Time Windows)
Stream Processor	Tumbling/Fixed window	Per-minute event-time bucketing
Sketch Store	Aggregated by window	Bucketed by `(location, minute)`
Storage	One entry per window	One entry per minute per region
Query Layer	Static lookup or full scan	Two-pointer bounded scan + dynamic merge
Materializer	Optional or slow-refresh	Optional; fast merge from fine-grained buckets

Deep Dive 3 - How to extend to strong accuracy without approximate?

The design we’ve explored so far leverages approximate data structures like Count-Min Sketch and Heavy Hitters to efficiently compute Top-K videos at scale. This strategy is well-suited for real-time, user-facing use cases like YouTube Trending — where high responsiveness, scalability, and acceptable precision (≥95% accuracy) are more important than exact counts.

However, some systems — such as competitive leaderboards, financial reporting, or creator payout tracking — demand exact accuracy. In these scenarios, even small ranking errors are unacceptable, and approximate methods can’t provide the guarantees required.

To support strong accuracy, the architecture needs to shift significantly:

Use exact counters rather than sketches
Maintain strict ordering and full ranking consistency
Often requires centralized coordination or precise aggregation (e.g., distributed heaps, mergeable sorted sets)

These topics introduce additional trade-offs: increased memory, higher coordination cost, and more complex state management — all of which must be considered carefully in design.

👉 For a detailed exploration of how to design a system that supports Top-K with strong, exact accuracy, refer to our upcoming system design article on Leaderboard Design. We’ll cover distributed heaps, sorted aggregators, tie-break strategies, and how to reconcile real-time updates with ranking integrity.

‍

Coach + Mock

Practice with a Senior+ engineer who just get an offer from your dream (FANNG) companies.

Schedule Now

Content:

Top-K Videos

Unlock Full System Design Access