Functional Requirements
- Users should be able to retrieve the Top-K most popular videos based on predefined ranking criteria.
- Users should be able to filter Top-K video queries by geographic location.
- Users should be able to define an arbitrary time window for which the Top-K videos are computed.
Non-Functional Requirements
For user-facing Top-K use cases like YouTube Trending, approximate accuracy is the preferred approach. It enables real-time responsiveness, scalability, and cost efficiency — while maintaining enough precision to reflect actual trends. Exact accuracy can be reserved for offline use cases where correctness is critical, such as creator revenue reports or audits.
Therefore, the non-functional requirement list here, and in the next chapters, we will by default discuss in approximate accuracy design strategy.
- Low Latency: Query response time < 10secs
- High Scalability: Handle ≥ 5M events/sec across 10B+ videos
- Approx. Precision: ≤ ±1% error with > 95% Top-K accuracy
- Cost Efficiency: < 1MB memory per stream using sketches
- Data Freshness: Update trends every 30–60 seconds
Data Flow Stages
- Event Ingestion
- Input: Raw video events from user activity logs (views, likes, etc.)
- Output: Cleaned event stream with extracted dimensions (video ID, timestamp, location)
- Pre-Aggregation & Count Tracking
- Input: Event stream
- Output: Approximate counts using structures like Count-Min Sketch or Heavy Hitters per
(location, time window)
partition
- Top-K Materialization
- Input: Aggregated approximate counters
- Output: Precomputed Top-K lists (e.g., top 10 per location per hour), stored in fast-access storage like Redis or a key-value store
- Serving Layer
- Input: Top-K cache or store queried by
(location, time window)
- Output: Final sorted list of top K videos, surfaced in UIs or downstream services (e.g., home feed, dashboards)
- Input: Top-K cache or store queried by
- Offline Audit (Optional)
- Input: Full raw event history and materialized counts
- Output: Exact Top-K rankings for billing, experimentation, or historical reporting
High Level Design
Here is the high level design diagram to resolve each functional requirement and meets data flow stages:

Functional Requirement 1 - Retrieve Top-K most popular videos by view count
The process starts with user activity logs (e.g., video views, likes) emitted from clients. These logs are streamed into Kafka, which buffers and forwards them to a stream processor (Flink in this case).
Each event is parsed and transformed into a lightweight format, keeping only what’s necessary: video_id
, location
, event_time
(bucketed to the minute), and a default score: 1
.
The stream processor then performs three key steps:
- Grouping by (video_id, location, time)
- In-memory counting via a hash map
- Periodic emission of Top-K, once per minute per region
This creates a minute-by-minute snapshot of trending content — capturing popularity dynamics with high temporal resolution.
This satisfies the first functional requirement by delivering a ranking of videos by popularity, computed in near real time.
Functional Requirement 2 - Filter Top-K by geographic location
The geographic filter is integrated early in the pipeline. When events are first received, each includes a location
field (e.g., "US", "IN", "JP"). This field is preserved through transformation and used as part of the aggregation key in the stream processor.
The counters are stored using this location dimension:
As a result, every emitted Top-K object is location-specific. The final Redis KV store and UI layers query by location and serve the right trending content for that user’s region.
Functional Requirement 3 - Support arbitrary time window for queries
The system writes Top-K results for every minute into a Redis KV store with a composite key:
When a user queries a longer window (e.g., past 24 hours), the Trend Query Layer fetches and merges all relevant keys. It aggregates counts across time slices, sorts them, and returns a Top-K over that range.
This time-bucketed design supports composable queries across arbitrary ranges, as the data is already broken down into fine-grained, non-overlapping time units.
Summary
This design satisfies all three functional requirements in a modular and extensible way:
- FR1: Popularity is ranked per minute using in-memory aggregation
- FR2: Results are partitioned and stored by location
- FR3: Minute-level buckets support any user-specified query window
While this design works well for small to moderate scale, the deep dives will show how to evolve it for real-world production loads — introducing Count-Min Sketch, heavy hitters, and scaling strategies to meet non-functional demands.
Deep Dives
Deep Dive 1 - How to Efficiently Support Large-Scale Counting?
Resolution: Use Approximate Data Structures Like Count-Min Sketch (CMS)
Count-Min Sketch offers a compelling alternative. Instead of tracking billions of distinct keys, we trade a small amount of accuracy for dramatic gains in space and performance.
For near-real-time trending, we don’t need perfect accuracy — we just need to know which videos are among the most viewed. Count-Min Sketch gives us bounded error (~±1%) while enabling a massive scale-up in capacity with minimal infrastructure overhead. However, CMS does not store keys, so you cannot list or sort items.
Improvement: Pair CMS with a Heavy Hitters Tracker
Let’s say CMS tracks 10 million video IDs with ±1% error.
CMS alone:
- Can answer: “How many views for abc123?”
- Can’t answer: “Which 10 videos are trending?”
CMS + HH:
- HH maintains a 100-item Top-K list in memory.
- As videos get views, CMS estimates counts and HH updates its heap.
- You can now query “Top 10 videos in US in the last hour” in milliseconds — no reprocessing required.
Therefore the high level diagram with CMS+HH can be drawn as:

Deep Dive 2 - How to handle time-window?
To support queries like “Top 10 videos in the US from 6:00 PM to 6:45 PM”, the system must allow users to specify arbitrary time windows while keeping aggregation efficient and query performance low-latency. This brings up a classic system design challenge: how do we structure time in the system to support flexible queries, while keeping ingestion and storage cost manageable?
There are three commonly used strategies for handling time windows in large-scale aggregation systems: tumbling windows, sliding windows, and event-time bucketing.
- Tumbling windows divide the timeline into fixed, non-overlapping intervals (e.g., every 15 minutes). Events are grouped strictly within each interval, and each window produces one independent aggregation. This approach is simple and efficient but lacks flexibility — if a user asks for data from 6:10 to 6:45, and your windows are aligned to 15-minute boundaries, there’s no way to answer the query without either reprocessing raw data or serving partially relevant results.
- Sliding windows maintain overlapping aggregations that move forward in time with a fixed step (e.g., a 15-minute window that slides every 5 minutes). This helps with freshness and trend tracking, but introduces redundancy: the same event contributes to multiple overlapping windows. Sliding windows are also expensive to maintain in a streaming system due to frequent state updates and re-computations.
- Event-time bucketing takes a more flexible and composable approach. It groups events into small, fine-grained buckets — usually by minute — using the event’s timestamp. This makes it possible to dynamically combine buckets at query time to match any arbitrary time range. This approach introduces more work at query time but gives the system the most flexibility without blowing up ingestion cost or storage footprint.
During the interview, you’d like to provide a thorough comparison and provide your decision made on this system design. Here comes the comparison for each strategy:
Among the three strategies, event-time bucketing is the most flexible and scalable solution for supporting arbitrary time window queries. By assigning each event to a small, fixed bucket (such as 1 minute) based on its event timestamp, we avoid the rigidity of tumbling windows and the overhead of overlapping sliding windows. This structure creates a clean, non-overlapping timeline of buckets that can be composed on demand — allowing the system to handle queries like “Top-K from 6:07 to 6:42” without any waste or reprocessing.
This approach is also ideal for approximate counting techniques like Count-Min Sketch, since each minute-level bucket can independently maintain its own compact frequency map. On the ingestion side, the stream processor groups events by (video_id, location, minute)
and updates the corresponding sketch or counter. These buckets are then stored in sorted time order, making them easy to scan and merge during query time. The result is a system that is efficient, composable, and fully supports time flexibility — all without requiring complex state duplication or downstream reprocessing.
To improve the performance of event-time bucketing, we apply the classic two-pointer technique. Since all time buckets are naturally stored in sorted order, we can use two pointers to identify the exact time range the query requires:
- The left pointer finds the first bucket greater than or equal to the query’s start time
- The right pointer finds the last bucket before the query’s end time
With these two pointers, the system efficiently slices the relevant segment of the bucket stream, fetching only what’s needed and avoiding unnecessary scans. This allows the query engine to focus on a bounded and ordered slice of time, drastically improving performance for long-range queries. After the scan, we merge the partial Top-K results from each bucket, aggregate the scores (summing across time), and re-rank to return the final Top-K list — all while keeping memory and latency under control.
Here is the updated logical diagram based on this deep dive at a high level:

Additionally, here is a summary on architectural change in each components:
Deep Dive 3 - How to extend to strong accuracy without approximate?
The design we’ve explored so far leverages approximate data structures like Count-Min Sketch and Heavy Hitters to efficiently compute Top-K videos at scale. This strategy is well-suited for real-time, user-facing use cases like YouTube Trending — where high responsiveness, scalability, and acceptable precision (≥95% accuracy) are more important than exact counts.
However, some systems — such as competitive leaderboards, financial reporting, or creator payout tracking — demand exact accuracy. In these scenarios, even small ranking errors are unacceptable, and approximate methods can’t provide the guarantees required.
To support strong accuracy, the architecture needs to shift significantly:
- Use exact counters rather than sketches
- Maintain strict ordering and full ranking consistency
- Often requires centralized coordination or precise aggregation (e.g., distributed heaps, mergeable sorted sets)
These topics introduce additional trade-offs: increased memory, higher coordination cost, and more complex state management — all of which must be considered carefully in design.
👉 For a detailed exploration of how to design a system that supports Top-K with strong, exact accuracy, refer to our upcoming system design article on Leaderboard Design. We’ll cover distributed heaps, sorted aggregators, tie-break strategies, and how to reconcile real-time updates with ranking integrity.