showoffer

To build a scalable newsfeed system, we'll explore three approaches, evolving from a simple solution to an optimized solution that satisfies all of non-functional requirements.

When a user requests their feed, the Newsfeed Service performs three operations:

To optimize this simple approach, we implement several improvements:

Database Index.
- For the post table, we can use userId as the partition key and createdAt as the sort key. So all posts created by the user_id are stored in the same partition, enabling efficient retrieval of a user's recent posts. Additionally, we can create local secondary index on engagement metrics (such as #views, #likes) supporting alternative sorting options (sort by view_count, likes).
- For follower table, we have two access patterns:
  - Query 1: Find all users that X follows; —> Who does X follow?
```
SELECT toUser FROM Follower WHERE fromUser = X
```
  - Query 2: Find all followers of X; —> Who follows X?
```
SELECT fromUser FROM Follower WHERE toUser = X
```
  - We can use indexes to optimize both access patterns (queries). For follower table, we can use fromUser as partition key and toUser as sort key. So 1st query is fast —> find all users that X follows. And all toUser entries for a specific user are stored together.
  - Then we can add global secondary index (GSI) where partition key is toUser and sort key is fromUser. This GSI allows efficient querying to find all followers of a specific user (toUser).
Cache Layer. We cache post data and follower data.
- Follower cache stores user follow relationship, reducing database load for social graph queries (i.e: find all users that a given user follows or find all follows for a given user);
- Post cache maintains recently accessed posts, improving read performance for popular content and reducing DB loads on reads.

To further reduce latency and system load on generating feeds, we can use a technique — pre-compute or create feeds offline adaptively.

Using pre-compute to create feeds offline adaptively

Keyword-1: Precompute. We can employ multiple newsfeed generation workers. With a job scheduler, we can set newsfeed generation workers to run at a fixed frequency to fetch all user data and relevant post data offline (before user opens the app or requests for new feeds).
Keyword-2: Adaptively. Users have different or inconsistent behaviors. Some users login and request newsfeed a couple times in a day. There could also be users who login to the app only once (or only every few days). So to best use of compute resources and have a right balance between latency and compute costs, we can adaptively pre-compute feeds. For example:
- For users who haven't logged in for more than 30 days, we will stop generating feeds for those "cold users". Or their newsfeed will be generated next time when they login.
- For users who logins daily or request feeds a couple times in a day, we can set newsfeed workers to run a higher frequency to ensure optimal user experience (low wait time on getting a new feed).

With all improvements (database index + cache + pre-compute), this option works but does not scale (at Facebook or Twitter level). Why?

To address those challenges, we introduced solution 2 — Fanout on write (”the push model”).

Different from fanout-on-read (where the feed generation request fans out to fetch posts from all users the given user follows), fanout-on-write pushes the new post content to all followers. Following diagram explains the workflow for fanout-on-write.

The benefit of this fanout-on-write process is that the users don’t have to go through their friend’s lists to acquire newsfeeds. In this approach, the number of read operations is significantly reduced. However, this approach introduces new challenges, particularly the “celebrity problem” where users with millions of followers create massive writes.

"Celebrity Problem"

A celebrity could have millions of followers. With this fanout-on-write approach, a new post made from a celebrity, will need to be written to million of followers' feed. One option is to batch write requests (group 100 ~ 200 writes requests as one batch). With strong hardware (memory), Redis can handle up to 50 K writes per second (and handle even more by use multiple redis cluster + sharding).

Our hybrid approach combines the advantages of both pull and push models to create a scalable and efficient newsfeed system. The key insight is that different types of users require different feed generation strategies. We can classify users into three categories based on their characteristics and applies distinct newsfeed generation strategies for each:

In summary, to support this hybrid model, we implement a dynamic feed generation service that works as follows: When a user requests their feed, the service first retrieves their pre-computed feed containing posts from regular users they follow. It then augments this feed by fetching and merging recent posts from any celebrity accounts they follow. The final feed is sorted by timestamp before being returned to the user.

Deep Dive

The system should be fast in serving view_feed requests (P90 < 500 ms)

We achieved a fast feed retrieval performance through a combination of (1) pre-computation, (2) caching and (3) content delivery optimization with CDN.

The system should be highly available in serving requests (99.99 uptime)

The system should be scalable to support X users/posts.

Assume we have 1 B DAU and each user requests feeds 5 times a day, then our QPS is 10^9 / 10^5 (seconds in a day) * 5 = 50 K. To support approximately 50 K QPS, we can implement scalability at every layer:

(Long-Version) Scale up Redis cluster to serve 1B DAU

Estimation:

Feed cache only stores a list of post ids (reference) and few other necessary metadata. We can estimate that to be 100 bytes. Each feed contains 50 ~ 100 posts. So the total feed cache per user would be 5 ~ 10 KB.
Post cache stores actual content and metadata (using 1 KB for estimation). 50 ~ 100 posts * 1 KB per post, would take 50 ~ 100 KB per user.
Sum them up. each user, takes 55 ~ 110 KB. (taking 80 KB for estimation)
A typical redis cluster node: 128 GB RAM. We use 110 GB for data storage (after system overheads + extra head room for operations). Then, max users can be handled by one Redis node would be: 110 GB / 80 KB = 1.4 M.
Assume 20% of 1B users are active users, so #nodes_needed = 200 M users / 1.4 ~= 143 master nodes. With 2 replicas, 143 * 3 ~= 430 total nodes.

Newsfeed

Functional Requirements

Non-Functional Requirements

Below the line

API

High-level Design

Generate Newsfeed

Deep Dive

The system should be fast in serving view_feed requests (P90 < 500 ms)

The system should be highly available in serving requests (99.99 uptime)

The system should be scalable to support X users/posts.

Unlock Full System Design Access