Social Media App

Introduction

Social media apps (Facebook, Twitter, Instagram) are frequently asked in system design interviews. Rather than designing an entire social platform, interviews typically focus on specific components like news feed generation, post creation and engagement (like, comment), or post search functionality. Understanding these core components and their interactions is crucial for tackling such interview questions.

Social media apps operates as large-scale distributed systems that must handle billions of users sharing, interacting and consuming content in real-time. Those systems are predominantly read-heavy - for example, Twitter sees approximately 1000 reads for every write operation.  This read-heavy nature drives key architectural decisions around caching, replication, and content delivery.

In this module, we will work on the designs for 3 real-world social media app features/components: Facebook Posts, Newsfeeds and Instagram.

Common Technical Challenges

Before we diving into those three system design questions, let’s first understand the common technical challenges associated with social media apps.

  1. Read Scalability: Systems must efficiently hundred of millions concurrent users reading their feeds, and accessing content.
  2. Feed Generation: Quickly assembling personalized feeds from multiple sources while maintaining reasonable latency.
  3. Content Distribution (”Fanout”): When new content is created (especially from high-follower accounts), the system needs to efficiently make it available to all followers.
  4. Engagement patterns: Users can interact with content in multiple ways (view, like, comment, share), each requiring different workflows and optimization strategies.
  5. Global content delivery: Content must be quickly accessible regardless of user location, requiring caching and distribution systems (CDN).
  6. Data consistency management: Systems must balance between immediate content availability and ensuring all users eventually see the same content state

Related Concepts

Post

The primary content unit in social media systems, which can include:

  • Text content
  • Media attachments (images, videos)
  • Metadata (creation time, author, visibility settings)
  • Engagement metrics (likes, shares, view counts)

Social Graph

Represents relationships between users and how content flows through the network:

  • Bidirectional (Facebook-style friendships where both users must agree)
  • Unidirectional (Twitter-style following where one user can follow another)
  • Groups and communities (Users belonging to common groups or communities)

News Feed

A personalized, continuously updated list of content (posts, updates, activities) from connected users or followed entities, typically displayed in reverse chronological order.

Fan-out

Methods for delivering new content to relevant users:

  1. Fan-out on Read (Pull Model)
    • Content is aggregated from followed accounts when a user requests their feed
    • Advantages: Efficient for handling viral content and high-follower accounts
    • Disadvantages: Higher latency when generating feeds
    • Best for: High-follower accounts (celebrities, news outlets)
  2. Fan-out on Write (Push Model)
    • When content is created, it's immediately copied to all follower feeds
    • Advantages: Fast read times, consistent feed viewing experience
    • Disadvantages: Resource intensive for users with many followers
    • Best for: Users with moderate follower counts (< 10k followers)
  3. Hybrid Approaches
    • Combine push and pull models based on user characteristics
    • Example: Push updates for regular users, pull for celebrity accounts
    • Dynamically adjust based on metrics like follower count and engagement rate

Building Blocks

Storage Layer (DB + Cache + Blob store)

  1. Primary Content Storage
    • Purpose: Store user-generated content and associated metadata
    • Technologies: Distributed databases like Cassandra or DynamoDB
    • Key features:
      • Horizontal scaling to handle growing data volumes
      • Flexible schema for different content types
  2. Graph Storage
    • Purpose: Manage user relationships and content connections
    • Technologies: Specialized graph databases.
    • Key features:
      • Efficient relationship traversal
      • Support for complex friend-of-friend queries
  3. Cache Layer
    • Purpose: Improve read performance and reduce database load
    • Technologies: In-memory stores like Redis or Memcached
    • Key components:
      • Frequently accessed content cache
      • Social graph caching for faster feed generation
      • Content metadata caching
  4. Blob Store (S3)
    1. Purpose: Store media content (large text, photo, video)
    2. Technologies: S3
    3. Key features:
      • simple and scalable file storage/access.

Content Distribution (CDN + Message Queue)

  1. Delivery Infrastructure (CDN)
    • Global CDN networks for fast content access
    • Edge caching strategies to reduce origin load
  2. Real-time Systems
    • Message queues for asynchronous processing (i.e: building search index)
    • Event streaming for real-time updates
    • (**) Notification systems for user engagement (i.e: “a user just liked/commented-on your post”)

Processing Subsystems

  1. Feed Generation
    • Aggregate content from followed sources
    • Implement pagination
    • Handle real-time updates
  2. Search Infrastructure
    • Index content for quick retrieval
    • Support complex queries (hashtags, mentions, keywords)
    • Real-time index updates
    • Relevance-based result ranking
  3. Analytics Engine
    • Track user engagement metrics (topK)
    • Identify trending content

Coach + Mock
Practice with a Senior+ engineer who just get an offer from your dream (FANNG) companies.
Schedule Now
Content: