min read

Users can view and search coding questions by various criteria, solve them in multiple languages, and submit solutions to receive results.

LeetCode

Written by

Staff Eng at Meta

Last revisited

April 12, 2025

Functional Requirements

View and search a list of coding questions (based on difficulty level, topic/tag, question name, competition status);
View a question, and be able to code a solution in multiple language;
Submit a solution and get submission result;
View a live leaderboard for competition;

[Interview time is limited] Design discussion should prioritize submission processing and leaderboard features over problem viewing and code editing interface. While important, viewing problems and editing code can be implemented using standard web application patterns. The submission system requires careful consideration of code isolation, execution safety, and handling 10K concurrent submissions. The leaderboard system needs to process real-time submission results while maintaining consistency under high load. These two components present the main technical challenges and should be our focus.

Non-Functional Requirements

The system should be highly available for common operations including viewing/searching questions, accepting/validating code submission and returning submission results.
The system should be scalable to support 10K concurrent submissions (handle peak of 5*X);
The system should be efficient to return submission results within ≤ 5 seconds;
The system should support real-time update/view of a leaderboard;
[Must-have] The system should support isolation and security when executing user-submitted code (to prevent tearing down the system by malicious code submission);

Security and Isolation Requirements

Security and isolation are critical non-functional requirements in this leetcode case. The fundamental principle is that each worker must operate in complete isolation, ensuring that a failure in one worker cannot affect others, effectively containing the blast radius to a single runtime environment. For code execution specifically, all user submissions must run in sandboxed environments with strict resource limits on CPU, memory, and network access to protect against malicious code. This isolation-first design is essential for maintaining system stability and preventing single points of failure from compromising the entire judging infrastructure.

Below the line

Recommend questions based on users browsing or submission history
User registration (premium) and management
Share or post questions into social media apps

API

1.GET /v1/problemsDescription: List/search problemsQuery Parameters:
  - difficulty: string
  - tag: string
  - cursor: string(optional)
  - limit: integer(default: 20, max: 100)Response: {
  problems: [{
    id: string,
    title: string,
    difficulty: string,
    tags: string[]
  }],
  next_cursor: string,  // null if no more resultshas_more: boolean
}

2.GET /v1/problems/:idDescription: Get problem detailsResponse: {
  id: string,
  title: string,
  description: string,
  difficulty: string,
  tags: string[],
  examples: [{
    input: string,
    output: string
  }],
  constraints: string
}

3.POST /v1/submissionsDescription: Submit a solutionHeaders:
  - Authorization: Bearer Body: {
  problem_id: string,
  language: string,
  code: string
}

Response: {
  submission_id: string,
  status: string("pending" | "succeeded" | "error" | "timeout")
}

4.GET /v1/competitions/:id/leaderboardDescription: Get competition leaderboardQuery Parameters:
  - cursor: string(optional)
  - limit: integer(default: 100)Response: {
  rankings: [{
    rank: integer,
    user_id: string,
    username: string,
    total_score: integer,
    solved_count: integer,
    last_submission_time: timestamp
  }],
  next_cursor: string
}

High Level Design

View and search a list of coding questions

Clients send REST API requests to retrieve a list of coding questions based on search terms such as difficulty_level, tag, or completion_status.
Upon receiving a request, the API Server first queries the cache (e.g., Redis) for the results and falls back to the database if the cache does not contain the requested data. The results are then returned to the client.

How to make search fast?

[1] Add database index.

Add secondary indexes on popular columns like tag and difficulty_level to improve query performance at the database level.

[2] Cache the results for frequently executed queries.

Cache the results of commonly executed queries in a high-performance in-memory store like Redis.
Refresh the cached data periodically (e.g., every 24 hours) to balance query performance with data freshness.

[3] Cursor-based pagination.

For optimal performance with large result sets, we can use cursor-based pagination in API response. Each response includes a cursor token for the next page, ensuring consistent results even when new problems are added to the system.

Do we need elastic search?

For the search use-cases in this LeetCode, we lean towards not using elastic search. Reasons:

Small Search Space. The total number of coding questions (~3000 questions or rows in the database) is relatively small, making database-based searches performant enough.
Structured Metadata. LeetCode questions are curated and updated manually by admins. Each question has predefined fields like tag and difficulty_level, and these fields can be indexed directly in the database without needing elastic search's indexing capabilities.
No Full-Text Search Use Case. Elasticsearch is optimized for full-text search, such as searching within question descriptions. In this system, queries are more structured, like retrieving all questions tagged as "dynamic programing", rather than searching for the word "dynamic programing" within descriptions.
Operational Overhead. Setting up and maintaining an elasticsearch cluster introduces additional complexity and failure points. For a use case of this scale and nature, the trade-offs do not justify the added overhead.

(Summary) Although Elasticsearch excels at full-text search and handling large, complex datasets, for a LeetCode-like system, the combination of database indexes and caching offers a more straightforward, efficient, and maintainable search solution.

View a question, and be able to code a solution in multiple language

Client sends a request to the API server (GET /problem/:id) to view a specific question.
The API server returns question details including description and initial code dumps etc. On the browser, we could use CodeMirror or Monaco Editor for rendering an IDE-like tool so that users can code a solution.

Both Code Mirror and Monaco Editor are good options. There is no need to speed time comparing them in the interview (as it's a less interesting topic).

(preferred) CodeMirror:

lightweight (fast to load, better for low-bandwidth connections)
support basic auto-completion and syntax highlighting
works on most (old and new) browsers
open-source

Monaco Editor

Heavier weight (slow on resource-constrained devices);
rich features (built-in support for advanced language tooling)
focuses on modern browsers
backed by Microsoft (VS code team)

Submit a solution and get submission result

Workflow

When a client submits code, the API server creates a submission record in the database and publishes a message to a language-specific SQS queue. The server returns a submission ID to the client immediately. The client periodically sends requests (/v1/submissions/{id}/check) to retrieve submission status. Upon receiving a check request, the API server queries the submission table and returns the current status to the client.

Design Options

(1) Worker Type (VM vs Docker vs Lambda)Choosing the right runtime environment is crucial for a LeetCode-like system. Based on non-functional requirements:AWS Lambda is preferred and a great option because it aligns well with the requirement of blast radius containment and low latency for setting up and tearing down environments.‍

Failures should be isolated to ensure one worker’s issue doesn’t affect others ("blast radius" containment).
The system must handle user submissions efficiently, returning results within 5 seconds, so the environment should be easy to spawn and dispose of with minimal setup/teardown latency.

Here's how Lambda satisfies these needs:

Isolates Failures (Blast Radius Containment)
- Each Lambda invocation runs in a separate, isolated environment with dedicated compute, memory, and ephemeral storage. Failures in one worker (e.g., runtime errors) don't impact others.
- Stateless execution ensures that subsequent invocations are unaffected by previous ones.
Fast Setup and Teardown
- Lambda can horizontally scale almost instantly, spawning thousands of environments in parallel to handle high traffic.
- AWS automatically handles lifecycle management, creating and disposing of environments with virtually no manual effort.
Minimal Management Overhead
- No need to manage VMs, containers, or infrastructure.
- Each invocation gets a clean, temporary storage space (/tmp directory, up to 10GB), automatically cleaned after execution.
(Bonus) seamless integration with AWS SQS and AWS DynamoDB;

While Lambda is an excellent fit for this scenario, there is a few things to consider:

Control Network Access. Configure security settings (VPC Security Groups, NACLs) to block unauthorized outbound traffic, preventing malicious external network calls.
Set Resource Limits. Use memory allocation settings to control resource usage and enforce strict execution timeouts (e.g., 5 minutes) to terminate runaway or malicious code.
Mitigate Cold Starts:
- Use Provisioned Concurrency to keep environments pre-warmed for latency-sensitive workloads. Alternatively, invoke the function periodically to minimize cold start impact.

(2) Single queue vs multiple queues (or single topic vs multiple topics if using Kafka)

[Option 1] We recommend to use separate (language-specific) queues or creating separate topics (if Kafka is used). Each language run-time workers can scale independently. For example, we can have dedicated queues for popular languages (Java, Python etc). Then, we can monitor queue (message count, throughput, latency etc) and scale workers dynamically.
[Option 2] An alternative is to use single queue with event filtering. API server publishes messages to the same queue. Workers polls the SQS queue for messages, evaluates each message against the filter criteria defined in the Event Source Mapping (an Lambda configuration). If the message matches the filter criteria, Lambda invokes the function and execute. If the message does not match, Lambda skips it and continues polling.

(3) How does a worker report execution result/status?

When a worker completes code execution, it writes the results directly to the submission table in the database. The client periodically sends requests (e.g., every second) to retrieve submission status. Upon receiving a check request, the API server queries the submission table and returns the current status to the client.

Why long-polling? (not WebSocket or SSE)

Long polling proves to be the superior choice for submission status updates compared to WebSocket or Server-Sent Events (SSE) for several reasons.

First, code execution typically completes within 1 - 5 seconds, meaning only a few poll requests are needed per submission. This makes the additional complexity of maintaining persistent connections with WebSocket or SSE unnecessary.
Second, polling provides natural load balancing since each request can be handled by any API server, whereas WebSocket/SSE require sticky sessions and connection state management.
Finally, polling is more resilient - if a client disconnects or encounters network issues, they can resume polling without complex reconnection logic. The slight delay in receiving results (1-2 seconds) is acceptable for this use case and worth the significant reduction in system complexity.

In fact, Leetcode uses "long-polling" for checking submission results. You can check it out using Google Developer Tool. When you click on "Submit", a submit request is sent. API response contains a submission_id. After that, Leetcode keeps sending a check request to check submission status until submission completes process.

View a live leaderboard for competition

To support a real-time competition leaderboard that ranks participants by total score and submission time, we need a data store that excels at both frequent updates and ordered data retrieval. We will re-use the existing submission table in DynamoDB and add a new leaderboard service responsible for calculating scores and maintaining rankings. Redis sorted sets will handle real-time ranking operations, providing efficient ordered data access.

Why Redis sorted set?

Redis sorted set provides ideal characteristics for our ranking system:

Maintains elements in sorted order with O(log N) insertion and update time. Each member's score can be updated independently without resorting the entire set
Supports efficient range queries through ZREVRANGE with O(log N + M) complexity, where M is the number of elements to return

Data Storage

To re-use existing submission table for storing submission details for a competition (v.s practice), we can add a new column — competitionId.
Redis SortedSet

competition:{id}:rankings (Sorted Set)
Score: total_score
Member: user_id

Ranking Update workflow

When a submission in a competition is processed, DynamoDB streams captures submission updates. Leaderboard Lambda processes stream events:

Filters for competition submissions
Calculates score from submission test results
Gets user's current total score from Redis (ZSCORE command) and then updates Redis sorted set with new total_score.

Leaderboard Query workflow

Serve rankings from Redis sorted set using ZREVRANGE
Fall back to calculating rankings from DynamoDB if Redis is unavailable

Common queries:

Fetching a user's rank: ZREVRANK competition:{id}:rankings {user_id}
Retrieving the top 5 users: ZREVRANGE competition:{id}:rankings 0 4 WITHSCORES
Showing a user's position and surrounding ranks: ZREVRANGE competition:{id}:rankings {user_rank-5} {user_rank+5}

Deep Dive

High Availability for Serving View, Search, and Submit Requests

Multi-instance API Servers: Deploying multiple instances of the API server behind an AWS API Gateway or an Application Load Balancer (ALB) ensures redundancy. If one instance fails, traffic is automatically routed to healthy instances.
AWS Lambda & DynamoDB: These managed services provide built-in fault tolerance across multiple availability zones. DynamoDB offers automatic replication and failover, ensuring data persistence and availability.
Handle Redis failure. Fail back to DynamoDB and replay or re-build leaderboard using AOF log files. When Redis cluster is down due to node failures or network failures, leaderboard service can fail back to pull scores from dynamoDB, and re-build leaderboard using AOF logs.

Support 10K concurrent submissions (peak: 5X)

Handling up to 10K concurrent submissions requires efficient workload distribution and rapid function executions. This can be satisfied by followings:

Language-specific queues. Using AWS SQS with separate queues for different programming languages allows independent scaling of workers. Popular languages like Python and Java can have dedicated, dynamically scalable queues.
Scaling AWS Lambda.
- Increase concurrent limit By default, AWS Lambda can handle up to 1K concurrent invocations (per AWS account per region). For higher concurrent invocation, we could request limit increase to accommodate the expected peak load.
- Monitor and adjust dynamically. When lambda processes messages from a queue, which acts as a buffer to handle sudden spikes in submissions without overwhelming the system. When Lambda is throttled (due to concurrency limits), SQS retain the messages and Lambda processes them as resources become available. We can monitor based on SQS and Lambda metrics (such as: #visible message, age of the oldest message, #concurrent execution) and dynamically adjust.

Return submission results within ≤ 5 seconds

Use Provisioned Concurrency to optimize code execution environments. We use Lambda as compute to execute user-submitted code. We can set provisioned currency beforehand to “pre-warm” lambda instances, to minimize cold start delay for predictable workloads (when hosting a competition).

[Security Control] Support isolation and security when executing user-submitted code

Sandboxed executions and limit the max resource consumption. Each submission runs in an isolated AWS Lambda environment, containing potential failures or security breaches within a single invocation. Strict CPU, memory, and execution time limits prevent resource exhaustion attacks.
Enforce timeout on code execution. Enforce timeout or max run-time and terminate the execution to prevent infinite loops or excessive recursion.

[Performance] Support real-time update/view of a leaderboard

Maintaining a live leaderboard requires an efficient ranking system that supports frequent updates and real-time queries. The approach includes:

Redis Sorted Sets for Fast Ranking Updates:
- Each competition has a dedicated sorted set. competition:{id}:rankings.
- The total score is the primary ranking criterion, while the submission timestamp is used to break ties.
- Updates occur via the ZADD command, ensuring an efficient O(log N) time complexity.
Real-time updates through DynamoDB stream:
- Submission records are stored in DynamoDB with an indexed competitionId field.
- Leaderboard Lambda functions process DynamoDB Streams, updating Redis rankings in real-time.

Coach + Mock

Practice with a Senior+ engineer who just get an offer from your dream (FANNG) companies.

Schedule Now

Content:

LeetCode

How to make search fast?

Do we need elastic search?

Unlock Full System Design Access