min read

Design a web service that helps customers purchase the cheapest available copy of a book from a network of online booksellers.

Bookstore

Coach with Author

Book a 90-minute 1:1 coaching session with the author of this post and video — get tailored feedback, real-world insights, and system design strategy tips.

‍Let’s sharpen your skills and have some fun doing it!

Schedule

Design a web service that helps customers purchase the cheapest available copy of a book from a network of online booksellers.

A customer submits a request including the book title, their shipping and payment details, and a maximum price they're willing to pay.

The service then queries multiple booksellers:

If any offer the book at or below the specified price, it automatically purchases the cheapest option and later informs the customer of the seller and final price.
If no offer meets the price threshold, the service notifies the customer of the lowest available price instead.

Use Case Assumptions

Book names are unique (like a UID)
The customer only ever buys one copy of a book at a time
The booksellers may or may not all sell the same books
For any specific book, each bookseller may charge a different price
Sellers provide heterogeneous APIs (different request/response formats)

Functional Requirements

FR1. Users should be able to submit a book purchase request with a maximum price limit

“As a user, I want to request a book by providing its name, my shipping and payment information, and a maximum price I’m willing to pay.”

FR2. Users should be able to retrieve a price resolution result (either success or lowest available price)

“As a user, I want to know whether the system found a seller within my budget — or at least what the lowest price available is.”

FR3. Users should be able to trigger an automatic purchase when the offer is acceptable

“As a user, I want the system to automatically purchase the book if the price is acceptable — without additional confirmation.”

Non-Functional Requirements

We rank this list of NFRs based on priorities tied to this specific design.

NFR1. Efficiency – avoid unnecessary work and reduce cost per request

NFR2. Latency – p90 < 5 seconds

NFR3. Idempotency – avoid duplicate charges or purchases

NFR4. Scalability – up to 200 QPS, 20K seller calls/sec

NFR5. Reliability – tolerate partial failures and degraded seller availability

NFR6. Observability – end-to-end traceability and real-time metrics

High-Level Design

💡 How to structure your High Level Design?

When presenting your High-Level Design, a clear and effective strategy is to go vertically, one Functional Requirement at a time — and walk through a working, end-to-end solution for each.

For every FR, answer:

What APIs are called or exposed at this stage?
What entity or schema design is needed to support the flow?
What are the core steps and flow from user input to system output?
What components are involved (e.g., API gateway, queue, fan-out workers, storage, aggregator)?

This vertical approach keeps each requirement self-contained and actionable, making it easier to read, review, and debug. For documentation, you can co-locate entity and API definitions within each FR, or collect them in a single section at the beginning — both are valid depending on interview style or presentation format.

FR1 - Users should be able to submit a book purchase request with a maximum price limit

API Design - POST /purchase-request

Request Body:

{
  "book_name": "Designing Data-Intensive Applications",
  "max_price": 30.00,
  "customer_info": {
    "shipping_address": "...",
    "credit_card_info": "..."
  }
}
    

Response Body:

{
  "request_id": "req_abc123",
  "status": "PENDING"
}
    

Core Entities

Core Entities

Entity	Description
User	End-customer
Book	Unique book identifier
PurchaseRequest	Tracks the async resolution workflow

Workflow

Step 1: API Gateway receives the POST /purchase-request call and forwards it to the backend service.

Step 2: RequestHandler Service performs:

Validation of input fields (e.g., price format, required fields).
Generation of request_id.

Step 3: Persistence of a new PurchaseRequest entity into the database with status PENDING.

Step 4: The same service enqueues a message with request_id into an internal RequestQueue (e.g., Kafka, SQS, Pub/Sub, can be discussed later).

Step 5: The system returns a 202 Accepted response with the request_id to the user.

The actual resolution (seller fan-out and purchase) happens asynchronously (covered in FR2).

FR2 – Users should be able to retrieve a price resolution result (either success or lowest available price)

API Design – GET /request-status

GET /request-status?request_id=req_abc123

Response – if match found (price within max_price):

{
  "status": "SUCCESS",
  "resolved_price": 24.99,
  "resolved_seller": "Booktopia"
}
    

Response – if no seller met price cap:

{
  "status": "NOT_AVAILABLE",
  "lowest_available_price": 34.50
}
    

Response – if still resolving:

{
  "status": "PROCESSING"
}
    

💡 This is a polling-based API that lets the user check if their price constraint was met — regardless of whether a purchase was triggered yet. The actual purchase happens separately and automatically (see FR3)

Core Entities:

Entity	Description
`PurchaseRequest`	Tracks the user's original request, including `book_name`, `max_price`, resolution status ( `PROCESSING`, `SUCCESS`, `NOT_AVAILABLE` ), and final outcome
`SellerOffer`	Temporary or cached offer received directly from an external seller’s API, including normalized fields like `price`, `availability`, `inventory`, and `response_time_ms`
`BookSeller`	Configuration metadata for each external seller, including direct API endpoints ( `check_offer_url` ), HTTP method, headers template, request/response mappings, retry policies, and ranking metadata (e.g. success rate, p95 latency)

Workflow:

Once the PurchaseRequest is enqueued (from FR1), the system kicks off seller resolution via a background worker.

Step 1: Fan-out Worker consumes the request_id from the RequestQueue.

It retrieves the associated PurchaseRequest and book_name.

Step 2: Fan-out Worker queries each external bookseller’s public API directly, using seller-specific config from the BookSeller table:

Constructs the HTTP request (URL, headers, payload) using the seller’s metadata
Sends the HTTP request
Parses the response based on expected JSON/XML paths

Each call returns a normalized {seller_id, price, availability, inventory} tuple.

Step 3: Fan-out Worker filters out unavailable or overpriced offers.

It selects the cheapest seller whose price is within the user’s max_price.

Step 4a: If a valid match is found, the worker updates the PurchaseRequest with:

status = RESOLVED
resolved_price
resolved_seller

⇒ And proceeds to automatically trigger the purchase (see FR3).

Step 4b: If no seller meets the price, the worker updates the PurchaseRequest with:

status = NOT_AVAILABLE
lowest_available_price from all seller responses

Step 5: User polls GET /request-status to retrieve the final outcome:

If status is RESOLVED, show matched seller and price
If status is NOT_AVAILABLE, return lowest available price as fallback

FR3 - Users should be able to trigger a purchase if the offer is acceptable

This flow continues from where FR2 ends. Once the system identifies a seller offering the book within the user's maximum price, it automatically triggers a purchase on the user's behalf.

API Design - Direct Seller Purchase Call

POST https://api.booktopia.com/purchase

Request template:

{
  "isbn": "...",
  "price": 12.50,
  "shipping": { ... },
  "payment_token": "tok_abc123",
  "client_reference_id": "req_001A"
}
    

Response parsing and normalization also happen in-worker.

Workflow

Step 1: Fan-out Worker (from FR2) selects the best valid offer from seller responses, where price ≤ max_price.

Step 2: Fan-out Worker initiates a direct HTTPS call to the selected bookseller’s purchase API, using configuration stored in the BookSeller entity:

Builds the HTTP request using:
- api_endpoint (purchase URL)
- HTTP method (e.g. POST)
- headers template (e.g. API keys, content-type)
- request body template (filled with dynamic fields like isbn, price, shipping info, etc.)

Step 3: Fan-out Worker sends the request to the external seller’s system, handles retries, parses the raw response, and normalizes it to extract:

order_id, final_price, delivery_eta, or error code/message

Step 4.a: On success, the worker updates the PurchaseRequest with:

status = SUCCESS
resolved_price, resolved_seller, purchase_timestamp

Step 4.b: On failure, the system may retry, attempt the next best valid offer (if any), or mark the request as PURCHASE_FAILED — based on business rules and error type.

Deep Dives

DD1 - How do we make fan-out efficient and cost-effective?

💡 In a system where each purchase request may involve contacting 50–200 booksellers, making fan-out more efficient is critical — not just to reduce cost, but to meet tight latency goals and avoid overwhelming external dependencies. The strategies below focus on eliminating unnecessary work, prioritizing high-value sellers, and reusing trustworthy data wherever possible.

Together, these techniques form the foundation for a scalable, cost-efficient resolution layer.

Strategy 1: Early Exit Once a Valid Price is Found

Stop querying other sellers once we’ve already found a seller that meets the user’s max price constraint.

Instead of waiting for all seller responses, the system can short-circuit fan-out once a valid price (≤ max_price) is returned by any seller. The Fan-out Controller monitors incoming responses from external bookseller APIs, and as soon as a valid offer is identified, it immediately:

Triggers the purchase via direct HTTPS request to that seller
Cancels, skips, or deprioritizes remaining seller calls

This approach minimizes outbound calls, lowers purchase latency, and improves throughput — especially when fast and reliable sellers respond early.

Design Implications:

Fan-out workers need interruptible call execution (e.g., cancellable threads, flag-driven short-circuiting)
Centralized decision point or shared coordination logic across threads
Seller-specific purchase logic must be ready to fire immediately on match

Tradeoffs:

You may miss slightly better prices from slower sellers
Requires thoughtful ordering of sellers and timeout management

Strategy 2: Adaptive Fan-out (Seller Prioritization Tiers)

Query sellers in prioritized tiers based on reliability, historical success, and pricing behavior.

Rather than querying all sellers at once, the system should fan out in smart waves:

Tier 1: High-trust sellers (fast, reliable, low-latency, low-failure)
Tier 2: Moderate sellers
Tier 3: Long-tail or low-quality sellers

The Fan-out Controller consults a Seller Ranking Engine to retrieve the appropriate seller tier list. It begins with Tier 1, and only escalates to lower tiers if:

No valid offer is found
All Tier 1 calls fail or timeout

This staged approach improves success rate with minimal cost and avoids wasting resources on low-yield sellers.

Design Implications:

Seller metadata (stored in DB) must include tier ranking and fan-out priority
Fan-out workers need logic to escalate tier-by-tier with fallback
Tier assignments must be continuously refreshed

Tradeoffs:

Slightly increases total resolution latency in failure cases
Requires ongoing seller performance monitoring and tier tuning

Strategy 3: Cache Valid Offers to Avoid Redundant Calls

If a seller recently returned a valid offer for a book with sufficient inventory, we might not need to re-query them again.

The system can reduce redundant work by caching recent seller responses in a BookOffer Cache (e.g., Redis, in-memory, or TTL-backed store). Each cache entry stores:

isbn, seller_id
price, availability
last_checked_at

During fan-out, the Fan-out Controller checks this cache before issuing new API calls. If a cached offer:

Was returned recently (within TTL)
Is still under the user’s max_price
Has high enough inventory (e.g., inventory > 100)

Then the controller can reuse the cached offer and skip that seller’s API call altogether.

Design Implications:

Offer cache must be keyed by isbn + seller_id
TTL values must be per-seller (some refresh more often)
Final purchase call still validates price to avoid staleness

Tradeoffs:

Slight risk of price drift or stockout
Requires per-seller TTL tuning

Design Diagram

DD2 - How do we handle in-flight deduplication of requests for the same book?

Strategy 1: Collapse Duplicate Lookups for the Same Book

Avoid redundant fan-out when many users request the same book — group requests by (isbn, max_price) and resolve them together.

Let’s say 3 users request “Designing Data-Intensive Applications” within a few seconds, all with max_price ≤ $30. Instead of triggering 3 separate 200-seller fan-outs, we coalesce them into a single resolution round.

This is managed via a Request Coalescing Layer, built on top of a temporary in-flight registry (e.g. Redis or in-memory map), keyed by (isbn, max_price_bucket).

How it works:

When a new request comes in:
- If no active fan-out for the (isbn, price) bucket → initiate new resolution
- If fan-out in progress → attach as a dependent
When resolution completes:
- If a valid purchase is made → return the same result (price, seller) to all dependents
- If no match → all receive the same fallback info (e.g. lowest available price)

This mechanism avoids redundant API calls and boosts efficiency in bursty or high-demand scenarios.

Example Timeline

Time	Event
t0	User A requests book `ISBN=X`, `max_price = $30`
t1	System starts fan-out for `(X, $30)`
t2	User B and C also request `X`, `$30`
t2	They are attached to the same fan-out (deduped)
t3	System finds offer from Seller Y at $28
t4	All 3 users receive the same result → purchase via Seller Y at $28

Strategy 2: Idempotent Purchase Guarantees Across Users

Prevent double-purchasing the same seller offer while still allowing safe parallelism when inventory is sufficient.

When several users attempt to purchase the same book, the system must:

Avoid overspending on the same seller offer
Prevent duplicate orders (especially for last-copy scenarios)
Allow parallel purchases when inventory is high

To do this, we enforce idempotency and introduce smart concurrency control.

1. Construct a Deterministic Purchase Idempotency Key

Before calling the seller’s HTTPS endpoint, the worker computes:

purchase_key = hash(isbn + seller_id + price)

This key ensures that duplicate purchase attempts for the same seller offer are detected:

If the same key is already in-flight → block or wait
If key is completed → reuse the result (success or fail)

This avoids accidental double purchases on 1-unit inventory sellers.

2. Controlled Parallelism for High-Inventory Sellers

If the seller's offer indicates sufficient inventory (e.g., >10 units), we allow parallel purchases — while maintaining idempotency using a sequence:

We do this by adding a sequence field to the idempotency key:

purchase_key = hash(isbn + seller_id + price + seq)

Here, sequence:

Increments per user request
Respects the known inventory cap
Helps isolate each parallel buy attempt

We ensure atomicity using Redis or distributed locks to avoid exceeding known inventory.

3. Revalidate Offers Just Before Purchase

Even with cached data, the system should double-check the offer before calling the seller API:

latest_offer = fetch_current_offer(seller_id, isbn)
if price_changed or inventory ≤ 0:
    abort()
    

This revalidation ensures:

No oversell due to stale price
Protection from volatile sellers
Safety against eventual consistency issues

And here is the tradeoffs:

Benefit	Cost
Prevents double-purchase of the same copy	Requires purchase lock store
Enables high concurrency on high stock	More complex idempotency model
Ensures consistency under load	Adds latency for revalidation step

Design Diagram

DD3 - How to prevent overload in a High-Fan-Out Architecture?

💡 In a system where one user request can fan out to 200+ sellers, high QPS can quickly exhaust both your system capacity and external seller APIs.

This deep dive explores two defensive strategies to throttle or delay low-priority fan-out traffic, ensuring stability during bursts or load spikes:

Throttled Fan-out Admission
Global Rate Limiting by User, Seller, or Region

Strategy 1: Throttled Fan-out Admission

Introduce a Throttling Controller and Fan-out Admission Queue to limit how many fan-out requests are in-flight at once.

This strategy enforces controlled concurrency — instead of letting every purchase request trigger outbound calls immediately, we gate entry into fan-out execution based on capacity.

This strategy uses two new components in the flow:

1. Throttling Controller

Applies real-time caps to control concurrency and request volume.
Limits can be:
- Global (e.g., max 500 concurrent seller calls)
- Per-seller (e.g., max 50 to Seller A)
Dynamically adjusts based on:
- Historical latency
- Recent error rates
- 429 rate-limit signals from sellers

2. Fan-out Admission Queue

A bounded queue that temporarily holds eligible requests.
Prevents request spikes from overwhelming the system.
Configurable policies:
- Drop, delay, or retry requests when queue is full
- Enqueue high-priority users first

Workflow

Fan-out Controller sends requests to Throttling Controller
If within quota → forwarded to Fan-out Admission Queue
Once admitted → executed by a Fan-out Worker

This model keeps the system stable, even under extreme concurrency.

Strategy 2: Global Rate Limiting by User, Seller, or Region

Prevent overload across shared global dimensions like user traffic, seller quotas, or regional constraints.

Rather than relying on worker-level throttle, this strategy introduces centralized rate limit guards to apply global traffic shaping across key axes:

User-level: e.g., max 10 requests/min per user ID
Seller-level: e.g., no more than 500 QPS to Seller B
Region-level: e.g., total QPS ≤ 5K for US-West

This strategy uses two new components in the fan-out flow:

Global Rate Limiting Gateway
- Library or service (sidecar or shared module)
- Applies request caps using token/leaky bucket algorithms
- Keys by (user_id, seller_id, region)
Rate Limit Store (e.g. Redis)
- Stores counters and TTL for all rate buckets
- Supports atomic increment, quota reset, expiration

Workflow:

Fan-out Controller contacts Global Rate Limiting Gateway before each seller API call.
Gateway checks current counters in Rate Limit Store.
Outcome:
- if within quota → proceed to seller API
- if over the limit → drop, delay, r degrade gracefully

This ensures systemic protection before any outbound fan-out is executed, shielding downstream sellers and preserving internal budgets.

Final Thought

Designing this system highlights a central engineering principle: optimize for the common case, but protect for the worst case.

The happy path — resolving a cheap price and triggering a purchase — should be fast, cost-efficient, and resilient to noise. That’s where strategies like early-exit fan-out, seller tiering, and offer caching shine. But we also prepare for messy realities: bursty traffic, duplicate requests, flaky sellers, and overload scenarios. That’s where deduplication, idempotency, and global rate limits come in.

Ultimately, this system balances user satisfaction, platform stability, and cost efficiency — and the design is flexible enough to evolve as seller ecosystems grow or customer needs shift. It’s a practical foundation for building trustworthy automation into real-world commerce flows.

‍

Coach + Mock

Practice with a Senior+ engineer who just get an offer from your dream (FANNG) companies.

Schedule Now

Content:

Bookstore

Core Entities

Unlock Full System Design Access