Design a web service that helps customers purchase the cheapest available copy of a book from a network of online booksellers.

Bookstore

Coach with Author

Book a 90-minute 1:1 coaching session with the author of this post and video — get tailored feedback, real-world insights, and system design strategy tips.

Let’s sharpen your skills and have some fun doing it!

Design a web service that helps customers purchase the cheapest available copy of a book from a network of online booksellers.

A customer submits a request including the book title, their shipping and payment details, and a maximum price they're willing to pay.

The service then queries multiple booksellers:

  • If any offer the book at or below the specified price, it automatically purchases the cheapest option and later informs the customer of the seller and final price.
  • If no offer meets the price threshold, the service notifies the customer of the lowest available price instead.

Use Case Assumptions

  • Book names are unique (like a UID)
  • The customer only ever buys one copy of a book at a time
  • The booksellers may or may not all sell the same books
  • For any specific book, each bookseller may charge a different price
  • Sellers provide heterogeneous APIs (different request/response formats)

Functional Requirements

FR1. Users should be able to submit a book purchase request with a maximum price limit

“As a user, I want to request a book by providing its name, my shipping and payment information, and a maximum price I’m willing to pay.”

FR2. Users should be able to retrieve a price resolution result (either success or lowest available price)

“As a user, I want to know whether the system found a seller within my budget — or at least what the lowest price available is.”

FR3. Users should be able to trigger an automatic purchase when the offer is acceptable

“As a user, I want the system to automatically purchase the book if the price is acceptable — without additional confirmation.”

Non-Functional Requirements

We rank this list of NFRs based on priorities tied to this specific design.

NFR1. Efficiency – avoid unnecessary work and reduce cost per request

Think about:

  • Direct seller API normalization utilities
  • Early-exit fan-out
  • Tiered seller groups (call fast/reliable sellers first)
  • Deduping concurrent fan-outs
  • Stale-offer reuse (cache)
  • Dynamic skipping of slow or unstable sellers

Without adapters, we must embed seller-specific call logic directly into workers — making efficiency even more important.

NFR2. Latency – p90 < 5 seconds

NFR3. Idempotency – avoid duplicate charges or purchases

NFR4. Scalability – up to 200 QPS, 20K seller calls/sec

Let’s just suppose the scale assumptions below:
  • 1M DAU
  • Average: ~0.01–0.1 requests/user/day (since users only search occasionally) ⇒ 1M DAU * 0.1 requests/user/day / 10^5 secs = 10 QPS on average (or Low)
  • Estimated traffic:
    • Low: 10 QPS
    • Peak: 10 - 20x ⇒ 100–200 QPS sustained (e.g. promo days or peak hours)
  • Each request → 50–200 seller fan-out ⇒ 5K–20K external calls/sec during peak

NFR5. Reliability – tolerate partial failures and degraded seller availability

NFR6. Observability – end-to-end traceability and real-time metrics

We rank Efficiency highest because the system’s core value is to help users find a cheap price fast — and that means minimizing unnecessary work: exiting early, prioritizing high-value sellers, batching calls, and de-duplicating in-flight lookups. Latency is next, as timely responses drive user trust and satisfaction.

Idempotency comes third because this system involves real purchases — without safeguards, retries could trigger double charges or duplicate orders. Scalability is essential to handle peak traffic and fan-out volume. Reliability ensures users still get useful results even when sellers fail.

Observability, while operationally critical, has the least direct impact on the user experience and can be layered in post-MVP.

High-Level Design

💡 How to structure your High Level Design?

When presenting your High-Level Design, a clear and effective strategy is to go vertically, one Functional Requirement at a time — and walk through a working, end-to-end solution for each.

For every FR, answer:

  • What APIs are called or exposed at this stage?
  • What entity or schema design is needed to support the flow?
  • What are the core steps and flow from user input to system output?
  • What components are involved (e.g., API gateway, queue, fan-out workers, storage, aggregator)?

This vertical approach keeps each requirement self-contained and actionable, making it easier to read, review, and debug. For documentation, you can co-locate entity and API definitions within each FR, or collect them in a single section at the beginning — both are valid depending on interview style or presentation format.

FR1 - Users should be able to submit a book purchase request with a maximum price limit

API Design - POST /purchase-request

Request Body:

{ "book_name": "Designing Data-Intensive Applications", "max_price": 30.00, "customer_info": { "shipping_address": "...", "credit_card_info": "..." } }

Response Body:

{ "request_id": "req_abc123", "status": "PENDING" }

The final outcome (purchase result or lowest price) is not returned here, but made available via a separate GET /request-status API (covered in FR2).

Again, though we make it HTTP polling, we can also improve by other options, such as webhook.

Core Entities

Core Entities

Entity Description
User End-customer
Book Unique book identifier
PurchaseRequest Tracks the async resolution workflow

Workflow

Step 1: API Gateway receives the POST /purchase-request call and forwards it to the backend service.

Step 2: RequestHandler Service performs:

  • Validation of input fields (e.g., price format, required fields).
  • Generation of request_id.

Step 3: Persistence of a new PurchaseRequest entity into the database with status PENDING.

Step 4: The same service enqueues a message with request_id into an internal RequestQueue (e.g., Kafka, SQS, Pub/Sub, can be discussed later).

Step 5: The system returns a 202 Accepted response with the request_id to the user.

The actual resolution (seller fan-out and purchase) happens asynchronously (covered in FR2).

FR2 – Users should be able to retrieve a price resolution result (either success or lowest available price)

API Design – GET /request-status

GET /request-status?request_id=req_abc123

Response – if match found (price within max_price):

{ "status": "SUCCESS", "resolved_price": 24.99, "resolved_seller": "Booktopia" }

Response – if no seller met price cap:

{ "status": "NOT_AVAILABLE", "lowest_available_price": 34.50 }

Response – if still resolving:

{ "status": "PROCESSING" }
💡 This is a polling-based API that lets the user check if their price constraint was met — regardless of whether a purchase was triggered yet. The actual purchase happens separately and automatically (see FR3)

Core Entities:

Entity Description
PurchaseRequest Tracks the user's original request, including book_name, max_price, resolution status ( PROCESSING, SUCCESS, NOT_AVAILABLE ), and final outcome
SellerOffer Temporary or cached offer received directly from an external seller’s API, including normalized fields like price, availability, inventory, and response_time_ms
BookSeller Configuration metadata for each external seller, including direct API endpoints ( check_offer_url ), HTTP method, headers template, request/response mappings, retry policies, and ranking metadata (e.g. success rate, p95 latency)

Workflow:

Once the PurchaseRequest is enqueued (from FR1), the system kicks off seller resolution via a background worker.

Step 1: Fan-out Worker consumes the request_id from the RequestQueue.

It retrieves the associated PurchaseRequest and book_name.

Step 2: Fan-out Worker queries each external bookseller’s public API directly, using seller-specific config from the BookSeller table:

  • Constructs the HTTP request (URL, headers, payload) using the seller’s metadata
  • Sends the HTTP request
  • Parses the response based on expected JSON/XML paths

Each call returns a normalized {seller_id, price, availability, inventory} tuple.

Step 3: Fan-out Worker filters out unavailable or overpriced offers.

It selects the cheapest seller whose price is within the user’s max_price.

Step 4a: If a valid match is found, the worker updates the PurchaseRequest with:

  • status = RESOLVED
  • resolved_price
  • resolved_seller

⇒ And proceeds to automatically trigger the purchase (see FR3).

Step 4b: If no seller meets the price, the worker updates the PurchaseRequest with:

  • status = NOT_AVAILABLE
  • lowest_available_price from all seller responses

Step 5: User polls GET /request-status to retrieve the final outcome:

  • If status is RESOLVED, show matched seller and price
  • If status is NOT_AVAILABLE, return lowest available price as fallback

FR3 - Users should be able to trigger a purchase if the offer is acceptable

This flow continues from where FR2 ends. Once the system identifies a seller offering the book within the user's maximum price, it automatically triggers a purchase on the user's behalf.

API Design - Direct Seller Purchase Call

POST https://api.booktopia.com/purchase

Request template:

{ "isbn": "...", "price": 12.50, "shipping": { ... }, "payment_token": "tok_abc123", "client_reference_id": "req_001A" }

Response parsing and normalization also happen in-worker.

Workflow

Step 1: Fan-out Worker (from FR2) selects the best valid offer from seller responses, where price ≤ max_price.

Step 2: Fan-out Worker initiates a direct HTTPS call to the selected bookseller’s purchase API, using configuration stored in the BookSeller entity:

  • Builds the HTTP request using:
    • api_endpoint (purchase URL)
    • HTTP method (e.g. POST)
    • headers template (e.g. API keys, content-type)
    • request body template (filled with dynamic fields like isbn, price, shipping info, etc.)

Step 3: Fan-out Worker sends the request to the external seller’s system, handles retries, parses the raw response, and normalizes it to extract:

  • order_id, final_price, delivery_eta, or error code/message

Step 4.a: On success, the worker updates the PurchaseRequest with:

  • status = SUCCESS
  • resolved_price, resolved_seller, purchase_timestamp

Step 4.b: On failure, the system may retry, attempt the next best valid offer (if any), or mark the request as PURCHASE_FAILED — based on business rules and error type.

Deep Dives

DD1 - How do we make fan-out efficient and cost-effective?

💡 In a system where each purchase request may involve contacting 50–200 booksellers, making fan-out more efficient is critical — not just to reduce cost, but to meet tight latency goals and avoid overwhelming external dependencies. The strategies below focus on eliminating unnecessary work, prioritizing high-value sellers, and reusing trustworthy data wherever possible.

Together, these techniques form the foundation for a scalable, cost-efficient resolution layer.

Strategy 1: Early Exit Once a Valid Price is Found

Stop querying other sellers once we’ve already found a seller that meets the user’s max price constraint.

Instead of waiting for all seller responses, the system can short-circuit fan-out once a valid price (≤ max_price) is returned by any seller. The Fan-out Controller monitors incoming responses from external bookseller APIs, and as soon as a valid offer is identified, it immediately:

  • Triggers the purchase via direct HTTPS request to that seller
  • Cancels, skips, or deprioritizes remaining seller calls

This approach minimizes outbound calls, lowers purchase latency, and improves throughput — especially when fast and reliable sellers respond early.

Design Implications:

  • Fan-out workers need interruptible call execution (e.g., cancellable threads, flag-driven short-circuiting)
  • Centralized decision point or shared coordination logic across threads
  • Seller-specific purchase logic must be ready to fire immediately on match

Tradeoffs:

  • You may miss slightly better prices from slower sellers
  • Requires thoughtful ordering of sellers and timeout management

Yes, that’s possible. But in practice, latency vs. price tradeoff is a core business decision. If the first match is good enough (e.g., $25 on a $30 cap), waiting for a hypothetical $24.50 offer isn’t worth the added delay.

To tune this, consider:

  • Setting a grace window (e.g., wait 200ms after first match)
  • Adding a price margin threshold (only early-exit if offer is 10–15% below user max)
  • Prioritizing sellers by historical best-price frequency

This balances price savings with user experience.

Strategy 2: Adaptive Fan-out (Seller Prioritization Tiers)

Query sellers in prioritized tiers based on reliability, historical success, and pricing behavior.

Rather than querying all sellers at once, the system should fan out in smart waves:

  • Tier 1: High-trust sellers (fast, reliable, low-latency, low-failure)
  • Tier 2: Moderate sellers
  • Tier 3: Long-tail or low-quality sellers

The Fan-out Controller consults a Seller Ranking Engine to retrieve the appropriate seller tier list. It begins with Tier 1, and only escalates to lower tiers if:

  • No valid offer is found
  • All Tier 1 calls fail or timeout

This staged approach improves success rate with minimal cost and avoids wasting resources on low-yield sellers.

Design Implications:

  • Seller metadata (stored in DB) must include tier ranking and fan-out priority
  • Fan-out workers need logic to escalate tier-by-tier with fallback
  • Tier assignments must be continuously refreshed

Tradeoffs:

  • Slightly increases total resolution latency in failure cases
  • Requires ongoing seller performance monitoring and tier tuning

Seller tiers are typically computed using:

  • Success rate (2xx purchase confirmations)
  • p95 latency and timeout frequency
  • Inventory match rate per ISBN
  • User feedback (e.g., refund rates)

These stats should be collected in real-time and aggregated hourly or daily, depending on request volume. A dedicated Seller Ranking Engine (batch job or streaming pipeline) can:

  • Update tier assignments
  • Decay outdated signals
  • Promote/demote sellers based on freshness windows

Also consider:

  • Manual overrides for known bad actors
  • Shadowing new sellers in Tier 3 before promotion

Strategy 3: Cache Valid Offers to Avoid Redundant Calls

If a seller recently returned a valid offer for a book with sufficient inventory, we might not need to re-query them again.

The system can reduce redundant work by caching recent seller responses in a BookOffer Cache (e.g., Redis, in-memory, or TTL-backed store). Each cache entry stores:

  • isbn, seller_id
  • price, availability
  • last_checked_at

During fan-out, the Fan-out Controller checks this cache before issuing new API calls. If a cached offer:

  • Was returned recently (within TTL)
  • Is still under the user’s max_price
  • Has high enough inventory (e.g., inventory > 100)

Then the controller can reuse the cached offer and skip that seller’s API call altogether.

Design Implications:

  • Offer cache must be keyed by isbn + seller_id
  • TTL values must be per-seller (some refresh more often)
  • Final purchase call still validates price to avoid staleness

Tradeoffs:

  • Slight risk of price drift or stockout
  • Requires per-seller TTL tuning

To maintain accuracy:

  • Set a short TTL (e.g., 30–90 seconds) for cached book offers
  • Require a minimum inventory threshold to trust reuse (e.g., only reuse if stock > 100)
  • Attach a last_checked_at timestamp for visibility and monitoring
  • Optionally, re-validate before actual purchase (fetch latest price before confirming)

Some sellers may include explicit expiry timestamps in their responses — always honor those if available.

Caching is about avoiding redundant API calls, not guaranteeing up-to-the-millisecond accuracy.

Design Diagram

DD2 - How do we handle in-flight deduplication of requests for the same book?

Strategy 1: Collapse Duplicate Lookups for the Same Book

Avoid redundant fan-out when many users request the same book — group requests by (isbn, max_price) and resolve them together.

Let’s say 3 users request “Designing Data-Intensive Applications” within a few seconds, all with max_price ≤ $30. Instead of triggering 3 separate 200-seller fan-outs, we coalesce them into a single resolution round.

This is managed via a Request Coalescing Layer, built on top of a temporary in-flight registry (e.g. Redis or in-memory map), keyed by (isbn, max_price_bucket).

How it works:

  1. When a new request comes in:
    • If no active fan-out for the (isbn, price) bucket → initiate new resolution
    • If fan-out in progress → attach as a dependent
  2. When resolution completes:
    • If a valid purchase is made → return the same result (price, seller) to all dependents
    • If no match → all receive the same fallback info (e.g. lowest available price)

This mechanism avoids redundant API calls and boosts efficiency in bursty or high-demand scenarios.

Example Timeline

Time Event
t0 User A requests book ISBN=X, max_price = $30
t1 System starts fan-out for (X, $30)
t2 User B and C also request X, $30
t2 They are attached to the same fan-out (deduped)
t3 System finds offer from Seller Y at $28
t4 All 3 users receive the same result → purchase via Seller Y at $28

Strategy 2: Idempotent Purchase Guarantees Across Users

Prevent double-purchasing the same seller offer while still allowing safe parallelism when inventory is sufficient.

When several users attempt to purchase the same book, the system must:

  • Avoid overspending on the same seller offer
  • Prevent duplicate orders (especially for last-copy scenarios)
  • Allow parallel purchases when inventory is high

To do this, we enforce idempotency and introduce smart concurrency control.

1. Construct a Deterministic Purchase Idempotency Key

Before calling the seller’s HTTPS endpoint, the worker computes:

purchase_key = hash(isbn + seller_id + price)

This key ensures that duplicate purchase attempts for the same seller offer are detected:

  • If the same key is already in-flight → block or wait
  • If key is completed → reuse the result (success or fail)

This avoids accidental double purchases on 1-unit inventory sellers.

2. Controlled Parallelism for High-Inventory Sellers

If the seller's offer indicates sufficient inventory (e.g., >10 units), we allow parallel purchases — while maintaining idempotency using a sequence:

We do this by adding a sequence field to the idempotency key:

purchase_key = hash(isbn + seller_id + price + seq)

Here, sequence:

  • Increments per user request
  • Respects the known inventory cap
  • Helps isolate each parallel buy attempt

We ensure atomicity using Redis or distributed locks to avoid exceeding known inventory.

3. Revalidate Offers Just Before Purchase

Even with cached data, the system should double-check the offer before calling the seller API:

latest_offer = fetch_current_offer(seller_id, isbn) if price_changed or inventory ≤ 0: abort()

This revalidation ensures:

  • No oversell due to stale price
  • Protection from volatile sellers
  • Safety against eventual consistency issues

And here is the tradeoffs:

Benefit Cost
Prevents double-purchase of the same copy Requires purchase lock store
Enables high concurrency on high stock More complex idempotency model
Ensures consistency under load Adds latency for revalidation step
Block parallelism when:
  • Seller inventory is ≤ 1
  • Offer TTL is very short or volatile
  • Seller is known for eventual consistency
  • Duplicate purchases cause financial risk
Allow concurrency when:
  • Inventory is reliable and high
  • TTL and offer metadata are recent
  • Seller API is stable and honors purchase locks
This hybrid policy gives us both efficiency and correctness under concurrency.

Design Diagram

DD3 - How to prevent overload in a High-Fan-Out Architecture?

💡 In a system where one user request can fan out to 200+ sellers, high QPS can quickly exhaust both your system capacity and external seller APIs.
This deep dive explores two defensive strategies to throttle or delay low-priority fan-out traffic, ensuring stability during bursts or load spikes:
  1. Throttled Fan-out Admission
  2. Global Rate Limiting by User, Seller, or Region

Strategy 1: Throttled Fan-out Admission

Introduce a Throttling Controller and Fan-out Admission Queue to limit how many fan-out requests are in-flight at once.

This strategy enforces controlled concurrency — instead of letting every purchase request trigger outbound calls immediately, we gate entry into fan-out execution based on capacity.

This strategy uses two new components in the flow:

1. Throttling Controller

  • Applies real-time caps to control concurrency and request volume.
  • Limits can be:
    • Global (e.g., max 500 concurrent seller calls)
    • Per-seller (e.g., max 50 to Seller A)
  • Dynamically adjusts based on:
    • Historical latency
    • Recent error rates
    • 429 rate-limit signals from sellers

2. Fan-out Admission Queue

  • A bounded queue that temporarily holds eligible requests.
  • Prevents request spikes from overwhelming the system.
  • Configurable policies:
    • Drop, delay, or retry requests when queue is full
    • Enqueue high-priority users first

Workflow

  • Fan-out Controller sends requests to Throttling Controller
  • If within quota → forwarded to Fan-out Admission Queue
  • Once admitted → executed by a Fan-out Worker

This model keeps the system stable, even under extreme concurrency.

Several fallback options exist:

  1. Reject & Retry
    Return HTTP 429 with Retry-After header — useful for bursty clients with exponential backoff.
  2. Graceful Degradation
    Query only Tier-1 sellers (e.g., top 10), skip long-tail.
  3. Queue Prioritization
    Use max_price, seller rank, or user tier to sort requests.
  4. Elastic Scaling
    Auto-scale fan-out workers or buffers temporarily (e.g., K8s HPA).

Strategy 2: Global Rate Limiting by User, Seller, or Region

Prevent overload across shared global dimensions like user traffic, seller quotas, or regional constraints.

Rather than relying on worker-level throttle, this strategy introduces centralized rate limit guards to apply global traffic shaping across key axes:

  • User-level: e.g., max 10 requests/min per user ID
  • Seller-level: e.g., no more than 500 QPS to Seller B
  • Region-level: e.g., total QPS ≤ 5K for US-West

This strategy uses two new components in the fan-out flow:

  1. Global Rate Limiting Gateway
    • Library or service (sidecar or shared module)
    • Applies request caps using token/leaky bucket algorithms
    • Keys by (user_id, seller_id, region)
  2. Rate Limit Store (e.g. Redis)
    • Stores counters and TTL for all rate buckets
    • Supports atomic increment, quota reset, expiration

Workflow:

  • Fan-out Controller contacts Global Rate Limiting Gateway before each seller API call.
  • Gateway checks current counters in Rate Limit Store.
  • Outcome:
    • if within quota → proceed to seller API
    • if over the limit → drop, delay, r degrade gracefully

This ensures systemic protection before any outbound fan-out is executed, shielding downstream sellers and preserving internal budgets.

To address that:

  • Use priority-based quotas (Gold > Silver > Free users)
  • Apply adaptive token buckets that refill faster for trusted clients
  • Add fallback paths (e.g., use cached offers if limit exceeded)
  • Employ circuit breakers for failing sellers to avoid cascading retries

The goal is not just throttling, but graceful control.

Final Thought

Designing this system highlights a central engineering principle: optimize for the common case, but protect for the worst case.

The happy path — resolving a cheap price and triggering a purchase — should be fast, cost-efficient, and resilient to noise. That’s where strategies like early-exit fan-out, seller tiering, and offer caching shine. But we also prepare for messy realities: bursty traffic, duplicate requests, flaky sellers, and overload scenarios. That’s where deduplication, idempotency, and global rate limits come in.

Ultimately, this system balances user satisfaction, platform stability, and cost efficiency — and the design is flexible enough to evolve as seller ecosystems grow or customer needs shift. It’s a practical foundation for building trustworthy automation into real-world commerce flows.

Coach + Mock
Practice with a Senior+ engineer who just get an offer from your dream (FANNG) companies.
Schedule Now
Content: