Functional Requirements
FR1. Users should be able to submit a book purchase request with a maximum price limit
“As a user, I want to request a book by providing its name, my shipping and payment information, and a maximum price I’m willing to pay.”
FR2. Users should be able to retrieve a price resolution result (either success or lowest available price)
“As a user, I want to know whether the system found a seller within my budget — or at least what the lowest price available is.”
FR3. Users should be able to trigger an automatic purchase when the offer is acceptable
“As a user, I want the system to automatically purchase the book if the price is acceptable — without additional confirmation.”
Non-Functional Requirements
We rank this list of NFRs based on priorities tied to this specific design.
NFR1. Efficiency – avoid unnecessary work and reduce cost per request
NFR2. Latency – p90 < 5 seconds
NFR3. Idempotency – avoid duplicate charges or purchases
NFR4. Scalability – up to 200 QPS, 20K seller calls/sec
NFR5. Reliability – tolerate partial failures and degraded seller availability
NFR6. Observability – end-to-end traceability and real-time metrics
High-Level Design
FR1 - Users should be able to submit a book purchase request with a maximum price limit
API Design - POST /purchase-request
Request Body:
Response Body:
Core Entities
Workflow
Step 1: API Gateway receives the POST /purchase-request call and forwards it to the backend service.
Step 2: RequestHandler Service performs:
- Validation of input fields (e.g., price format, required fields).
- Generation of
request_id.
Step 3: Persistence of a new PurchaseRequest entity into the database with status PENDING.
Step 4: The same service enqueues a message with request_id into an internal RequestQueue (e.g., Kafka, SQS, Pub/Sub, can be discussed later).
Step 5: The system returns a 202 Accepted response with the request_id to the user.
The actual resolution (seller fan-out and purchase) happens asynchronously (covered in FR2).

FR2 – Users should be able to retrieve a price resolution result (either success or lowest available price)
API Design – GET /request-status
Response – if match found (price within max_price):
Response – if no seller met price cap:
Response – if still resolving:
Core Entities:
Workflow:
Once the PurchaseRequest is enqueued (from FR1), the system kicks off seller resolution via a background worker.
Step 1: Fan-out Worker consumes the request_id from the RequestQueue.
It retrieves the associated PurchaseRequest and book_name.
Step 2: Fan-out Worker queries each external bookseller’s public API directly, using seller-specific config from the BookSeller table:
- Constructs the HTTP request (URL, headers, payload) using the seller’s metadata
- Sends the HTTP request
- Parses the response based on expected JSON/XML paths
Each call returns a normalized {seller_id, price, availability, inventory} tuple.
Step 3: Fan-out Worker filters out unavailable or overpriced offers.
It selects the cheapest seller whose price is within the user’s max_price.
Step 4a: If a valid match is found, the worker updates the PurchaseRequest with:
status = RESOLVEDresolved_priceresolved_seller
⇒ And proceeds to automatically trigger the purchase (see FR3).
Step 4b: If no seller meets the price, the worker updates the PurchaseRequest with:
status = NOT_AVAILABLElowest_available_pricefrom all seller responses
Step 5: User polls GET /request-status to retrieve the final outcome:
- If status is
RESOLVED, show matched seller and price - If status is
NOT_AVAILABLE, return lowest available price as fallback

FR3 - Users should be able to trigger a purchase if the offer is acceptable
This flow continues from where FR2 ends. Once the system identifies a seller offering the book within the user's maximum price, it automatically triggers a purchase on the user's behalf.
API Design - Direct Seller Purchase Call
Request template:
Response parsing and normalization also happen in-worker.
Workflow
Step 1: Fan-out Worker (from FR2) selects the best valid offer from seller responses, where price ≤ max_price.
Step 2: Fan-out Worker initiates a direct HTTPS call to the selected bookseller’s purchase API, using configuration stored in the BookSeller entity:
- Builds the HTTP request using:
api_endpoint(purchase URL)- HTTP method (e.g.
POST) - headers template (e.g. API keys, content-type)
- request body template (filled with dynamic fields like
isbn,price, shipping info, etc.)
Step 3: Fan-out Worker sends the request to the external seller’s system, handles retries, parses the raw response, and normalizes it to extract:
order_id,final_price,delivery_eta, or error code/message
Step 4.a: On success, the worker updates the PurchaseRequest with:
status = SUCCESSresolved_price,resolved_seller,purchase_timestamp
Step 4.b: On failure, the system may retry, attempt the next best valid offer (if any), or mark the request as PURCHASE_FAILED — based on business rules and error type.

Deep Dives
DD1 - How do we make fan-out efficient and cost-effective?
Strategy 1: Early Exit Once a Valid Price is Found
Stop querying other sellers once we’ve already found a seller that meets the user’s max price constraint.
Instead of waiting for all seller responses, the system can short-circuit fan-out once a valid price (≤ max_price) is returned by any seller. The Fan-out Controller monitors incoming responses from external bookseller APIs, and as soon as a valid offer is identified, it immediately:
- Triggers the purchase via direct HTTPS request to that seller
- Cancels, skips, or deprioritizes remaining seller calls
This approach minimizes outbound calls, lowers purchase latency, and improves throughput — especially when fast and reliable sellers respond early.
Design Implications:
- Fan-out workers need interruptible call execution (e.g., cancellable threads, flag-driven short-circuiting)
- Centralized decision point or shared coordination logic across threads
- Seller-specific purchase logic must be ready to fire immediately on match
Tradeoffs:
- You may miss slightly better prices from slower sellers
- Requires thoughtful ordering of sellers and timeout management
Strategy 2: Adaptive Fan-out (Seller Prioritization Tiers)
Query sellers in prioritized tiers based on reliability, historical success, and pricing behavior.
Rather than querying all sellers at once, the system should fan out in smart waves:
- Tier 1: High-trust sellers (fast, reliable, low-latency, low-failure)
- Tier 2: Moderate sellers
- Tier 3: Long-tail or low-quality sellers
The Fan-out Controller consults a Seller Ranking Engine to retrieve the appropriate seller tier list. It begins with Tier 1, and only escalates to lower tiers if:
- No valid offer is found
- All Tier 1 calls fail or timeout
This staged approach improves success rate with minimal cost and avoids wasting resources on low-yield sellers.
Design Implications:
- Seller metadata (stored in DB) must include tier ranking and fan-out priority
- Fan-out workers need logic to escalate tier-by-tier with fallback
- Tier assignments must be continuously refreshed
Tradeoffs:
- Slightly increases total resolution latency in failure cases
- Requires ongoing seller performance monitoring and tier tuning
Strategy 3: Cache Valid Offers to Avoid Redundant Calls
If a seller recently returned a valid offer for a book with sufficient inventory, we might not need to re-query them again.
The system can reduce redundant work by caching recent seller responses in a BookOffer Cache (e.g., Redis, in-memory, or TTL-backed store). Each cache entry stores:
isbn,seller_idprice,availabilitylast_checked_at
During fan-out, the Fan-out Controller checks this cache before issuing new API calls. If a cached offer:
- Was returned recently (within TTL)
- Is still under the user’s
max_price - Has high enough inventory (e.g.,
inventory > 100)
Then the controller can reuse the cached offer and skip that seller’s API call altogether.
Design Implications:
- Offer cache must be keyed by
isbn + seller_id - TTL values must be per-seller (some refresh more often)
- Final purchase call still validates price to avoid staleness
Tradeoffs:
- Slight risk of price drift or stockout
- Requires per-seller TTL tuning
Design Diagram

DD2 - How do we handle in-flight deduplication of requests for the same book?
Strategy 1: Collapse Duplicate Lookups for the Same Book
Avoid redundant fan-out when many users request the same book — group requests by (isbn, max_price) and resolve them together.
Let’s say 3 users request “Designing Data-Intensive Applications” within a few seconds, all with max_price ≤ $30. Instead of triggering 3 separate 200-seller fan-outs, we coalesce them into a single resolution round.
This is managed via a Request Coalescing Layer, built on top of a temporary in-flight registry (e.g. Redis or in-memory map), keyed by (isbn, max_price_bucket).
How it works:
- When a new request comes in:
- If no active fan-out for the
(isbn, price)bucket → initiate new resolution - If fan-out in progress → attach as a dependent
- If no active fan-out for the
- When resolution completes:
- If a valid purchase is made → return the same result (price, seller) to all dependents
- If no match → all receive the same fallback info (e.g. lowest available price)
This mechanism avoids redundant API calls and boosts efficiency in bursty or high-demand scenarios.
Example Timeline
Strategy 2: Idempotent Purchase Guarantees Across Users
Prevent double-purchasing the same seller offer while still allowing safe parallelism when inventory is sufficient.
When several users attempt to purchase the same book, the system must:
- Avoid overspending on the same seller offer
- Prevent duplicate orders (especially for last-copy scenarios)
- Allow parallel purchases when inventory is high
To do this, we enforce idempotency and introduce smart concurrency control.
1. Construct a Deterministic Purchase Idempotency Key
Before calling the seller’s HTTPS endpoint, the worker computes:
This key ensures that duplicate purchase attempts for the same seller offer are detected:
- If the same key is already in-flight → block or wait
- If key is completed → reuse the result (success or fail)
This avoids accidental double purchases on 1-unit inventory sellers.
2. Controlled Parallelism for High-Inventory Sellers
If the seller's offer indicates sufficient inventory (e.g., >10 units), we allow parallel purchases — while maintaining idempotency using a sequence:
We do this by adding a sequence field to the idempotency key:
Here, sequence:
- Increments per user request
- Respects the known inventory cap
- Helps isolate each parallel buy attempt
We ensure atomicity using Redis or distributed locks to avoid exceeding known inventory.
3. Revalidate Offers Just Before Purchase
Even with cached data, the system should double-check the offer before calling the seller API:
This revalidation ensures:
- No oversell due to stale price
- Protection from volatile sellers
- Safety against eventual consistency issues
And here is the tradeoffs:
Design Diagram

DD3 - How to prevent overload in a High-Fan-Out Architecture?
Strategy 1: Throttled Fan-out Admission
Introduce a Throttling Controller and Fan-out Admission Queue to limit how many fan-out requests are in-flight at once.
This strategy enforces controlled concurrency — instead of letting every purchase request trigger outbound calls immediately, we gate entry into fan-out execution based on capacity.
This strategy uses two new components in the flow:
1. Throttling Controller
- Applies real-time caps to control concurrency and request volume.
- Limits can be:
- Global (e.g., max 500 concurrent seller calls)
- Per-seller (e.g., max 50 to Seller A)
- Dynamically adjusts based on:
- Historical latency
- Recent error rates
429rate-limit signals from sellers
2. Fan-out Admission Queue
- A bounded queue that temporarily holds eligible requests.
- Prevents request spikes from overwhelming the system.
- Configurable policies:
- Drop, delay, or retry requests when queue is full
- Enqueue high-priority users first
Workflow
Fan-out Controllersends requests to Throttling Controller- If within quota → forwarded to Fan-out Admission Queue
- Once admitted → executed by a
Fan-out Worker
This model keeps the system stable, even under extreme concurrency.
Strategy 2: Global Rate Limiting by User, Seller, or Region
Prevent overload across shared global dimensions like user traffic, seller quotas, or regional constraints.
Rather than relying on worker-level throttle, this strategy introduces centralized rate limit guards to apply global traffic shaping across key axes:
- User-level: e.g., max 10 requests/min per user ID
- Seller-level: e.g., no more than 500 QPS to Seller B
- Region-level: e.g., total QPS ≤ 5K for US-West
This strategy uses two new components in the fan-out flow:
- Global Rate Limiting Gateway
- Library or service (sidecar or shared module)
- Applies request caps using token/leaky bucket algorithms
- Keys by (
user_id,seller_id,region)
- Rate Limit Store (e.g. Redis)
- Stores counters and TTL for all rate buckets
- Supports atomic increment, quota reset, expiration
Workflow:
Fan-out ControllercontactsGlobal Rate Limiting Gatewaybefore each seller API call.- Gateway checks current counters in
Rate Limit Store. - Outcome:
- if within quota → proceed to seller API
- if over the limit → drop, delay, r degrade gracefully
This ensures systemic protection before any outbound fan-out is executed, shielding downstream sellers and preserving internal budgets.

Final Thought
Designing this system highlights a central engineering principle: optimize for the common case, but protect for the worst case.
The happy path — resolving a cheap price and triggering a purchase — should be fast, cost-efficient, and resilient to noise. That’s where strategies like early-exit fan-out, seller tiering, and offer caching shine. But we also prepare for messy realities: bursty traffic, duplicate requests, flaky sellers, and overload scenarios. That’s where deduplication, idempotency, and global rate limits come in.
Ultimately, this system balances user satisfaction, platform stability, and cost efficiency — and the design is flexible enough to evolve as seller ecosystems grow or customer needs shift. It’s a practical foundation for building trustworthy automation into real-world commerce flows.

