Functional Requirements
1. Merchant can submit a payment request
The system should allow a merchant to initiate a payment with amount, currency, and customer details.
2. Customer can complete the payment using a payment method
The system should accept and process a customer's payment method and trigger authorization with the external provider.
3. Merchant can view the current status of any payment
The system should return real-time or persisted status updates of a payment, including authorization outcome and settlement state.
4. System should settle all authorized payments once daily
The system should batch authorized payments at the end of each day and submit them for final settlement with the external provider.
Non-Functional Requirements
1. Security — No confidential data leak
The system must protect all sensitive payment data by enforcing industry-standard security measures: encrypting all traffic with TLS 1.2 or higher, encrypting stored data with AES-256, and enforcing role-based access control and audit logging. This is essential to prevent data breaches, meet regulatory requirements, and build trust with users and financial partners.
2. Strong Consistency — Exactly-once processing, no duplicate settlement
The system must ensure that each payment is processed exactly once and moves through a well-defined state machine (e.g., pending → authorized → settled) without duplication or loss. Inaccurate payment states can lead to financial loss, double charges, or regulatory issues, which are unacceptable in a production payment system.
3. Durability — ≥ 99.9999999% (nine 9s) data durability
All accepted payments must be written to a durable, replicated data store before acknowledging the client. A payment system cannot lose data due to crashes, network failures, or region outages; persistence guarantees are foundational to user trust and financial correctness.
4. High Scalability — ≥ 10,000 QPS sustained throughput
The system must be able to ingest and process at least 10,000 payments per second at steady load, with the ability to scale horizontally across stateless services and background workers. This ensures the system can handle real-world traffic spikes, global usage, and future growth without becoming a bottleneck.
Core Entities
1. Merchant
Represents a business integrating with the payment platform to accept payments.
2. Payment
Represents a payment request initiated by a merchant and fulfilled by a customer.
3. Transaction
Represents a discrete step taken to process a payment — such as authorization or settlement.
API Design
1. Merchant submits a payment request
Endpoint:
POST /payments
Request Body:
Response:
This endpoint registers a Payment
intent. No funds are authorized at this stage — it simply prepares the system for a future customer confirmation.
2. Customer completes the payment using a payment method
Endpoint:
POST /payments/{payment_id}/confirm
Request Body:
Response:
This endpoint triggers authorization with the external provider. It creates a Transaction
of type AUTHORIZATION
, updates the Payment
status, and holds funds on success.
3. Merchant views the current status of any payment
Endpoint:
GET /payments/{payment_id}
Response:
Returns the current state of a payment and its related transactions, useful for merchant dashboards and auditing.
4. System settles all authorized payments once daily
Internal Endpoint (scheduled job):
POST /settlement_batches
Request Body:
Response:
Triggered by a scheduled task, this endpoint collects all AUTHORIZED
payments, submits them for final settlement with the external provider, and creates SETTLEMENT
transactions. Payment statuses are updated to SETTLED
.
State Machine
We will use state machine to reflect the business lifecycle of a payment and a transaction from creation to completion or failure.

Payment States
Transaction States
Retry Policy Notes:
- Retryable errors: timeouts, network failures, rate limits
- Non-retryable errors: invalid card, expired method, fraud flags
- Use exponential backoff and dead-letter queue for robust retry strategy
High Level Design
The system is composed of stateless API services, asynchronous background workers, persistent storage, and external integrations with third-party payment providers. Each functional requirement maps to a specific request flow that spans multiple components.
1. Merchant submits a payment request
When a merchant initiates a payment, the system validates the request, persists it as a Payment
in the database, and prepares it for later customer confirmation.
Flow Description
- Merchant sends a
POST /payments
request. - API Gateway authenticates the merchant via API key.
- Payment Service:
- Validates merchant status and request data
- Persists a new
Payment
record with statusPENDING
- Returns a
payment_id
for the customer to use in the next step.
Flow Chart

2. Customer completes the payment using a payment method
Once the customer is ready to pay, they confirm the payment using the payment_id
. The system triggers authorization by calling an external provider, updates state based on the outcome, and records a Transaction
.
Flow Description
- Customer submits
POST /payments/{payment_id}/confirm
with card or wallet details. - API validates the payment and forwards the request to an async processing queue.
- Authorization Worker:
- Fetches the payment
- Calls the external provider for authorization
- Records the outcome as a
Transaction
(typeAUTHORIZATION
) - Updates the
Payment
status toAUTHORIZED
orDECLINED
- Result is eventually available via
GET /payments/{id}
.

3. Merchant views the current status of any payment
Merchants can check the real-time status of any payment, including authorization state and transaction history.
Flow Description
- Merchant sends
GET /payments/{payment_id}
. - Payment Service fetches:
- The
Payment
record - All related
Transaction
records (AUTHORIZATION, SETTLEMENT, etc.)
- The
- Response is assembled and returned.

4. System settles all authorized payments once daily
At the end of each day, a scheduled job aggregates all authorized payments into a batch and sends them to the external provider for final settlement.
Flow Description
- Batch Processor triggers at scheduled time (e.g., midnight UTC).
- Payment Service queries all
AUTHORIZED
payments. - Creates a settlement batch and sends it to the external provider.
- For each settled payment:
- Records a
Transaction
(typeSETTLEMENT
) - Updates the
Payment
status toSETTLED
- Records a
Flow Chart

Deep Dives
Deep Dive 1 - How Does the System Stay Secure?
In the high-level design, we describe a system where merchants submit payments, customers confirm them with payment methods, and backend workers authorize and settle payments using external providers. At a functional level, this works.
However, it’s not sufficient from a security standpoint. The current architecture leaves open two major risks that would be unacceptable in a production-grade payment platform like Stripe:
- Handling raw payment information, which is tightly governed by PCI DSS, and
- Allowing internal services to interact without proper authentication, which can lead to privilege escalation or lateral movement if compromised.
Risk 1: Raw Card Data Can Flow Through Application Infrastructure
In the current flow, the customer submits a card to POST /payments/{id}/confirm
. This implies the payment method (e.g., card number, CVC) flows through your public API, service layer, queue, and worker — even if only briefly.
But why this is a problem, because handling raw card data directly puts your entire system into PCI DSS Level 1 scope, which means every service that touches that data must go through costly audits, strict isolation, and compliance checks. Even worse, if card details accidentally end up in logs, error dumps, or unprotected Kafka topics due to a misconfiguration, it could lead to a serious regulatory breach — exposing sensitive information and putting the company at legal and financial risk.
✅ Solution: Vault-Based Tokenization (Don’t Handle Cards Directly)
Use tokenization: offload card handling to a vaulted, PCI-compliant provider like Stripe, Braintree, or a custom HSM-backed vault.
Updated Payment Authorization Flow with Tokenization
- Customer enters card details via frontend SDK
- Example: Stripe.js, Braintree Hosted Fields
- The SDK securely collects raw card details in the browser (never hitting your backend)
- SDK sends the card details directly to the payment provider’s vault
- This bypasses your backend infrastructure entirely
- Stripe or another PCI-compliant vault handles encryption, validation, and token issuance
- Vault returns a
payment_method_token
(e.g.,pm_abc123
)- This token is a safe reference to the actual payment method stored in the provider’s secure system
- Browser sends the token to your backend
POST /payments/{payment_id}/confirm
with body containing only the token- No raw payment data touches your application infrastructure
- Your backend uses the token to authorize payment via provider APIs
- For example,
POST /v1/payment_intents/{id}/confirm
in
- For example,
Production Tip: Even when using your own vault, isolate the tokenization service on a separate VPC, restrict access by service identity, and log only token references.
Risk 2: Internal Services Trust Each Other Implicitly
In the high-level architecture, background workers and internal services communicate with each other and external providers (e.g., auth-worker → provider, cron job → settlement API) but there is no mention of internal authentication or authorization boundaries. These services operate as if they fully trust one another.
In a real-world microservices environment, trust boundaries must be enforced. If one internal service is compromised — either due to a software bug or a security incident — it could impersonate another service, call sensitive internal APIs, or write directly to shared databases.
Without internal authentication and access control:
- Any service could issue sensitive operations like refunds or settlements.
- A misconfigured worker could write corrupted data into
payments
ortransactions
. - Attackers exploiting one service could move laterally across your infrastructure.
✅ Solution: Mutual TLS and Service Identity-Based Authorization
Introduce mutual TLS (mTLS) and per-service identity propagation to harden internal service communication.
How It Works in Practice
Every internal service (like auth-worker
, payment-api
, or batch-processor
) is given a unique identity, similar to a verified name tag.
When one service wants to talk to another (e.g., auth-worker
wants to update a payment), it must:
- Prove who it is using a secure certificate — this is like showing an official badge.
- The receiving service (e.g.,
payment-db
orsettlement-api
) checks the badge and verifies:- Is this service really who it says it is?
- Is it allowed to perform this action?
This process is enforced by the system automatically using mutual TLS (mTLS) — where both sides of the connection authenticate each other, and all traffic is encrypted.
For example:
auth-worker
shows its identity:auth-worker.prod.internal
payment-db
only allows writes from trusted services likeauth-worker
- If an unknown or untrusted service tries to access it, the connection is blocked
This setup prevents internal services from impersonating each other and stops any compromised part of the system from moving laterally or accessing sensitive data.
Updated Payment Entity Table
Still remember in “Core Entities” section, we introduced Payment
entity and that needs to be further discussed? Now, we can make an update correspondingly:
- ✅ Added
payment_method_token
: this replaces raw card data - ❌ Removed
payment_info
(card number, CVC, etc.): not safe to store and a common interview red flag
Updated Diagram

Deep Dive 2 – How Does the System Guarantee Exactly-Once Processing?
In the high-level design, we allow merchants to initiate payments, customers to confirm them, and workers to interact with external payment providers for authorization and settlement. But that design — while functionally complete — falls short when it comes to correctness guarantees, especially around exactly-once execution.
This is critical in payment systems. Unlike many domains, where duplicate writes or retries are acceptable, double authorizing or settling a payment is a catastrophic bug.
Where the Current Design Falls Short?
In a naive system, retries, crashes, or race conditions can cause:
- Duplicate authorization charges to a customer card
- Payments marked as
AUTHORIZED
even when external provider failed - A payment settled twice because the batch processor was re-run
- Conflicting state updates due to concurrent workers
None of this is acceptable. Stripe, PayPal, and similar systems spend enormous engineering effort ensuring that every payment moves through a strictly valid state machine — and only once per transition.
Why This Is a Problem
Modern systems use retries and asynchronous processing heavily. That’s good for availability, but without safeguards, retries can re-trigger real-world actions like charging a card. The following issues arise:
- Stateless APIs may re-authorize a card if the response is lost or retried.
- Worker crashes before persisting a
Transaction
may leave the system in limbo. - Settlement workers scanning the same
AUTHORIZED
rows may run twice due to timeouts, leading to double settlement.
These bugs are hard to detect, cause financial loss, and undermine user trust.
✅ Solution: Idempotency, Transactional Outbox, and State Enforcement
1. Use Idempotency Keys at API Boundaries
Each request to confirm a payment (e.g., POST /payments/{id}/confirm
) should include an idempotency key — a unique identifier for this intent.
- Store the key with a hash of the input parameters and the final result.
- If the same key is reused, return the cached result — do not reprocess.
- Stripe does this with a client-provided
Idempotency-Key
header.
2. Implement Transactional Outbox Pattern
Don’t directly call the external provider from your DB transaction. Instead:
- Write a message (e.g., "authorize this payment") into an outbox table.
- Commit this message in the same transaction that updates payment status.
- A worker polls this table and processes it asynchronously.
This avoids “write succeeded, provider call failed” and vice versa — both of which break consistency.
3. Enforce State Transitions at the Database Level
Add guards around updates to ensure only valid transitions occur:
- From
PENDING → AUTHORIZED
- From
AUTHORIZED → SETTLED
- Never allow reverting or skipping intermediate states
Use optimistic locking (e.g., version numbers or timestamps) to reject concurrent conflicting updates.
4. De-duplicate Settlements with Batch Token
Every settlement batch should have a unique batch ID and each payment should record which batch settled it. This prevents re-processing if the batch is re-run or partially failed.
And here is a summary to resolve the duplications.
Updated Payment Entity Table
- ✅ Added
batch_id
to track settlement inclusion and prevents reprocessing in repeated batch runs.
Updated Transaction Entity Table
- ✅ Added
idempotency_key
to allow the system to safely deduplicate API retries.
New Table: Outbox
- ✅ Added to decouple DB writes from external side effects, ensuring at-least-once delivery and enabling retries without duplication.
Updated Diagram

Deep Dive 3 - Payment System with Webhook
Modern payment systems like Stripe support webhooks to enable event-driven integrations for merchants. While APIs allow merchants to poll payment status, this is inefficient for near-real-time use cases like triggering order fulfillment or updating accounting systems. Webhooks solve this by pushing structured event notifications to merchants as critical state changes occur in the payment lifecycle.
For our existing payment system here, webhook publishing is triggered after core payment events (like authorization or settlement). Rather than synchronously notifying merchants during payment processing, the system uses an event outbox model for durability and decoupling:
- After a critical event (e.g.,
status = SETTLED
), thePayment Service
writes awebhook_event
record into a persistent table. - A
Webhook Dispatcher
worker reads undelivered events and sends signed POST requests to the merchant’s registered webhook URL. - Success responses (
2xx
) mark the event asDELIVERED
. Failures are retried with exponential backoff. - If delivery fails persistently, events are sent to a dead-letter queue or marked
FAILED
.
We use the Transactional Outbox Pattern to ensure state updates (e.g., updating payment to SETTLED) and webhook event creation happen atomically — guaranteeing exactly-once webhook generation, even if the system crashes.
Updated webhook_events
Table
Optional: add a merchant_webhook_config
table to store endpoint URL, secret for HMAC signing, etc.
Updated Diagram

Final Thoughts – Why Many Candidates Fail This System Design Interview
Designing a Stripe-like payment system is not just about drawing boxes and arrows — it’s about building trust, ensuring correctness, and handling failure gracefully in a financial environment where mistakes cost real money.
Many candidates fail this interview because they stop at the happy path: a merchant submits a payment, a customer confirms it, and everything settles. But what interviewers look for is how you handle edge cases, enforce exactly-once guarantees, protect sensitive data, and deliver webhook events reliably. Weak answers skip security (e.g., raw card handling), ignore retries and duplicates, or fail to define clear state transitions and database integrity. Others forget that real-world systems need observability, failure recovery, and scalability beyond 10K QPS. What separates a great candidate is not just technical knowledge, but their ability to design a system that is robust, auditable, and production-ready — end to end.