The calendar system enables users to manage events, send invitations, view schedules, and check availability with strong consistency, real-time multi-device sync, high scalability, and low-latency responses optimized for read-heavy workloads.

Google Calendar

Coach with Author

Book a 90-minute 1:1 coaching session with the author of this post and video — get tailored feedback, real-world insights, and system design strategy tips.

Let’s sharpen your skills and have some fun doing it!

Functional Requirements

FR1 – Event Management (Create / Modify / Delete)

Users can create events on their calendars with basic details (title, time, all-day flag, description, location), update those details later, and delete events they own. Only the event organizer (or calendar owner) can modify or delete an event.

FR2 – Invitations and RSVP

Users can add guests to an event and the system records each guest’s response as Yes/No/Maybe. Guests can update their own response, and the organizer can always see the current RSVP status for all invitees.

FR3 – Calendar Viewing (Weekly by Default)

Users can view their schedule in a calendar UI, with the weekly view as the default. The weekly view shows all events for that week and allows users to navigate to other weeks, open event details, and start creating new events from empty time slots.

FR4 – Free/Busy and Availability Checking

Users can check other users’ availability over a given time window, seeing when they are free or busy without necessarily seeing event details. This free/busy view helps organizers choose suitable time slots for meetings while respecting each user’s visibility settings

Non-Functional Requirements

NFR1 – Strong Consistency (CAP: Consistency over Availability)

The calendar backend is a single source of truth: once an event or RSVP is saved, all clients should see the same canonical state. Under network partition or quorum loss, we prefer failing writes rather than accepting conflicting versions of the same event.

  • CAP stance: For core event data, we choose Consistency over Availability; if we can’t safely commit, the write fails.
  • Guarantee: Read-your-own-writes is guaranteed for core APIs within a region.
💡 Why Consistency over Availability in this design? (Click to expand)
  • Impact of errors is worse than impact of failures: if an event silently “forks” (two times/guest lists both accepted), you get double-bookings and missed meetings. That feels like data corruption and destroys trust in the calendar. A transient “failed to save, please retry” is annoying but recoverable.
  • Partitions are rare, events are core: DB/quorum partitions are exceptional; event correctness is exercised constantly. Optimizing for the rare case (AP) at the cost of everyday correctness (C) is the wrong trade here.
  • Our other features depend on a single truth: free/busy computation, time suggestions, and multi-device sync all assume one canonical version of each event. If we allowed divergent writes, we'd need complex reconciliation logic that leaks into RSVP state, availability results, and device views.
  • Offline is handled at the edge, not by weakening server guarantees: when a user is offline, we queue changes locally and reconcile on reconnect with version checks. We do not relax the server’s consistency just because clients may be disconnected; this keeps the core store clean and predictable.

NFR2 – Multi-Device Real-Time Sync

The same account can be signed in on multiple devices, and changes must stay aligned across them with predictable, near-real-time behavior.

  • Cross-device propagation: Changes made on one online device appear on other online devices within 3–5 seconds (p95).
  • Offline edits: Edits made offline are queued and reconciled on reconnect, with explicit conflict surfacing if a newer version exists.

NFR3 – High Scalability

The system must sustain large user bases and long event histories while maintaining performance as read-heavy traffic grows.

  • Scale assumption: Up to tens of millions of users, each with hundreds of recurring series and thousands of events per year.
  • Traffic profile: Optimized for reads ≫ writes, with strong spikes during business hours and meeting-heavy periods.

NFR4 – Low Latency

Common user interactions should feel instant, especially viewing schedules and checking availability.

  • Views: Day/Week/Month view responses should be < 200 ms (p95).
  • Availability: Free/busy queries for up to ~10 users over a 1–2 week window should be < 500 ms (p95).

APIs & Entities

💡Note on Structure – Why APIs & Entities Are Separate This Time
In most ShowOffer delivery frameworks, we walk each FR vertically (FR → APIs → Entities → Workflow → Diagram). For this calendar system, I’ve intentionally pulled APIs and Entities into shared sections instead of repeating them inside each FR.

There are two main reasons:
  1. Heavy cross-FR reuse of the same entities and APIs
    In this problem, almost all FRs depend on the same core objects: Event, EventException, Invitation, FreeBusyBlock, and ChangeLog. The same APIs (e.g., GET /v1/events, GET /v1/events?start_ts=…, POST /v1/freebusy) are used by multiple FRs: FR1 for CRUD, FR3 for views, FR4 for free/busy. If we followed the standard per-FR layout, we’d either duplicate these definitions four times or constantly say “same as FR1,” which adds noise but no insight.
  2. Entities evolve as later FRs are introduced
    The data model for a calendar is central and grows over time: FR1 defines Event and EventException, FR2 adds Invitation, FR4 adds FreeBusyBlock, and all of them write to ChangeLog. If entity tables lived inside each FR section, every refinement (e.g., adding rsvp_status or changing recurrence_rule) would require editing multiple places, increasing the risk of inconsistent diagrams and descriptions. A single Core Data Model section keeps entities canonical and lets FR sections focus on how they use those entities, not re-explain what they are.
Practically, this structure is still interview-friendly: verbally, I can walk each FR vertically (APIs → key entities → workflow), but in the written version I centralize APIs and Entities to avoid duplication and make it easier for the interviewer to see the big picture of the system at a glance.

APIs

Each FR maps to one or more API endpoints. Here we just define signatures, not full payloads.

FR1 – Event Management (Create / Modify / Delete)

  • Create event
    Creates a new event on the authenticated user’s calendar. If recurrence_rule is omitted or null, this creates a one-time event; if recurrence_rule is provided (RRULE-style), this creates a recurring event series. For all the recurred cases, we will discuss more in Deep Dive 1 after the High Level Design.
POST /v1/events
  • Get event
    Returns full details of a single event.
‍GET /v1/events/{event_id}
  • Update event
    Updates event fields; scope controls how recurring events are affected.
‍PATCH /v1/events/{event_id}?scope=single|this_and_future|series
  • Delete event
    Deletes/cancels an event; scope controls recurring behavior.
DELETE /v1/events/{event_id}?scope=single|this_and_future|series

FR2 – Invitations and RSVP

  • Add or update invitees for an event (organizer)
    Adds new guests or updates the guest list for an event.
‍POST /v1/events/{event_id}/invitations
  • List invitees and RSVP status (organizer)
    Returns all guests and their current RSVP status.
‍GET /v1/events/{event_id}/invitations
  • Update RSVP for an event (guest)
    Authenticated user sets or updates their RSVP (Yes/No/Maybe) for the event.
‍PATCH /v1/events/{event_id}/rsvp

FR3 – Calendar Viewing (Weekly by Default)

  • List events in a time range (used for day/week/month views)
    Returns all events on the authenticated user’s calendar within the given time window; the client uses this for weekly (default) and other views.
GET /v1/events?start_ts={start}&end_ts={end}
  • Get single event
    (Same as FR1) Used when user clicks an event in the calendar view to see details.
‍GET /v1/events/{event_id}

FR4 – Free/Busy and Availability Checking

  • Free/busy for multiple users
    Given a list of user IDs and a time window, returns each user’s busy intervals (no event details). This is used to power free/busy views in the UI and to manually choose suitable time slots for meetings.
POST /v1/availability
💡Why POST not GET in checking availability?
Semantically it’s a read, so we could expose GET /v1/availability with query params, but in practice we’d use POST because the request shape (multiple users, time windows, optional constraints) fits a JSON body much better and avoids URL length / encoding headaches.

Entities

User

Represents an account in the system; the owner of exactly one calendar. All APIs infer the acting user from auth.

Name Comment
user_id Primary key; unique identifier for the user.
email User’s login / contact email.
name Display name.
default_timezone Default timezone for events and views.

Calendar

Represents the user’s single personal calendar (1:1 with User). All /v1/events APIs operate on this calendar for the authenticated user.

Name Comment
user_id Primary key; unique identifier for the user.
email User’s login / contact email.
name Display name.
default_timezone Default timezone for events and views.

Event

Backs all event CRUD APIs and calendar views (POST/GET/PATCH/DELETE /v1/events, GET /v1/events?start_ts&end_ts). A row can represent a single event or a recurring series; recurrence semantics and per-occurrence overrides are detailed in Deep Dive 1.

Name Comment
user_id Primary key; unique identifier for the user.
email User’s login / contact email.
name Display name.
default_timezone Default timezone for events and views.

Note: For recurring events, this Event row acts as the series definition. Per-occurrence overrides (e.g., “this instance only moved/cancelled”) are modeled via additional tables introduced in Deep Dive 1 (e.g., EventException).

Invitation

Represents each invitee (including guests) for an event, with their RSVP state; this is how FR2 APIs attach invitees to events.

Name Comment
invitation_id Primary key; identifier for this invitation record.
event_id FK → Event.event_id; event the guest is invited to.
guest_user_id FK → User.user_id; invited user (nullable if external only).
guest_email Email of invited guest (used for external/lookup).
rsvp_status YES, NO, MAYBE, or PENDING.
response_ts Timestamp of the last RSVP update.
is_organizer Boolean; true if this row represents the organizer as an attendee.

So: organizer is on the Event; all invitees (including organizer, if we want) live in Invitation. That keeps invitee data normalized and lets FR2 evolve without bloating the Event row.

FreeBusyBlock (Optional)

In the simplest design, FR4 POST /v1/availability can compute free/busy directly from Event + Invitation by expanding relevant events in the requested window. At scale, we introduce a derived FreeBusyBlock store to precompute busy intervals per user and keep availability queries fast.

Name Comment
user_id FK → User.user_id; user this busy block belongs to.
start_ts Start of busy interval in UTC.
end_ts End of busy interval in UTC.
status Busy type (e.g., BUSY, OOO, TENTATIVE).
source Optional reference (e.g., event_id or type) for debugging.

High-Level Design

This section explains how the APIs and entities work together in a running system. All four FRs share the same core architecture; each subsection then highlights the extra logic needed for that specific capability.

Architecture Overview

At a high level, the calendar looks like a single backend service fronted by an API gateway and backed by a relational event store:

  • Clients (Web / Mobile)
    • Render the calendar UI (event forms, views, availability).
    • Call the APIs defined earlier (/v1/events, /invitations, /rsvp, /availability).
  • API Gateway
    • Terminates TLS and validates authentication.
    • Extracts user_id from the token and forwards it to the backend.
    • Routes all calendar traffic to Calendar Service.
  • Calendar Service (single logical backend)
    • Implements all FRs:
      • Event module – FR1 (create/get/modify/delete).
      • Invitation module – FR2 (guest list + RSVP).
      • View module – FR3 (day/week/month views).
      • Availability module – FR4 (free/busy for multiple users).
    • Applies business rules (ownership checks, scope for recurring events, RSVP rules).
    • Uses version for optimistic concurrency to enforce strong consistency.
  • DB Store (Relational)
    • Strongly consistent store for the entities defined earlier: User, Calendar, Event, Invitation.
    • All FRs are expressed as reads/writes over this shared schema.

The diagrams for FR1–FR4 all follow this pattern:

clients → API-GW → calendar service → DB (User / Calendar / Event / Invitation)

Only the logic inside Calendar Service changes per FR.

FR1 – Event Management (Create / Get / Modify / Delete)

FR1 is the foundation: it defines how events are created, read, updated, and deleted on a user’s calendar. All later features (views, invites, availability) depend on this event model.

Calendar Service responsibilities (FR1):

  • Resolve the caller’s calendar from user_id.
  • Validate basic event shape (time range, timezone, optional recurrence_rule).
  • Enforce ownership: only the organizer/calendar owner can modify or delete.
  • Interpret scope=single|this_and_future|series for recurring events (details in DD1).
  • Use the version column for optimistic concurrency; stale updates return 409 Conflict.

Core flows (see FR1 diagram):

  1. Create event – POST /v1/events
    • Client sends event details.
    • Calendar Service looks up the user’s calendar, validates fields, and inserts a new Event row with:
      • organizer_user_id = user_id, status = ACTIVE, version = 1, optional recurrence_rule.
    • Response returns the canonical event with event_id and version.
  2. Get event – GET /v1/events/{event_id}
    • Calendar Service loads the Event by event_id, checks the caller has access, and returns the canonical event object.
    • No recurrence expansion here; this is just “show me the event details.”
  3. Update event – PATCH /v1/events/{event_id}?scope=...
    • Calendar Service loads the event, checks organizer ownership, applies changes.
    • For non-recurring or scope=series, updates the base Event.
    • For recurring with scope=single/this_and_future, delegates to recurrence logic (DD1) but still as one atomic update.
    • Uses version to detect conflicts; if versions mismatch, returns 409 and the latest state.
  4. Delete event – DELETE /v1/events/{event_id}?scope=...
    • Calendar Service loads the event, checks organizer ownership.
    • Non-recurring or scope=series: marks status = CANCELLED and increments version.
    • Recurring with scope=single/this_and_future: cancels only the relevant occurrence(s) while preserving past history (DD1).
    • Again, version is checked to avoid lost updates.

Design Diagram

As for the HLD in FR1 needs to truncate each sub-flow, we applied multi-color to illustrate better to audience. In the rest of this design, we will not apply sub-flow level in order to make room for the complicated diagram.

FR2 – Invitations and RSVP

Building on FR1’s event model, FR2 adds invitees and their responses via the Invitation table. The same Calendar Service now also manages guest lists and RSVP state.

Calendar Service responsibilities (FR2):

  • For guest management APIs, ensure the caller is the event organizer.
  • For RSVP APIs, ensure the caller is an invited guest.
  • Keep Invitation rows in sync with events, enforcing strong consistency for RSVP status.

Core flows (see FR2 diagram):

  1. Organizer adds/updates guests – POST /v1/events/{event_id}/invitations
    • Calendar Service loads Event by event_id, checks organizer_user_id == user_id.
    • For each guest in the payload, upserts an Invitation:
      • New guest → create row with rsvp_status = PENDING, response_ts = null.
      • Existing guest → update fields (e.g., email) or mark removed if supported.
    • Response returns the updated guest list or a success status.
  2. Organizer views guest list – GET /v1/events/{event_id}/invitations
    • Calendar Service verifies the caller is the organizer.
    • Reads all Invitation rows for that event_id, optionally joining with User to show names/emails.
    • Returns each guest’s rsvp_status and response_ts.
  3. Guest updates RSVP – PATCH /v1/events/{event_id}/rsvp
    • Calendar Service finds the Invitation row for (event_id, guest_user_id = user_id).
    • If none exists, the request is rejected.
    • Otherwise, it sets rsvp_status = YES/NO/MAYBE and updates response_ts to “now”.
    • Because this is a single-row update in a consistent DB, organizers immediately see the new status.

Design Diagram

FR3 – Calendar Viewing (Weekly by Default)

FR3 uses the same canonical data (Event + Invitation) to render a time-bounded view (day/week/month). Weekly is the default, but the pattern is identical for other ranges.

Calendar Service responsibilities (FR3):

  • For a requested window [start_ts, end_ts], compute all events relevant to the user:
    • Events on their own calendar.
    • Events where they are invited and haven’t declined.
  • Expand recurring series into concrete occurrences in that window (DD1).
  • Merge and sort results so the client can draw them on a calendar grid.

Weekly view flow (see FR3 diagram):

  1. Client requests window – GET /v1/events?start_ts={start}&end_ts={end}
    • User opens or navigates the calendar; client computes the window for that week and calls the API.
  2. Calendar Service resolves context
    • Looks up the user’s calendar_id using owner_user_id = user_id.
  3. Load owned + invited events
    • Owned: all Event rows on that calendar_id overlapping [start_ts, end_ts].
    • Invited: Invitation rows where guest_user_id = user_id, joined to Event rows and filtered to the same window; optionally excludes rsvp_status = NO.
  4. Expand recurrence and apply overrides
    • For each event with recurrence_rule, expands it into concrete occurrences within the window and applies per-occurrence overrides (DD1).
    • Combines these with one-off events.
  5. Merge, sort, and return
    • Produces a list of event instances (each with event_id, concrete start_ts/end_ts, title, etc.), ordered and grouped by day.
    • Client renders them into the week grid. Clicking an event uses FR1’s GET /v1/events/{event_id} to show details.

Design Diagram

FR4 – Free/Busy and Availability Checking

FR4 builds directly on FR3’s “what events exist in a window” logic, but changes the output: instead of full events, we return busy intervals per user across multiple calendars.

Calendar Service responsibilities (FR4):

  • Validate multi-user availability requests (window length, max number of users).
  • For each target user, compute when they are busy in the specified window, based on:
    • Events they organize.
    • Events they’re invited to and have not declined.
  • Reuse the same recurrence expansion logic as FR3, then compress to busy blocks.

Free/busy flow (see FR4 diagram):

  1. Client requests availability – POST /v1/availability
    • Organizer selects target user_ids and a window (start_ts, end_ts) and calls the API.
  2. Calendar Service validates request
    • Ensures start_ts < end_ts, window within bounds (e.g., ≤ 2 weeks), and number of users within limits (e.g., ≤ 10).
  3. Resolve calendars and load events
    • For each target user_id, finds their calendar_id.
    • Loads:
      • Owned events on those calendars overlapping the window.
      • Invited events from Invitation where guest_user_id is in the target set, joined to Event, filtered to the window, ignoring rsvp_status = NO if desired.
  4. Expand recurrence and compute busy blocks
    • For each event, expands recurring series inside [start_ts, end_ts] and applies overrides (Deep Dive 1).
    • For each user, collects all their event instances and converts them into [busy_start, busy_end] intervals.
    • Merges overlapping intervals to produce a compact list of busy blocks.
  5. Return per-user busy intervals
    • Response maps each user_id to a list of busy intervals (no event details).
    • Client uses these blocks to visually show free slots and help the organizer pick a time.

Design Diagram

Deep Dives

DD1 - How do we deal with the recurred events?

From the high level design, we have already shed light to the core idea of this recurred events. And this is almost the top asked question during your interview as a follow-up deep dive. In this part, we will use a dedicated section to illustrate:

  • What options do we have to deal with recurred events and what is the best one?
  • What will be the corner cases in recurred events that we haven’t covered in HLD?
💡 Option 1 – Fully Materialize Occurrences (e.g. 1-Year Horizon) (Click to expand)

When a recurring event is created, immediately generate one row per occurrence for some horizon, e.g.:

  • On create: write all occurrences for the next 12 months into Event (or an EventInstance) table.
  • A background job periodically extends the horizon (e.g., always keep “next 6–12 months” materialized).
  • “This instance only” edits just update that one row.

Example:

If you create a weekly standup starting Jan 6, 2025:

  • You insert ~52 rows (one per week) covering Jan 2025–Jan 2026.
  • For FR3 & FR4 just query these instance rows by time range.

Pros

  • Very simple reads for FR3/FR4:
    • Views and availability are just “give me events in [start_ts, end_ts]”.
    • No recurrence expansion logic on the critical read path.
  • Simple per-occurrence edits:
    • “This instance only” = update that row.
    • “Cancel this instance” = mark that row cancelled.
  • Debuggable: What you see in DB is close to what users see in UI (one row per actual calendar cell).

Cons

  • Write amplification & storage blow-up:
    • Long-running series + many users = lots of rows.
    • Every “forever” series becomes N rows/year/user.
  • Painful pattern changes (“this and future”):
    • You may need to update or delete many instance rows (potentially thousands) to reflect series changes.
    • Increases lock time and risk of conflicts.
  • Horizon edge cases:
    • Beyond 1 year, the series “disappears” unless you pre-extend the horizon.
    • Background job failures → future meetings not visible.

In short, this option is great for very simple systems or short-lived recurrences, but doesn’t scale nicely for “forever” series and complex edits. Not a good fit for a Google Calendar–like product at tens of millions of users.

💡 Option 2 – Series Only, On-the-Fly Expansion (No Per-Instance State) (Click to expand)

During the creation, store only the series definition and expand on read:

  • Event entity has start_ts, timezone, recurrence_rule.
  • FR3/FR4 always compute occurrences on-the-fly inside the requested window.
  • There is no per-occurrence state as exception.

To keep this option coherent, we’d relax the product a bit:

  • Only support series-level edits (scope=series).
  • “This instance only” and “this and future” are not supported, or are internally treated as “change the whole series”.

Pros

  • Simplest data model:
    • Just one Event row for the series; no exceptions, no extra tables.
  • Writes are cheap:
    • Changing a series = update a single row.
  • Reads are predictable:
    • Expansion cost depends on the window size, not on how many exceptions exist.

Cons

  • Does not meet our product FR1:
    • We already committed to scope=single|this_and_future|series.
    • Users expect “just this instance” and “this and future” like Google Calendar.
  • No true historical preservation:
    • If you change the series, you change the interpretation of past occurrences as well.
    • That breaks “don’t mutate history” and makes auditing messy.
  • User-surprising behavior:
    • “I just wanted to move next week’s standup, but all past ones now look like they were at that new time.”

Overall, this is a nice “teaching” design for a simpler product, but it cannot satisfy our FR1 requirements (single-instance edits, this-and-future semantics, preserving history). We discard it given our spec.

💡 Option 3 – Series + EventException Table (Chosen Design)

The core idea is that we need to introduce an exception entity to deal with changes based on the defined event series initially:

  • Event row = series definition (base start time + recurrence_rule).
  • EventException rows = overrides or cancellations for specific occurrences, identified by event_id + original_start_ts.

On read (FR3/FR4):

  • Expand series occurrences within the window.
  • For each occurrence, look up a matching exception:
    • status = CANCELLED → drop it.
    • Otherwise, override fields (start_ts, end_ts, title, etc.) from the exception.

On write (FR1):

  • scope=series → update the Event row.
  • scope=single → insert/update a single EventException.
  • scope=this_and_future → typically split series into old + new, plus a few exceptions if needed.

Pros

  • Supports full product semantics:
    • “This instance only”, “this and future”, “series”.
    • Can move/cancel a single occurrence without rewriting everything.
  • Preserves history:
    • Past occurrences remain interpretable using the original rule + exceptions.
    • Middle edits don’t rewrite past rows.
  • Storage-efficient:
    • One row per series + one row per overridden occurrence; unchanged instances don’t consume more space.
  • Good for reads with bounded windows:
    • Expansion is limited to the requested [start_ts, end_ts].
    • Exceptions are typically sparse, so per-window cost is reasonable.
  • Aligns with Consistency:
    • Small, well-scoped writes (one series row + one exception row).
    • Easy to keep strong consistency with version on series and transactional updates including exceptions.

Cons

  • More complex logic than previous 2 Options:
    • You need a recurrence engine and an “apply exceptions” step.
    • FR1 edits must carefully write both Event and EventException in one transaction.
  • Slightly heavier reads than Option 1:
    • Need to: expand series → join with exceptions → apply overrides.
    • Needs thoughtful indexing and careful window limits to meet latency SLOs.
  • Edge cases (e.g. “this and future” splits) require careful design:
    • You’ll need clear rules on when to split series vs when to add exceptions.

Overall, option 3 hits the best balance for a Google Calendar–like system:

  • Meets our functional requirements (scope variants).
  • Respects strong consistency + history preservation.
  • Scales better than fully materialized instances (Option 1).
  • More complex than Option 2, but the complexity is localized to:
    • Event + EventException data model, and
    • a clear “expand + override” algorithm.

This is the design we choose for the rest of DD1 and the overall system.

How should we implement this change?

To make Option 3 concrete, we introduce a dedicated EventException entity. This table is only needed for recurring series and lives in the deep-dive, not the core HLD entities:

  • Each Event row with a recurrence_rule defines a series (e.g., “every Monday at 9am”).
  • EventException represents “this specific occurrence is different”:
    • It happens at a different time, or
    • It has a different title/location, or
    • It is cancelled.

During expansion, we match each computed occurrence to an exception (if any) and either override or drop that occurrence. And here is the concrete look on EventException :

Name Comment
exception_id Primary key for the exception row.
event_id FK → Event.event_id; the recurring series this exception belongs to.
original_start_ts The expected start time (UTC) of the occurrence according to the series rule (before any override). This is how we identify which instance is being changed.
override_start_ts New start time (UTC) for this occurrence, if we move it. Nullable if time is unchanged.
override_end_ts New end time (UTC) for this occurrence, if we change the duration. Nullable if unchanged.
override_title Per-occurrence title override. Nullable; when null, use the series title.
override_location Per-occurrence location override. Nullable; when null, use the series location.
status ACTIVE or CANCELLED. If CANCELLED, this occurrence is skipped entirely.
created_at Timestamp when the exception was created (for audit/debug; not critical to logic).

Read-time behavior (FR3 / FR4):

  • Expand the recurring series into candidate occurrences within the requested window.
  • For each candidate (event_id, instance_start_ts):
    • Look up EventException rows where event_id matches and original_start_ts == instance_start_ts (within some tolerance).
    • If no exception → use the series fields as-is.
    • If exception:
      • If status = CANCELLEDdrop this occurrence.
      • Otherwise → override any non-null override_* columns (time/title/location).

Write-time behavior (FR1):

  • scope=series → update the Event row (series definition).
  • scope=single → write or update one EventException row keyed by (event_id, original_start_ts).
  • scope=this_and_future → usually split series:
    • Old Event truncated to end before the cut.
    • New Event starts at the cut with the new rule.
    • Optionally add a few exceptions for edge cases around the cut.

All of this happens in a single transaction to respect our strong consistency stance.

💡 Summary – What Challenges Does ‘Recurrence Rules + EventException’ Actually Solve?

We have discussed option 3 that can solve the following use cases

  • “This instance only” edits without breaking the whole series
  • Mid-series changes without altering history
  • One-off cancellations for individual instances
  • Handling no-expiry / long-running series without infinite rows
💡 Mini Example – Handling “This and Future” with Series Split

So far, EventException handles sparse, one-off changes (“this instance only”). For larger pattern changes (“this and future”), we combine series split + exceptions.

Let’s look at a scenario to walk through how it builds the solution:

  • Original series:
    • Title: “Team Sync”
    • Rule: every Monday 09:00–09:30, starting 2025‑01‑06, no end date.
    • Stored as one Event row: evt_123 with recurrence_rule="FREQ=WEEKLY;BYDAY=MO"
  • On 2025‑03‑10, the user chooses in UI:
    “Edit → This and future” → move to 10:00–10:30

First, what do we do on Write?

Step 1 - Find the cut occurrence

  • Compute the occurrence being edited:
    • Original instance start: 2025-03-10T09:00 in America/Los_Angeles2025-03-10T17:00:00Z.
    • This is the first occurrence that should follow the new pattern.

Step 2 - Truncate the original series (evt_123)

  • Update evt_123’s recurrence to stop before the cut:
    • Add an UNTIL or equivalent limit so the last occurrence is 2025‑03‑03.
    • All history up to (and including) 2025‑03‑03 stays governed by this row.

Step 3 - Create a new series starting at the cut (evt_456)

  • Insert a new Event row:
| Field         | Value                                        |
|---------------|----------------------------------------------|
| `event_id`    | `evt_456`                                    |
| `start_ts`    | `2025‑03‑10T18:00:00Z` (10:00 local)         |
| `end_ts`      | `2025‑03‑10T18:30:00Z`                        |
| `timezone`    | `America/Los_Angeles`                        |
| `recurrence_rule` | `FREQ=WEEKLY;BYDAY=MO` (same weekly rule)|
| `status`      | `ACTIVE`                                     |
| `version`     | `1`                                          |
    
  • Copy over other fields (title, description, location, organizer, etc.).
  • If there are invitations (FR2), we also clone the guest list from evt_123 to evt_456.

Step 4 - No exceptions needed in this simple case

  • Because the change applies cleanly at a boundary (“this date and all future”), we don’t need EventException for this example.
  • For more complex changes (e.g., shifting days with overlapping patterns), we might still use a small number of exceptions around the split.

2. How reads behave after the split?

  • Past weeks (before 2025‑03‑10):
    • FR3/FR4 expansion sees only evt_123, whose recurrence now ends before the cut.
    • All those occurrences remain at 09:00–09:30.
  • From 2025‑03‑10 onward:
    • Expansion sees evt_456 starting on 2025‑03‑10 at 10:00–10:30 and weekly afterwards.
    • The calendar shows the new time from that date forward.

Design Diagram

In this deep dive, we introduce a Recurrence Engine that encapsulates all recurring-event logic:

  • Calendar Service → Recurrence Engine
    • For FR3 (views) and FR4 (availability), the Calendar Service calls the Recurrence Engine to:
      1. Expand series into concrete instances for a given [ts_start, ts_end] window.
      2. Apply EventException to override or cancel specific occurrences.
    • All recurring behavior is centralized here, so both views and free/busy stay consistent.
  • Recurrence Workers → DB + Recurrence Engine
    • Background recurrence workers periodically scan the DB (Event, EventException, Invite, Calendar) for changes.
    • They call the same Recurrence Engine to expand only the near-future horizon (e.g., next 24 hours) and:
      • Schedule reminders/notifications now,
      • (In DD3) optionally prewarm caches or derived free/busy data.
  • DB (Event / User / Invite / Calendar / EventException)
    • Event holds the series definition (start_ts, timezone, recurrence_rule, etc.).
    • EventException stores per-occurrence overrides/cancellations keyed by (event_id, original_start_ts).
    • Invite and Calendar are reused as in HLD; no new services, just one new table and one shared engine.

DD2 - How to support multi-device in-sync quickly?

💡 We want the same user, signed in on multiple devices, to see changes propagate in ≤ 3–5 seconds (p95) while still respecting our Consistency over Availability stance.

Key behaviors:

  • Device A edits an event → Device B & C should reflect it quickly without reloading the whole calendar.
  • Devices can go offline, make edits, then reconcile when back online.
  • The server remains the single source of truth; devices are just caches.
💡 Option 1 – Simple Periodic Polling

Each device periodically polls:

  • Either GET /v1/events?start_ts&end_ts for the visible window, or
  • A light /v1/sync endpoint with “changes since X”.

Polling interval might be 10–30 seconds.

Pros

  • Very simple to implement:
    • No extra infra beyond existing APIs.
    • No WebSockets / push infra.
  • Fully leverages strong server state:
    • Every poll gets canonical data from the DB.

Cons

  • Latency tied to the polling interval:
    • If we poll every 30 seconds, worst-case propagation is ~30s.
    • To reach 3–5s, we’d need very aggressive polling → lots of wasted traffic.
  • Expensive at scale:
    • Tens of millions of users × frequent polls = massive QPS, most of which return “no changes”.
  • Battery / bandwidth unfriendly on mobile.

Summary
Good as a baseline and fallback, but doesn’t hit our latency + efficiency goals cleanly.
We’ll keep a slow poll as a safety net, but not rely on it as the primary mechanism.

💡 Option 2 – Polling with Delta (/sync) + ChangeLog (No Push)

Improve Option 1 by adding a change log and a delta API:

  • Maintain a ChangeLog (or EventChange) table with a monotonically increasing sequence.
  • Expose a GET /v1/sync?cursor=...:
    • Input: last seen cursor.
    • Output: list of changes since that cursor (create/update/delete, event_ids) + new cursor.
  • Devices poll /sync periodically instead of refetching full views.

Pros

  • Much cheaper than refetching everything:
    • Devices only pull changes, not full windows.
  • Good offline story:
    • When a device comes back, it calls /sync with its last cursor and gets all missed changes.
  • Still uses strong server state as source of truth.

Cons

  • Still limited by polling interval:
    • If /sync is polled every 15–30s, multi-device lag is still too high.
  • Extra complexity on the backend:
    • Need to maintain ChangeLog and cursors.
  • Doesn’t exploit the fact that many devices are online and connected and could be notified immediately.

Summary
Better than pure polling and useful as a core primitive (we’ll reuse /sync), but not enough on its own to reach “near real-time” without aggressive polling.

💡 Option 3 – Push Invalidation + Delta Sync (Chosen Design)

Now how about we combine:

  • A ChangeLog + /sync API (Option 2), and
  • A push-based invalidation channel

So the flow looks like:

  1. When Calendar Service commits a change, it writes to ChangeLog and publishes a “user X has updates” message.
  2. Online devices for that user maintain a push connection (WebSocket / SSE / mobile push).
  3. When they receive an invalidation (“there are changes after cursor K”), they immediately call /v1/sync?cursor=K to pull the actual deltas.
  4. Offline devices simply call /v1/sync when they come back.

Pros

  • Near real-time for online devices:
    • Push invalidation reaches the device quickly; the only latency is one /sync call.
  • Efficient:
    • Push payloads are tiny (“you have changes”), data comes from /sync which only returns deltas.
  • Great offline story:
    • When offline, device misses pushes but can catch up using /sync and its cursor.
  • Aligns with strong consistency:
    • Server is the canonical state; push is just a trigger, not data truth.

Cons

  • More infra:
    • Need a Notification/Sync Service (or reuse an existing one).
    • Require a durable message channel (e.g., Kafka / PubSub) between Calendar Service and Notification Service.
  • More moving parts:
    • Need to handle dropped connections, reconnects, and push failures gracefully.

Summary
Option 3 best matches our NFR2 (multi-device real-time) + NFR3 (scalability):

  • Use /sync + ChangeLog as the core abstraction.
  • Layer push invalidation on top to get <5s propagation for online devices.
  • Fall back to periodic polling /sync for robustness.

We’ll adopt Option 3.

How should we implement this change?

We first introduce another new entity that helps this deep dive, which is ChangeLog.

Like EventException in Deep Dive 1, ChangeLog is a deep-dive-only entity used for sync. It does not appear in the core HLD entities. Here is the schema that should have in a ChangeLog event:

Name Comment
change_id Monotonically increasing sequence (bigint); acts as the global cursor.
user_id User affected by this change (organizer or invitee). We write one row per affected user.
event_id Event whose state changed.
change_type CREATED, UPDATED, DELETED, RSVP_UPDATED, etc.
changed_at Timestamp of the change (for debugging, not primary cursor).
source Optional: WEB, MOBILE, WORKER, etc. (useful for debugging).

Every time an event changes in a way that affects user X’s calendar, we append a row with an increasing change_id for user X. Devices use change_id as a cursor to request “all changes after N”.

Then, we need to add a delta sync endpoint (not in the main APIs table to keep it focused; introduced here in DD2):

GET /v1/sync?cursor={last_seen_change_id}

Example responses:

  • First sync (no cursor):
GET /v1/sync

Response:

{ "cursor": 12345, "events": [ /* initial snapshot of relevant events for default windows */ ] }
  • Incremental sync:
GET /v1/sync?cursor=12345

Response:

{ "cursor": 12360, "changes": [ { "event_id": "evt_1", "change_type": "UPDATED" }, { "event_id": "evt_2", "change_type": "DELETED" }, { "event_id": "evt_3", "change_type": "CREATED" } ] }

On receiving changes, the client:

  • For each event_id:
    • Fetches updated event details if needed (or, for small payloads, the /sync response could already inline the event docs).
  • Updates its in-memory / local cache for current and upcoming windows.

New Component: Notification / Sync Service

We add one new logical component next to Calendar Service:

  • Notification / Sync Service
    • Maintains online connections for devices (WebSockets / SSE / mobile push registration).
    • Subscribes to a message topic fed by Calendar Service (e.g., user-updates partitioned by user_id).
    • On each message (user_id, latest_change_id):
      • Pushes a small invalidation to all online devices for that user_id:
        • e.g., { "type": "calendar_updates", "cursor_hint": latest_change_id }.
    • Devices then call GET /v1/sync?cursor=last_seen.

This service doesn’t know event semantics; it just knows “user X has updates up to cursor Y”.

💡 Example: Write Path with Sync

Let’s say a user on Laptop edits an event time; Phone and Tablet should update.

  1. Laptop → API-GW → Calendar Service
    • PATCH /v1/events/{event_id} with new time, version, scope, etc.
    • Calendar Service validates, applies business logic, writes to Event (+ EventException / Invitation if needed) in a single transaction.
  2. Calendar Service writes ChangeLog
    • After the transaction commits, for each affected user (organizer + invitees):
    • Insert into ChangeLog:
    (change_id = NEXTVAL,
     user_id = guest_or_organizer,
     event_id = evt_123,
     change_type = UPDATED,
     changed_at = now)
            
    • For each inserted row, publish a message to the message bus or directly to Notification Service:
    { "user_id": ..., "latest_change_id": ... }
            
  3. Notification / Sync Service pushes invalidation
    • Receives the message and looks up active connections for that user_id (Laptop, Phone, Tablet).
    • Sends a small push: “Your calendar has changed; latest cursor ≥ 12360”.
  4. Other devices call /sync
    • Phone and Tablet receive the push.
    • Each calls: GET /v1/sync?cursor=their_last_seen_cursor.
    • Calendar Service:
      • Reads ChangeLog rows for that user_id with change_id > cursor.
      • Returns a list of changes (and optionally the updated event docs).
    • Devices update their local view/cache.

Propagation time is dominated by:

  • Calendar write latency + ChangeLog insert,
  • Message bus + push latency,
  • One /sync call.

This keeps us well within the 3–5s p95 target.

Offline & Conflict Handling

Let’s take a look on the offline device behavior:

  • Each device stores its last cursor locally (e.g., in SQLite).
  • When offline:
    • It can still show the last synced view from local cache.
    • If the user edits events offline, the client:
      • Writes locally.
      • Queues outbound operations (e.g., “PATCH event X”) to be sent later.
  • When it comes back online:
    1. It first calls /v1/sync?cursor=last_cursor to pull missed server-side updates.
    2. Then replays its queued writes:
      • If writes are against stale versions, Calendar Service returns 409 Conflict.
      • Client can:
        • Reload server state,
        • Merge or re-ask user (e.g., “Meeting was changed elsewhere; do you still want to move it?”).

Because the server is canonical and every write is version-checked, we never have silent divergence: either the offline edit cleanly applies, or we surface a conflict.

Design Diagram

DD3 - How to quickly check availability & provide suggested event time for a group of users

💡 Deep Dive: Why We Might Need FreeBusyBlock (Click to Expand)

In this deep dive, we want to:

  • Quickly check a group of users’ availability/conflicts in < 500 ms p95 over a 1–2 week window for interactive “find a time” flows.
  • Find a suggested time from the system based on availability checks.

Recall from HLD that POST /v1/availability currently walks Event + Invitation, uses the Recurrence Engine from DD1, and computes busy intervals online.

In the entities section, we marked FreeBusyBlock as optional. In this deep dive we’ll explore whether we actually need such a derived store, and if so, how to keep it fresh while preserving our Consistency over Availability stance.

💡 Option 1 – Compute Free/Busy On-the-Fly (Baseline)

For each check with /v1/availability request:

  1. For each target user:
    • Query Event + Invitation overlapping [start_ts, end_ts].
    • Use the Recurrence Engine to expand recurring series in that window and apply EventException.
    • Merge instances into busy intervals in memory.
  2. Return per-user busy blocks; suggestions (if any) are computed by intersecting these blocks on the fly.

Pros

  • Simple & fully consistent:
    Always reflects current Event + EventException + Invitation state.
  • No extra storage beyond existing tables.
  • Easy to reason about correctness (only one source of truth).

Cons

  • CPU & DB heavy:
    • Same recurrence expansion work is redone on each call.
    • Availability for the same people/window is recomputed again and again.
  • Latency grows with:
    • Number of events in the window,
    • Complexity of recurrence rules,
    • Number of attendees.
  • Hard to guarantee < 500 ms p95 at high scale and during peak times.

Summary
Great as a baseline and fallback, but too expensive to be the hot path for a large calendar product.

💡 Option 2 – Batch-Precomputed Free/Busy (Nightly/Hourly)

Predefined in offline jobs, we can:

  • Run a periodic batch (e.g., nightly or hourly) that:
    • Expands events for each user for the next N days (say 30–60).
    • Materializes per-user busy intervals into a FreeBusyBlock table.
  • /v1/availability queries FreeBusyBlock instead of raw events.

Pros

  • Very fast reads:
    • Availability queries become simple range scans over pre-merged blocks.
  • Easy to compute suggestions:
    • Work on a small set of intervals instead of all events & recurrences.

Cons

  • Staleness:
    • Any event changes after the batch won’t show in free/busy until the next run.
  • Tradeoff is unpleasant:
    • Either accept noticeably stale answers (“this time looks free but was just booked”),
    • Or add real-time corrections on top, which complicates the system anyway.
  • Batch cost grows with user base; recomputing everything each run can be expensive.

Summary
Nice for a small or low-change system, but the staleness is too high for a Google-Calendar–like product that expects near real-time availability.

💡 Option 3 – Incremental FreeBusy Store + Cache (Chosen)

Use a derived “busy blocks” store per user that is:

  • Incrementally updated when events change, driven by the same ChangeLog/Kafka pipeline from DD2.
  • Backed by the Recurrence Engine from DD1 to recompute only the affected days.
  • Served via a cache (e.g., Redis) so /v1/availability reads are fast and predictable.

The canonical truth is still Event + EventException + Invitation; FreeBusyBlock is a materialized view tuned for FR4.

We keep Option 1 as a fallback (misses, edge cases), but most requests hit the derived store.

Good, we now commit to using FreeBusyBlock at scale. So for a given (user_id, day), the FreeBusyBlock rows are the merged busy intervals for that user for that day, across owned and accepted events. It’s okay if FreeBusyBlock lags the canonical state by a few seconds; we’ll talk about how we cap that staleness next.

Design Diagram

Now let’s dig into what the write path and read path looks like in the design diagram.

1. Write Path: Keeping FreeBusyBlock Fresh
Goal:
when an event or RSVP changes, update derived busy blocks (and cache) for all affected users and days.

🛠️ Step-by-Step: FR1/FR2 Write Flow (Click to Expand)
  1. Client → API-GW (FR1/FR2 write)
    • User creates/updates/deletes an event or RSVP.
    • Client calls one of the existing APIs:
    POST   /v1/events
    PATCH  /v1/events/{event_id}
    DELETE /v1/events/{event_id}
    PATCH  /v1/events/{event_id}/rsvp
  2. API-GW → Calendar Service
    • API-GW authenticates the request, extracts user_id, and forwards it.
  3. Calendar Service → Apply core business logic (canonical write)
    • Validates payload, permissions, and recurrence (from DD1).
    • Applies FR1/FR2 logic in a strong-consistency transaction to tables:
      • Event, EventException, Invitation, and optionally Calendar.
  4. Calendar Service → Append ChangeLog rows (DD2)
    • After transaction commits, for each affected user (organizer + invitees):
    • Insert into ChangeLog:
      {
        change_id,
        user_id,
        event_id,
        change_type,
        changed_at
      }
  5. Calendar Service → Kafka (user updates topic)
    • For each ChangeLog row, publish a small message:
    • {"user_id": "u123", "event_id": "evt_456", "change_type": "UPDATED"}
    • Topic is partitioned by user_id.
  6. FreeBusy Updater Worker ← Kafka (DD3)
    • Subscribes to the user-updates topic.
    • For each message:
      • a. Find affected days: Look up events to find impacted calendar days.
      • b. Rebuild FreeBusy for (user_id, day):
        • Query: Event, EventException, Invitation
        • Use recurrence engine to expand, filter for visible attendees, and merge blocks.
      • c. Write FreeBusyBlock:
        • Delete existing rows for (user_id, D).
        • Insert merged rows + set last_rebuilt_at.
  7. FreeBusy Updater → FreeBusyCache (optional)
    • If (user_id, day) is in the configured “hot window” (e.g. next 4 weeks):
      • Update/invalidate corresponding entry in FreeBusyCache (Redis / in-memory).
      • Keeps FreeBusyBlock + cache aligned with canonical change.
  8. End State
    • Canonical tables remain source of truth.
    • FreeBusyBlock (and optionally cache) are up-to-date, ready for /v1/availability.

2. Read Path : Serving /v1/availability Fast
Goal:
answer free/busy requests and suggestions in < 500 ms p95 using the derived store.

💡 Step by Step with Explanation (Click to expand)
  1. Client → API-GW (POST /v1/availability)
    • Organizer opens “Find a time” and selects attendees + a date range (e.g., next 2 weeks).
    • Client calls:
    {
      "user_ids": ["u1", "u2", "u3"],
      "start_ts": "...",
      "end_ts": "..."
    }
    • Request is authenticated at API-GW and forwarded to Calendar Service.
  2. Calendar Service → Normalize request window
    • Validates start_ts < end_ts and window length limits.
    • Computes the set of calendar days D that intersect [start_ts, end_ts].
  3. Calendar Service → FreeBusyCache (fast path)
    • For each requested user_id:
    • For each day d ∈ D:
      • Tries to read busy blocks from FreeBusyCache (e.g., key like freebusys:{user_id}:{day}).
  4. Cache miss → DB (FreeBusyBlock)
    • For (user_id, d) entries missing in cache:
      • Query FreeBusyBlock rows from the DB for that user and day.
      • Populate cache entries with a short TTL (e.g., a few minutes), so subsequent requests are fast.
  5. Merge & trim per user
    • For each user:
      • Combine all their busy blocks across days in D.
      • Re-merge if needed and trim intervals to the exact [start_ts, end_ts] window.
      • Prepare a per-user list:
      {
        "user_id": "u1",
        "busys": [
          { "start_ts": "...", "end_ts": "..." },
          ...
        ]
      }
  6. (Optional) Fallback on-the-fly for gaps
    • In rare cases where FreeBusyBlock doesn’t exist yet for a (user_id, day):
      • Compute free/busy on-the-fly from canonical Event + EventException + Invitation.
      • Optionally backfill FreeBusyBlock and cache that day.
  7. Meeting suggestions (if needed)
    • Calendar Service can:
      • Derive free intervals per user inside [start_ts, end_ts].
      • Intersect across attendees to find candidate slots.
      • Return raw busy data and suggestions.
  8. Response → Client
    • Calendar Service returns a JSON payload like:
    • {
        "users": [
          { "user_id": "u1", "busys": [ ... ] },
          { "user_id": "u2", "busys": [ ... ] }
        ],
        "suggested_slots": [
          { "start_ts": "...", "end_ts": "..." },
          ...
        ]
      }
    • Client renders colored free/busy strips and time suggestions.

Final Thoughts

Overall, this calendar design doc hangs together really well: it starts from clear, interview-ready FRs (events, invitations/RSVP, calendar views, free/busy) and a tight NFR stack that explicitly picks Consistency over Availability, then cleanly factors APIs and Entities as shared building blocks before walking through HLD for each FR.

The deep dives feel coherent rather than bolted on: DD1 introduces a realistic recurring-events model with Event + EventException and a recurrence engine; DD2 layers on multi-device sync via ChangeLog, Kafka, and /v1/sync; DD3 then reuses the same primitives (ChangeLog, recurrence engine) to build a scalable FreeBusyBlock + cache path for fast availability checks, with separate write/read diagrams that make the data-flow story easy to follow.

The result is a consistent narrative: one canonical event store, a small set of reusable services/components, and progressively richer derived views (exceptions, sync, free/busy) that satisfy both product needs and the NFRs without overcomplicating the core.

Coach + Mock
Practice with a Senior+ engineer who just get an offer from your dream (FANNG) companies.
Schedule Now
Content: