Proximity

Introduction

Location-based services (proximity systems) are foundational components of modern applications, powering services like:

  • Ride-sharing (Uber, Lyft)
  • Food delivery (DoorDash)
  • Navigation (Google Maps)
  • Local business discovery (Yelp)
  • Dating applications (Tinder)

These systems face two core technical challenges:

  1. Efficiently identifying relevant nearby entities (users, businesses or locations) within specified geographic boundaries
  2. Managing and processing real-time location updates at scale

In this module, we'll first explore the core technologies that power these location-based services and then walk through three practical examples to help you master the principles of proximity system design.

Related Concepts

Geo Point or Geo Location

A Geo point precisely defines a location on Earth using two coordinates:

  • Latitude (lat): Angular distance north or south of the equator
  • Longitude (lng/lon): Angular distance east or west of the prime meridian

QuadTree vs GeoHash

QuadTree

A QuadTree is a spatial data structure that recursively partitions 2D space into four equal quadrants. Each internal node has exactly four children, representing northwest (NW), northeast (NE), southwest (SW), and southeast (SE) regions. The division process continues until each leaf node contains no more than a predefined number of points.

Key advantages:

  • Highly memory-efficient for non-uniform data distribution (e.g., dense urban centers vs. sparse rural areas)
  • Excels at spatial relationship queries (containment, intersection)
  • Efficient for complex geometric queries (e.g., finding points within irregular polygons)
  • Adaptive to data density variations

Trade-off: Requires rebalancing operations when adding or removing points, which can impact performance in highly dynamic scenarios.

GeoHash

GeoHash is a hierarchical spatial data structure that encodes geographic coordinates into alphanumeric strings. It creates a grid system where longer strings represent smaller, more precise areas. Each additional character in the hash divides the previous cell into 32 subcells, increasing precision.

For example:

  • "9q9" approximately represent the San Francisco area
  • "9q9hvu" pinpoints a specific location within San Francisco

Key advantages:

  • Optimal for real-time location updates due to simple encoding/decoding
  • Efficient for proximity searches using prefix matching
  • Eeay to shard/partition data by prefix.
  • Excellent for write-heavy workloads (because QuadTree may require rebalance on writes).
  • Easy to cache popular regions

Trade-off:

  • can be very tricky in handling edge cases on Grid Boundaries. Two very close points can have very different GeoHash prefixes.
  • Rectangular vs. Circular Search. GeoHash cells are rectangular but most "find nearby" queries want results within a radius (my current location +/- 1 mile).

So GeoHash may have higher processing overheads (i.e: fetching/filtering additional cells) when supporting complex geometric queries.

💡 The choice between QuadTree and GeoHash typically depends on your specific use case.

  • QuadTrees excel at complex spatial queries and handling uneven distribution.
  • GeoHash is preferred for real-time updates and simple proximity searches.

In a system design interview, it’s unlikely that your interviewer would ask you to implement either QuadTree or GeoHash. So focus on understanding these trade-offs rather than implementation details, and knowing when to use what.

Building Blocks

  • Geo Coding
    • Geo Coding is a process (a service) to “translate” an address or area names into geo coordinates. They’re known geocoding service providers such as: Google Geocoding API, MapBox, Amazon Location Service.
    • Why is this needed? When doing location-based search, users are unlikely to provide or search by coordinates ( "lat": 37.7749, "lng": -122.4194,). What’s more likely to happen is searching by an address (i.e: find all Thai restaurants within 1 miles of my current location, or on Mission street). So the backend needs to “translate” a user-provided location to a Geo point first (and then do processing & database query using Geo point).
  • SQL Database (Postgres + PostGIS extension)
    • PostGIS extends PostgreSQL with geographic objects and functions
    • Supports complex spatial queries, geometric operations, and indexing
    • Example: storing static delivery zones, service areas or other complex geo boundaries;
  • NoSQL Database (MongoDB)
    • Native support for GeoJSON format and 2d-sphere indexes.
    • Good for simple proximity queries and high write throughputs
    • Example: storing user location data.
  • In-Memory (Redis)
    • Redis supports geospatial indexing through GEOADD and GEORADIUS commands. (Redis uses a 52-bit GeoHash implementation).
    • Can support extremely fast for simple radius queries
    • Example: Real-time proximity searches, caching frequent location queries
  • Elastic Search + GEO Support
    • Supports multiple approaches: geo_point, geo_shape
    • Efficient for large-scale text search combined with location filtering
    • (Extra) Provides flexible scoring based on distance
    • Best for: Location-aware search, combining full-text search with proximity

💡 Key Consideration for Tool Selection:

  1. Query Patterns
    • Simple radius search → Redis or MongoDB, PostGIS
    • Complex spatial operations → PostGIS
    • Full-text + location search → Elasticsearch
  2. Scale Requirements
    • High write throughput → MongoDB or Redis
    • Complex queries at scale → Elasticsearch
    • Accurate spatial analysis → PostGIS
  3. Data Consistency Needs
    • Real-time updates → Redis
    • ACID compliance → PostgreSQL/PostGIS
    • Eventually consistent → MongoDB/Elasticsearch

Coach + Mock
Practice with a Senior+ engineer who just get an offer from your dream (FANNG) companies.
Schedule Now
Content: