Design a Distributed Cache

viaLeetCode

Problem Design a distributed cache (Memcached/Redis-cluster-style) used by a fleet of microservices.

Functional requirements

get/put/delete with TTL; cluster client routing; optional atomic ops (incr, CAS). Cache-aside as the primary usage pattern.

Non-functional requirements

Sub-millisecond reads at high QPS; horizontal scale by adding nodes with minimal data movement; graceful behavior when nodes die — the cache must degrade, not cascade failures into dependents.

Key components

Partitioning: CONSISTENT HASHING with virtual nodes — adding/removing a node remaps only ~1/N keys; contrast naive mod-N (full reshuffle). Client-side routing vs proxy layer.
In-node engine: hashmap + LRU/LFU eviction (approximate LRU sampling à la Redis), TTL wheel/lazy expiry, memory limits.
Replication: async replica per shard for read scaling and failover; leader election / gossip for membership; CAP stance — caches choose AP: stale reads acceptable, availability paramount.
Invalidation: TTL as backstop; explicit delete-on-write from services; pub/sub invalidation broadcast; versioned keys as an alternative to deletes.

Deep dives / trade-offs

Failure containment: hot-key problem (local mini-cache, key splitting), thundering herd on miss (request coalescing / single-flight, jittered TTLs against synchronized expiry), cache stampede after node loss.
CIRCUIT BREAKER at the client: on cache-cluster degradation trip open and serve from source (or degrade features) instead of piling timeouts — half-open probes to recover; plus bulkheads/timeouts so a slow cache can't exhaust caller threads.
Consistency talk: read-your-writes gaps in cache-aside, write-through vs write-back trade-offs.

Add a follow-up question they asked

No follow-ups yet. Be the first to add one.

asked …