Design a Distributed Cache
viaLeetCode
Problem Design a distributed cache (Memcached/Redis-cluster-style) used by a fleet of microservices.
Functional requirements
- get/put/delete with TTL; cluster client routing; optional atomic ops (incr, CAS). Cache-aside as the primary usage pattern.
Non-functional requirements
- Sub-millisecond reads at high QPS; horizontal scale by adding nodes with minimal data movement; graceful behavior when nodes die — the cache must degrade, not cascade failures into dependents.
Key components
- Partitioning: CONSISTENT HASHING with virtual nodes — adding/removing a node remaps only ~1/N keys; contrast naive mod-N (full reshuffle). Client-side routing vs proxy layer.
- In-node engine: hashmap + LRU/LFU eviction (approximate LRU sampling à la Redis), TTL wheel/lazy expiry, memory limits.
- Replication: async replica per shard for read scaling and failover; leader election / gossip for membership; CAP stance — caches choose AP: stale reads acceptable, availability paramount.
- Invalidation: TTL as backstop; explicit delete-on-write from services; pub/sub invalidation broadcast; versioned keys as an alternative to deletes.
Deep dives / trade-offs
- Failure containment: hot-key problem (local mini-cache, key splitting), thundering herd on miss (request coalescing / single-flight, jittered TTLs against synchronized expiry), cache stampede after node loss.
- CIRCUIT BREAKER at the client: on cache-cluster degradation trip open and serve from source (or degrade features) instead of piling timeouts — half-open probes to recover; plus bulkheads/timeouts so a slow cache can't exhaust caller threads.
- Consistency talk: read-your-writes gaps in cache-aside, write-through vs write-back trade-offs.
asked …