Design a Rate Limiter
viaLeetCode
Problem Design a rate limiter restricting each client to N requests per window — first the algorithms, then the distributed deployment.
Functional requirements
- allow(clientId) verdict per request; per-client (and per-API/tier) configurable limits; informative rejections (429 + Retry-After headers).
Non-functional requirements
- Sub-millisecond decision latency on the request hot path; millions of clients; limiter failure must not take down traffic (fail-open vs fail-closed — discuss which and when).
Key components / algorithms
- Token bucket: refill at rate r up to burst B — allows bursts, O(1) state (tokens, lastRefill); the industry default.
- Leaky bucket: fixed outflow rate — smooths bursts into a queue; compare semantics.
- Fixed window counter: simple but 2x boundary spike; sliding window LOG: exact but O(N) memory per client; sliding window COUNTER: weighted interpolation of two fixed windows — the practical compromise.
- Placement: API gateway/middleware layer, with per-service overrides; client identification (API key, user id, IP with its NAT caveats).
Deep dives / trade-offs
- Distributed state: centralized Redis with atomic Lua scripts (correct but adds a network hop + hot-key risk) vs local in-memory buckets with async sync (fast, slightly over-admits) vs sticky routing per client; the consistency-vs-latency trade is the core senior discussion.
- Race conditions on read-modify-write (why INCR/Lua atomicity matters); Redis failure policy (fail-open with local fallback limits); multi-region clients hitting different regions.
- Response ergonomics (headers, jittered retry guidance) and tiered/dynamic limits under global overload (load shedding).
asked …