Design a Rate Limiter

viaLeetCode

Problem Design a rate limiter restricting each client to N requests per window — first the algorithms, then the distributed deployment.

Functional requirements

allow(clientId) verdict per request; per-client (and per-API/tier) configurable limits; informative rejections (429 + Retry-After headers).

Non-functional requirements

Sub-millisecond decision latency on the request hot path; millions of clients; limiter failure must not take down traffic (fail-open vs fail-closed — discuss which and when).

Key components / algorithms

Token bucket: refill at rate r up to burst B — allows bursts, O(1) state (tokens, lastRefill); the industry default.
Leaky bucket: fixed outflow rate — smooths bursts into a queue; compare semantics.
Fixed window counter: simple but 2x boundary spike; sliding window LOG: exact but O(N) memory per client; sliding window COUNTER: weighted interpolation of two fixed windows — the practical compromise.
Placement: API gateway/middleware layer, with per-service overrides; client identification (API key, user id, IP with its NAT caveats).

Deep dives / trade-offs

Distributed state: centralized Redis with atomic Lua scripts (correct but adds a network hop + hot-key risk) vs local in-memory buckets with async sync (fast, slightly over-admits) vs sticky routing per client; the consistency-vs-latency trade is the core senior discussion.
Race conditions on read-modify-write (why INCR/Lua atomicity matters); Redis failure policy (fail-open with local fallback limits); multi-region clients hitting different regions.
Response ergonomics (headers, jittered retry guidance) and tiered/dynamic limits under global overload (load shedding).

Add a follow-up question they asked

No follow-ups yet. Be the first to add one.

asked …