Design a highly scalable chatbot service
viaGlassdoor
Problem Design a highly scalable chatbot service that supports real-time, streamed conversations for a large number of concurrent users.
Functional requirements
- Real-time bidirectional messaging with streamed (token-by-token) responses.
- Per-user session and conversation-context management.
- Pluggable LLM/backend for generating responses.
Non-functional requirements
- Scale to a large number of concurrent connections; tolerate spiky load.
- Low latency to first token; high availability.
Key components
- WebSocket gateway for streaming, session service, context cache (Redis), LLM backend / inference pool, message queue, horizontal autoscaling behind a load balancer.
Deep dives / trade-offs
- Connection management: sticky sessions vs a shared session store; reconnection handling.
- Caching conversational context in Redis (TTL, size limits, truncation/summarisation of long histories).
- Horizontal scaling of stateful WebSocket nodes; backpressure when the backend is saturated.
- Handling spiky load: queueing, autoscaling signals, and graceful degradation.
asked …