2dbi

Design a highly scalable chatbot service

viaGlassdoor

Problem Design a highly scalable chatbot service that supports real-time, streamed conversations for a large number of concurrent users.

Functional requirements

  • Real-time bidirectional messaging with streamed (token-by-token) responses.
  • Per-user session and conversation-context management.
  • Pluggable LLM/backend for generating responses.

Non-functional requirements

  • Scale to a large number of concurrent connections; tolerate spiky load.
  • Low latency to first token; high availability.

Key components

  • WebSocket gateway for streaming, session service, context cache (Redis), LLM backend / inference pool, message queue, horizontal autoscaling behind a load balancer.

Deep dives / trade-offs

  • Connection management: sticky sessions vs a shared session store; reconnection handling.
  • Caching conversational context in Redis (TTL, size limits, truncation/summarisation of long histories).
  • Horizontal scaling of stateful WebSocket nodes; backpressure when the backend is saturated.
  • Handling spiky load: queueing, autoscaling signals, and graceful degradation.
Add a follow-up question they asked
No follow-ups yet. Be the first to add one.
asked …
LeaderboardSalary
Language
Account