2dbi
Home/Palantir/Word Frequency / Streaming Aggregation
PPalantir·DSASWE-2Technical Phone Screen

Word Frequency / Streaming Aggregation

Problem

Given a large stream of log lines, return the top-k most frequent tokens.

Example

logs -> top 3 tokens by count

Constraints

  • Stream may not fit in memory

Approach

Hash map of counts + heap; discuss approximate counting (count-min sketch) if memory-bound.

added 6 days ago
LeadersAccount