Hotels with Most Preferred Words in Reviews
viaLeetCode
Problem Given a list of preferred words and hotels each with review texts, return the hotel id(s) whose reviews contain the most occurrences of preferred words.
Input / Output
- Input: string[] preferredWords; list of (hotelId, review text) pairs (a hotel may have many reviews).
- Output: hotel id(s) with the maximum preferred-word count (clarify tie handling — often smallest id or all ids).
Constraints
- Total review text up to ~10^6 chars; matching must be whole-word and case-insensitive (clarify punctuation stripping).
Example
- preferred = ["clean","quiet"], reviews: h1 "Clean room, very quiet.", h2 "quiet street" → h1 (2 vs 1).
Expected approach
- Put preferred words (lowercased) in a hash set; per review, tokenize on non-letters, lowercase, count tokens present in the set, accumulating per hotel in a map; take the max. O(total tokens) time. Talking points: word-boundary correctness ("cleanliness" shouldn't match "clean" — that's why substring contains() is wrong), stemming as an extension, and top-k via heap if many hotels are requested ranked.
asked …