Data Model for Hotel Reviews Dashboard
viaLeetCode
Problem Design the data model that consolidates hotel reviews from multiple sources and powers an analytics dashboard (scores over time, by hotel/region/source, review volumes, moderation stats).
Functional requirements
- Clarify the dashboard first: which metrics (avg rating, volume, sentiment), which dimensions (hotel, brand, region, source platform, time), what grain (daily) and freshness (daily batch vs streaming).
- Support drill-down from region → hotel and filtering by source and date range.
Non-functional requirements
- Analytical workload: wide scans and aggregations, not point lookups — a star schema in a warehouse, not 3NF OLTP.
- Volumes to discuss: millions of reviews, years of history; dashboard queries under a few seconds.
Key components
- Fact table fact_review (review_id, hotel_key, date_key, source_key, rating, sentiment_score, review_length, is_verified) at one-row-per-review grain, plus an aggregated daily fact for dashboard speed.
- Dimensions: dim_hotel (hotel → brand → region hierarchy, SCD Type 2 for attribute changes), dim_date, dim_source, dim_traveler_segment.
Deep dives / trade-offs
- Grain choice: raw fact enables new metrics later; pre-aggregated daily fact serves the dashboard — keep both, derive the rollup.
- Slowly changing dimensions: hotel rebrands/reclassification (Type 2) so history reports under the attributes at review time.
- Late/duplicate review feeds from sources: natural-key dedupe, load audit columns, incremental loads partitioned by date.
asked …