2dbi

Data Model for Hotel Reviews Dashboard

viaLeetCode

Problem Design the data model that consolidates hotel reviews from multiple sources and powers an analytics dashboard (scores over time, by hotel/region/source, review volumes, moderation stats).

Functional requirements

  • Clarify the dashboard first: which metrics (avg rating, volume, sentiment), which dimensions (hotel, brand, region, source platform, time), what grain (daily) and freshness (daily batch vs streaming).
  • Support drill-down from region → hotel and filtering by source and date range.

Non-functional requirements

  • Analytical workload: wide scans and aggregations, not point lookups — a star schema in a warehouse, not 3NF OLTP.
  • Volumes to discuss: millions of reviews, years of history; dashboard queries under a few seconds.

Key components

  • Fact table fact_review (review_id, hotel_key, date_key, source_key, rating, sentiment_score, review_length, is_verified) at one-row-per-review grain, plus an aggregated daily fact for dashboard speed.
  • Dimensions: dim_hotel (hotel → brand → region hierarchy, SCD Type 2 for attribute changes), dim_date, dim_source, dim_traveler_segment.

Deep dives / trade-offs

  • Grain choice: raw fact enables new metrics later; pre-aggregated daily fact serves the dashboard — keep both, derive the rollup.
  • Slowly changing dimensions: hotel rebrands/reclassification (Type 2) so history reports under the attributes at review time.
  • Late/duplicate review feeds from sources: natural-key dedupe, load audit columns, incremental loads partitioned by date.
Add a follow-up question they asked
No follow-ups yet. Be the first to add one.
asked …
LeaderboardSalary
Language
Account