DATA MODELING
🧱 DATA MODELING – SENIOR DATA ENGINEER VIEW
1️⃣ Types of Data Models (They WILL ask)
| Model | Purpose | When to Use |
|---|---|---|
| Conceptual | Business view | Early discussions |
| Logical | Entities & relationships | Platform-agnostic |
| Physical | Tables, indexes | Implementation |
🗣 Say:
“I usually start with logical modeling and then optimize physically based on query patterns.”
2️⃣ OLTP vs OLAP (Very Important)
| OLTP | OLAP |
|---|---|
| Normalized (3NF) | Denormalized |
| Frequent inserts/updates | Read-heavy |
| Small transactions | Large scans |
| App-driven | Analytics-driven |
🗣 Senior answer:
“For analytics, I prefer dimensional models to reduce join complexity.”
3️⃣ Dimensional Modeling (⭐ MUST KNOW)
🔹 Star Schema (Most Preferred)
Components
Fact Table → Measures (sales, amount)
Dimension Tables → Context (customer, product, date)
FACT_SALES
-----------
date_key
customer_key
product_key
amount
quantity
🗣 Say:
“Star schema improves query performance and is BI-friendly.”
🔹 Snowflake Schema
🗣 Say:
“Snowflake reduces redundancy but increases joins — I use it only when dimensions are large.”
4️⃣ Fact Table Types (Interview Favorite)
| Fact Type | Example |
|---|---|
| Transaction Fact | Each sale |
| Snapshot Fact | Daily account balance |
| Accumulating Snapshot | Order lifecycle |
🗣 Say:
“Accumulating snapshots are useful for SLA and process tracking.”
5️⃣ Slowly Changing Dimensions (🔥 MUST)
🔹 SCD Types Summary
| Type | Description |
|---|---|
| Type 0 | No change |
| Type 1 | Overwrite |
| Type 2 | History |
| Type 3 | Limited history |
🔹 SCD Type 2 Example
customer_key | name | city | start_date | end_date | is_current
🗣 Senior line:
“SCD2 enables point-in-time analysis and auditability.”
6️⃣ Keys in Data Modeling
| Key | Purpose |
|---|---|
| Natural Key | Business ID |
| Surrogate Key | Warehouse ID |
| Composite Key | Multi-column |
🗣 Say:
“I always use surrogate keys in dimensions for stability.”
7️⃣ Handling Time in Models (Important)
🔹 Date Dimension
date_key | date | year | quarter | month | day | is_weekend
🗣 Say:
“Avoids date functions at query time and improves performance.”
8️⃣ Data Modeling for Azure / Modern DW
🔹 Bronze–Silver–Gold Layers
| Layer | Purpose |
|---|---|
| Bronze | Raw data |
| Silver | Cleaned, conformed |
| Gold | Business models |
🗣 Say:
“Gold layer typically follows dimensional modeling.”
9️⃣ Common Data Modeling Interview Questions
❓ Q1: When do you denormalize?
“When query performance and simplicity outweigh storage cost.”
❓ Q2: How do you model many-to-many?
“Using a bridge table.”
❓ Q3: How do you handle late-arriving facts?
“Load facts with unknown dimension keys and update later.”
❓ Q4: How do you model deletes?
“Soft deletes using flags or end_date.”
10️⃣ Performance-Oriented Modeling (Senior Signal 🚀)
✔ Partition fact tables by date
✔ Index surrogate keys
✔ Minimize fact table width
✔ Avoid snowflake unless needed
🗣 Say:
“Physical design depends on query access patterns.”
11️⃣ How to Answer Modeling Questions in Interview
Use this structure:
Business requirement
Grain definition
Fact & dimensions
SCD strategy
Performance & scalability
💬 Example:
“First I define the grain as one row per transaction, then identify dimensions, then decide SCD strategy based on reporting needs.”
🔥 5-Minute MOCK QUESTION (Try This)
Question:
Design a data model for e-commerce orders.
Expected Answer Outline:
Fact_Order
Dim_Customer (SCD2)
Dim_Product
Dim_Date
Accumulating snapshot for order lifecycle
Comments
Post a Comment