DATA MODELING

 

🧱 DATA MODELING – SENIOR DATA ENGINEER VIEW


1️⃣ Types of Data Models (They WILL ask)

ModelPurposeWhen to Use
ConceptualBusiness viewEarly discussions
LogicalEntities & relationshipsPlatform-agnostic
PhysicalTables, indexesImplementation

🗣 Say:

“I usually start with logical modeling and then optimize physically based on query patterns.”


2️⃣ OLTP vs OLAP (Very Important)

OLTPOLAP
Normalized (3NF)Denormalized
Frequent inserts/updatesRead-heavy
Small transactionsLarge scans
App-drivenAnalytics-driven

🗣 Senior answer:

“For analytics, I prefer dimensional models to reduce join complexity.”


3️⃣ Dimensional Modeling (⭐ MUST KNOW)

🔹 Star Schema (Most Preferred)

Components

  • Fact Table → Measures (sales, amount)

  • Dimension Tables → Context (customer, product, date)

FACT_SALES
-----------
date_key
customer_key
product_key
amount
quantity

🗣 Say:

“Star schema improves query performance and is BI-friendly.”


🔹 Snowflake Schema

🗣 Say:

“Snowflake reduces redundancy but increases joins — I use it only when dimensions are large.”


4️⃣ Fact Table Types (Interview Favorite)

Fact TypeExample
Transaction FactEach sale
Snapshot FactDaily account balance
Accumulating SnapshotOrder lifecycle

🗣 Say:

“Accumulating snapshots are useful for SLA and process tracking.”


5️⃣ Slowly Changing Dimensions (🔥 MUST)

🔹 SCD Types Summary

TypeDescription
Type 0No change
Type 1Overwrite
Type 2History
Type 3Limited history

🔹 SCD Type 2 Example

customer_key | name | city | start_date | end_date | is_current

🗣 Senior line:

“SCD2 enables point-in-time analysis and auditability.”


6️⃣ Keys in Data Modeling

KeyPurpose
Natural KeyBusiness ID
Surrogate KeyWarehouse ID
Composite KeyMulti-column

🗣 Say:

“I always use surrogate keys in dimensions for stability.”


7️⃣ Handling Time in Models (Important)

🔹 Date Dimension

date_key | date | year | quarter | month | day | is_weekend

🗣 Say:

“Avoids date functions at query time and improves performance.”


8️⃣ Data Modeling for Azure / Modern DW

🔹 Bronze–Silver–Gold Layers

LayerPurpose
BronzeRaw data
SilverCleaned, conformed
GoldBusiness models

🗣 Say:

“Gold layer typically follows dimensional modeling.”


9️⃣ Common Data Modeling Interview Questions

❓ Q1: When do you denormalize?

“When query performance and simplicity outweigh storage cost.”

❓ Q2: How do you model many-to-many?

“Using a bridge table.”

❓ Q3: How do you handle late-arriving facts?

“Load facts with unknown dimension keys and update later.”

❓ Q4: How do you model deletes?

“Soft deletes using flags or end_date.”


10️⃣ Performance-Oriented Modeling (Senior Signal 🚀)

✔ Partition fact tables by date
✔ Index surrogate keys
✔ Minimize fact table width
✔ Avoid snowflake unless needed

🗣 Say:

“Physical design depends on query access patterns.”


11️⃣ How to Answer Modeling Questions in Interview

Use this structure:

  1. Business requirement

  2. Grain definition

  3. Fact & dimensions

  4. SCD strategy

  5. Performance & scalability

💬 Example:

“First I define the grain as one row per transaction, then identify dimensions, then decide SCD strategy based on reporting needs.”


🔥 5-Minute MOCK QUESTION (Try This)

Question:
Design a data model for e-commerce orders.

Expected Answer Outline:

  • Fact_Order

  • Dim_Customer (SCD2)

  • Dim_Product

  • Dim_Date

  • Accumulating snapshot for order lifecycle

Comments

Popular posts from this blog

SyBase Database Migration to SQL Server

Basics of US Healthcare -Medical Billing