SCD TYPE 2 – INTERVIEW QUESTIONS + MERGE CODE
Below is a COMPLETE, interview-ready guide to SCD Type 2 using MERGE, covering ALL POSSIBLE QUESTIONS (theory + edge cases) AND production-grade SQL code.
This is curated for senior data engineer / Azure / Databricks / SQL Server interviews.
🔹 SCD TYPE 2 – INTERVIEW QUESTIONS + MERGE CODE
1️⃣ What is SCD Type 2?
Answer:
SCD Type 2 maintains full historical changes by:
-
Expiring old records
-
Inserting a new row for every change
Typical columns:
-
effective_start_date -
effective_end_date -
is_current -
version(optional)
2️⃣ SCD Type 2 Table Design (Interview MUST)
🎯 Dimension Table
CREATE TABLE dim_customer (
customer_sk INT IDENTITY(1,1),
customer_id INT,
name VARCHAR(100),
city VARCHAR(100),
effective_start_date DATE,
effective_end_date DATE,
is_current CHAR(1)
);
🎯 Staging Table
CREATE TABLE stg_customer (
customer_id INT,
name VARCHAR(100),
city VARCHAR(100),
load_date DATE
);
3️⃣ BASIC SCD TYPE 2 MERGE (MOST IMPORTANT)
MERGE dim_customer AS tgt
USING stg_customer AS src
ON tgt.customer_id = src.customer_id
AND tgt.is_current = 'Y'
WHEN MATCHED
AND (
tgt.name <> src.name
OR tgt.city <> src.city
)
THEN UPDATE SET
tgt.effective_end_date = DATEADD(day, -1, src.load_date),
tgt.is_current = 'N'
WHEN NOT MATCHED BY TARGET
THEN INSERT (
customer_id, name, city,
effective_start_date, effective_end_date, is_current
)
VALUES (
src.customer_id, src.name, src.city,
src.load_date, '9999-12-31', 'Y'
);
4️⃣ What happens internally?
| Scenario | Action |
|---|---|
| New customer | INSERT |
| Attribute change | UPDATE old row + INSERT new |
| No change | NO ACTION |
5️⃣ How do you INSERT the new version after UPDATE?
🔥 Classic interview trick question
Correct approach: 2-step MERGE
-- Step 1: Expire old records
MERGE dim_customer tgt
USING stg_customer src
ON tgt.customer_id = src.customer_id
AND tgt.is_current = 'Y'
WHEN MATCHED
AND (
tgt.name <> src.name
OR tgt.city <> src.city
)
THEN UPDATE SET
tgt.effective_end_date = DATEADD(day, -1, src.load_date),
tgt.is_current = 'N';
-- Step 2: Insert new records
INSERT INTO dim_customer (
customer_id, name, city,
effective_start_date, effective_end_date, is_current
)
SELECT
s.customer_id, s.name, s.city,
s.load_date, '9999-12-31', 'Y'
FROM stg_customer s
LEFT JOIN dim_customer d
ON s.customer_id = d.customer_id
AND d.is_current = 'Y'
WHERE d.customer_id IS NULL
OR (
d.name <> s.name
OR d.city <> s.city
);
6️⃣ How do you detect changes efficiently?
Answer: Hash comparison
HASHBYTES('SHA2_256',
CONCAT(name, '|', city))
Used in large tables to avoid column-by-column comparison.
7️⃣ SCD Type 2 Using HASH (Enterprise Pattern)
MERGE dim_customer tgt
USING (
SELECT *,
HASHBYTES('SHA2_256',
CONCAT(name, '|', city)) AS hash_val
FROM stg_customer
) src
ON tgt.customer_id = src.customer_id
AND tgt.is_current = 'Y'
WHEN MATCHED
AND tgt.hash_val <> src.hash_val
THEN UPDATE SET
tgt.effective_end_date = src.load_date - 1,
tgt.is_current = 'N'
WHEN NOT MATCHED BY TARGET
THEN INSERT (
customer_id, name, city,
effective_start_date, effective_end_date, is_current, hash_val
)
VALUES (
src.customer_id, src.name, src.city,
src.load_date, '9999-12-31', 'Y', src.hash_val
);
8️⃣ How do you handle DELETE in SCD Type 2?
WHEN NOT MATCHED BY SOURCE
AND tgt.is_current = 'Y'
THEN UPDATE SET
tgt.effective_end_date = GETDATE(),
tgt.is_current = 'N';
9️⃣ How do you handle late-arriving data?
effective_start_date = src.load_date
effective_end_date = LEAD(src.load_date) OVER (...)
Or back-dated insert with date comparison.
🔟 How do you prevent duplicates?
CREATE UNIQUE INDEX ux_dim_customer
ON dim_customer(customer_id, is_current)
WHERE is_current = 'Y';
1️⃣1️⃣ How do you validate SCD Type 2?
-- More than one current record (ERROR)
SELECT customer_id, COUNT(*)
FROM dim_customer
WHERE is_current = 'Y'
GROUP BY customer_id
HAVING COUNT(*) > 1;
1️⃣2️⃣ SCD Type 2 in Databricks (Delta MERGE)
MERGE INTO dim_customer tgt
USING stg_customer src
ON tgt.customer_id = src.customer_id
AND tgt.is_current = true
WHEN MATCHED
AND tgt.hash_val <> src.hash_val
THEN UPDATE SET
tgt.is_current = false,
tgt.effective_end_date = current_date()
WHEN NOT MATCHED
THEN INSERT *
1️⃣3️⃣ COMMON INTERVIEW QUESTIONS (RAPID FIRE)
| Question | Expected Answer |
|---|---|
| Difference SCD1 vs SCD2 | Overwrite vs History |
| Why MERGE is risky? | Concurrency issues |
| MERGE vs INSERT+UPDATE | Control & audit |
| Surrogate key purpose | History uniqueness |
Why 9999-12-31 | Open-ended record |
| How to handle nulls | ISNULL() |
| How to rollback | Transactions |
| SCD Type 2 performance | Hash + indexes |
🔥 FINAL INTERVIEW STATEMENT
“In production, I prefer a two-step SCD Type 2 using hash comparison, unique indexes on current records, and Delta MERGE for ACID compliance.”
Comments
Post a Comment