SCD TYPE 2 – INTERVIEW QUESTIONS + MERGE CODE

 Below is a COMPLETE, interview-ready guide to SCD Type 2 using MERGE, covering ALL POSSIBLE QUESTIONS (theory + edge cases) AND production-grade SQL code.

This is curated for senior data engineer / Azure / Databricks / SQL Server interviews.


🔹 SCD TYPE 2 – INTERVIEW QUESTIONS + MERGE CODE


1️⃣ What is SCD Type 2?

Answer:
SCD Type 2 maintains full historical changes by:

  • Expiring old records

  • Inserting a new row for every change

Typical columns:

  • effective_start_date

  • effective_end_date

  • is_current

  • version (optional)


2️⃣ SCD Type 2 Table Design (Interview MUST)

🎯 Dimension Table

CREATE TABLE dim_customer (
customer_sk INT IDENTITY(1,1),
customer_id INT,
name VARCHAR(100),
city VARCHAR(100),
effective_start_date DATE,
effective_end_date DATE,
is_current CHAR(1)
);

🎯 Staging Table

CREATE TABLE stg_customer (
customer_id INT,
name VARCHAR(100),
city VARCHAR(100),
load_date DATE
);

3️⃣ BASIC SCD TYPE 2 MERGE (MOST IMPORTANT)

MERGE dim_customer AS tgt
USING stg_customer AS src
ON tgt.customer_id = src.customer_id
AND tgt.is_current = 'Y'

WHEN MATCHED
AND (
tgt.name <> src.name
OR tgt.city <> src.city
)
THEN UPDATE SET
tgt.effective_end_date = DATEADD(day, -1, src.load_date),
tgt.is_current = 'N'

WHEN NOT MATCHED BY TARGET
THEN INSERT (
customer_id, name, city,
effective_start_date, effective_end_date, is_current
)
VALUES (
src.customer_id, src.name, src.city,
src.load_date, '9999-12-31', 'Y'
);

4️⃣ What happens internally?

ScenarioAction
New customerINSERT
Attribute changeUPDATE old row + INSERT new
No changeNO ACTION

5️⃣ How do you INSERT the new version after UPDATE?

🔥 Classic interview trick question

Correct approach: 2-step MERGE

-- Step 1: Expire old records
MERGE dim_customer tgt
USING stg_customer src
ON tgt.customer_id = src.customer_id
AND tgt.is_current = 'Y'
WHEN MATCHED
AND (
tgt.name <> src.name
OR tgt.city <> src.city
)
THEN UPDATE SET
tgt.effective_end_date = DATEADD(day, -1, src.load_date),
tgt.is_current = 'N';

-- Step 2: Insert new records
INSERT INTO dim_customer (
customer_id, name, city,
effective_start_date, effective_end_date, is_current
)
SELECT
s.customer_id, s.name, s.city,
s.load_date, '9999-12-31', 'Y'
FROM stg_customer s
LEFT JOIN dim_customer d
ON s.customer_id = d.customer_id
AND d.is_current = 'Y'
WHERE d.customer_id IS NULL
OR (
d.name <> s.name
OR d.city <> s.city
);

6️⃣ How do you detect changes efficiently?

Answer: Hash comparison

HASHBYTES('SHA2_256',
CONCAT(name, '|', city))

Used in large tables to avoid column-by-column comparison.


7️⃣ SCD Type 2 Using HASH (Enterprise Pattern)

MERGE dim_customer tgt
USING (
SELECT *,
HASHBYTES('SHA2_256',
CONCAT(name, '|', city)) AS hash_val
FROM stg_customer
) src
ON tgt.customer_id = src.customer_id
AND tgt.is_current = 'Y'

WHEN MATCHED
AND tgt.hash_val <> src.hash_val
THEN UPDATE SET
tgt.effective_end_date = src.load_date - 1,
tgt.is_current = 'N'

WHEN NOT MATCHED BY TARGET
THEN INSERT (
customer_id, name, city,
effective_start_date, effective_end_date, is_current, hash_val
)
VALUES (
src.customer_id, src.name, src.city,
src.load_date, '9999-12-31', 'Y', src.hash_val
);

8️⃣ How do you handle DELETE in SCD Type 2?

WHEN NOT MATCHED BY SOURCE
AND tgt.is_current = 'Y'
THEN UPDATE SET
tgt.effective_end_date = GETDATE(),
tgt.is_current = 'N';

9️⃣ How do you handle late-arriving data?

effective_start_date = src.load_date
effective_end_date = LEAD(src.load_date) OVER (...)

Or back-dated insert with date comparison.


🔟 How do you prevent duplicates?

CREATE UNIQUE INDEX ux_dim_customer
ON dim_customer(customer_id, is_current)
WHERE is_current = 'Y';

1️⃣1️⃣ How do you validate SCD Type 2?

-- More than one current record (ERROR)
SELECT customer_id, COUNT(*)
FROM dim_customer
WHERE is_current = 'Y'
GROUP BY customer_id
HAVING COUNT(*) > 1;

1️⃣2️⃣ SCD Type 2 in Databricks (Delta MERGE)

MERGE INTO dim_customer tgt
USING stg_customer src
ON tgt.customer_id = src.customer_id
AND tgt.is_current = true

WHEN MATCHED
AND tgt.hash_val <> src.hash_val
THEN UPDATE SET
tgt.is_current = false,
tgt.effective_end_date = current_date()

WHEN NOT MATCHED
THEN INSERT *

1️⃣3️⃣ COMMON INTERVIEW QUESTIONS (RAPID FIRE)

QuestionExpected Answer
Difference SCD1 vs SCD2Overwrite vs History
Why MERGE is risky?Concurrency issues
MERGE vs INSERT+UPDATEControl & audit
Surrogate key purposeHistory uniqueness
Why 9999-12-31Open-ended record
How to handle nullsISNULL()
How to rollbackTransactions
SCD Type 2 performanceHash + indexes

🔥 FINAL INTERVIEW STATEMENT

“In production, I prefer a two-step SCD Type 2 using hash comparison, unique indexes on current records, and Delta MERGE for ACID compliance.”

Comments

Popular posts from this blog

SyBase Database Migration to SQL Server

Basics of US Healthcare -Medical Billing