complete interview-ready guide for
🔹 PATTERN 13: Pivot / Unpivot
🎯 Core skill: Transform rows ↔ columns

Designed for 6–8 years experience, with real interview questions + solutions in:

✅ SQL Basic
✅ SQL Advanced
✅ PySpark Basic
✅ PySpark Advanced / Optimized

🔹 Assumed Interview Dataset

`sales`


order_id | order_date | region | product | metric    | value
------------------------------------------------------------
1        | 2024-01-01 | East   | Laptop  | revenue   | 1000
2        | 2024-01-01 | East   | Laptop  | quantity  | 2
3        | 2024-01-01 | West   | Mobile  | revenue   | 800

1️⃣ Pivot Product Rows into Columns (Total Revenue)

🟢 SQL – Basic (CASE WHEN)


SELECT
    region,
    SUM(CASE WHEN product = 'Laptop' THEN value END) AS laptop_revenue,
    SUM(CASE WHEN product = 'Mobile' THEN value END) AS mobile_revenue
FROM sales
WHERE metric = 'revenue'
GROUP BY region;

📌 Most common interview pivot

🔵 SQL – Advanced (PIVOT)


SELECT *
FROM (
    SELECT region, product, value
    FROM sales
    WHERE metric = 'revenue'
) src
PIVOT (
    SUM(value)
    FOR product IN ('Laptop', 'Mobile')
);

🟢 PySpark – Basic


from pyspark.sql import functions as F

df = (sales_df
      .filter("metric = 'revenue'")
      .groupBy("region")
      .pivot("product")
      .sum("value"))

🔵 PySpark – Advanced (Null-safe + rename)


df = (sales_df
      .filter(F.col("metric") == "revenue")
      .groupBy("region")
      .pivot("product", ["Laptop", "Mobile"])
      .agg(F.sum("value"))
      .fillna(0))

2️⃣ Pivot Multiple Metrics (Revenue & Quantity)

🟢 SQL – Basic


SELECT
    region,
    product,
    SUM(CASE WHEN metric = 'revenue' THEN value END) AS revenue,
    SUM(CASE WHEN metric = 'quantity' THEN value END) AS quantity
FROM sales
GROUP BY region, product;

🔵 SQL – Advanced (Double Pivot Pattern)


SELECT *
FROM (
    SELECT region, product, metric, value
    FROM sales
) src
PIVOT (
    SUM(value)
    FOR metric IN ('revenue', 'quantity')
);

📌 Very common BI interview question

🟢 PySpark – Basic


df = (sales_df
      .groupBy("region", "product")
      .pivot("metric")
      .sum("value"))

🔵 PySpark – Advanced


df = (sales_df
      .groupBy("region", "product")
      .pivot("metric", ["revenue", "quantity"])
      .agg(F.sum("value")))

3️⃣ Monthly Pivot (Rows → Month Columns)

🟢 SQL – Basic


SELECT
    product,
    SUM(CASE WHEN MONTH(order_date) = 1 THEN value END) AS jan,
    SUM(CASE WHEN MONTH(order_date) = 2 THEN value END) AS feb
FROM sales
WHERE metric = 'revenue'
GROUP BY product;

🔵 SQL – Advanced


SELECT *
FROM (
    SELECT product,
           FORMAT(order_date, 'MMM') AS month,
           value
    FROM sales
    WHERE metric = 'revenue'
) src
PIVOT (
    SUM(value)
    FOR month IN ('Jan', 'Feb', 'Mar')
);

🟢 PySpark – Basic


df = (sales_df
      .withColumn("month", F.month("order_date"))
      .groupBy("product")
      .pivot("month")
      .sum("value"))

🔵 PySpark – Advanced


df = (sales_df
      .withColumn("month", F.date_format("order_date", "MMM"))
      .groupBy("product")
      .pivot("month")
      .agg(F.sum("value")))

4️⃣ Unpivot Columns Back to Rows (Very Common!)

🟢 SQL – Basic (UNION ALL)


SELECT product, 'Jan' AS month, jan AS revenue FROM sales_summary
UNION ALL
SELECT product, 'Feb', feb FROM sales_summary;

🔵 SQL – Advanced (UNPIVOT)


SELECT product, month, revenue
FROM sales_summary
UNPIVOT (
    revenue FOR month IN (jan, feb, mar)
);

📌 Senior-level question

🟢 PySpark – Basic


df = sales_summary_df.selectExpr(
    "product",
    "stack(3, 'Jan', jan, 'Feb', feb, 'Mar', mar) as (month, revenue)"
)

🔵 PySpark – Advanced (Explode)


df = sales_summary_df.select(
    "product",
    F.explode(
        F.array(
            F.struct(F.lit("Jan").alias("month"), F.col("jan").alias("revenue")),
            F.struct(F.lit("Feb"), F.col("feb")),
            F.struct(F.lit("Mar"), F.col("mar"))
        )
    ).alias("kv")
).select("product", "kv.*")

5️⃣ Dynamic Pivot (Unknown Values) – Tricky Interview Question

🔵 SQL – Conceptual

SQL requires dynamic SQL for unknown pivot columns.

🔵 PySpark – Advanced (Dynamic)


pivot_cols = [row[0] for row in sales_df.select("product").distinct().collect()]

df = (sales_df
      .filter("metric = 'revenue'")
      .groupBy("region")
      .pivot("product", pivot_cols)
      .sum("value"))

📌 Very strong Spark interview answer

🔥 Interview Traps & Talking Points

Topic	Interview Insight
CASE vs PIVOT	CASE works everywhere
Dynamic pivot	Easy in Spark, hard in SQL
NULL handling	Always `COALESCE / fillna`
Performance	Pivot = shuffle-heavy
Wide tables	Bad for analytics
Unpivot	Needed before ML/BI

🧠 When to Pivot vs Not Pivot

❌ Avoid pivot in raw / bronze layers
✅ Use pivot in gold / reporting layers
✅ Unpivot before ML / analytics

Search This Blog

ShaikBlog

DE Interview: Pivot / Unpivot

🔹 Assumed Interview Dataset

`sales`

1️⃣ Pivot Product Rows into Columns (Total Revenue)

🟢 SQL – Basic (CASE WHEN)

🔵 SQL – Advanced (PIVOT)

🟢 PySpark – Basic

🔵 PySpark – Advanced (Null-safe + rename)

2️⃣ Pivot Multiple Metrics (Revenue & Quantity)

🟢 SQL – Basic

🔵 SQL – Advanced (Double Pivot Pattern)

🟢 PySpark – Basic

🔵 PySpark – Advanced

3️⃣ Monthly Pivot (Rows → Month Columns)

🟢 SQL – Basic

🔵 SQL – Advanced

🟢 PySpark – Basic

🔵 PySpark – Advanced

4️⃣ Unpivot Columns Back to Rows (Very Common!)

🟢 SQL – Basic (UNION ALL)

🔵 SQL – Advanced (UNPIVOT)

🟢 PySpark – Basic

🔵 PySpark – Advanced (Explode)

5️⃣ Dynamic Pivot (Unknown Values) – Tricky Interview Question

🔵 SQL – Conceptual

🔵 PySpark – Advanced (Dynamic)

🔥 Interview Traps & Talking Points

🧠 When to Pivot vs Not Pivot

Comments

Post a Comment

Popular posts from this blog

Revenue Cycle Management- HealthCare

SyBase Database Migration to SQL Server

Basics of US Healthcare -Medical Billing