Posts

Showing posts from February, 2026

DE Interview: Pivot / Unpivot

  complete interview-ready guide for šŸ”¹ PATTERN 13: Pivot / Unpivot šŸŽÆ Core skill: Transform rows ↔ columns Designed for 6–8 years experience , with real interview questions + solutions in: ✅ SQL Basic ✅ SQL Advanced ✅ PySpark Basic ✅ PySpark Advanced / Optimized šŸ”¹ Assumed Interview Dataset sales order_id | order_date | region | product | metric | value ------------------------------------------------------------ 1 | 2024-01-01 | East | Laptop | revenue | 1000 2 | 2024-01-01 | East | Laptop | quantity | 2 3 | 2024-01-01 | West | Mobile | revenue | 800 1️⃣ Pivot Product Rows into Columns (Total Revenue) 🟢 SQL – Basic (CASE WHEN) SELECT region, SUM ( CASE WHEN product = 'Laptop' THEN value END ) AS laptop_revenue, SUM ( CASE WHEN product = 'Mobile' THEN value END ) AS mobile_revenue FROM sales WHERE metric = 'revenue' GROUP BY region; šŸ“Œ Most common interview pivot...

DE Interview: Ranking & Ties

  šŸ”¹ PATTERN 3: Ranking & Ties šŸ“˜ Assumed Table: employees emp_id | emp_name | department | salary | join_date 1️⃣ Basic Ranking (No Partition) ✅ SQL – Basic SELECT emp_id, emp_name, salary, ROW_NUMBER () OVER ( ORDER BY salary DESC ) AS row_num, RANK () OVER ( ORDER BY salary DESC ) AS rank_val, DENSE_RANK () OVER ( ORDER BY salary DESC ) AS dense_rank_val FROM employees; šŸ“Œ Interview must-know ROW_NUMBER → unique sequence RANK → gaps after ties DENSE_RANK → no gaps ✅ PySpark – Basic from pyspark.sql import functions as F from pyspark.sql.window import Window w = Window.orderBy(F.col( "salary" ).desc()) df = employees_df.select( "*" , F.row_number().over(w).alias( "row_num" ), F.rank().over(w).alias( "rank_val" ), F.dense_rank().over(w).alias( "dense_rank_val" ) ) 2️⃣ Ranking Within Groups (Department-wise) ✅ SQL – Basic SELECT emp_i...

DE Interview: Self-Join (Hierarchy)

  a complete, senior-level (6–8 yrs) coverage of šŸ”¹ PATTERN : Self-Join (Hierarchy) šŸŽÆ Core skill: Joining a table to itself šŸ”¹ PATTERN : Self-Join (Hierarchy) šŸ“˜ Assumed Table: employees emp_id | emp_name | manager_id | department | salary manager_id → references emp_id CEO has manager_id = NULL 1️⃣ Employee → Manager Mapping ✅ SQL – Basic SELECT e.emp_id, e.emp_name, m.emp_name AS manager_name FROM employees e LEFT JOIN employees m ON e.manager_id = m.emp_id; šŸ“Œ Basic self-join concept ✅ SQL – Advanced (Include CEO) SELECT e.emp_id, e.emp_name, COALESCE (m.emp_name, 'CEO' ) AS manager_name FROM employees e LEFT JOIN employees m ON e.manager_id = m.emp_id; ✅ PySpark – Basic emp = employees_df.alias( "e" ) mgr = employees_df.alias( "m" ) df = emp.join( mgr, emp.manager_id == mgr.emp_id, "left" ).select( "e.emp_id" , "e.emp_name" , ...

DE-interview Questions: Running Totals & Moving Averages

  šŸŽÆ Running Totals & Moving Averages — written at 6–8 years experience level , with basic + advanced / optimized approaches . šŸ”¹ Assumed Sample Tables sales sale_id | customer_id | product_id | sale_date | amount | status transactions txn_id | account_id | txn_date | txn_type | amount -- credit/debit 1️⃣ Running Total of Daily Sales ✅ SQL – Basic SELECT sale_date, SUM (amount) AS daily_sales, SUM ( SUM (amount)) OVER ( ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_total FROM sales GROUP BY sale_date ORDER BY sale_date; šŸ“Œ Interview Tip Always aggregate before applying window functions when dealing with dates. ✅ SQL – Advanced (Partitioned Running Total) SELECT customer_id, sale_date, SUM (amount) AS daily_sales, SUM ( SUM (amount)) OVER ( PARTITION BY customer_id ORDER BY sale_date ) AS running_total FROM sales GROUP BY customer_i...