DE-interview Questions: Running Totals & Moving Averages
🎯 Running Totals & Moving Averages
— written at 6–8 years experience level, with basic + advanced / optimized approaches.
🔹 Assumed Sample Tables
sales
transactions
1️⃣ Running Total of Daily Sales
✅ SQL – Basic
📌 Interview Tip
Always aggregate before applying window functions when dealing with dates.
✅ SQL – Advanced (Partitioned Running Total)
✅ PySpark – Basic
✅ PySpark – Optimized (Partition + Sort)
2️⃣ Running Balance (Credit / Debit)
✅ SQL – Advanced (Conditional Running Total)
✅ PySpark – Optimized
3️⃣ Excluding Current Row (Previous Running Total)
✅ SQL
📌 Very common interview trap
✅ PySpark
4️⃣ Moving Average (3-Day Rolling)
✅ SQL – Basic
✅ SQL – RANGE (Date-Aware)
📌 ROWS vs RANGE is a must-know interview topic
✅ PySpark – Rolling Average
5️⃣ Moving Average Per Customer
✅ SQL
✅ PySpark
6️⃣ Year-to-Date (YTD) Sales
✅ SQL – Advanced
✅ PySpark
7️⃣ Restart Running Total When Condition Changes
(e.g., status changes)
✅ SQL – Advanced
✅ PySpark
8️⃣ Performance-Optimized Spark Pattern (Very Important)
❌ BAD (multiple window scans)
✅ GOOD (single window spec)
📌 Senior-level optimization insight
Comments
Post a Comment