DataENgineering Interview: Ultra Hard (CTE + Window + GROUP BY mix)
🔥 ULTRA HARD – Pattern 1 Mixed With CTE + Window + GROUP BY
Level: Senior Data Engineer (8+ years)
These are the kinds of questions asked at product companies, fintech, FAANG-level interviews.
These problems mix:
-
GROUP BY
-
HAVING
-
CTEs
-
Window functions
-
Multi-step aggregations
-
Global vs partition comparisons
-
Ratio & ranking logic
We’ll use 4 realistic datasets.
🧩 DATASET 1: BANKING – Fraud Signal Detection
Table: transactions
🔥 Q1. Find accounts whose total debit is above the 90th percentile of total debit across all accounts
✅ SQL (CTE + Window + GROUP BY)
✅ PySpark Advanced
🧩 DATASET 2: HR – Compensation Outlier Detection
Table: employees
🔥 Q2. Find employees earning more than department average AND above overall average
✅ SQL
✅ PySpark Advanced
🧩 DATASET 3: E-COMMERCE – Power Users
Table: orders
🔥 Q3. Find customers whose monthly spend is consistently above monthly average for at least 3 months
✅ SQL
✅ PySpark Advanced
🧩 DATASET 4: Platform Logs – Heavy Users
Table: user_sessions
🔥 Q4. Find users whose 7-day average session time exceeds global daily average
✅ SQL (Window + CTE + GROUP BY)
🧩 DATASET 5: Payments – Merchant Risk Tiering
Table: payments
🔥 Q5. Classify merchants into HIGH / MEDIUM / LOW risk based on failure ratio percentile
✅ SQL
Comments
Post a Comment