DE Interview: Ranking & Ties
🔹 PATTERN 3: Ranking & Ties
📘 Assumed Table: employees
1️⃣ Basic Ranking (No Partition)
✅ SQL – Basic
📌 Interview must-know
-
ROW_NUMBER→ unique sequence -
RANK→ gaps after ties -
DENSE_RANK→ no gaps
✅ PySpark – Basic
2️⃣ Ranking Within Groups (Department-wise)
✅ SQL – Basic
✅ PySpark – Basic
3️⃣ Top-N per Group (Very Common Interview Pattern)
✅ SQL – Basic (Top 3 salaries per department)
📌 Use DENSE_RANK to include ties
✅ SQL – Advanced (Exact 3 rows per dept)
📌 Use ROW_NUMBER for strict limits
✅ PySpark – Basic
4️⃣ Find Employees With Same Rank (Ties Detection)
✅ SQL – Advanced
✅ SQL – Ranking-based
✅ PySpark
5️⃣ Nth Highest Salary (Classic Trap Question)
✅ SQL – Advanced (3rd highest salary)
📌 RANK ❌ (can skip)
📌 DENSE_RANK ✅
✅ PySpark
6️⃣ Remove Duplicates Using ROW_NUMBER
✅ SQL – Advanced
📌 ROW_NUMBER is the only safe choice
✅ PySpark – Dedup
7️⃣ Rank by Multiple Columns
✅ SQL – Advanced
✅ PySpark
8️⃣ Performance & Architecture (Senior-Level)
🔥 Key Interview Talking Points
-
Ranking functions require shuffle
-
Always:
-
Partition carefully
-
Select only needed columns
-
-
Prefer
ROW_NUMBERwhen ties don’t matter -
Use
DENSE_RANKfor Top-N per group -
Avoid multiple window scans:
🧠 Quick Decision Table (Interview Gold)
| Requirement | Use |
|---|---|
| Unique sequence | ROW_NUMBER |
| Allow gaps | RANK |
| No gaps | DENSE_RANK |
| Top-N per group | DENSE_RANK |
| Deduplication | ROW_NUMBER |
| Nth highest | DENSE_RANK |
Comments
Post a Comment