Posts

Top 50 SQL Questions

  šŸ”„ 1. Basic → Intermediate (Warm-up but important) 1. Find duplicate records in a table 2. Delete duplicate records (keep latest) 3. Find second highest salary 4. Find Nth highest salary 5. Get employees earning more than their manager 6. Count employees in each department 7. Departments with more than 5 employees 8. Get employees who joined in last 30 days 9. Find records with NULL values in specific columns 10. Replace NULL values with default ⚡ 2. Joins & Relationships (Very common) 11. Find customers who never placed an order 12. Find orders without matching customers 13. Self join to find employee-manager hierarchy 14. Find mutual friends (self join problem) 15. Cross join to generate combinations 16. Find missing IDs in a sequence 17. Find unmatched records between two tables 18. Anti-join using NOT EXISTS 19. Compare two tables and find differences 20. Join 3+ tables and aggregate results šŸš€ 3. Window Functions (VERY IMPORTAN...

PySpark Data Skew Handling – Complete Guide

Image
  šŸ”“ 1. Problem Statement: Skewed Aggregation df . groupBy( "user_id" ) . count() ❗ Issue One user_id contains ~40% of total data Spark sends same key → same partition Result: One task becomes extremely heavy Other tasks finish early Straggler problem → slow job 🧠 2. Why Skew Happens Spark distributes data based on keys: user_id = A → goes to one partition user_id = B → another partition If: A = 40% of data Then: Partition for A = huge → bottleneck šŸ” 3. How to Identify Skew (Practical Approach) ✅ Method 1: Distribution Check from pyspark . sql . functions import count df . groupBy( "user_id" ) \ . agg( count ( "*" ) . alias( "cnt" )) \ . orderBy( "cnt" , ascending = False ) \ . show( 10 ) šŸ‘‰ Example Output: user_id cnt A 40,000,000 ← skew B 1,000 C 900 ✅ Method 2: Percentile Analysis df . groupBy( "user_id" ) \ . count() \ . selectExpr( ...

DAX QUERIES

  🧮 DAX QUERIES – COMPLETE INTERVIEW GUIDE (AAS / Power BI) Applies to Azure Analysis Services and Power BI 1️⃣ DAX BASICS (They expect this instantly) ✅ Total Sales Total Sales = SUM ( FactSales[SalesAmount] ) šŸ—£ Say: “This is a simple aggregation evaluated in filter context.” ✅ Total Orders Total Orders = COUNT ( FactSales[OrderID] ) 2️⃣ CALCULATE – MOST IMPORTANT DAX FUNCTION šŸ”„ ✅ Sales for Current Year Sales CY = CALCULATE ( [Total Sales], DimDate[Year] = YEAR ( TODAY() ) ) šŸ—£ Senior explanation: “CALCULATE modifies the filter context before evaluating the measure.” ✅ Sales for a Specific Region Sales US = CALCULATE ( [Total Sales], DimRegion[Country] = "USA" ) 3️⃣ TIME INTELLIGENCE (GUARANTEED QUESTIONS) ✅ Year-to-Date (YTD) Sales YTD = TOTALYTD ( [Total Sales], DimDate[Date] ) ✅ Month-to-Date (MTD) Sales MTD = TOTALMTD ( [Total Sales], DimDate[Date] ) ✅ Previous Year Sales Sales PY = CALCULATE ( [T...

Azure Analysis Services

Image
  šŸ”· Azure Analysis Services (AAS) – TIERS EXPLAINED 1️⃣ What Are AAS Tiers? Azure Analysis Services tiers define: Compute power Memory capacity Concurrent users Query performance Cost šŸ—£ Senior line: “AAS tiers help balance performance, concurrency, and cost based on BI workload.” 2️⃣ Azure Analysis Services Tier Types AAS has two main pricing tiers : Tier Purpose Developer (D) Development / Testing Basic (B) Small production workloads Standard (S) Enterprise production 3️⃣ Developer Tier (D1) Feature Details Use Case Dev / QA SLA ❌ No SLA Scale Fixed Cost Low šŸ—£ Interview explanation: “Developer tier is used for model development and testing, never production.” 4️⃣ Basic Tier (B1–B2) Feature B1 B2 Use Case Small prod Medium prod Memory Low Medium Concurrency Limited Moderate SLA ✅ Yes šŸ—£ When to use: “For limited users and simpler models.” 5️⃣ Standard Tier (S0–S9) ⭐ MOST IMPORTANT Feature Description Use Case Enterprise workloads Me...