Posts

Showing posts from June, 2025

SQL Technical Questions

SQL interviews can be tricky! Here are some thought-provoking questions that tested my knowledge: 1️⃣ How do you calculate the median salary in a department using SQL? 2️⃣ What’s the difference between correlated and non-correlated subqueries? 3️⃣ How can you update data in one table based on values from another? 4️⃣ What are window functions and how are they different from GROUP BY? 5️⃣ How do you find gaps in a sequence (like missing invoice numbers)? 6️⃣ How would you delete every nth row from a table? 7️⃣ How can you transpose columns into rows (unpivot) in SQL? 8️⃣ What’s the purpose of the COALESCE function, and how is it different from ISNULL or NVL? 9️⃣ How do you calculate the difference in days, months, or years between two dates? šŸ”Ÿ How do you write a SQL query to show cumulative percentage of sales per region? 1️⃣1️⃣ How do you rank items within partitions based on a custom sorting logic? 1️⃣2️⃣ What’s the difference between EXISTS and IN? 1️⃣3️⃣ How can you det...

PySpark Interview Questions with Detailed Answers

  Spark Internals & Execution 1. Explain the full lifecycle of a Spark job — from code to DAG to task execution. Definition: The Spark job lifecycle begins when the driver program initiates an action (e.g., .collect() , .count() ). Here's a step-by-step breakdown: User Code: You write transformations and actions in PySpark. Logical Plan: Spark creates a logical plan based on your transformations. Optimization: Catalyst optimizer optimizes the logical plan. Physical Plan: Spark creates a physical plan with stages. DAG (Directed Acyclic Graph): Stages are translated into a DAG of tasks. Task Scheduling: Tasks are distributed across executors. Execution: Executors run the tasks in parallel. Healthcare Example: In processing electronic health records (EHR), you may filter patients by diagnosis, join with lab results, and then aggregate. Spark translates this pipeline into optimized stages and tasks. 2. What is the difference between transformations and actions? Why are tra...

Must-Practice Azure Data Factory (ADF) Scenarios for 2025 Interviews

  Real-world examples, optimal solutions, and expert insights for Azure Data Engineer candidates. šŸ” 1. Dynamic Pipelines Q: How would you design a single pipeline to process multiple country files from different folders in ADLS? Answer: Use parameterization and ForEach activity to loop through country folders dynamically. Leverage Get Metadata and Lookup to fetch file names and directory structure, then pass those values into a single generic Copy Activity. Example: Imagine folders like /data/US/ , /data/IN/ , /data/UK/ —all with daily CSVs. Create a parameter for CountryName . Use a Lookup to fetch the list of countries (e.g., from a control table or config file). Use ForEach to iterate and process files using a dataset parameterized by @pipeline().parameters.CountryName .  2. Schema Drift Handling Q: What if incoming files have columns frequently added or removed? Answer: Enable Schema Drift in ADF’s mapping data flows. Use the Auto-Mapping feat...

ADVANCED SQL + DELTA LAKE INTERVIEW SET

 interview ( 2+ YOE )  Difficulty level = Intermediate 1.⁠ ⁠Write a SQL query using Delta tables to compute cumulative revenue per day 2.⁠ ⁠Retrieve the first and last login per user from a huge event log table 3.⁠ ⁠Find the top 3 customers by spend in each region (use window functions) 4.⁠ ⁠Detect duplicate records in a transactional table and delete extras safely 5.⁠ ⁠Get users who made purchases in 3 consecutive months 6.⁠ ⁠Identify skewed joins in SQL and propose fixes using broadcast hints 7.⁠ ⁠Compute a 7-day moving average of product sales on Delta Lake 8.⁠ ⁠Pivot daily sales into month-wise columns 9.⁠ ⁠Find customers who bought products every month in a year 10.⁠ ⁠Rank products by sales per year, resetting rank each year 11.⁠ ⁠Find employees earning more than their department average 12.⁠ ⁠Find the median transaction amount (no built-in median) 13.⁠ ⁠Get all users who placed their first order in the last 30 days 14.⁠ ⁠Compare price change between two dates for each pr...

Amazon and Microsoft asked these SQL questions in their interview.

  1/ Find the second highest salary without using LIMIT, OFFSET, or TOP. 2/ Identify customers who made transactions in every month of the year. 3/ Calculate the 7-day moving average of sales for each product. 4/ Find users who logged in for at least three consecutive days. 5/ Use window functions to rank orders by order value per customer and return the top 3. 6/ Retrieve employees who earn more than their managers. 7/ Find duplicate rows in a large table and delete only the extras. 8/ Optimize a slow-performing query with multiple joins and aggregations. 9/ Get the first order for each customer, handling tie-breakers properly. 10/ Find products never purchased by any customer. 11/ Retrieve users who made purchases in 2 consecutive months but not the 3rd. 12/ Find the department with the highest total salary payout. 13/ Identify employees with the same salary in the same department. 14/ Calculate median salary without using built-in functions. 15/ Write a query to pivot rows into ...