ETL Failures & Monitoring (Azure-specific)

🔹 How do you handle job failures in an ETL pipeline?

Tools: Azure Data Factory, Azure Monitor, Log Analytics, Application Insights
Approach:

Use ADF's built-in alerts for activity failures (email, Logic Apps, Azure Monitor).
Enable diagnostic logging to send logs to Log Analytics.
Use ADF’s retry policy to auto-retry transient failures.
Use custom error handling with IfCondition, Switch, and Fail activities.

Example:
An ADF copy activity failed due to a schema mismatch. The pipeline’s OnFailure path was configured to log the error to a Log Analytics table and notify the team via an Azure Logic App (email + Teams).

🔹 What steps do you take when a pipeline is running slower than expected?

Tools: ADF Integration Runtime, Azure Monitor, Azure Databricks, Synapse
Approach:

Analyze ADF activity run times via Monitor tab or exported logs.
Check IR (Integration Runtime) performance (self-hosted or Azure-hosted).
Optimize Databricks notebooks (use caching, cluster sizing).
Partition data in Synapse / Databricks to improve load.

Example:
A pipeline that copied 100 GB to Synapse slowed down. I switched to a higher-performance Azure IR, batched files, and parallelized the copy activity using foreach with concurrency enabled.

🔹 What if a scheduled job didn't trigger?

Tools: ADF Triggers, Azure Monitor Alerts
Approach:

Validate trigger status in ADF (enabled, conditions met).
Check last trigger run logs.
Set alerts using Azure Monitor to notify if a pipeline hasn’t run on schedule.

Example:
A pipeline with a tumbling window trigger missed execution because the previous run failed. I enabled dependency chaining using dependsOn and added logic to catch the first failure.

🔹 How do you manage data pipeline dependencies?

Tools: ADF dependency conditions, Data Factory pipeline orchestration
Approach:

Use ADF’s dependency conditions in pipeline triggers.
Use Wait, IfCondition, or Until activities to manage flows.
Use Azure Event Grid + Logic App for event-driven dependencies.

Example:
A downstream ADF pipeline waits for a file to land in ADLS from an upstream process. I used Event Grid trigger to launch the pipeline on blob creation.

🔹 How do you monitor SLAs for pipelines?

Tools: Azure Monitor, Log Analytics, Application Insights
Approach:

Track pipeline execution time using ADF logs in Log Analytics.
Create custom Kusto queries to find SLA breaches.
Send alerts if a pipeline exceeds expected runtime.

Example:
I created a dashboard in Azure Workbook that tracks all pipeline runs, highlighting those exceeding SLA. Alerts are sent via email and Teams.

💎 Data Quality, Validation & Duplicates (Azure-specific)

🔹 How do you automate data validation during ingestion?

Tools: ADF, Azure Data Lake, Azure DataBricks, Azure Purview
Approach:

Use custom validation notebooks in Azure Databricks.
In ADF, use Stored Procedure or Data Flow to validate counts, nulls, types.
Use Azure Purview for profiling and monitoring data quality rules.

Example:
Post ingestion, I used a Databricks notebook to check for duplicate IDs and nulls in critical columns and pushed the result to a Power BI dashboard.

🔹 What would you do if there’s a mismatch in record counts?

Tools: ADF, SQL Database, Azure Synapse
Approach:

Use Lookup and IfCondition in ADF to compare source vs. destination.
Trigger alerts or rerun the pipeline if mismatch exceeds threshold.

Example:
In a CDC-based pipeline, I compared source (SQL DB) and sink (Synapse) row counts after ingestion. If mismatch >5%, the pipeline failed intentionally for manual review.

🔹 How do you handle duplicates?

Tools: Azure Databricks, Synapse, Data Flow
Approach:

Use Data Flow with Surrogate Key and Window functions for deduplication.
In Databricks, use dropDuplicates() or ROW_NUMBER() logic.

💎 Performance & Optimization (Azure-specific)

🔹 How do you handle Spark job memory issues in Azure Databricks?

Tools: Azure Databricks
Approach:

Monitor jobs via Spark UI on Databricks.
Tune cluster configs: memory, cores, autoscaling.
Use persist() and broadcast() wisely.

Example:
A job failed due to OOM during join. I broadcasted the smaller table and repartitioned the larger dataset to resolve memory issues.

🔹 Skewed data in Spark?

Approach:

Apply salting: add a random prefix to keys.
Repartition before joins.

Example:
For a large sales table joined on customer_id, 10% of records had one customer ID. I salted the key with a mod hash and recombined after join.

💎 Schema & Metadata Management (Azure-specific)

🔹 How do you handle schema evolution in ADLS (Parquet/Avro)?

Tools: Azure Data Lake Gen2, Databricks Delta Lake
Approach:

Use Delta Lake with schema evolution (mergeSchema in Databricks).
Store schemas in Azure Purview or custom registry.

Example:
Schema changes in a Parquet source were handled via Delta Lake’s automatic schema evolution when writing to Bronze layer.

🔹 How do you manage metadata across tools (e.g., ADF, Synapse)?

Tools: Azure Purview, Azure Data Catalog
Approach:

Use Azure Purview for cataloging and lineage tracking.
Automate metadata ingestion from Synapse, SQL, ADLS.

💎 Partitioning & Scalability (Azure-specific)

🔹 How do you decide partitioning strategy for a Delta table?

Tools: Delta Lake on Azure Databricks
Approach:

Partition by date or region based on query pattern.
Avoid high-cardinality columns.
Periodically OPTIMIZE with Z-ORDER.

🔹 What if ingestion from a third-party API fails?

Tools: ADF Web Activity, REST Connector, Logic Apps
Approach:

Use retry policy on API activity in ADF.
Log failure to Azure Monitor.
Add Until or conditional looping.

🔹 Handle malformed files in ADLS?

Tools: ADF, Databricks
Approach:

Enable faultTolerance mode in ADF’s Data Flow.
Use badRecordsPath in Spark to capture corrupted records.

🔹 Late-arriving data in streaming jobs?

Tools: Azure Event Hubs + Azure Databricks Structured Streaming
Approach:

Use watermarks and windowed aggregations.
Write late events to separate table or reprocess batch-wise.

💎 Consistency, Security, & Recovery (Azure-specific)

🔹 How do you resolve data consistency issues?

Tools: Azure SQL, Synapse, Data Factory
Approach:

Use hash comparison between source and sink.
Use CDC or snapshots for delta reconciliation.

🔹 What if your pipeline deletes/overwrites production data?

Approach:

Implement soft deletes or backups in Synapse.
Version your data with Delta Lake time travel.
Use ADF's global parameters to separate environments and avoid accidental prod writes.

🔹 How do you secure sensitive data like PAN/Aadhaar?

Tools: Azure Key Vault, Data Masking, Encryption
Approach:

Store secrets in Azure Key Vault.
Encrypt data at rest using Azure Storage Encryption.
Apply Dynamic Data Masking in SQL/Synapse.

🔹 How do you backfill historical data?

Tools: ADF, Databricks, Azure SQL
Approach:

Use parameterized pipelines for date ranges.
Load in chunks (e.g., monthly).
Use IfCondition to skip already-loaded data.

Example:
To backfill 3 years of logs into Synapse, I created a parameterized ADF pipeline that took start_date and end_date, looping with a ForEach activity.

Search This Blog

ShaikBlog