ETL Failures & Monitoring (Azure-specific)

 🔹 How do you handle job failures in an ETL pipeline?

Tools: Azure Data Factory, Azure Monitor, Log Analytics, Application Insights
Approach:

  • Use ADF's built-in alerts for activity failures (email, Logic Apps, Azure Monitor).

  • Enable diagnostic logging to send logs to Log Analytics.

  • Use ADF’s retry policy to auto-retry transient failures.

  • Use custom error handling with IfCondition, Switch, and Fail activities.

Example:
An ADF copy activity failed due to a schema mismatch. The pipeline’s OnFailure path was configured to log the error to a Log Analytics table and notify the team via an Azure Logic App (email + Teams).


🔹 What steps do you take when a pipeline is running slower than expected?

Tools: ADF Integration Runtime, Azure Monitor, Azure Databricks, Synapse
Approach:

  • Analyze ADF activity run times via Monitor tab or exported logs.

  • Check IR (Integration Runtime) performance (self-hosted or Azure-hosted).

  • Optimize Databricks notebooks (use caching, cluster sizing).

  • Partition data in Synapse / Databricks to improve load.

Example:
A pipeline that copied 100 GB to Synapse slowed down. I switched to a higher-performance Azure IR, batched files, and parallelized the copy activity using foreach with concurrency enabled.


🔹 What if a scheduled job didn't trigger?

Tools: ADF Triggers, Azure Monitor Alerts
Approach:

  • Validate trigger status in ADF (enabled, conditions met).

  • Check last trigger run logs.

  • Set alerts using Azure Monitor to notify if a pipeline hasn’t run on schedule.

Example:
A pipeline with a tumbling window trigger missed execution because the previous run failed. I enabled dependency chaining using dependsOn and added logic to catch the first failure.


🔹 How do you manage data pipeline dependencies?

Tools: ADF dependency conditions, Data Factory pipeline orchestration
Approach:

  • Use ADF’s dependency conditions in pipeline triggers.

  • Use Wait, IfCondition, or Until activities to manage flows.

  • Use Azure Event Grid + Logic App for event-driven dependencies.

Example:
A downstream ADF pipeline waits for a file to land in ADLS from an upstream process. I used Event Grid trigger to launch the pipeline on blob creation.


🔹 How do you monitor SLAs for pipelines?

Tools: Azure Monitor, Log Analytics, Application Insights
Approach:

  • Track pipeline execution time using ADF logs in Log Analytics.

  • Create custom Kusto queries to find SLA breaches.

  • Send alerts if a pipeline exceeds expected runtime.

Example:
I created a dashboard in Azure Workbook that tracks all pipeline runs, highlighting those exceeding SLA. Alerts are sent via email and Teams.


💎 Data Quality, Validation & Duplicates (Azure-specific)


🔹 How do you automate data validation during ingestion?

Tools: ADF, Azure Data Lake, Azure DataBricks, Azure Purview
Approach:

  • Use custom validation notebooks in Azure Databricks.

  • In ADF, use Stored Procedure or Data Flow to validate counts, nulls, types.

  • Use Azure Purview for profiling and monitoring data quality rules.

Example:
Post ingestion, I used a Databricks notebook to check for duplicate IDs and nulls in critical columns and pushed the result to a Power BI dashboard.


🔹 What would you do if there’s a mismatch in record counts?

Tools: ADF, SQL Database, Azure Synapse
Approach:

  • Use Lookup and IfCondition in ADF to compare source vs. destination.

  • Trigger alerts or rerun the pipeline if mismatch exceeds threshold.

Example:
In a CDC-based pipeline, I compared source (SQL DB) and sink (Synapse) row counts after ingestion. If mismatch >5%, the pipeline failed intentionally for manual review.


🔹 How do you handle duplicates?

Tools: Azure Databricks, Synapse, Data Flow
Approach:

  • Use Data Flow with Surrogate Key and Window functions for deduplication.

  • In Databricks, use dropDuplicates() or ROW_NUMBER() logic.


💎 Performance & Optimization (Azure-specific)


🔹 How do you handle Spark job memory issues in Azure Databricks?

Tools: Azure Databricks
Approach:

  • Monitor jobs via Spark UI on Databricks.

  • Tune cluster configs: memory, cores, autoscaling.

  • Use persist() and broadcast() wisely.

Example:
A job failed due to OOM during join. I broadcasted the smaller table and repartitioned the larger dataset to resolve memory issues.


🔹 Skewed data in Spark?

Approach:

  • Apply salting: add a random prefix to keys.

  • Repartition before joins.

Example:
For a large sales table joined on customer_id, 10% of records had one customer ID. I salted the key with a mod hash and recombined after join.


💎 Schema & Metadata Management (Azure-specific)


🔹 How do you handle schema evolution in ADLS (Parquet/Avro)?

Tools: Azure Data Lake Gen2, Databricks Delta Lake
Approach:

  • Use Delta Lake with schema evolution (mergeSchema in Databricks).

  • Store schemas in Azure Purview or custom registry.

Example:
Schema changes in a Parquet source were handled via Delta Lake’s automatic schema evolution when writing to Bronze layer.


🔹 How do you manage metadata across tools (e.g., ADF, Synapse)?

Tools: Azure Purview, Azure Data Catalog
Approach:

  • Use Azure Purview for cataloging and lineage tracking.

  • Automate metadata ingestion from Synapse, SQL, ADLS.


💎 Partitioning & Scalability (Azure-specific)


🔹 How do you decide partitioning strategy for a Delta table?

Tools: Delta Lake on Azure Databricks
Approach:

  • Partition by date or region based on query pattern.

  • Avoid high-cardinality columns.

  • Periodically OPTIMIZE with Z-ORDER.


🔹 What if ingestion from a third-party API fails?

Tools: ADF Web Activity, REST Connector, Logic Apps
Approach:

  • Use retry policy on API activity in ADF.

  • Log failure to Azure Monitor.

  • Add Until or conditional looping.


🔹 Handle malformed files in ADLS?

Tools: ADF, Databricks
Approach:

  • Enable faultTolerance mode in ADF’s Data Flow.

  • Use badRecordsPath in Spark to capture corrupted records.


🔹 Late-arriving data in streaming jobs?

Tools: Azure Event Hubs + Azure Databricks Structured Streaming
Approach:

  • Use watermarks and windowed aggregations.

  • Write late events to separate table or reprocess batch-wise.


💎 Consistency, Security, & Recovery (Azure-specific)


🔹 How do you resolve data consistency issues?

Tools: Azure SQL, Synapse, Data Factory
Approach:

  • Use hash comparison between source and sink.

  • Use CDC or snapshots for delta reconciliation.


🔹 What if your pipeline deletes/overwrites production data?

Approach:

  • Implement soft deletes or backups in Synapse.

  • Version your data with Delta Lake time travel.

  • Use ADF's global parameters to separate environments and avoid accidental prod writes.


🔹 How do you secure sensitive data like PAN/Aadhaar?

Tools: Azure Key Vault, Data Masking, Encryption
Approach:

  • Store secrets in Azure Key Vault.

  • Encrypt data at rest using Azure Storage Encryption.

  • Apply Dynamic Data Masking in SQL/Synapse.


🔹 How do you backfill historical data?

Tools: ADF, Databricks, Azure SQL
Approach:

  • Use parameterized pipelines for date ranges.

  • Load in chunks (e.g., monthly).

  • Use IfCondition to skip already-loaded data.

Example:
To backfill 3 years of logs into Synapse, I created a parameterized ADF pipeline that took start_date and end_date, looping with a ForEach activity.

Comments

Popular posts from this blog

SyBase Database Migration to SQL Server

Basics of US Healthcare -Medical Billing