Must-Practice Azure Data Factory (ADF) Scenarios for 2025 Interviews

 Real-world examples, optimal solutions, and expert insights for Azure Data Engineer candidates.


🔁 1. Dynamic Pipelines

Q: How would you design a single pipeline to process multiple country files from different folders in ADLS?

Answer:
Use parameterization and ForEach activity to loop through country folders dynamically. Leverage Get Metadata and Lookup to fetch file names and directory structure, then pass those values into a single generic Copy Activity.

Example:
Imagine folders like /data/US/, /data/IN/, /data/UK/—all with daily CSVs.

  • Create a parameter for CountryName.

  • Use a Lookup to fetch the list of countries (e.g., from a control table or config file).

  • Use ForEach to iterate and process files using a dataset parameterized by @pipeline().parameters.CountryName.


 2. Schema Drift Handling

Q: What if incoming files have columns frequently added or removed?

Answer:
Enable Schema Drift in ADF’s mapping data flows. Use the Auto-Mapping feature to accommodate structural changes without breaking pipelines.

Example:
Ingest dynamic CSVs where some days include optional fields like “PromoCode” or “Discount”.

  • In your Source transformation, enable schema drift.

  • Use derived columns or select() to manipulate known fields and ignore unknowns.

  • Sink can be set to auto-map or to write to a staging table with a flexible schema (like a JSON column or wide table).


 3. Incremental Load Design

Q: How do you load only the new records daily without reprocessing everything?

Answer:
Use Watermark Columns or Change Tracking logic. Store the last processed timestamp and query only newer records.

Example:
If a SQL source table has a LastModifiedDate column:

  • Store the last max timestamp in a metadata table or ADF variable.

  • Use a parameterized query:

sql

SELECT * FROM Orders WHERE LastModifiedDate > @lastWatermark
  • Update the watermark value post-load.


 4. Retry Mechanism

Q: One activity (e.g., API call) fails sometimes. How do you make it retry automatically?

Answer:
Use the built-in Retry Policy in ADF activities. Set custom retry count and interval.

Example:
If calling a third-party REST API that’s unstable:

  • Set Retry = 3, Interval = 00:01:00 in the Web activity settings.

  • Add a failure alert via Webhook or Email if all retries fail.


 5. Pipeline Reusability

Q: You’ve built 10 pipelines with similar logic. How would you modularize this?

Answer:
Use ADF Pipeline Templates or Invoke Pipeline (Execute Pipeline) with parameters.

Example:
If multiple pipelines ingest files from different sources but follow the same transformation logic:

  • Create a Master Pipeline with parameters like SourcePath, TableName.

  • Each child pipeline becomes a configurable, reusable module.

  • This reduces maintenance and code duplication.


 6. Metadata-Driven Pipelines

Q: How would you load 50+ tables using metadata instead of hardcoding everything?

Answer:
Build a control table in Azure SQL DB or a config file in ADLS with metadata like table name, source path, column mapping, etc. Use Lookup + ForEach to drive pipeline logic.

Example:

  • Your config table could have: TableName, SourcePath, SinkTable, PrimaryKey, IncrementalColumn.

  • A single pipeline reads the metadata and processes tables using dynamic datasets and parameterized queries.


7. File Validation Before Load

Q: What if a file is dropped, but it’s incomplete or empty?

Answer:
Implement a Validation Step using Get Metadata, If Condition, and possibly a checksum check.

Example:

  • Use Get Metadata to check file size. If size = 0, skip processing or trigger an alert.

  • For partial loads, include a control file with a record count or hash.

  • Validate before ingestion using expressions like:

json

@if(greater(activity('Get Metadata').output.size, 0), true, false)

 8. Trigger Timing Issue

Q: What if your pipeline runs at midnight but the source file isn’t ready until 2AM?

Answer:
Use Event-based triggers or implement a Wait-and-Poll pattern with timeout logic.

Example:

  • Use Until loop to check every 10 mins if file exists using Get Metadata.

  • Set timeout after 3 hours with Wait activity to avoid indefinite looping.

  • Alternatively, configure blob storage event triggers to run the pipeline as soon as file lands.


 9. Secure Access

Q: How do you securely store and access credentials for ADF pipelines?

Answer:
Use Azure Key Vault to store secrets (e.g., passwords, API keys, connection strings). Link Key Vault to ADF via managed identity.

Example:

  • Create a Key Vault secret like SQLPassword.

  • In ADF Linked Service, choose "Authentication via Key Vault".

  • Use @Microsoft.KeyVault(...) syntax in parameter fields for runtime resolution.


 10. Post-Load Cleanup

Q: Once the file is loaded, how do you delete or archive it in ADLS?

Answer:
Use Delete Activity or Move Activity to archive files post-processing.

Example:

  • After Copy Activity, use Move Activity to move the file from /landing/ to /archive/yyyy/MM/dd/.

  • If cleanup is needed, use Delete to remove files older than X days using Get Metadata + Filter.


Pro Tip for 2025 Interviews
Recruiters want to see clarity, adaptability, and problem-solving. Be ready to whiteboard your logic, explain trade-offs, and connect your design to cost-efficiency and monitoring.

Comments

Popular posts from this blog

SyBase Database Migration to SQL Server

Basics of US Healthcare -Medical Billing