Must-Practice Azure Data Factory (ADF) Scenarios for 2025 Interviews
Real-world examples, optimal solutions, and expert insights for Azure Data Engineer candidates.
🔁 1. Dynamic Pipelines
Q: How would you design a single pipeline to process multiple country files from different folders in ADLS?
Answer:
Use parameterization and ForEach activity to loop through country folders dynamically. Leverage Get Metadata and Lookup to fetch file names and directory structure, then pass those values into a single generic Copy Activity.
Example:
Imagine folders like /data/US/, /data/IN/, /data/UK/—all with daily CSVs.
-
Create a parameter for
CountryName. -
Use a Lookup to fetch the list of countries (e.g., from a control table or config file).
-
Use
ForEachto iterate and process files using a dataset parameterized by@pipeline().parameters.CountryName.
2. Schema Drift Handling
Q: What if incoming files have columns frequently added or removed?
Answer:
Enable Schema Drift in ADF’s mapping data flows. Use the Auto-Mapping feature to accommodate structural changes without breaking pipelines.
Example:
Ingest dynamic CSVs where some days include optional fields like “PromoCode” or “Discount”.
-
In your Source transformation, enable schema drift.
-
Use derived columns or
select()to manipulate known fields and ignore unknowns. -
Sink can be set to auto-map or to write to a staging table with a flexible schema (like a JSON column or wide table).
3. Incremental Load Design
Q: How do you load only the new records daily without reprocessing everything?
Answer:
Use Watermark Columns or Change Tracking logic. Store the last processed timestamp and query only newer records.
Example:
If a SQL source table has a LastModifiedDate column:
-
Store the last max timestamp in a metadata table or ADF variable.
-
Use a parameterized query:
-
Update the watermark value post-load.
4. Retry Mechanism
Q: One activity (e.g., API call) fails sometimes. How do you make it retry automatically?
Answer:
Use the built-in Retry Policy in ADF activities. Set custom retry count and interval.
Example:
If calling a third-party REST API that’s unstable:
-
Set
Retry = 3,Interval = 00:01:00in the Web activity settings. -
Add a failure alert via Webhook or Email if all retries fail.
5. Pipeline Reusability
Q: You’ve built 10 pipelines with similar logic. How would you modularize this?
Answer:
Use ADF Pipeline Templates or Invoke Pipeline (Execute Pipeline) with parameters.
Example:
If multiple pipelines ingest files from different sources but follow the same transformation logic:
-
Create a Master Pipeline with parameters like
SourcePath,TableName. -
Each child pipeline becomes a configurable, reusable module.
-
This reduces maintenance and code duplication.
6. Metadata-Driven Pipelines
Q: How would you load 50+ tables using metadata instead of hardcoding everything?
Answer:
Build a control table in Azure SQL DB or a config file in ADLS with metadata like table name, source path, column mapping, etc. Use Lookup + ForEach to drive pipeline logic.
Example:
-
Your config table could have:
TableName,SourcePath,SinkTable,PrimaryKey,IncrementalColumn. -
A single pipeline reads the metadata and processes tables using dynamic datasets and parameterized queries.
7. File Validation Before Load
Q: What if a file is dropped, but it’s incomplete or empty?
Answer:
Implement a Validation Step using Get Metadata, If Condition, and possibly a checksum check.
Example:
-
Use
Get Metadatato check file size. If size = 0, skip processing or trigger an alert. -
For partial loads, include a control file with a record count or hash.
-
Validate before ingestion using expressions like:
8. Trigger Timing Issue
Q: What if your pipeline runs at midnight but the source file isn’t ready until 2AM?
Answer:
Use Event-based triggers or implement a Wait-and-Poll pattern with timeout logic.
Example:
-
Use
Untilloop to check every 10 mins if file exists usingGet Metadata. -
Set timeout after 3 hours with
Waitactivity to avoid indefinite looping. -
Alternatively, configure blob storage event triggers to run the pipeline as soon as file lands.
9. Secure Access
Q: How do you securely store and access credentials for ADF pipelines?
Answer:
Use Azure Key Vault to store secrets (e.g., passwords, API keys, connection strings). Link Key Vault to ADF via managed identity.
Example:
-
Create a Key Vault secret like
SQLPassword. -
In ADF Linked Service, choose "Authentication via Key Vault".
-
Use
@Microsoft.KeyVault(...)syntax in parameter fields for runtime resolution.
10. Post-Load Cleanup
Q: Once the file is loaded, how do you delete or archive it in ADLS?
Answer:
Use Delete Activity or Move Activity to archive files post-processing.
Example:
-
After Copy Activity, use
Move Activityto move the file from/landing/to/archive/yyyy/MM/dd/. -
If cleanup is needed, use
Deleteto remove files older than X days usingGet Metadata+Filter.
Pro Tip for 2025 Interviews
Recruiters want to see clarity, adaptability, and problem-solving. Be ready to whiteboard your logic, explain trade-offs, and connect your design to cost-efficiency and monitoring.
Comments
Post a Comment