Must-Practice Azure Data Factory (ADF) Scenarios for 2025 Interviews

Real-world examples, optimal solutions, and expert insights for Azure Data Engineer candidates.

🔁 1. Dynamic Pipelines

Q: How would you design a single pipeline to process multiple country files from different folders in ADLS?

Answer:
Use parameterization and ForEach activity to loop through country folders dynamically. Leverage Get Metadata and Lookup to fetch file names and directory structure, then pass those values into a single generic Copy Activity.

Example:
Imagine folders like /data/US/, /data/IN/, /data/UK/—all with daily CSVs.

Create a parameter for CountryName.
Use a Lookup to fetch the list of countries (e.g., from a control table or config file).
Use ForEach to iterate and process files using a dataset parameterized by @pipeline().parameters.CountryName.

2. Schema Drift Handling

Q: What if incoming files have columns frequently added or removed?

Answer:
Enable Schema Drift in ADF’s mapping data flows. Use the Auto-Mapping feature to accommodate structural changes without breaking pipelines.

Example:
Ingest dynamic CSVs where some days include optional fields like “PromoCode” or “Discount”.

In your Source transformation, enable schema drift.
Use derived columns or select() to manipulate known fields and ignore unknowns.
Sink can be set to auto-map or to write to a staging table with a flexible schema (like a JSON column or wide table).

3. Incremental Load Design

Q: How do you load only the new records daily without reprocessing everything?

Answer:
Use Watermark Columns or Change Tracking logic. Store the last processed timestamp and query only newer records.

Example:
If a SQL source table has a LastModifiedDate column:

Store the last max timestamp in a metadata table or ADF variable.
Use a parameterized query:

sql

SELECT * FROM Orders WHERE LastModifiedDate > @lastWatermark

Update the watermark value post-load.

4. Retry Mechanism

Q: One activity (e.g., API call) fails sometimes. How do you make it retry automatically?

Answer:
Use the built-in Retry Policy in ADF activities. Set custom retry count and interval.

Example:
If calling a third-party REST API that’s unstable:

Set Retry = 3, Interval = 00:01:00 in the Web activity settings.
Add a failure alert via Webhook or Email if all retries fail.

5. Pipeline Reusability

Q: You’ve built 10 pipelines with similar logic. How would you modularize this?

Answer:
Use ADF Pipeline Templates or Invoke Pipeline (Execute Pipeline) with parameters.

Example:
If multiple pipelines ingest files from different sources but follow the same transformation logic:

Create a Master Pipeline with parameters like SourcePath, TableName.
Each child pipeline becomes a configurable, reusable module.
This reduces maintenance and code duplication.

6. Metadata-Driven Pipelines

Q: How would you load 50+ tables using metadata instead of hardcoding everything?

Answer:
Build a control table in Azure SQL DB or a config file in ADLS with metadata like table name, source path, column mapping, etc. Use Lookup + ForEach to drive pipeline logic.

Example:

Your config table could have: TableName, SourcePath, SinkTable, PrimaryKey, IncrementalColumn.
A single pipeline reads the metadata and processes tables using dynamic datasets and parameterized queries.

7. File Validation Before Load

Q: What if a file is dropped, but it’s incomplete or empty?

Answer:
Implement a Validation Step using Get Metadata, If Condition, and possibly a checksum check.

Example:

Use Get Metadata to check file size. If size = 0, skip processing or trigger an alert.
For partial loads, include a control file with a record count or hash.
Validate before ingestion using expressions like:

json

@if(greater(activity('Get Metadata').output.size, 0), true, false)

8. Trigger Timing Issue

Q: What if your pipeline runs at midnight but the source file isn’t ready until 2AM?

Answer:
Use Event-based triggers or implement a Wait-and-Poll pattern with timeout logic.

Example:

Use Until loop to check every 10 mins if file exists using Get Metadata.
Set timeout after 3 hours with Wait activity to avoid indefinite looping.
Alternatively, configure blob storage event triggers to run the pipeline as soon as file lands.

9. Secure Access

Q: How do you securely store and access credentials for ADF pipelines?

Answer:
Use Azure Key Vault to store secrets (e.g., passwords, API keys, connection strings). Link Key Vault to ADF via managed identity.

Example:

Create a Key Vault secret like SQLPassword.
In ADF Linked Service, choose "Authentication via Key Vault".
Use @Microsoft.KeyVault(...) syntax in parameter fields for runtime resolution.

10. Post-Load Cleanup

Q: Once the file is loaded, how do you delete or archive it in ADLS?

Answer:
Use Delete Activity or Move Activity to archive files post-processing.

Example:

After Copy Activity, use Move Activity to move the file from /landing/ to /archive/yyyy/MM/dd/.
If cleanup is needed, use Delete to remove files older than X days using Get Metadata + Filter.

Pro Tip for 2025 Interviews
Recruiters want to see clarity, adaptability, and problem-solving. Be ready to whiteboard your logic, explain trade-offs, and connect your design to cost-efficiency and monitoring.

Search This Blog

ShaikBlog