For 4–10 Years Experienced Data Engineers

Objective

This guide is designed to:

Build strong Azure Data Factory fundamentals
Understand enterprise ETL orchestration
Learn real-time pipeline implementations
Master integration and automation concepts
Prepare for senior-level interviews
Build scalable cloud data engineering mindset

Target Audience:

Data Engineers
Azure Data Engineers
ETL Developers
Cloud Data Platform Engineers
Integration Engineers

Daily Time Commitment:

3 Hours Per Day
15 Days Total

Learning Strategy:

20% Theory
80% Hands-On Practice

Goal:

Build enterprise-grade ETL pipelines
Understand orchestration deeply
Handle production workflows
Implement incremental processing
Build scalable cloud integrations

Daily Learning Structure

Hour 1 – Learn Concepts

Focus on:

Understanding WHY ADF exists
Pipeline orchestration concepts
Integration architecture
Real-time enterprise use cases

Avoid:

Memorizing UI clicks blindly
Watching endless tutorials

Hour 2 – Hands-On Development

Focus on:

Building pipelines
Configuring linked services
Creating dynamic ETL workflows
Parameterization

Hour 3 – Real-Time Scenarios

Focus on:

Pipeline failures
Monitoring
Incremental loads
Optimization
Debugging
Enterprise architecture understanding

SECTION 1 – AZURE DATA FACTORY BASICS

Topics:

What is ADF
Why ADF
ETL vs ELT
ADF Components
Data Integration Concepts

WHAT IS ADF

Azure Data Factory is a cloud-based data integration and orchestration service.

Purpose:

Build ETL pipelines
Move data between systems
Orchestrate workflows
Automate data processing

WHY ADF IS USED

Problems ADF Solves:

Complex data movement
ETL orchestration
Cloud integration
Scheduling workflows
Incremental processing
Enterprise automation

ETL VS ELT

Critical Interview Topic.

Understand:

ETL processing
ELT processing
Transformation location
Cloud optimization

ADF COMPONENTS

Topics:

Pipelines
Activities
Datasets
Linked Services
Integration Runtime
Triggers
Data Flows

SECTION 2 – ADF ARCHITECTURE

Topics:

Control Flow
Data Flow
Integration Runtime
Linked Services
Datasets

CONTROL FLOW

Purpose:
Pipeline orchestration.

Activities:

Execute pipeline
If condition
Switch
ForEach
Until
Wait

DATA FLOW

Purpose:
Transformation logic.

Use Cases:

Cleansing
Aggregations
Joins
Derived columns

INTEGRATION RUNTIME (IR)

Critical Topic.

Types:

Azure IR
Self-hosted IR
Azure SSIS IR

Purpose:

Data movement
Compute execution
Connectivity

LINKED SERVICES

Purpose:
Connection management.

Examples:

Azure SQL
Blob Storage
ADLS
SQL Server
REST APIs
Databricks

DATASETS

Purpose:
Represent data structures.

Examples:

CSV files
JSON files
Database tables

SECTION 3 – PIPELINES

Topics:

Pipeline creation
Activities
Dependency management
Parameterization
Variables

PIPELINES

Purpose:
Logical grouping of activities.

Real-Time Usage:

ETL orchestration
Batch processing
Data synchronization

ACTIVITIES

Most Important Topic.

Types:

Copy activity
Lookup activity
Stored procedure activity
Execute pipeline
Web activity
Databricks notebook activity
Data flow activity

COPY ACTIVITY

Critical Interview Topic.

Purpose:
Move data between systems.

Practice:

SQL to Blob
Blob to SQL
API to ADLS
CSV to Parquet

LOOKUP ACTIVITY

Purpose:
Retrieve metadata/configuration.

Real-Time Usage:

Dynamic pipelines
Config-driven ETL

STORED PROCEDURE ACTIVITY

Purpose:
Execute SQL procedures.

Use Cases:

Logging
Post-processing
Data validation

EXECUTE PIPELINE ACTIVITY

Purpose:
Parent-child orchestration.

Benefits:

Modular pipeline design
Reusability

WEB ACTIVITY

Purpose:
Call REST APIs.

Use Cases:

Trigger APIs
Notifications
External integrations

DATABRICKS NOTEBOOK ACTIVITY

Purpose:
Trigger Databricks notebooks.

Use Cases:

PySpark transformations
Delta processing
Advanced ETL logic

SECTION 4 – PARAMETERIZATION AND DYNAMIC CONTENT

Topics:

Parameters
Variables
Expressions
Dynamic content

PARAMETERS

Purpose:
Reusable pipelines.

Practice:

Dynamic file paths
Dynamic table names
Environment handling

VARIABLES

Purpose:
Store runtime values.

Practice:

Counters
Status tracking
Dynamic processing

EXPRESSIONS

Critical Topic.

Functions:

concat
substring
utcNow
pipeline
activity
replace

Practice:

Dynamic file generation
Timestamp creation

SECTION 5 – CONTROL FLOW ACTIVITIES

Topics:

If condition
Switch
ForEach
Until
Wait

FOREACH ACTIVITY

Critical Topic.

Purpose:
Loop processing.

Real-Time Usage:

Process multiple files
Dynamic table loads

IF CONDITION

Purpose:
Conditional execution.

Use Cases:

Success/failure handling
Validation logic

UNTIL ACTIVITY

Purpose:
Loop until condition met.

Use Cases:

Polling APIs
Monitoring jobs

SECTION 6 – DATA FLOWS

Topics:

Mapping Data Flow
Source
Sink
Derived column
Aggregate
Join
Filter
Window

DATA FLOWS

Purpose:
Graphical transformations.

Real-Time Usage:

ETL transformations
Cleansing
Aggregations

DERIVED COLUMN

Purpose:
Create calculated fields.

AGGREGATE

Purpose:
Summarize data.

JOIN TRANSFORMATIONS

Topics:

Inner join
Left join
Exists

WINDOW TRANSFORMATIONS

Use Cases:

Ranking
Running totals
Deduplication

SECTION 7 – INCREMENTAL LOADS

Topics:

Watermarking
CDC
Delta loads
Upserts

WATERMARKING

Critical Topic.

Purpose:
Load only changed data.

Practice:

Timestamp tracking
Last successful load logic

CDC

Purpose:
Capture inserts/updates/deletes.

Real-Time Usage:

Incremental ETL
Audit pipelines

UPSERT LOGIC

Use Cases:

Delta processing
Historical tracking

SECTION 8 – TRIGGERS

Topics:

Schedule trigger
Tumbling window trigger
Event trigger

SCHEDULE TRIGGERS

Purpose:
Run pipelines on schedule.

EVENT TRIGGERS

Purpose:
Trigger on file arrival.

Use Cases:

Real-time ingestion
Landing zone automation

TUMBLING WINDOW TRIGGERS

Purpose:
Time-based dependency processing.

SECTION 9 – MONITORING AND DEBUGGING

Topics:

Monitoring
Debugging
Alerts
Logging
Retry handling

MONITORING

Critical Production Topic.

Learn:

Activity runs
Pipeline runs
Error tracking
Performance monitoring

RETRY POLICIES

Purpose:
Handle transient failures.

Practice:

Retry configuration
Timeout handling

ERROR HANDLING

Topics:

Fail activity
Try-catch pattern
Logging tables

SECTION 10 – SECURITY

Topics:

Managed identities
Key Vault
RBAC
Private endpoints

AZURE KEY VAULT

Critical Topic.

Purpose:
Secure secrets.

Use Cases:

Password storage
API keys
Connection strings

MANAGED IDENTITIES

Purpose:
Secure authentication.

Benefits:

No hardcoded secrets
Secure access

SECTION 11 – PERFORMANCE OPTIMIZATION

Topics:

Parallelism
Partitioning
Batch size
Staging
Compression

PARALLEL COPY

Purpose:
Improve throughput.

Practice:

Parallel file processing
Partitioned loading

STAGING

Purpose:
Improve large data movement.

COMPRESSION

Formats:

gzip
snappy

Purpose:
Reduce storage and transfer cost.

SECTION 12 – REAL-TIME ETL ARCHITECTURE

Topics:

Batch pipelines
Incremental pipelines
Medallion architecture
Orchestration

ENTERPRISE ETL FLOW

Source Systems
↓
Landing Layer
↓
ADF Orchestration
↓
Databricks Processing
↓
Delta Lake
↓
Gold Reporting Layer
↓
Power BI / Analytics

MEDALLION ARCHITECTURE

Layers:

Bronze
Silver
Gold

BRONZE LAYER

Purpose:
Raw ingestion.

SILVER LAYER

Purpose:
Validated and cleansed data.

GOLD LAYER

Purpose:
Business-ready analytics.

SECTION 13 – REAL-TIME PROJECT STRUCTURE

Typical ADF Project Structure:

project/
│
├── pipelines/
│ ├── ingestion_pipeline
│ ├── transformation_pipeline
│ └── reporting_pipeline
│
├── datasets/
│ ├── source_datasets
│ └── sink_datasets
│
├── linked_services/
│ ├── sql_ls
│ ├── blob_ls
│ └── databricks_ls
│
├── triggers/
│ └── daily_trigger
│
├── dataflows/
│ └── cleansing_flow
│
├── config/
│ └── config.json
│
└── documentation/
└── pipeline_design.docx

SECTION 14 – MID-LEVEL PROJECTS

PROJECT 1 – SALES INGESTION PIPELINE

Requirements:

Read CSV files
Load to ADLS
Trigger Databricks notebook
Generate logs

Concepts Used:

Copy activity
Triggers
Dynamic content

PROJECT 2 – INCREMENTAL CUSTOMER PIPELINE

Requirements:

Process changed records only
Maintain watermark table
Trigger Delta merge

Concepts Used:

Watermarking
Stored procedures
Dynamic pipelines

PROJECT 3 – API INGESTION PIPELINE

Requirements:

Call REST API
Store JSON response
Process nested data

Concepts Used:

Web activity
JSON handling
Error handling

PROJECT 4 – MULTI-FILE PROCESSING FRAMEWORK

Requirements:

Process multiple source files
Loop dynamically
Generate success/failure reports

Concepts Used:

ForEach
Lookup
Dynamic parameters

PROJECT 5 – ENTERPRISE ETL ORCHESTRATION

Requirements:

Parent-child pipelines
Databricks integration
Retry handling
Logging framework

Concepts Used:

Execute pipeline
Logging
Monitoring
Error handling

SECTION 15 – ADF INTERVIEW QUESTIONS

BASIC QUESTIONS

What is ADF?
Difference between ETL and ELT.
What are linked services?
What are datasets?
What is Integration Runtime?
Difference between Azure IR and Self-hosted IR.
What are triggers?
What is Copy Activity?
What are pipelines?
What are Data Flows?

INTERMEDIATE QUESTIONS

Explain ADF architecture.
Explain dynamic content.
Explain parameterization.
Explain watermarking.
Explain incremental loads.
Explain event triggers.
Explain retry policies.
Explain ForEach activity.
Explain Databricks integration.
Explain monitoring strategy.

ADVANCED QUESTIONS

Design enterprise ETL orchestration.
Handle millions of files efficiently.
Optimize slow copy activity.
Design incremental CDC pipelines.
Implement metadata-driven framework.
Explain production debugging approach.
Design parent-child orchestration.
Handle pipeline failures gracefully.
Explain secure secret management.
Design scalable cloud ETL architecture.

SECTION 16 – 15-DAY EXECUTION PLAN

WEEK 1 – FOUNDATION

Day 1

ADF basics
Architecture
Components overview

Day 2

Linked services
Datasets
Integration Runtime

Day 3

Pipelines
Activities
Copy activity

Day 4

Parameters
Variables
Dynamic content

Day 5

ForEach
If condition
Control flow

Day 6

Data flows
Aggregations
Joins
Derived columns

Day 7

Mini ETL project

WEEK 2 – ADVANCED ADF

Day 8

Incremental loads
Watermarking
CDC

Day 9

Triggers
Event-based pipelines

Day 10

Monitoring
Logging
Retry handling

Day 11

Security
Key Vault
Managed identities

Day 12

Performance optimization
Parallelism
Partitioning

Day 13

Databricks integration
Enterprise orchestration

Day 14

Mid-level projects

Day 15
FINAL MOCK INTERVIEW + REVISION

REAL-TIME BEST PRACTICES

Always Follow:

Use parameterized pipelines
Avoid hardcoded values
Use Key Vault
Implement logging
Handle failures gracefully
Use modular design
Use metadata-driven frameworks
Implement retries
Use proper naming conventions
Monitor pipelines proactively

MOST IMPORTANT SKILLS FOR SENIOR ENGINEERS

You must become strong in:

Pipeline orchestration
Incremental processing
Cloud integration
Dynamic ETL frameworks
Monitoring and debugging
Security implementation
Databricks integration
Real-time troubleshooting
Scalability thinking
Enterprise architecture understanding

FINAL INTERVIEW EXPECTATIONS

At 4–10 years experience, interviewers expect:

Strong orchestration understanding
Enterprise ETL design capability
Dynamic pipeline implementation knowledge
Incremental loading expertise
Production troubleshooting mindset
Secure integration understanding
Databricks + ADF integration knowledge
Real-time implementation experience

They do NOT expect only UI knowledge.

They expect:

Engineering mindset
Scalable orchestration design
Production-level troubleshooting
Cloud integration understanding
Enterprise ETL architecture capability

END OF DOCUMENT

15-Day Azure Data Factory (ADF) for Data Engineering Master Guide

For 4–10 Years Experienced Data Engineers

Objective

Daily Learning Structure

Hour 1 – Learn Concepts

Hour 2 – Hands-On Development

Hour 3 – Real-Time Scenarios

SECTION 1 – AZURE DATA FACTORY BASICS

WHAT IS ADF

WHY ADF IS USED

ETL VS ELT

ADF COMPONENTS

SECTION 2 – ADF ARCHITECTURE

CONTROL FLOW

DATA FLOW

INTEGRATION RUNTIME (IR)

LINKED SERVICES

DATASETS

SECTION 3 – PIPELINES

PIPELINES

ACTIVITIES

COPY ACTIVITY

LOOKUP ACTIVITY

STORED PROCEDURE ACTIVITY

EXECUTE PIPELINE ACTIVITY

WEB ACTIVITY

DATABRICKS NOTEBOOK ACTIVITY

SECTION 4 – PARAMETERIZATION AND DYNAMIC CONTENT

PARAMETERS

VARIABLES

EXPRESSIONS

SECTION 5 – CONTROL FLOW ACTIVITIES

FOREACH ACTIVITY

IF CONDITION

UNTIL ACTIVITY

SECTION 6 – DATA FLOWS

DATA FLOWS

DERIVED COLUMN

AGGREGATE

JOIN TRANSFORMATIONS

WINDOW TRANSFORMATIONS

SECTION 7 – INCREMENTAL LOADS

WATERMARKING

CDC

UPSERT LOGIC

SECTION 8 – TRIGGERS

SCHEDULE TRIGGERS

EVENT TRIGGERS

TUMBLING WINDOW TRIGGERS

SECTION 9 – MONITORING AND DEBUGGING

MONITORING

RETRY POLICIES

ERROR HANDLING

SECTION 10 – SECURITY

AZURE KEY VAULT

MANAGED IDENTITIES

SECTION 11 – PERFORMANCE OPTIMIZATION

PARALLEL COPY

STAGING

COMPRESSION

SECTION 12 – REAL-TIME ETL ARCHITECTURE

ENTERPRISE ETL FLOW

MEDALLION ARCHITECTURE

BRONZE LAYER

SILVER LAYER

GOLD LAYER

SECTION 13 – REAL-TIME PROJECT STRUCTURE

SECTION 14 – MID-LEVEL PROJECTS

PROJECT 1 – SALES INGESTION PIPELINE

PROJECT 2 – INCREMENTAL CUSTOMER PIPELINE

PROJECT 3 – API INGESTION PIPELINE

PROJECT 4 – MULTI-FILE PROCESSING FRAMEWORK

PROJECT 5 – ENTERPRISE ETL ORCHESTRATION

SECTION 15 – ADF INTERVIEW QUESTIONS

BASIC QUESTIONS

INTERMEDIATE QUESTIONS

ADVANCED QUESTIONS

SECTION 16 – 15-DAY EXECUTION PLAN

WEEK 1 – FOUNDATION

WEEK 2 – ADVANCED ADF