#Databricks Data Engineer Associate

Databricks Data Engineer Associate focuses on building scalable data pipelines using Apache Spark and Delta Lake on Databricks. It equips you to handle ETL, data transformations, and performance optimization in cloud environments, leading to roles like Data Engineer, Spark Developer, and Cloud Data Engineer.

✅Azure Fundamentals & Core Data Services
✅Cloud, Big Data, ETL & DWH
✅ Data Warehouse : Synapse, Spark
✅ETL : ADF, Databricks, ASA Jobs
✅Data Lake, Delta Lake (DLT), Unity Catalog
✅Python, PySpark, Scala, IoT
✅Logic Apps, Azure Functions, ME
✅End-to-End Project Execution & Migration

<br />

Module 1: SQL Server TSQL (MS SQL) Queries

Ch 1: Databricks Job Roles

Introduction to Data
Data Analyst Job Roles
Data Analyst Job Roles

Ch 2: Database Intro & Installations

Database Types (OLTP, DWH, ..)
DBMS: Basics
SQL Server 2025 Installations
SSMS Tool Installation
Server Connections, Authentications

Ch 3: SQL Basics V1 (Commands)

Creating Databases (GUI)
Creating Tables, Columns (GUI)
SQL Basics (DDL, DML, etc..)
Creating Databases, Tables
Data Inserts (GUI, SQL)
Basic SELECT Queries

Ch 4: SQL Basics V2 (Commands, Operators)

DDL : Create, Alter, Drop, Add, modify, etc..
DML: Insert, Update, Delete, select into, etc..
DQL: Fetch, Insert… Select, etc..
SQL Operations: LIKE, BETWEEN, IN, etc..

Ch 5: Data Types

Integer Data Types
Character, MAX Data Types
Decimal & Money Data Types
Boolean & Binary Data Types
Date and Time Data Types
SQL_Variant Type, Variables

Ch 6: Excel Data Imports

Data Imports with Excel
SQL Native Client
Order By: Asc, Desc
Order By with WHERE
TOP & OFFSET
UNION, UNION ALL

Ch 7: Schemas & Batches

Schemas: Creation, Usage
Schemas & Table Grouping
Real-world Banking Database
2 Part, 3 Part & 4 Part Naming
Batch Concept & “Go” Command

Ch 8: Constraints, Keys & RDBMS – Level 1

Null, Not Null Constraints
Unique Key Constraint
Primary Key Constraint
Foreign Key & References
Default Constraint & Usage
DB Diagrams & ER Models

Ch 9: Normal Forms & RDBMS – Level 2

Normal Forms: 1 NF, 2 NF
3 NF, BCNF and 4 NF
Adding PK to Tables
Adding FK to Tables
Cascading Keys
Self Referencing Keys
Database Diagrams

Ch 10: Joins & Queries

Joins: Table Comparisons
Inner Joins & Matching Data
Outer Joins: LEFT, RIGHT
Full Outer Joins & Aliases
Cross Join & Table Combination
Joining more than 2 tables

Ch 11: Views & RLS

Views: Realtime Usage
Storing SELECT in Views
DML, SELECT with Views
RLS: Row Level Security
WITH CHECK OPTION
Important System Views

Ch 12: Stored Procedures

Stored Procedures: Realtime Use
Parameters Concept with SPs
Procedures with SELECT
System Stored Procedures
Metadata Access with SPs
SP Recompilations
Stored Procedures, Tuning

Ch 13: User Defined Functions

Using Functions in MSSQL
Scalar Functions in Real-world
Inline & Multiline Functions
Parameterized Queries
Date & Time Functions
String Functions & Queries
Aggregated Functions & Usage

Ch 14: Triggers & Automations

Need for Triggers in Real-world
DDL & DML Triggers
For / After Triggers
Instead Of Triggers
Memory Tables with Triggers
Disabling DMLs & Triggers

Ch 15: Transactions & ACID

Transaction Concepts in OLTP
Auto Commit Transaction
Explicit Transactions
COMMIT, ROLLBACK
Checkpoint & Logging
Lock Hints & Query Blockin
READPAST, LOCKHINT

Ch 16: CTEs & Tuning

Common Table Expression
Creating and Using CTEs
CTEs, In-Memory Processing
Using CTEs for DML Operations
Using CTEs for Tuning
CTEs: Duplicate Row Deletion

Ch 17: Indexes Basics, Tuning

Indexes & Tuning
Clustered Index, Primary Key
Non Clustered Index & Unique
Creating Indexes Manually
Composite Keys, Query Optimizer
Composite Indexes & Usage

Ch 18: Group By Queries

Group By, Distinct Keywords
GROUP BY, HAVING
Cube( ) and Rollup( )
Sub Totals & Grand Totals
Grouping( ) & Usage
Group By with UNION
Group By with UNION ALL

Ch 19: Joins with Group By

Joins with Group By
3 Table, 4 Table Joins
Join Queries with Aliases
Join Queries & WHERE
Join Queries & Group By
Joins with Sub Queries
Query Execution Order

Ch 20: Sub Queries

Sub Queries Concept
Sub Queries & Aggregations
Joins with Sub Queries
Sub Queries with Aliases
Sub Queries, Joins, Where
Correlated Queries

Ch 21: Cursors & Fetch

Cursors: Realtime Usage
Local & Global Cursors
Scroll & Forward Only Cursors
Static & Dynamic Cursors
Fetch, Absolute Cursors

Ch 22: Window Functions, CASE

IIF Function and Usage
CASE Statement Usage
Window Functions (Rank)
Row_Number( )
Rank( ), DenseRank( )
Partition By & Order By

Ch 23: Merge(Upsert) & CASE, IIF

Merge Statement
Upsert Operations with Merge
Matched and Not Matched
IIF & CASE Statements
Merge Statement inside SPs
Merge with OLTP & DWH

Module 2: Databricks

Ch 1: Databricks Introduction

Cloud ETL, DWH
Cloud Computing
Databricks Concepts
Databricks Advantages
Databricks Key Features
Big Data in Cloud
Databricks Account

Ch 2: Databricks Architecture

Unified Cloud Platform
Unity Catalog
Apache Spark
LakeHouse (Cloud)
Volumes, Files & Tables
Control Pane, Compute Pane
Deployment Modes
Cloud Providers: Azure/AWS/Google
Azure Cloud: Advantages
Databricks Runtime (DBR)
RDD & DAG Components
Databricks One: Hadoop, Map Reduce

Ch 3: Spark Cluster Architecture (Cloud Computing)

Spark Components
Apache Spark Clusters
Cloud Computing Concepts
Classic Cluster Types
Serverless Clusters
Compute Operations
Apache Spark Ecosystem
Drive Node, Worker Node
Cluster Manager & Executors

Ch 4: Unity Catalog

Unity Catalog Concepts
Region, Properties
Databricks Workspace UI
Organizing Workspace Objects
File Uploads
Spark Table Creations
Creating Volumes
UI: Limitations

Ch 5: SparkSQL – 1

Spark SQL Notebooks
Creating Schemas, Tables
Spark Data Types
Data Partitioning
Managed Tables
SQL Queries with the PySpark API
Union, Views in Spark
Dropping Objects

Ch 6: Spark SQL – 2

Spark Joins
Aggregations
Math, Sort Functions
String, DateTime Functions
Conditional Statements
SQL Expressions with expr()
Spark SQL Aggregations

Ch 7: Spark SQL – 3

Spark Time Travel
Data Recovery & Undo
Version Number
Describe
Describe Extented
TimeStamp As Of Concept

Ch 8: Python Intro & Print

Python Introduction
Python Versions
Python Implementations
Python in Spark (PySpark)
Python Print()
Single, Multiline Statements

Ch 9: Python Variables

Defining Variables
Using Variables
Printing Variables
Display Variables
Variable Types
Multi Value Variables
Multi Value Assigning
If … Else Statement

Ch 10: Python Operators

Integer Operators
String Operators
Arithmetic Operators
Assignment Operators
Comparison Operators
Formatted Strings
Indexing Operators
Short Hand If, OR, AND
ELIF and ELSE IF Statements

Ch 11: Python Data Types

Python Data Types
Integer / Int Data Types
Float, String Data Types
List Data Type
List Items, Indexes
Tuple Data Type
Dictionary Data Type

Ch 12: Python Dataframes

Pandas Module (Python)
Dataframes from Lists
Dataframe from Dict
Pandas Dataframes
Dataframe print, display
Dataframe from Files
spark.read.csv()
spark.read.format()

Ch 13: Medallion Architecture

Understanding Medallion Concepts
Bronze, Gold and Silver
Raw Data
Data Preparation (Prepping)
Temporary Views
Aggregated Data Flow
Big Data Analytics

Ch 14: PySpark: Medallion Loads – 1

Reading from Volumes
Dataframes, Temp Views
Data Prep (Silver)
Filtering DataFrame Records
Removing Duplicate Records
Sorting and Limiting Records
Spark SQL Dataframes
Gold Layer Implementation
Testing Aggregated Loads

Ch 15: PySpark: Medallion Loads – 2

Azure SQL DB Connections
JDBC & Credentials
SQL Queries in PySpark
Data Prep (Silver)
Filtering Null Values
Grouping and Aggregating
Spark SQL Dataframes
Gold Layer Implementation
Testing Aggregated Loads

Ch 16: PySpark: Delta Tables

Delta Tables (Spark)
Parquet Versus Delta
Deleting and Updating Records
Table Utility Commands
Delta Transaction Log

Ch 17: PySpark: SCD

Slowly Changing Dimension
Parquet Versus Delta
Deleting and Updating Records
Table Utility Commands
Merge Into Statement
Incremental Loads
Merge with OLTP Data Sources
Merge Temp Views & Spark Table

Ch 18: PySpark: Widgets

Need for Widgets
Text Widgets
User Parameters
Manual Executions
Parameters & JSON

Ch 19: Lake Flow Jobs

Worksflows & CRON
Job Compute, Running Tasks
Python Tasks (Notebooks)
Parameters into Notebook Tasks
Parameters into Python Script Tasks
Concurrent Executions, Dependencies
Branching Control with the If-Else Task

Ch 20: Databricks Tuning

OPTIMIZE
VACUUM
Lazy Evaluation
Caching, Data Shuffling
Broadcast Joins
Data Skipping
Z Ordering
Liquid Clustering
Spark Configurations

Ch 21: Databricks Security

Databricks Security
\MFA (Multi Factor Authentication)
IAM (Identity & Access Management)
ACL Concepts
Workspace Users & Groups
Workspace Security
Notebook Security
Job Security
Cluster Access Control

Ch 22: Auto Loader – 1

File Incremental Loads
Cloud Files
Cloud File Processing
Checkpoint Files
Creating Directories in Volumes
Reading Streams with Auto Loader
Workspace Modules
Testing Auto Loader (Initial Loads)

Ch 23: Auto Loader – 2

Metadata & WithColumns
Schema Evolution
addNewColumns
Rescue
FailOnNewColumns
Writing to a Data Stream
Testing Auto Loader (Incremental Loads)

Ch 24: Spark Structured Streaming

Delta Lake Concepts
Lakeflow SDP
Declarative Pipelines
Streaming Tables
CDC: Change Data Capture
Bronze Tables
Silver Tables, Timestamp
Gold Tables
Big Data Analytics
SDP (Spark Data Pipelines)
Exploratory Data Analysis

Ch 25: Version Control & GitHub

Local Development
Runtime Compatibility
Git and GitHub Pre-requisites
Git and GitHub Basics
Linking GitHub and Databricks
Databricks Git Folders
Project Code to GitHub
Adding Modules to the Project Code
Databricks Job Updates, Runs

Ch 26: Realtime Project @ Ecommerce / Banking / Sales

Detailed Project Requirements
Project Solutions
Project FAQs
Project Flow
LakeBridge
Interview Questions & Answers
Resume Guidance (1:1)

Brochure Download Request a Callback

What is the Databricks Data Engineer Associate Training?

This training covers Databricks concepts end-to-end including Spark SQL, PySpark, Delta Lake, Lakehouse, Auto Loader, DLT, Unity Catalog, Workflows, Streaming, Medallion Architecture, and Real-Time Projects.

Who should join this course?

Aspiring Data Engineers, Cloud Engineers, BI Developers, Data Science Engineers, and freshers who want to build a strong career in Databricks and modern Data Engineering.

What modules are included in this training?

Module 1: MSSQL
Module 2: Python
Module 3: Databricks (Complete)
Module 4: Databricks Data Engineer Associate Exam Guidance

Is SQL included as part of the training?

Yes. SQL Server basics to advanced topics including DDL, DML, Joins, Constraints, Keys, Views, Procedures, Functions, CTEs, Tuning, Indexes, Group By, Subqueries, Transactions, and Window Functions.

Do I need Python knowledge to learn Databricks?

Yes, and this course teaches Python from scratch including data types, loops, functions, modules, file handling, exception handling, and full pandas for ETL.

What Databricks basics will I learn?

You will learn Workspace, Notebooks, Clusters, Filesystems, Catalogs, Schemas, and Databricks Architecture including Spark and Lakehouse fundamentals.

Does the course include Spark SQL?

Yes. Spark SQL API, creating schemas, altering columns, unions, math functions, sort functions, string functions, date/time functions, conditional logic, expr() and complex SQL expressions.

Will I learn PySpark in detail?

Yes. Creating DataFrames, reading/writing CSV/JSON/ORC/Parquet, schema inference, grouping, filtering, joins, union, pivot/unpivot, transformations, and rendering outputs.

Is Unity Catalog included in the curriculum?

Yes. Managed tables, external tables, volumes, catalogs, schemas, views, access control, workspace binding, lineage, metastore, system tables, and securable objects.

Will I learn Data Ingestion & Auto Loader?

Yes. Auto Loader streaming ingestion, schema inference, evolution, streaming reads/writes, cancellations, and workspace modules.

Is Medallion Architecture taught?

Yes. Bronze, Silver, Gold layers, aggregated loads, temp views, parquet tables, file/table sources, and building reliable pipelines using Medallion principles.

What Delta Lake concepts does this course cover?

Delta Table API, delete/update/merge, time travel, history, schema evolution, DML operations, retention, transaction logs, and Delta Lake SCD Type 2 implementation.

Will I learn SCD Type 2 in real-time?

Yes. Incremental loads, new/existing record handling, history retention, upserts, and automation using Delta Lake and notebooks.

Does the course include Streaming & Structured Streaming?

Yes. Streaming simulations, micro-batches, schema evolution, watermarking, time-based aggregations, triggers, and Delta streaming pipelines.

Do you cover Databricks Workflows (Jobs)?

Yes. Jobs scheduling, CRON, task dependencies, branching logic, passing parameters into notebooks/py scripts, concurrent executions, and job clusters.

Is Databricks Tuning part of the training?

Yes. Explain plans, lazy evaluation, caching, data shuffling, broadcast joins, partitioning, data skipping, Z-ordering, Liquid Clustering, and Spark configs.