ChatGPT Image Jun 6, 2026, 01_25_36 PM

#Databricks Data Engineer

Databricks Data Engineer is the backbone of today’s data-driven world — uniting data, analytics, and AI on one collaborative platform. Built on Apache Spark, it enables seamless data integration, scalable ETL pipelines, and real-time analytics.
From finance to healthcare, Databricks Data Engineers power intelligent insights, optimize data workflows, and drive business transformation across industries.

Training Highlights

✅ Databricks Architecture
✅ PySpark & Spark SQL
✅ Delta Lake & Delta Tables
✅ Databricks Notebooks
✅ Python, PySpark, SQL
✅ Event Hub, Kafka
✅ Job Workflows & Automation
✅ CI-CD Integrations (DevOps)
✅ Real Time Projects

Modules We Learn:

✅Module 1: SQL Server TSQL (MS SQL) Queries
✅Module 2: Databricks
✅Module 3: Real Time Project (E-commerce)

Course Duration: 7 Weeks

Databricks Data Engineer

Brochure Download Request a Callback

Module 1: SQL Server TSQL (MS SQL) Queries

Ch 1: SQL Database Job Roles

Introduction to Data
Database Intro, Types
OLTP, DWH, OLAP
DBMS Concepts
Database Job Roles
Data Engineer Job Roles

Ch 2: Database Intro & Installations

SQL Server Installations
Instance Concepts
Authentication Types
Authentication Modes
Collation & File Stream
SQL Server 2025 Installations
SSMS Tool Installation
Connections, Authentications

Ch 3: SQL Basics V1 (Commands)

Creating Databases (GUI)
Creating Tables, Columns (GUI)
SQL Basics (DDL, DML, etc..)
Creating Databases, Tables
Data Inserts (GUI, SQL)
Basic SELECT Queries

Ch 4: SQL Basics V2 (Commands, Operators)

DDL: Create, Alter, Drop, Add
DML: Insert, Update, Delete
DQL: Select, Fetch
SQL Operators
Special Operators

Ch 5: Excel Data Imports

Data Imports with Excel
SQL Native Client
Order By: Asc, Desc
Order By with WHERE
TOP & OFFSET
UNION, UNION ALL

Ch 6: Schemas & Batches

Schemas: Creation, Usage
Schemas & Table Grouping
Real-world Banking Database
2 Part, 3 Part & 4 Part Naming
Batch Concept & “Go” Command

Ch 7: Constraints, Keys & RDBMS

Null, Not Null Constraints
Unique Key Constraint
Primary Key Constraint
Foreign Key & References
Default Constraint & Usage
DB Diagrams & ER Models

Ch 8: Realtime Case Study – 1

Medicare Database
Patients, Visits, Meds, etc
Keys, Constraints
Relations, Data Validations

Ch 9: Joins & Queries

Joins: Table Comparisons
Inner Joins & Matching Data
Outer Joins: LEFT, RIGHT
Full Outer Joins & Aliases
Cross Join & Table Combination
Joining more than 2 tables

Ch 10: Views & RLS

Views: Realtime Usage
Storing SELECT in Views
DML, SELECT with Views
RLS: Row Level Security
WITH CHECK OPTION
Important System Views

Ch 11: Stored Procedures

Stored Procedures: Realtime Use
Parameters Concept with SPs
Procedures with SELECT
System Stored Procedures
Metadata Access with SPs
Stored Procedures, Tuning

Ch 12: User Defined Functions

Using Functions in MSSQL
Scalar Functions in Real-world
Inline & Multiline Functions
Parameterized Queries
Date & Time Functions
String Functions & Queries
Aggregated Functions & Usage

Ch 13: Triggers & Automations

Need for Triggers in Real-world
DDL & DML Triggers
For / After Triggers
Instead Of Triggers
Memory Tables with Triggers
Disabling DMLs & Triggers

Ch 14: Transactions & ACID

Transaction Concepts in OLTP
Auto Commit Transaction
Explicit Transactions
COMMIT, ROLLBACK
Checkpoint & Logging
Lock Hints & Query Blocking
READPAST, LOCKHINT

Ch 15: Indexes Basics, Tuning

Indexes & Tuning
Clustered Index, Primary Key
Non Clustered Index & Unique
Creating Indexes Manually
Composite Keys, Query Optimizer
Composite Indexes & Usage

Ch 16: CTEs & Tuning

Common Table Expression
Creating and Using CTEs
CTEs, In-Memory Processing
Using CTEs for DML Operations
SP Recompilations
IIF(), CASE Statement

Ch 17: Group By Queries

Group By, Distinct Keywords
GROUP BY, HAVING
Cube( ) and Rollup( )
Sub Totals & Grand Totals
Grouping( ) & Usage
Group By with UNION
Group By with UNION ALL

Ch 18: Sub Queries

Sub Queries Concept
Sub Queries & Aggregations
Joins with Sub Queries
Sub Queries with Aliases
Sub Queries, Joins, Where
Correlated Queries

Ch 19: Joins with Group By

Joins with Group By
3 Table, 4 Table Joins
Join Queries with Aliases
Join Queries & WHERE
Join Queries & Group By
Joins with Sub Queries
Query Execution Order

Ch 20: Normal Forms & Self Joins

Normal Forms: 1 NF, 2 NF
3 NF, BCNF and 4 NF
Adding PK to Tables
Adding FK to Tables
Cascading Keys
Self Referencing Keys
Database Diagrams

Ch 21: Data Types & Variables

Integer Data Types
Character, MAX Data Types
Decimal & Money Data Types
Boolean & Binary Data Types
Date and Time Data Types
SQL_Variant Type
Variables in SQL
Cursor Variable & Fetch

Ch 22: Rank Functions, CTEs

Window Functions (Rank)
Row_Number( )
Rank( ), DenseRank( )
Partition By & Order By
Using CTEs with Row Number

Ch 23: Merge (Upsert) with SPs

Merge Statement
Upsert Operations with Merge
Merge with OLTP & DWH
Matched and Not Matched
Merge Statement inside SPs

Ch 24: Realtime Case Study – 2

ECommerce Database
Entities and ER Diagram
Data Validations
Query Writing
Query Tuning

Module 2: Databricks

Ch 1: Databricks Introduction

Cloud ETL, DWH
Cloud Computing
Databricks Concepts
Big Data in Cloud

Ch 2: Databricks Architecture

Unity Catalog, Volume
Spark Clusters
Apache Spark and Databricks
Apache Spark Ecosystem
Compute Operations
Hadoop, MapReduce, Apache Spark

Ch 3: Unity Catalog

Unity Catalog Concepts
Workspace Objects
Databricks Notebooks
Databricks Workspace UI
Organizing Workspace Objects
Creating Volumes
Spark Table Creations
Spark UI: Limitations

Ch 4: Spark SQL: Basics

Spark SQL Notebooks
Creating Catalog
Creating Schemas
Creating Tables
Spark Data Types
PySpark API: SQL Queries
Dropping Objects
Notebooks: Exports, Clone

Ch 5: Spark SQL: Table Types

Delta Tables
Managed Tables
External Tables
Data Partitioning
Union, Views in Spark
External Volumes

Ch 6: Spark SQL: Functions

Math, Sort Functions
String, DateTime Functions
Conditional Statements
SQL Expressions with expr()
Volume for our Data Assets
File Formats, Schema Inference
Spark SQL Aggregations

Ch 7: Spark SQL: Time Travel

Time Travel Concepts
Spark DB: Logical Architecture
Spark DB: Physical Store
Data File Store
Log File Store
Time Travel
DESCRIBE, EXTENDED
HISTORY
Version Numbers

Ch 8: Python: Introduction, Print

Python Introduction
Python Versions
Python Implementations
Python in Spark (PySpark)
Python Print()
Single, Multiline Statements

Ch 9: Python: Variables

Python Variables
Variable Declarations
Variable Values
Value Types
Multi Variable Values
Common Variable Values
Realtime use of Variables

Ch 10: Python: Operators

Need for Operators
Arithmetic Operators
Assignment Operators
Comparison Operators
Operator Precedence
Operands in Python

Ch 11: Python: Control Statements

Python Control Structures
If … Else Statement
Short Hand If
ELIF & ELSE IF Statements
OR, AND Concepts
Python Loops

Ch 12: Python: Data Types

Python Data Types
Integer / Int Data Types
Float, String Data Types
List Data Type
Dictionary Data Type
Tuple Data Type
List Items, Indexes
Tables Versus Dictionaries

Ch 13: Python: Modules & Dataframes

Python Modules
Pandas
NumPy
Dataframe Concepts
Handling Nulls
Data Cleansing Concepts
Pandas Series, arrays
Indexes, Indexed Lists

Ch 14: PySpark Concepts

Constructing Dataframes
Single List Dataframes
Multi List Dataframes
Pandas Dataframes
Contact & Union
Merge
Join Options with Dataframes

Ch 15: Medallion Architecture – 1

Medallion Architecture
Aggregated Data Loads
Broze, Silver and Gold
Temp Views
Spark Tables (Parquet)
Work with File Sources

Ch 16: Medallion Architecture – 2

Medallion Architecture
Azure SQL DB Connections
Joining Source Tables
Dataframes, Temp Views
Aggregated Data Loads
Gold Data Consumption

Ch 17: Delta Lake

Databricks DeltaLake
Schema Evolution
Azure SQL DB Connections
Dataframes, Temp Views
Delta Table API
Deleting Records
Updating Records
Merging Records
Old History Retention
Delta Transaction Log

Ch 18: PySpark: Widgets

PySpark Parameters
Text Widgets
User Parameters
Manual Executions
Automations
UI & JSON For Widgets

Ch 19: Lake Flow Jobs

Worksflows & CRON
Job Compute, Running Tasks
Python Script Tasks
Parameters into Notebook Tasks
Parameters into Python Script Tasks
Concurrent Executions, Dependencies
Branching Control with the If-Else Task

Ch 20: Pyspark: Auto Loader – 1

AutoLoader Concept
Cloudfiles Architecture
Checkpoint Configurations
Creating Directories
Reading Databricks Cloud Sources
Initial Loads

Ch 21: PySpark: Auto Loader – 2

Reading Streams with Auto Loader
Reading a Data Stream
Manually Cancel your Data Streams
Writing to a Data Stream
Schema Evaluation Modes
Adding New Columns
Workspace Modules

Ch 22: Lake Flow Declarative Pipelines

SDP: Spark Declarative Pipelines
Delta LIVE Tables
Streaming Data Loads
Bronze, Silver, Gold Data
Materialized Views
Pipeline Clusters
Databricks CLI
Data Quality Checks

Ch 23: Databricks Optimizations

Lazy Evaluation
Explain Plan
Caching
Data Shuffling
Broadcast Joins
Partitions
Data Skipping
Z Ordering
Liquid Clustering
VACUUM
OPTIMIZE

Ch 24: Security Concepts

Overview of ACLs
Adding a New User to Workspace
Workspace Access Control
Cluster Access Control
Groups & LakeBridge
Access Keys (Tokens)

Ch 25: Version Control & GitHub

Local Development
Runtime Compatibility
Git and GitHub Pre-requisites
Git and GitHub Basics
Linking to GitHub & Databricks
Databricks Git Folders
Project Code to GitHub
Adding Modules to the Project Code
Databricks Job Updates, Runs

Ch 26: Databricks Data Engineer Associate Exam

Databricks Data Engineer Associate Exam
AVRO Formats
Exam Guidance
Databricks Exam Pattern
Exam Q & A, Scenarios

Module 3: Real Time Project (E-commerce)

Realtime Project : (E-commerce Platform)

Project Objective
Build an end-to-end Azure Data Engineering solution to process, transform, and analyze ecommerce business data from multiple sources.

Technologies Used