Chapter 1: Cloud Basics, Azure SQL
- Cloud Introduction and Azure Basics
- Azure Implementation: IaaS, PaaS, SaaS
- Azure Data Engineer: Job Roles
- Azure Storage Components
- Azure ETL & Streaming Components
- Need for Azure Data Factory (ADF)
- Need for Azure Synapse Analytics
- Azure Resources and Resource Types
- Azure Account, Subscription (Free)
- Azure SQL Server [Logical Server]
- Firewall Rules and Azure Services
- Azure SQL Database Deployment
- Azure SQL Pool Deployment
- Compute: DTU Versus DWU
- Test Connections from SSMS
|
Chapter 1: Azure Fundamentals - Storage
- Azure Resources: Storage Components
- Storage Resources and Properties
- Resource Groups & Subscriptions
- Azure Storage : Files, Tables and ETL
- Azure Storage Account & Use
- Data Lake Storage Account (ADLS)
- Advanced Options: HNS Property
- Resource Location, Resource Group
- Azure Portal: Deployment Verifications
- Azure Portal: Deployment Verification
- Storage Account : Basic Properties
- Overview Page: Status, HNS State
- Azure Storage : Access Options
- Azure Storage Explorer Tool
- Explorer Tool : Configuration
- Azure Subscription : Filter Options
|
Chapter 1: Azure Intro, Azure Databricks
- Azure Cloud : SaaS, PaaS, PaaS & IaaS
- Azure Cloud : Storage, ETL Resources
- Azure Databricks : Compute Resources
- Need for Azure Databricks (ADB)
- Azure Databricks : Purpose & Config
- Azure Databricks Service Creation
- Azure Databricks Components
- Azure Databricks Workspace, Usage
- Spark Cluster Configurations, Capacity
- Driver Nodes, Worker Nodes in Spark
- Cluster Types : Personal, Unrestricted
- CPU, Memory & IO Resources
- Virtual Machines (VM) for Clusters
- Databricks : Runtime & DBFS Storage
- DBFS : Files, Tables with Spark DB
|
Chapter 2: Synapse SQL Pools (DWH)
- Dedicated SQL Pools in Azure
- Data Warehouse with Synapse
- Massively Parallel Processing (MPP)
- Control Nodes and Compute Nodes
- DMS: Data Movement Service
- Start/Resume/Pause & Scaling
- SQL Pool Config @ TSQL Scripts
- Start/Resume/Pause, Scaling Options
- Table Creations @ TSQL Scripts
- Table Partitions: Left & Right
- Distributions: Round Robin, Hash
- Distributions: Replicate and Usage
- Auto Indexing & Column Store
- Planning for Big Data Loads
- Need for ADF: Azure Data Factory
|
Chapter 2: Azure Storage Operations
- BLOB: Binary Large Objects
- Storage Browser and Service Pages
- Storage Browser: Container Creation
- Storage Browser: Folder, File Uploads
- Service Page: Container Creation
- Service Page: Folder, File Uploads
- Container, Folder, File Properties
- Limitations with Storage Portal
- Azure Data Explorer Tool : Usage
- Contrainer: Creation, Properties
- File Uploads, Edits and Access URLs
- Azure Storage Explorer Tool Usage
- Azure Account Options in Explorer
- Directory Creation, File Operations
- Limitations with Explorer Tool
|
Chapter 2: SparkDatabase, SQL Notebooks
- DBFS : File Uploads from ON-Premise
- Creating Spark Tables; Spark DB
- Data Explorer: HIVE Metastore
- Data Explorer: Spark Database, Tables
- Notebooks: SQL, Python and Scala
- Creating SQL Notebooks in Databricks
- Creating User Defined Spark Databases
- Connecting / Using Spark Databases
- Spark SQL : Big Data Loads
- Spark SQL : Database & Table List
- Spark SQL : Data Aggregations, Jobs
- Spark SQL : Data Analytics, Reports
- Analytics: X, Y Axis, Group By
- Notebooks : Export, Import, Clone
- Notebooks : Storage & Versions
|
Chapter 3: Azure Data Factory, Pipelines
- Azure Data Factory (ADF) Concepts
- ADF Pipelines : Architecture
- Integration Runtime (IR) & Use
- Linked Services and Datasets
- Pipeline Activities: Copy Data Tool
- DIU : Data Integration Units
- DTU Vs DWUs Vs DIU
- ADF Pipeline with Copy Data Tool
- Azure SQL DB to Synapse Data Loads
- Multi Tables Data Loads with ADF
- Bulk Insert, Data Copy Methods
- ETL Staging: Storage Account
- Staging Container Connections
- DIU Allocations & Publish
- ETL Pipeline Monitoring, Runs
|
Chapter 3: Azure Storage Security, ACLs
- Azure Data Lake Storage Security Options
- Shared Access Keys: Primary, Secondary
- SAS Key Generation: Container, Tables
- SAS Key Permissions, Validation Options
- Access Keys: Account Level Permissions
- Azure Active Directory: Users, Groups
- Azure AD Security: RBAC, IAM, ACLs
- Owner Role, Contributor, Reader Role
- Azure Data Lake Storage Security
- ACL : Access Control Lists & Security
- Azure BLOB Storage Containers & ACLs
- Folder Level and File Level Security
- ACL Permissions: Read, Write, Execute
- Access Policy: Creation, Realtime Use
- rwacdl; Azure Principals, CORS
|
Chapter 3: Python Intro, Data Loads
- Python : Introduction, Real-time Use
- Python For ETL and DWH
- Python For Azure: Data Engineer
- Python Data Frames & Purpose
- Python Dataframes - Pandas
- Python with Spark Integrations
- PySpark for DDL and ETL
- PySpark Versus SQL Notebooks
- Reading DBFS Data into Spark
- Creating Dataframes for ETL
- Temporary Views & Dataframes
- Spark Temp Views: Aggregations
- Spark Table Loads, HIVE Data
- write.format()
- Parquet Tables with Spark DB
|
Chapter 4: OnPremise Data Loads, Upsert
- Copy Data Tool : Incremental Loads
- On-Premise Data Sources with Azure
- Self Hosted Integration Runtime (IR)
- Access Keys, Remote Linked Service
- Synapse SQL Pool (DW), OnPremise
- ETL Staging with Storage Account
- Copy Method: Polybase - Tuning
- Polybase : Big Data Loads
- ETL Pipelines for Incremental Loads
- Business Keys For Table Upsert
- Pipeline Schedules with ADF
- ETL Logging with Storage Account
- Copy Method: UPSERT
- DIU, DOCP & Publish
- Manual Pipeline Executions in ADF
|
Chapter 4: SQL Database Migrations
- OnPremise SQL Server to Azure Migration
- SSMS Tool, SQL Database Installation
- Source Database Scripts & Validations
- BACPAC File Generation: SSMS Tool
- Table Selection & Advanced Options
- Azure Data Lake Storage, SSMS Access
- Azure Storage Container, BACPAC Files
- IAM and Account Key Authentication
- Azure SQL Server Creation From Portal
- Azure SQL Database Deployment
- DTU : Data Transaction Units, Pricing
- Azure Firewall Configuration, Security
- Azure SQL Database Imports (bacpac)
- Azure SQL Server with ADLS Containers
- Azure SQL DB Migrations, Verification
|
Chapter 4: PySpark with ADLS
- Azure Storage Account : Creation
- Azure Data Lake Storage : HNS
- Creating Containers in ADLS
- BLOB File Uploads / Generation
- Account Key : Access Key / SAS Key
- BLOB Access URL for Databricks
- WASBS URL for PySpark Notebook
- Generating PySpark Script
- PySpark Connection Variables
- Databricks : Data Import Scripts
- Config Options with ADLS, Spark
- config (), Session Context
- DataFrames with Temp Tables
- Escape Sequence with SparkSQL
- Data Explorer: HIVE & Spark DB
|
Chapter 5: File Incremental Loads in ADF
- Incremental Loads with Files (BLOB)
- ETL Schedules: Tumbling Window
- Execution Retry and Delay Options
- Binary Copy, Structural Data Loads
- Incremental Loads Verification Tests
- Incompatible Rows & Fault Tolerance
- Pipeline Compression & Tuning
- Pipeline Publish, Monitor Options
- Azure Monitor Resource : Metrics
- ADF Metrics and Pipeline Runs
- ADF: Pipeline Monitoring and Alerts
- Synapse: Storage Monitoring, Alerts
- Conditions, Signal Rules and Metrics
- Alerts & Action Groups: Emails
- Email Notifications with Azure
|
Chapter 5: Azure Tables & Replication
- Azure Tables - SchemaLess Design
- Azure Tables: Creation, Data Inserts
- Tables, Entities, Properties Concepts
- Structured, Relational Data Storage
- Azure Tables: GUI, Data Types
- Azure Tables: Big Data Imports
- Data Edits, Queries, Delete Operations
- Odata Options (REST API), End Points
- Azure Storage: Replications, DR Options
- LRS: Locally Redundant Storage
- GRS: Globally Redundant Storage
- ZRS: Zone Redundant Storage
- Replication Options and Advantages
- Replication Verification, Modifications
- Storage Endpoints, Failover Partner
|
Chatper 5: PySpark Widgets & Spark
- Widgets : Notebook Parameters
- widget module : Text, Combo
- Dropdown, Multi Select Parameters
- dbutils help(), get() & remove()
- Dataframes, Spark SQL @ Variables
- Python Data Frames, Spark SQL
- Reading Parameters Values
- Parameters Versus Variables
- Using Parameters For Temp Tables
- Using Parameters for Spark Tables
- Data Storage and HIVE Metastore
- Reading Parameterized Data
- Format Strings with PySpark
- Dynamic Queries with Spark SQL
- Aggregations and f Strings
|
Chapter 6: ADF Data Flow - 1
- Data Flow Task, Data Flow Activity
- Transformations with Data Flow
- Spark Cluster For Debugging
- Cluster Node Configurations
- Spark Cluster Types & Sizing
- Transaction Optimized - Capacity
- Memory Optimized - Capacity
- Data Cleansing with ADF
- Data Orchestration with Data Flow
- SELECT Transformation & Options
- Conditional Split Transformation
- UNION, SELECT Transformation
- Spark Cluster For Pipeline Executions
- Pipeline Monitoring & Run IDs
- Adding Data Flow into Pipelines
|
Chapter 6: Azure Stream Analytics, IoT
- Azure Stream Analytics Real-time Use
- Real-time Data Processing, Events
- Ingest, Deliver & Analysis Operations
- Azure Stream Analytics Jobs Concept
- Understanding Input, Output Options
- SAQL Queries: Stream Analytics Jobs
- IoT: Internet Of Things, Real-time Data
- Need for IoT Hubs and Event Hubs
- Conditional Split Transformation
- Creating IoT Device for Data Inputs
- Creating Azure Stream Analytics Job
- Stream Analytics for Historical Data
- Azure SQL Database for ASA Jobs
- SAQL: Query Formatting, Validation
- Historical Data Upload, ASA Jobs
- Stream Analytics Job Monitoring
|
Chapter 6: Architecture, Workflows
- Driver Nodes, Worker Nodes, DBUs
- RDD : Resilent Data Distribution
- DAG : Directed Acyclic Graph
- Hadoop HDES and Spot Instance
- Cluster Manager, Master Node
- RDDS, Worker, Excecutor & Slave
- Hadoop HDES & Databricks Runtime
- Databricks Optimization Techniques
- Spot Instance, Photon Acceleration
- All Purpose Cluster, Job Cluster
- Databricks Jobs: Creation & Tasks
- Jobs with Parameters, Executions
- Task Dependency & Notifications
- Continuous & Manual Schedules
- Active Jobs, Recent Run Jobs, Monitor
|
Chapter 7: ADF Data Flow - 2
- ADF Pipelines For ETL Operations
- Data Flow Tasks, Activities in Synapse
- JOIN & EXISTS Transformations
- Aggregate & Group By Transformations
- Window Functions, Rank in Data Flow
- Rank / DenseRank / Row Number
- Derived Column Transformation
- Lookup, Surrogate Key, Parse
- Type Convert, Cast Transformations
- Reusing Data Flow Tasks in Synapse
- Pipeline Validations & Executions
- Inline Datasets, Schema Drift
- Data Deduplication with ADF
- DFT Optimization Techniques
- Data Flow Task - Staging, Logging
|
Chapter 7: Azure Event Hubs
- Azure Stream Analytics For API Data
- IoT Hubs, IoT Devices, Connection Strings
- Rasberry APP Connections with IoT Hub
- Azure Storage Account and Container
- Creating Azure Stream Analytics Job
- Configuring Input Aliases with IoT Hub
- Output Aliases with ADLS Gen 2
- SAQL Query, Job Executions; Monitoring
- Azure Event Hubs and Event Instances
- Event Hub Namespaces, Partition Counts
- Access Policies, Permissions & Defaults
- RootManageSharedAccessKey & Options
- Connection Strings & Event Service Bus
- Telco App : Executions & LIVE Data
- On-Premise App Integration, ASA Jobs
|
Chapter 7: Databricks Security, Scala
- Azure Databricks Security Operations
- Azure Active Directory (Azure AD)
- AD Users and RBAC with IAM
- Owner, Contributor & Reader Roles
- Workspace Admin Permissions
- Notebook Permissions & Share
- Workflow Security, HTTP Path
- User Tokens & ServerName
- Scala : Differences with PySpark
- Scala : Variables Declaration, Usage
- SparkSQL with Scala Notebooks
- Temp Views with Scala Notebooks
- Aggregations with Scala Notebooks
- Visual Data Analytics with Scala
- PySpark to Scala Conversions
|
Chapter 8: Azure Synapse Analytics
- Azure Synapse Analytics Resource
- Azure Synapse Analytics Workspace
- Managed Resource Group, SQL Account
- Synapse Workspace & Synapse Studio
- Operations with Synapse Workspace
- ADLS Gen 2 Storage Account, Container
- Synapse Studio: Scripts & Pipelines
- Dedicated SQL Pools : Creation, Use
- Synapse Tables, Data Loads with TSQL
- COPY INTO Statements with T-SQL
- Row Terminator and Compressions
- T-SQL Queries and Aggregations
- Aggregation Data Loads in Synapse
- Creating Synapse Pipelines with TSQL
- Stored Procedure Activity & Triggers
|
Chapter 8: Storage Architecture, Queues
- Azure Storage Account : Architecture
- Etag: Replication & Encryption Use
- BLOB Types: Block, Append & Page
- Access Tiers: Hot, Cool, Cold Types
- Archive Access Tier & Retention
- Legal Hold & Time Bound Access
- Pricing : HNS, Security, Encryption
- EndPoint URL & Read-Only Use
- Azure File Share Service (Files)
- Mounting Files From On-Premise
- SMB File Share : Hot, Optimized
- Azure Queue Service & Messages
- Message Queues : Operations
- Storage Explorer Tool with Shares
- Azure Storage Services: ETL Needs
|
Chapter 8: Scala with ADLS, Azure SQL
- Data Imports with Azure SQL DB
- Using Scala for Big Data Loads
- Spark SQL Queries @ Temp Views
- Variables, display(), read()
- Scala Transformations, display()
- JSON, AVRO and DBFS Mounts
- azure.sas.container @ ADLS
- write.jdbc() & JVM
- JDBC Connection, DataframeWriter
- Data Extraction, SQLContext
- Spark Context and Spark Session
- SQLServerDriver with Scala
- ADLS with Scala Notebooks
- Parameters (Widgets) with Scala
- Compare Python with Scala
|
Chapter 9: Synapse Analytics with Spark
- Synapse Pipelines: Performance Advantage
- Pivot Transformation For Normalization
- Generate Pivot Column, Aggregations
- Pivot Transformation & Pivot Setting
- Pivot Key Selection, Value and Nulls
- Pivoted Columns and Column Pattern
- Column Prefix, Help Graphic, Metadata
- Denormalized Data and Aggregations
- Apache Spark Pool in Azure Synapse
- Spark Cluster Nodes: Vcores, Memory
- Notebooks : Purpose, Usage Options
- Python Notebooks For Remote Access
- Creating Databases in Apache Spark Pool
- Data Loads from Dedicated SQL Pools
- PySpark Code for Data Operations, Writes
|
Chapter 9: Monitoring & Key Vaults
- Azure Monitor, Metrics & Activity Logs
- Monitoring Azure Storage Namespaces
- Add KQL Metrics; Account, Blob and File
- Total Ingress and Egress Metrics: Charts
- Average Latency, Transaction Count
- Request Breakdowns, Signal Logic
- Azure Alerts & Conditions, Notifications
- Signal Logic Conditions and Emails
- Key Vaults Types: Standard & Premium
- Secret Page, Key Backups, Key Restores
- Azure Key Vaults - Name and Vault URI
- Inbuilt Managed Key and Azure Key Vault
- Key Vaults Types: Standard & Premium
- Secret Page, Key Backups, Key Restores
- Managed Identity with ETL Process
|
Chapter 9: DeltaLake Incr Loads, DWH
- Azure DeltaLake Implementation
- ACID Properties, Upsert Advantages
- Delta Engine Optimizations & Uses
- Pipeline Creation: JSON Files in DBFS
- Delta Tables Creation, Data Loads
- Spark Cluster Settings: Auto Optimize
- Auto Compact, Delta Table Optimize
- JSON Files, Delta Streaming Location
- Joins and Merge with Delta Tables
- Incremental Loads, Delta Tables
- Create & Use DWH with Databricks
- Upsert (Merge) with Spark Tables
- Big Data & Jupyter Notebooks
- Databricks with Data Factory (ADF)
- Pipelines with Databricks Notebooks
- End to End Implementations
|
Chapter 10: Synapse Security & Parameters
- Azure Active Directory (AAD) Users, Groups
- IAM: Identity & Access Management
- Synapse Workspace Security with RBAC
- ADF Security: RBAC, Owner, Contributor
- Azure Synapse SQL Pool Security: Logins
- Creating SQL Logins & Users : master
- SQL Users in Azure SQL DB and SQL Pool
- Grant, Control, Revoke: Security Roles
- Parameters - Creation and Use in Pipelines
- Dynamic Connections with Credentials
- User Name and Password Connectivity
- Dynamic Dataset Configurations
- Pipeline Expressions with Parameters
- Resource Classes and Usage with SQL Pool
|
Real-time Project (End to End)
- Online Retail Database Data Source
- Azure Migrations and ETL Concepts
- Azure SQL Pool (Synapse DWH) Tables
- Apache Spark Pool : Databases, Tables
- Azure Data Lake Storage (ADLS Gen 2)
- Handling Unstructured Data in ADF
- End to End Workflows, Automations
- Azure Logic Apps: Automated Workflows
- Visual Designer & Prebuild Templates
- Server Less Integrations in Azure
- Workflow, Triggers and Actions
- Managed Connectors, Integrations
- ARM Template : Deployments
- ARM Templates : ADF, ADLS
|
- ADLS with Spark Databases
- Aggregations with Big Data Loads
- Parameterized ETL Sources
- Parameterization & Workflows
- Python Notebooks to Scala
- Azure SQL DB Connections
- ARM Templates & JSON
- Project Requirement
- Project Solution, FAQs
- Concept wise FAQs
- Resume Guidance
- Mock Interviews (1 to 1)
- DP 203 Certification Guidance
- DP 203 Sample Papers (Latest)
|
Chapter 11: Change Data Capture (CDC)
- Change Data Capture (CDC) Data Loads
- Incremental Loads with CDC Types
- SQL Server CDC : ETL Load Dates
- Pipeline Expression, Data Window
- JSON Parameters, Pipeline Scheduling
- ETL Optimization Techniques
- Serverless Pool in Azure Synapse
- Connections, Use with Serverless Pool
- Using Azure OpenDatasets in Synapse
- OPENROWSET and BULK Data Loads
- Working with Parquet Files in Synapse
- Python Notebooks (Pyspark) in Synapse
|
Azure Data Engineering with Power BI (For Power BI Registrations)
- Power BI with Synapse SQL Pool
- Power BI with Synapse Analytics
- Get Data: Storage Modes
- Direct Query, Performance Inspector
- Aggregated Data Analytics
- Data Gateways : Auto Refresh
- Power BI with ADLS : Record Query
- Power BI with ADLS : BLOB Data
- Power BI with Spark DB : JDBC
- Power BI with Spark DB : User Tken
- Power BI with Spark DB : LIVE Data
- Power BI with Spark DB : Refresh
|
- Azure Purview : Data Governance
- Unified SaaS for Multi Cloud
- Data Mapping and Resilence
- Automated Data Discovery
- Sensitive Data Labels : SQL Server
- Interactive Data Lineage
- Trusted Data Discovery in Azure
- Confidential Data & Trust
- DataCatalog, Data Estate Insights
- Azure Key Vaults, ADLS Security
- Azure Passwords, Keys, Certificates
- Azure Key Vaults - Name, Vault URI
- Managed Key & ETL Connections
|