Microsoft Certified : Azure Data Engineer Associate (DP-203)

Complete Practical and Real-time Training on Azure Data Engineer. This Job Oriented Course includes: 1. Azure Fundamentals, 2. Azure Active Directory, 3. Azure SQL Databases, 4. Azure Migrations, 5. Azure Data Factory, 6. Azure SQL Pools (Synapse), 7. Azure Synapse Analytics, 8. Azure Storage, 9. Azure Data Lake Storage, 10. Azure Stream Analytics, 11. Azure Databricks, 12. SparkSQL, 13. PySpark, and 14. End to Implementation with a Real-time Project for your resume & job work. Also includes End to End Real-time Project with Power BI Integrations including Storage Explorer Tool, Data Explorer Tool, Python/R/Scala Notebooks and Big Data Analytics.

This Azure Data Engineer Training course is applicable for DP 200 and DP 201 Microsoft Certification Examinations.
 
 

Azure Data Engineer Training Content

Mod 1: Azure Data Factory [ADF], Synapse

Mod 2: Azure Storage & Stream Analytics

Mod 3: Azure Databricks & SparkSQL

Chapter 1: Cloud Basics, Azure SQL DB

  • Cloud Introduction and Azure Basics
  • Azure Implementation: IaaS, PaaS, SaaS
  • Benefits of Azure Cloud Environment
  • Azure Data Engineer: Job Roles
  • Azure Storage Components
  • Azure ETL & Streaming Components
  • Need for Azure Data Factory (ADF)
  • Need for Azure Synapse Analytics
  • Azure Resources and Resource Types
  • Resource Groups in Azure Portal
  • Azure SQL Server [Logical Server]
  • Firewall Rules and Azure Services
  • Connections with SSMS & ADS Tools
  • Working with Azure Portal
  • Resource Group Navigations, Options

Chapter 1: Azure Storage & Containers

  • Storage Components in Microsoft Azure
  • Azure Storage Services and Types - Uses
  • High Availability, Durability & Scalability
  • Blob: Binary Large Object Storage
  • General Purpose: Gen 1 & Gen 2 Versions
  • Blobs, File Share, Queues and Tables
  • Data Lake Gen 2 Operations with Azure
  • Azure Storage Account Creation
  • Azure Storage Container: Usage
  • Azure Data Explorer: Operations
  • File Uploads, Edits and Access URLs
  • Azure Storage Explorer Tool Usage
  • Azure Account Options in Explorer
  • Directory Creation, File Operations
  • End User Access Options With Files
  • Data Explorer Vs Storage Explorer Tool

Chapter 1: Azure Intro, Azure Databricks

  • Azure Databricks : Purpose & Config
  • Need for Azure Databricks (ADB)
  • Azure Databricks Service Creation
  • Azure Databricks Workspace & Usage
  • Spark Cluster Configurations & Capacity
  • Driver Nodes and Worker Nodes in Spark
  • Master Node & Cluster Creation Process
  • Cluster Types and Capacity Options
  • Standard, High Concurrency Clusters
  • Databricks Runtime Service & DBUs
  • Databricks File System (DBFS) and Usage
  • Azure Databricks Workspace Operations
  • ETL and Data Storage Components
  • Spark Concepts and Spark SQL
  • Spark Context and Spark Session
  • DataFrame, Dataset and Real-time Use

Chapter 2: Synapse SQL Pools (DWH)

  • Dedicated SQL Pools in Azure
  • Enterprise Data Warehouse with Synapse
  • DWU: Data Warehouse Units, Resources
  • Massively Parallel Processing (MPP)
  • Control Nodes and Compute Nodes
  • SQL Pool Access from SSMS Tool
  • T-SQL Queries @ SQL Pools
  • Start/Resume/Pause, Scaling Options
  • Creating Tables in Azure SQL Pool
  • Compression, MAX DOP & Indexes
  • Distributions: Round Robin, Hash
  • Distributions: Replicate and Usage
  • Data Imports with COPY Table
  • Dynamic Views (DMV) with PDW
  • Data Loads Monitoring, Resource Class

Chapter 2: Azure Migration, BLOB Imports

  • SQL Server (On-Premise) to Azure Migration
  • Source Database Scripts & Validations
  • BACPAC File Generation From SSMS Tool
  • Azure Data Lake Storage and SSMS Access
  • Azure Storage Container, BACPAC Files
  • Azure SQL Server Creation From Portal
  • Azure SQL DB Imports, Storage SAS Keys
  • Azure SQL Database Migrations, Verification
  • BLOB Data Access from On-Premise
  • Data Imports From Excel and CSV Files
  • BLOB Data Imports using T-SQL Queries
  • SAS - Shared Access Signature Generation
  • CSV File - Uploads, Downloads, Edits, Keys
  • Master Keys, Credentials, External Sources
  • BULK INSERT Statement and Data Imports
  • T-SQL Imports : Practical Limitations

Chapter 2: SQL Notebooks & Python

  • Notebooks: Concept, Usage Options
  • Creating SQL Notebooks in Databricks
  • Using DBFS Tables in SQL Notebooks
  • Data Access and Analytics Options
  • SparkSQL Queries: SELECT, GROUP BY
  • SparkSQL Queries: Aggregates, Conditions
  • Notebook Operations: Download, Clone
  • Notebook Operations: Upload, Reuse
  • SQL Notebooks with Python Code
  • Using DBFS Sample Data Sources (CSV)
  • Dataframes: Creation and Real-time Use
  • Pandas Dataframe, Virtual Table Creation
  • Dataframe Data Access, Caching Options
  • Take() and Display() Functions in PySpark
  • Temporary View Creation and Access
  • SparkSQL Queries, Analytics, Chart Reports

Chapter 3: Azure Data Factory Concepts

  • Azure Data Factory (ADF) Concepts
  • Hybrid Data Integration at Scale
  • ADF Pipeline Components & Usage
  • Configure ADF Resource in Azure
  • Understanding ADF Portal and IR
  • Linked Services and Connections
  • Datasets and Tables / Files for ETL
  • ADF Pipelines: Design, Publish & Trigger
  • ADF Pipeline with Copy Data Tool
  • Creating Azure Storage Account
  • Storage Container, BLOB File Uploads
  • Data Loads with Azure BLOB Files
  • DIU Allocations and Concurrency
  • Creating Linked Services, Datasets
  • Pipeline Trigger, Author and Monitor

Chapter 3: Azure Tables, Shares

  • Azure Tables - Real-time Usage
  • Schema-less Design and Access Options
  • Structured and Relational Data Storage
  • Tables, Entities and Properties Concepts
  • Azure Tables: Creation and Data Inserts
  • Azure Tables in Portal - GUI and Data Types
  • Azure Tables: Data Imports in Explorer
  • Data Edits, Queries & Delete Operations
  • Azure Files - SMB Protocol, Creation, Usage
  • Shared Access, Fully Managed & Resiliency
  • Performance, Size Requirements for Shares
  • Azure Storage Explorer Tool for File Shares
  • Azure Queues: Message Queues, Limitations
  • Adding Messages, Queuing and De-Queuing
  • Data Access & Clear Queue from Explorer
  • End Points for Azure Message Queues

Chapter 3: Python Notebooks

  • Azure SQL Server Configurations
  • Azure SQL Database Creation
  • Azure Firewall Rules and IP Address
  • Allow Azure Services, Remote Access
  • Connection Tests with SSMS Tool
  • Python Notebooks with Azure Databricks
  • Data Imports and Table Creations (Code)
  • Parquet Files and Usage in Databricks
  • Using Dataframes for Data Operations
  • SparkSQL Queries with SELECT, TOP
  • Establishing Connections to Azure SQL DB
  • JDBC Connection Strings, DataframeWriter
  • JDBC Properties, Port Settings & Options
  • Data Extraction, SQLContext & Dataframes
  • Pandas Data Frame for Big Data Analytics
  • JDBC URL Options & PySparkSQL Modules

Chapter 4: ADF Pipelines, Polybase

  • Copy Data Tool For ETL Operations
  • Azure SQL DB to Synapse Data Loads
  • Working with Multi Tables Data Loads
  • Query Options for Source Datasets
  • Transformations with Copy Data Tool
  • Rename, Rearrange & Remove Options
  • Pipeline Execution: DTU & DOCP
  • ADF Pipeline Monitoring Options
  • ADF Pipelines: Execution Settings
  • ADF Logging Options, Consistency Check
  • Compression Option, DOP and DOCP
  • ETL Staging Advantages & Performance
  • Staging with Storage Account, Container
  • ADF Pipeline Triggers and Monitoring
  • Polybase For Azure Synapse, Advantages

Chapter 4: Azure Storage Security, Admin

  • Azure Data Lake Storage Security Options
  • Shared Access Keys - Primary, Secondary Keys
  • SAS Key Generation: Container, Tables, Files
  • SAS Key Permissions, Validation Options
  • Access Keys: Account Level Permissions
  • Azure Active Directory (AAD): Users, Groups
  • Azure AD Security: RBAC with IAM, ACLs
  • Owner Role, Contributor and Reader Role
  • Azure Data Lake Storage Security Options
  • ACL : Access Control Lists & Security
  • Azure BLOB Storage Containers & ACLs
  • Folder Level and File Level Security
  • ACL Permissions: Read, Write & Execute
  • Access Policy: Creation and Realtime Use
  • Permissions: rwacdl; Azure Principals, CORS
  • Comparing IAM and ACLs in Data Lake Store

Chapter 4: Open Data Sources, DeltaLakes

  • Creating Python Notebooks with Databricks
  • Spark Dataframes with Azure OpenDatasets
  • Windows Azure Storage Blob [wasb] Sources
  • Creating Dataframes & Temporary Views
  • Using Print and Display Functions with ADB
  • Big Data Analysis with BLOB Data & Charts
  • Keys, Values, Aggregations, Display Type
  • Databricks Notebooks, Jobs and Stages
  • Azure DeltaLake Implementation
  • ACID Properties and Upsert Advantages
  • Delta Engine Optimizations & Uses
  • Pipeline Creation with JSON Files in DBFS
  • Delta Tables Creation, Data Loads
  • Spark Cluster Settings: Auto Optimize
  • Auto Compact and Delta Table Optimize
  • Delta Locations; Data Retrieval, Versions

Chapter 5: OnPremise Data with ADF

  • On-Premise Data Sources with Azure
  • Self Hosted Integration Runtime (IR)
  • Access Keys, Remote Linked Services
  • Synapse SQL Pool (DW) with OnPremise
  • Staged Data Copy and Performance
  • Pipeline Executions and Monitoring
  • Pipeline RunIDs and Audits / Tracing
  • Incompatible Rows Skips, Fault Tolerance
  • Incremental Loads with Files (BLOB)
  • Pipeline Executions and Schedules
  • Regular Schedules and Tumbling Window
  • Execution Retry and Delay Options
  • Binary Copy, Last Modified Date in Blob
  • Automated Loops and Trigger Schedules
  • Incremental Loads Verification Tests

Chapter 5: Azure Monitoring, Power BI

  • Azure Monitor, Metrics & Logs
  • Monitoring Azure Storage Namespaces
  • Add KQL Metrics; Account, Blob and File
  • Total Ingress and Egress Metrics: Charts
  • Average Latency, Transaction Count
  • Request Breakdowns, Signal Logic Options
  • Azure Alerts and Conditions, Notifications
  • Signal Logic Conditions and Emails
  • Power BI Desktop Tool Installation
  • Binary Data and Record Data Access
  • Azure Data Lake Storage: Access Keys
  • Azure Data Lake Storage with Power BI
  • BLOB File Access with Power BI
  • Azure Tables Creation and File Imports
  • Azure Table Access with Power BI

Chapter 5: Databricks Security & Jobs

  • Azure Databricks Security Operations
  • Azure Active Directory (Azure AD)
  • AD Users and RBAC with IAM
  • Owner, Contributor & Reader Roles
  • Workspace Admin Permissions
  • Notebook Permissions and Share Options
  • Shared Notebooks, User Access Options
  • Notebook Operations: Clone & Export
  • Databricks Jobs: Creation Options, Usage
  • Job Limits, Workspace, Concurrency Limits
  • Notebooks with and without Parameters
  • Jobs with Default Parameters, Executions
  • Interactive, Automated Clusters for Jobs
  • Job Schedules and Manual Executions
  • Active Jobs, Recently Run Jobs, Monitoring
  • ADB Jobs with Azure OData Sources, BLOB

Chapter 6: ADF Data Flow - 1

  • Limitations with Copy Data Tool
  • Data Flow Task, Data Flow Activity
  • Transformations with Data Flow
  • Spark Cluster For Debugging
  • Cluster Node Configurations
  • Data Preview Options with DFT
  • SELECT Transformation & Options
  • JOIN Transformation and Usage
  • Conditional Split Transformation
  • Aggregate & Group By Transformations
  • Synapse Sink Options with DFT
  • DFT Optimization Techniques
  • Pipeline Debug Runs and ETL Testing
  • Spark Cluster For Pipeline Executions
  • Pipeline Monitoring & Run IDs

Chapter 6: Azure Stream Analytics, IoT

  • Azure Stream Analytics: Real-time Usage
  • Real-time Data Processing, Event Tracking\
  • Ingest, Deliver and Analysis Operations
  • Azure Stream Analytics Jobs Concept
  • Understanding Input & Output Options
  • SAQL Queries for Stream Analytics Jobs
  • IoT: Internet Of Things For Real-time Data
  • Need for IoT Hubs and Event Hubs
  • Creating IoT Device for Data Inputs
  • Creating Azure Strean Analytics Resource
  • Stream Analytics Jobs for Historical Data
  • Azure SQL Database Options for ASA Jobs
  • SAQL: Query Formatting and Validation
  • Historical Data Uploads, ASA Job Execution
  • Stream Analytics Job Monitoring Options

Chapter 6: Databricks @ BLOB, Power BI

  • BLOB Data Access with Databricks
  • Accessing Storage Account, Container
  • Gerate, Use SAS: Shared Access Signature
  • dbutils.fs.mount() with DBFS Store
  • fs.azure.sas.container.strorageaccount
  • spark.read() and DBFS Mounts
  • Scala Transformations, Create Temp View
  • Spark SQL Queries with Temp Views
  • dataframe.write.jdbc() & JVM Properties
  • spark.read.jdbc() with Azure SQL DB
  • Power BI Integration with Databricks
  • Server Host Name, Port and Http Path
  • Cluster Configurations and JDBC
  • User Access Token Generation, Usage
  • Spark ClusterAccess, Power BI Analytics

Chapter 7: ADF Data Flow - 2

  • ADF Pipelines For ETL Operations
  • Data Flow Tasks and Activities in Synapse
  • Pivot Transformation For Normalization
  • Generating Pivot Column, Aggregations
  • Pivot Transformation and Pivot Settings
  • Pivot Key Selection, Value and Nulls
  • Pivoted Columns and Column Pattern
  • Column Prefix, Help Graphic & Metadata
  • Window Functions & Usage in Data Flow
  • Rank / DenseRank / Row Number
  • Over Clause and Input Options
  • Derived Column Transformations
  • Exists & Lookup Transformations
  • Reusing Data Flow Tasks in Synapse
  • Pipeline Validations & Executions

Chapter 7: IoT Hubs & Event Hubs

  • Azure Stream Analytics For API Data
  • IoT Hubs & IoT Devices, Connection Strings
  • Rasberry APP Connections with IoT Hub
  • Azure Storage Account and Container
  • Creating Azure Stream Analytics Job
  • Configuring Input Aliases with IoT Hub
  • Configuring Output Alias with ADLS Gen 2
  • SAQL Query and Job Executions; Monitoring
  • Azure Event Hubs and Event Instances
  • Event Hub Namespaces, Partition Counts
  • Access Policies, Permissions & Defaults
  • RootManageSharedAccessKey & Options
  • Connection Strings & Event Service Bus
  • Telco App Installation, Executions. LIVE Data
  • On-Premise App Integration with ASA Jobs

Chapter 7: Databricks Integrations

  • Azure Databricks with Data Lake Storage
  • Handling Unstructured Data in Azure
  • Data Preparation and Staging Operations
  • Azure App (Service Principal) Registration
  • Azure Key Vault Creation & Key Usage
  • Service Principal Permissions @ Data Loads
  • Tenants and Authorization Settings
  • Client Credentials, Token Provider Options
  • Spark Notebooks For Dynamic Connections
  • Parameterized Options & Blob Access
  • Data Preparation & Big Data Ingestion
  • Data Extraction and ADLS Storage
  • show(), transformations, wasbs Options
  • Azure SQL Server & Synapse Creations
  • Data Loads with Incremental Changes

Chapter 8: Azure Synapse Analytics

  • Azure Synapse Analytics Resource
  • Azure Synapse Analytics Workspace
  • Managed Resource Group, SQL Account
  • SQL Admin Account and its Purpose
  • Operations with Synapse Workspace
  • ADLS Gen 2 Storage Account, Container
  • Synapse Studio (Synapse Portal)
  • Dedicated SQL Pools & Spark Pools
  • Creating Dedicated SQL Pools
  • Synapse Tables, Data Loads with T-SQL
  • COPY INTO Statements with T-SQL
  • Clustered Column Store Indexes
  • Row Terminator and Compressions
  • T-SQL Queries and Aggregations
  • Aggregation Data Loads in Synapse

Chapter 8: Azure Stream Analytics Security

  • Azure Key Vaults & ADLS [Data Lake] Security
  • Azure Passwords, Keys and Certificates
  • Azure Key Vaults - Name and Vault URI
  • Inbuilt Managed Key and Azure Key Vault
  • Standard Type, Premium Type Azure Key Vaults
  • Secret Page, Key Backups and Key Restores
  • Adding Keys to Azure Vaults. Key Type, Size
  • Using Azure Key Vaults to secure Resources
  • Azure Storage: Replications and DR Options
  • LRS: Locally Redundant Storage
  • GRS: Globally Redundant Storage
  • ZRS: Zone Redundant Storage
  • Replication Options and Advantages
  • Replication Verification and Modifications
  • Azure Storage Endpoints, Failover Partner

Real-Time Project

  • ADF Integration, Real-time Project
  • Azure Databricks Integrations with ADF
  • Defining Scala Notebooks in ADB
  • Using Notebooks in Azure Data Factory
  • spark.conf.set & fs.azure.account.key
  • spark.read.format, Option() and Head()
  • Online Retail Database Data Source
  • Azure Migrations and ETL Concepts
  • Azure SQL Pool (Synapse DWH) Tables
  • Apache Spark Pool : Databases, Tables
  • Azure Data Lake Storage (ADLS Gen 2)
  • Azure Stream Analytics Jobs with IoT
  • Azure Data Bricks and DBFS, Notebooks
  • Concept wise FAQs, Resume Guidance
  • Project Requirement, Solution, FAQs
  • DP 203 Certification Guidance

Chapter 9: Synapse Analytics with Spark

  • Apache Spark Pool in Azure Synapse
  • Spark Cluster Nodes: Vcores, Memory
  • Creating Spark Clusters @ Synapse Studio
  • Python Notebooks For Remote Access
  • Creating Databases in Apache Spark Pool
  • Data Loads from Dedicated SQL Pools
  • Table Creations, Aggregation Operations
  • PySpark Code for Data Operations, Writes
  • Serverless Pool in Azure Synapse
  • Connections, Usage with Serverless Pool
  • Using Azure OpenDatasets in Synapse
  • OPENROWSET and BULK Data Loads
  • Azure Storage Account : Data Analysis
  • Working with Parquet Files in Synapse
  • Python Notebooks (Pyspark) in Synapse

Chapter 10: Incremental Loads @ Synapse

  • Incremental Loads with Synapse Studio
  • Multi Table Merge Operations
  • On-Premise Data Sources & Timestamps
  • Azure SQL DB Destinations, Watermarks
  • Watermark Table Usage & Audits
  • Stored Procedures for Timestamp Updates
  • Table Data Type and Dynamic MERGE
  • SQL Queries for Datasets and Fetch
  • Lookup Activity and its Usage un Synapse
  • Expressions in ADF Portal for Lookup
  • Expressions in ADF Portal for Source
  • Output Pipeline Expression, Data Window
  • Concat Function, Run IDs Expressions
  • JSON Parameters, Pipeline Scheduling
  • Pipeline Validation, Trigger and Monitoring

Chapter 11: Optimizations, Power Query

  • ADF ETL with GUI : Power Query
  • Power Query Resoruce Creation, Use
  • Source Data Configurations & Settings
  • Rename, Remove, Pivot, Group By, Order
  • Index, Filter, Remove Error Rows
  • Using Power Query Activity, ADF Pipelines
  • Spark Cluster Configurations for Pipelines
  • Concurrency, Big Data Recommendations
  • Storage Optimization Techniques
  • ETL Optimization Techniques
  • SQL Pool (Synapse) Optimizations
  • Indexes, Partitions, Distributions, DOP
  • Pipeline Optimization Techniques
  • Partitions, DOCP, Compressions, DIU
  • Staging, Polybase and Core Counts

Chapter 12: Pipeline Monitoring, Security

  • Azure Monitor Resource and Usage
  • Pipeline Monitoring Techniques
  • ADF: Pipeline Monitoring and Alerts
  • Synapse: Pipeline Monitoring and Alerts
  • Synapse: Storage Monitoring and Alerts
  • Conditions, Signal Rules and Metrics
  • Email Notifications with Azure
  • Concurrency, Big Data Recommendations
  • Azure Active Directory (AAD) Users, Groups
  • IAM: Identity & Access Management
  • Synapse Workspace Security with RBAC
  • ADF Security with RBAC: Owner, Contributor
  • Azure Synapse SQL Pool Security: Logins
  • Users, Roles and Resource Classes (RC)
  • ADF V1 to V2 Migrations, Considerations

SQL Server, SQL DBA, MSBI, Azure SQL Dev, Azure SQL DBA, Azure BI, Power BI Training