Microsoft Certified : Azure Data Engineer Associate (DP-200, DP-201)

Complete Practical and Real-time Training on Azure Data Engineer. This Job Oriented Certification Course includes : Azure Fundamentals, Azure Active Directory, Azure SQL Databases, Azure Migrations, Azure Azure Data Factory (ADF), Azure Synapse, Azure Databricks (ADB), Azure Cosmos DB, Azure Stream Analytics, Azure Data Lake Storage, Azure Data Lake Analytics, Azure Key Vaults and Azure Data Share. Also includes End to End Real-time Project with Power BI Integrations including Storage Explorer Tool, Data Explorer Tool, Python/R/Scala Notebooks and Big Data Analytics.

This Azure Data Engineer Training course is applicable for DP 200 and DP 201 Microsoft Certification Examinations.
 
 

Azure Data Engineer Training Content

1. Azure Fundamentals: What is Cloud? Advantages of Azure Cloud? IaaS, SaaS & PaaS. Azure Data Engineer Technologies, Job Roles. DP 200, 201 Exams Azure Account Registration and Free Trail Activation; Understanding Azure Resources and Resource Types; Creating Resource Groups in Azure Portal;
2. Azure SQL Server: Azure Resources; Resource Groups; Azure SQL Server [Logical Server] Creation; Server Name Format and Firewall Rules; Azure Services Access with Firewall; Test Connections with SSMS Tool and Azure Data Studio Tool; Creating Azure SQL Databases in Portal, T-SQL; Tables, Data Inserts;
3. Azure SQL DB Migrations: Azure SQL DB Migrations from OnPremise; Using Data Migration Assistant Tool; Migration Assessments; Deploy Schema & Migrate Data Options; Onpremise Versus Azure SQL DB Differences; Generating bacpac Files From SSMS Tool; Azure SQL DB Exports & Imports;

Mod 1: Azure Data Factory, Azure Synapse, Azure Data Share, Azure Key Vaults

Mod 2: Azure Storage, Azure Data Lake Storage, Azure Data Lake Analytics, U-SQL

Mod 3: Azure Databricks, Azure Cosmos DB, Azure Stream Analytics, NoSQL

Ch 1: Azure Data Factory, Synapse Intro

  • Azure Data Factory (ADF) Operations
  • Hybrid Data Ingestion, Orchestration
  • Data Processing & Movement in ADF
  • Data Pipelines, Flows & Wrangling
  • Data Mashup and ETL in Azure
  • Azure Synapse (Data Warehouse)
  • Enterprise Warehouse with Synapse
  • Azure Synapse (SQL Pools) Creation
  • DWUs : Data Warehouse Units &
  • Big Data Storage and Analytics
  • Column Store in Azure Synapse
  • Automated Tuning and Security
  • Access, Pause/Resume with Synapse
  • SSMS & ADS Tools Connections

Ch 11: Azure Storage Concepts

  • Azure Storage Managed Service & Use
  • Azure Storage Services and Types
  • High Availability, Durability, Scalability
  • Blob: Binary Large Object Storage
  • General Purpose : Gen 1 and Gen 2
  • Blobs, File Share, Queues and Tables
  • Data Lake Gen 2 with Azure Storage
  • Blob - File System and Object Storage
  • Queues: Message Store, Secured Access
  • File Share - SMB [Server Message Block]
  • Azure Tables - Unstructured Data Store
  • Block Blob, Append Blob and Page Blobs
  • HTTP & HTTPS Access to Azure Services
  • Azure Storage Containers, End Points

Ch 21: Azure Databricks Configurations

  • Azure Databricks - Spark Based Analytics
  • ADB Workspace &Data Science Analytics
  • Workspace Options, Databricks Runtime
  • Serverless Storage, ETL, Analytics
  • Databricks File System DBFS in Real-time
  • Notebooks: SQL, Spark, Python, Scala
  • Apache Spark Eco System & Integrations
  • Azure Databricks Deployments, Workspace
  • Databricks Pricing; Databricks Units (DBUs)
  • Databricks Storage, Network Security Group
  • Databricks Clusters : Architecture
  • Standard & High Concurrency Clusters
  • Databricks Pools, Capacity : CPU, Memory
  • Autopilot Options; Worker & Driver Nodes

Ch 2: Azure Synapse Architecture

  • MPP - Massively Parallel Processsing
  • Control Node and Compute Nodes
  • Azure Storage, DMS and DWUs
  • Round Robin, Replicate, Hash Tables
  • Service Level Objective, Sharding
  • Resource Classes; Gen 1 and Gen 2;
  • Table Creation, Storage, Distribution
  • CTAS: Create Table As Select. Indexes
  • Distribution Types, Time Partitions
  • Logins and Users in SQL Server
  • Users and Roles in Synapse SQL DW
  • Resource Classes; Blob Data Import
  • COPY INTO Statement in T-SQL
  • Data Monitoring Scripts with T-SQL

Ch 12: Azure Tables & Azure BLOB

  • Azure Tables - Real-time Use, NoSQL
  • Schema-less Design and Access Options
  • Structured and Realtional Data Storage
  • Tables, Entities and Properties Concepts
  • Azure Storage Account for Table Store
  • Azure Tables in Portal - GUI, Data Types
  • Azure Tables using Storage Explorer Tool
  • Query Azure Tables @ Query Builder
  • Data Imports From Excel, CSV Files
  • BLOB Data Imports @ T-SQL Queries
  • SAS: Shared Access Signature
  • CSV Uploads, Downloads, Edits, Keys
  • Master Key Credentials, External Sources
  • BULK INSERT Statement, Data Imports

Ch 22: Databricks Notebooks, Spark Jobs

  • Databricks Workspace and Spark Clusters
  • FileStore and Tables. Notebook Options
  • Data File Uploads an Tables to DBFS
  • Notebook Creation, Cells, Cmd Executions
  • Python Notebooks, ETL, Data Access
  • Data Frames Creation, Access, Analytics
  • Reports, Graphs, Plot and Custom UI
  • Spark Jobs with Azure Open Datasets
  • Notebooks For Azure BLOB Data Access
  • Remote Data @ Spark Jobs in Notebooks
  • Parquet Files and Data Frames with Spark
  • Select Queries on Temporary Data Views
  • Bar Chart and Custom Reports, Analytics
  • ADB Plots: Aggregation and Display Type

Ch 3: Azure Data Factory Architecture

  • ADF Pipeline Design, Publish, Trigger
  • ADF Architecture, Pipelines & ETL
  • DIU : Data Integration Units; Concurrency
  • Linked Service, Dataset & Activities
  • Staging Data - Advantages and Pricing
  • Polybase Indexes, Compression Options
  • Mapping Data Flow, Wrangling Data Flow
  • Pipeline Creation using Copy Data Tool
  • Azure BLOB Storage to Synapse DB
  • Linked Service and Datasets. Mapping
  • Polybase; Staging, Bulk Import Options
  • Validate, Publish Pipelines to ADF Store
  • Pipeline Execution (Triggers), Monitoring
  • Auto Resolving Integration Runtime (IR)

Ch 13: Azure Files, Queues & Security

  • Azure Files - SMB Protocol, Creation
  • Shared Access, Fully Managed, Resilency
  • Performance, Size Requirements for Shares
  • Azure Storage Explorer Tool for File Access
  • Azure Queues and Message Queues
  • Adding Messages, Queing and De-Queing
  • Clear Queue and Messages from Explorer
  • Azure Storage Security - Storage Keys
  • Shared Access - Primary, Secondary Keys
  • SAS: Shared Access Signature Generation
  • Encryption and Data Security at REST
  • CORS (Cross Origin Resource Sharing)
  • Auditing Access, Network Access Rules
  • Firewall, Advanced Threat Protection

Ch 23: Python Notebooks & Operations

  • Databricks Notebooks, Cells and Usage
  • Execution & Idle Contexts and Evictions
  • Azure Databricks Notebooks Tasks
  • Cluster Configuration Metadata Reads
  • Notebook Schedules, Cloning, URL Path
  • Notebook Exports and Imports; Re-use
  • Cluster Configurations with Notebooks
  • Python Notebooks and Magic Commands
  • CSV Files to DBFS. Access using Python
  • JDBC Hosts, Connection String, Access
  • SQL Contexts & SQL DB Connections
  • Data Imports with pyspark Assemblies
  • Pandas Data frames in Python Notebooks
  • Tables and Data Imports using Notebooks

Ch 4: Azure Data Lake with ADF

  • Creating Azure Data Lake Storage
  • Data Lake Gen 2 Hierarchial Namespace
  • Excel Upload to Container, Data Preview
  • Pipeline Parameters, Variables, OUT
  • Copy Data Tool: Timeout and Schedule
  • Secured Pipelines and Linked Services
  • Sink Options; Colum Mapping, Triggers
  • Azure SQL Database Loads to Synapse
  • Azure SQL Database Tables Data Loads
  • For Each Loops with ADF Pipelines
  • Copy Data Tool, Pipeline Edits in ADF
  • Task Schedules and Tumbling Window
  • Pipeline Execution & Runs; Monitor

Ch 14: Azure Monitor, KQL, Power Shell

  • Azure Monitor Components for Storage
  • Metrics and Logs with Azure Storage
  • Monitoring the Azure Storage Namespaces
  • Adding KQL Metrics; Account, Blob and File
  • Ingress & Egress Chart; Average Latency
  • Request Breakdowns, Signal Logic Options
  • Alerts, Conditions, Notifications and Emails
  • Power Shell Commands for Azure Storage
  • PowerShell Remoting: Scripts and cmdlets
  • Background Jobs, Transactions & Eventing
  • Network Transfer & Power Shell Types
  • $ # Prefix, Resource Groups in Power Shell
  • Creating Storage Account, Context & Files

Ch 24: Scala Notebooks, SQL Notebooks

  • Scala Notebooks and Big Data Loads
  • CSV Files from Databricks File System
  • Data Source Connections with Spark
  • Driver Classes and SQL Server Drivers
  • Data Frames in Spark, Reading Data
  • Display, Transformations with Spark
  • Data Loads to Azure SQL Synapse
  • SQL Notebooks & Data Frame, Access
  • Python Magic Comands for Data Access
  • Data Frame View, Testing data.take(n)
  • SQL Context for Data Representations
  • SQL in Notebooks: SELECT, WHERE
  • ORDER BY, GROUP BY, TOP, LIMIT

Ch 5: On-Premise Data to Azure

  • On-Premise Data Sources with Azure
  • Install Self Hosted Integration Runtime
  • Access Keys & Use. Configuration Tools
  • Remote Linked Services in ADF & SH IR
  • Authentication with Integration Runtime
  • Sourc, Sink Linked Service Connections
  • Incompatable Rows Skip, Fault Tolerance
  • Table Mapping, Column Mapping, Errors
  • Synapse Pool Connection with Onpremise
  • Staged Data Copy & ETL Performance
  • Azure Blob for Staging. Polybase
  • Connections Management - Preview
  • Pipeline Exectution, Run IDs, Errors

Ch 15: Azure Stream Analytics, Event Hubs

  • Azure Stream Analytics Pattern, Realations
  • Ingest & Analyse; Stream Analytics Jobs
  • IoT Hub, IoT Devices; Transformations
  • IoT, Stream Analytics Jobs Monitoring
  • Stream Analytics Jobs: Edits, Security
  • Streaming Units, Error Handling
  • Test Result, Output Schema in Hubs
  • IoT Hubs, IoT Events: Azure SQL DB
  • Azure Stream Analytics Integration
  • Event Hub Policies, Consumer Groups
  • Power BI Reports from Azure Storage
  • Shared Access Signature with Power BI
  • Data Visualizations with Azure Storage

Ch 25: Databricks Jobs & Power BI

  • Databricks Jobs : Creation and Usage
  • Job, Workspace & Concurrency Limits
  • Notebooks with and without Parameters
  • Jobs with Default Parameter Execution
  • Interactive and Automated Clusters
  • Job Schedules & Manual Executions
  • Active Jobs and Job Monitoring with ADB
  • Databricks with Power BI Desktop
  • Spark Connectors in Power BI
  • Access Token from Azure Databricks
  • Spark Cluster Connections, Nodes, Pools
  • Server Host Name, Ports and HTTP
  • Power BI Reports with Databricks

Ch 6: Incremental Loads with ADF - 1

  • ADF Pipelines with Stored Procedures
  • Watermark Tables and Timestamp Columns
  • Incremental Data Loads to Azure DW
  • New Rows and Old Rows Indentifications
  • Storing High Water Mark Data
  • Stored Procedures for Timestamp Updates
  • Azure Storage Container Incremental Loads
  • Lookup in ADF Portal & ModifiedDate
  • Expressions in ADF Portal for Lookup
  • Expressions in ADF Portal for Source
  • @activity with output Data Pipelines
  • SQL Queries for Dataset Creation
  • Concat Function, Run IDs For File Names
  • ADF Pipeline Validation and Triggers

Ch 16: Azure Data Lake Storage (ADLS)

  • Azure Data Lake Storage - Data Store
  • LIVE Edits, Permissions & Sharing
  • Hadoop based on Apache YARN
  • HDFS File System , Map Reduce
  • Authentication & Access Control
  • Azure Data Lake Gen 1 - Deployment
  • Encryption with Service Master Key
  • ADLS - Pricing and Instance Details
  • Data Explorer Tool in Azure Portal
  • Azure Strorage Explorer Tool
  • File Preview and Header Row Promotion
  • Download / Rename / Access Properties
  • Folder Upload & Download; Quick Access
  • Cached File Access & Folder Statistics

Ch 26: Azure Cosmos DB - Architecture

  • Azure Cosmos DB: Gloabally Distributed
  • Multi Model Support for Big Data
  • Turnkey Global Distribution in Cosmos
  • Always-On, Elastic Scalability, Low Pricing
  • SQL API, Mongo DB, Cassandra, Gremlin
  • Table API: Real-time Applicative Uses
  • Azure Cosmos DB and Database Concept
  • Containers - Collection, Table & Graph
  • Items - Document, Rows, Node and Edge
  • Create Azure Cosmos DB Account in Portal
  • Create Azure Cosmos DB with Data Explorer
  • Creating Containers, Add JSON Documents
  • Data Store and Data Access (Querying)
  • Scaling Options for Cosmos DB & Cautions

Ch 7: Incremental Loads with ADF - 2

  • Incremental Load Pipeline Design in ADF
  • Working with Azure Storage Containers
  • Pipeline Executions, Incremental Schedules
  • Regular Schedules & Tumbiling Windows
  • Binay Copy, Last Modified Date in Blob
  • Pipeline Trigger Schedules, Modifications
  • Incompatable Rows Skips, Fault Tolerance
  • Incremental Loads with Mutliple Tables
  • Stored Procedures, Loops in ADF Pipelines
  • Configure ETL Sources, Pre-Copy Scripts
  • Using @{item() with Dyanamic Connections
  • Table_Schema for Column Mapping
  • Writing Expressions For Dynamic Loads
  • Staging and Performance for ADF Loads

Ch 17: ADLS Monitoring, Alerts

  • Azure Data Lake Monitoring and Alerts
  • ADL Metrics, Storage Utilization Reports
  • Reads & Writes Metrics; Charts, Metrics
  • Data Reads, Writes, Requests - Storage
  • Report Shares, Download to Excel, Alerts
  • Scope, Conditions and Action Groups
  • Email Notifications and Scope Options
  • ADLS - Security Management and Levels
  • ADLS Resource Levels, Folder / File Leve
  • IP Address; Role Based Access (RBAC)
  • Access Control Lists (ACL), IAM AD Roles
  • POSIX - Access ACLs and Default ACLs
  • ACL Permissions; Read, Write,; Execute
  • Super User, RWX, Owning Users, Groups

Ch 27: Azure Cosmos DB Queries Level 1

  • Hierarchial JSON Documents with Cosmos
  • Embrace SQL, Extend SQL with NoSQL
  • Writing, Adding and Importing JSON
  • NoSQL Query Concepts and Executions
  • SELECT Format and Query Items
  • Request Charge, Results and IO Reads
  • Writes, Index, Lokup and Roundtrip
  • JSON Document & Key Value Pairs
  • Data Storage, Query: WHERE, SET
  • FROM, Aliases, Geo Spatial Queries
  • JSON Scripts to Access Sub Documents
  • Hierarchial Data, Parent-Child Relations
  • NoSQL: Unary and Binary Operators
  • SELECT with IN, BETWEEN, TOP, JOIN

Ch 8: Mapping Data Flow in ADF

  • Data Flow Task Creation in ADF Pipelines
  • Transformation Editor and Parameters
  • Comparing ADF Pipelines and Data Flow
  • Debugging: ADF Managed Executions
  • Apache Spark Clusters @ ADF Debugging
  • Authoring Data Flow, Graph, Configuration
  • Transformation Setting, Optimize, Inspect
  • Conditional Split Transformation in ADF
  • Pivot Transformation in Mapping Data Flow
  • Pivot Column & Aggregation Functions
  • Pivot Transformation, Pivot Settings
  • Pivot Key Value, Enabling Null Values
  • Pivoted Columns, Pattern, Optimize
  • Column Prefix, Help Graphic, Metadata

Ch 18: Azure Key Vaults & ADL Analytics

  • Azure Passwords, Keys and Certificates
  • Azure Key Vaults - Name and Vault URI
  • Inbuilt Managed Key, Azure Key Vault
  • Standard & Premium Azure Key Vaults
  • Identify Vault Name, URI: Access Points
  • Secret Page, Key Backups & Restores
  • Adding Keys to Azure Vaults, Types
  • Azure Data Lake Analytics Creation
  • Dynamic Scaling, U-SQL Implementation
  • Azure Data Lake Storage for Data Lake
  • Jobs Creation, Execution Environment
  • Distibuted Runtime Environment in ADLA
  • ADLA - On-demand Job Service in Azure
  • Exabyte Scale and Data Lake in USQL

Ch 28: Azure Cosmos DB Queries Level 2

  • Data Import Tool : Installation and Usage
  • JSON Data Imports to Azure Cosmos DB
  • Azure Cosmos Endpoints, Access Keys
  • NoSQL Queries on JSON Documents
  • Writing Stored Procedures & Functions
  • ACID Properties : Atomocity, Consistency
  • Isolation, Durability with Procedures
  • SP Coding, Execution with Parameter
  • Stored Procedures for Document Uploads
  • Procedures with Variables, getResponse()
  • Procedure Advantages, Execution Option
  • User Defined Functions (UDF) in Cosmos
  • UDF Executions using Cosmos DB Scripts
  • Dynamic Calculations & Reporting Options

Ch 9: Wrangling Data Flow in ADF

  • Wrangling Data Flow in ADF : Advantages
  • Power Query Online Editor for Mashup
  • Spark Code for Cloud Scale Executions
  • Wrangling Data For Less Formal Analytics
  • Sources and Sinks with Wrangling DF
  • Github Integration with ADF Repository
  • User Defined Data Stores in GitHub
  • Transformations in Data Wrangling
  • Group By, Aggregate, Reordering
  • Pivot, Aggregations in Power Query
  • ADF Data Types & ADF Pipeline Store
  • Heterogenous Sources in Power Query
  • ADF Publish, GitHub Store Differences

Ch 19: Data Lake Analytics, U-SQL 1

  • Azure Data Lake Analytics : Advantages
  • USQL - Big Data Processing Language
  • Azure Portal and Visual Studio Access
  • Aggregate, Analytical, Ranking Functions
  • U-SQL Catalog : Databases and Objects
  • Rowsets, Types and USQL Expressions
  • Azure U-SQL Jobs For Data Insertions
  • U-SQL Jobs for Storage, Retreival
  • SELECT, EXTRACT & OUTPUT Clauses
  • USING, Outputters, Extractors Classes
  • Script Execution, Job Graph, Diagnosis
  • AU Allocation, Analysis for Job Execution
  • Script Reuse. CSC, User & System Errors

Ch 29: Azure Notebooks, Azure Functions

  • Azure Notebooks, Serverles Deployments
  • Azure Cosmos Notebooks and Advantages
  • Jupyter Notebook : Implementation, Usage
  • Pandas DataFrame with Python Scripts
  • Understanding Notebook and Cells
  • Python Script for Azure Cosmos DB
  • Python Script for Data Import & Report
  • Azure Functions: Creation and Apps
  • Azure Functions @ Cosmos DB Triggers
  • Azure Function App Service Plans
  • Serverless Components, Azure Insight
  • Azure Cosmos DB Monitoring, Logs
  • Azure Monitor Workbooks, Timelines

Ch 10: ADF : End to End Implementation

  • Azure Data Share: Configuration & Use
  • Azure Data Share: PaaS for ADF Shares
  • Importing BACPAC Files into Azure
  • Azure SQL DB: Data Lake Storage Gen 2
  • Data Filters, Aggregations, Joins in ADF
  • Spark Clusters for DataFlow Debugging
  • Multi Leve Data Flows in ADF Pipeline
  • Data Loads to Azure Synapse from ADLS
  • Data Load Settings and Optimization
  • ADF Pipeline Debugging, Publish in ADF
  • Data Shares with Azure Synapse Tables
  • Data Ingestion, Consumption with Synapse
  • Recipients and Azure AD Users, Accounts
  • Run IDs, Monitoring, Cost Analysis, Metrics

Ch 20: Data Lake Analytics, U-SQL 2

  • U - SQL Operations with Visual Studio
  • Script.usql, Executions & Job Graphs
  • ADLA Account & Local Job Executions
  • Metadata, State History & AU Analysis
  • Working with TSV Data Sources in ADLS
  • Extract, Format, Data Loads with U-SQL
  • Adding New Columns to Files with U-SQL
  • ADLA Jobs For Create Databases, Tables
  • ADLA Managed Tables & External Tables
  • Create Tables from Query Rowset Option
  • Clone & Copy Tables using U-SQL Jobs
  • Hash Distributed Tables, Clustered Index
  • TVF - Table Valued Functions, Retreival
  • SELECT, TOP, FETCH & ROW_NUMBER

Ch 30: Azure Cosmos DB Admin, Power BI

  • Conistency Levels in Azure Cosmos DB
  • Bounded Staleness, Session, Consistent Prefix
  • Eventual, Synchronization Options with Cosmos
  • Azure Cosmos DB : Backups & Restores. Retentions
  • IAM - Identity Access Management with Azure AD
  • Owner Role, Contributor Role and Reader Role
  • Cosmos DB Backup Operator, Account Reader Roles
  • Global Distribution Strategies and BCDR
  • Data Replication Options and High Availability
  • Multi Region Writes and Data Access Options
  • Azure Cosmos Database Cost Calculation Options
  • Availability Zones, Manual / Automated Failvoer
  • Azure Cosmos Database Cost Calculation Options
  • Costing Factors: Workloads and Multi Region Writes

Real-time Project @ Ecommerce Domain:

 

Includes On-Premise Migrations with bac Files, Azure Storage Compoments, Azure Data Ingestions using Azure Data Factory; Big Data Storage wit Synapse, Cosmos Database; Big Data Analytics using Azure Databricks and End User Reporting. Resume Support and DP 200 & DP 201 Certification

SQL Server, SQL DBA, MSBI, Azure SQL Dev, Azure SQL DBA, Azure BI, Power BI Training