Microsoft Certified : Azure Data Engineer Associate (DP-200, DP-201)

Azure Data Engineers design and implement the management, monitoring, security, and privacy of data using the full stack of Azure data services to satisfy business needs.

 

Module I: Azure DATA Factory

Module II: AZURE DATA FACTORY

Module III: AZURE DATA FACTORY

Ch 1: AZURE BI INTRODUCTION

  • Introduction to Azure Cloud
  • Cloud Architecture and Components
  • IaaS: Infrastructure as a Service
  • PaaS: Platform as a Service
  • SaaS: Software as a Service
  • Azure Resources and ARM Concepts
  • Azure : Advantages and Pricing
  • Azure Data Factory Pricing Plans
  • BI : Business Intelligence Terms
  • Database Types: OLTP, OLAP, DWH
  • Need for MSBI and Azure BI
  • Azure BI Components: ADF,AAS,ADLS
  • Azure Data Factory Architecture
  • ADF : Data Processing, Movement
  • Data Mashup and ETL Components
  • Data Warehouse, Analysis Components
  • End to End Implementation Plan

Ch 5: Azure Database Migrations

  • Database and Data Migration Options
  • Azure Data Migrations with DMA Tool
  • On-Premise Database Assessments
  • Migration Scope, Object Scripting
  • Schema Migration and Data Migrations
  • Database Exports into BACPAC Files
  • Azure Storage Container For Imports
  • Database Imports using Azure Portal
  • Database Imports using SSMS Tool
  • On-Premise Versus Azure Architecture
  • Server Architecture Differences
  • Database Architecture Differences
  • Table Architecture Differences
  • Query Processing Differences
  • T-SQL Query Keyword Differences
  • Memory Resources, Buffer Manager
  • DB File Structures and Protocols
  • SQL Server Logical Server in Azure

Ch 9 : INCREMENTAL DATA LOADS 1

  • Incremental Data Loads with ETL
  • Incremental Loads in ADF Portal
  • Source Data & Watermark Column
  • Watermark Tables : Creation, Use
  • TimeStamp For Incremental Loads
  • Configuration Procedures for ETL
  • Using Lookup Activities in ADF
  • Lookup Transformation with ADF
  • Watermark Column Extraction
  • Activity Connections in ADF Portal
  • Expressions and Usage in ADF
  • Generating New Watermark Columns
  • Lookup Output Row Level Audits
  • Stored Procedures and Dynamic Values
  • Parameterized Values for Data Extraction
  • Using ADF Run IDs for Connections
  • Parameter Imports and Staging Options
  • Working with Lookup Connections in ADF
  • Output() Functions & Item() Functions

Ch 2: SSIS BASICS, INSTALLATIONS

  • Database Types: OLTP, DWH, OLAP
  • SQL Server, Tools Installations
  • Database Engine & SSIS Components
  • Need for ETL Components, Tools
  • SSIS Basics and Package Environment
  • Azure Data Factory (ADF) Pipelines
  • SSIS versus ADF : Basic Differences
  • ADF : Workflow & Data Source Types
  • Versions and Editions of SQL Server
  • SSMS Tool Installation, Connections
  • SSIS DB Creation and Connection
  • SSIS Catalog and Real-time Use
  • SSDT (Visual Studio) Installation
  • Azure Data Studio Tool Installation
  • SQL Server Connection @ SSDT
  • SQL Server Connection @ ADS

Ch 6: AZURE DATA WAREHOUSE

  • Azure Data Warehouse (ADW)
  • Key Benefits of Azure Synapse
  • ADWH Resource Creation in SQL Server
  • Scaling Options, Compute Engine
  • ADW Connection from SSMS Tool
  • Data Warehouse Details: Name, Server
  • cDWU & DWU : Real-time Usage, Pricing
  • DWs and Scaling Options: DW Series
  • Massively Parallel Processing (MPP)
  • Sharding Benefits with MPP in DWH
  • Control Node and Compute Node
  • Data Distribution Components in ADW
  • DMS : Data Movement Service
  • Hash Functions, Distributed Tables
  • Round-robin Distributed Tables
  • Replicated Tables in Azure DWH
  • Queries Distribution, Nodes in ADW
  • Dynamic Management Views (DMV)

Ch 10 : INCREMENTAL DATA LOADS 2

  • Incremental Loads - Multiple Tables
  • Data Prep for Watermark Store
  • ForEach Activity and Real-time use
  • Lookup Activities Use in ADF
  • Linked Service Parameters, Use
  • Copy Data Activity for ETL in ADF
  • Stored Procedure Activity for ETL
  • Data Pipelines Parameters in ADF
  • Object Name Parameters, Iterations
  • Using Self Hosted IR Engine in ADF
  • Incremental Loads with Azure Storage
  • Dynamic Sources and Destinations
  • Data Preview & Dynamic Content
  • Linked Services & Datasets Creations
  • ADF Pipeline Executions, Run IDs
  • Dynamic File Formats and Connections
  • ADF Pipeline, Trigger Execution Runs

Ch 3: SSIS & ETL Basics

  • Basic Vocabulary: ETL, DWH, Data Flow
  • SSIS Package Environment and Projects
  • SSIS Solutions and Project Templates
  • Control Flow Tasks: Architecture, Purpose
  • Data Flow Tasks - Architecture and Usage
  • SSIS Packages: Data Flow Operations
  • Pipelines & Connections in Data Flow
  • SSIS Package Design and Execution
  • Basic SSIS Package Creation for ETL
  • Using Transformations in SSIS
  • DTSX Files for SSIS Package Executions
  • Data Flow Transformations, Data Splits
  • Basic Transformations: Conditional Split
  • Data Conversion Transformation, ETL
  • Source Assistants & Destination Assistants
  • SSIS Project Builds and ISPAC Files
  • SSIS Package Deployment, Verification
  • SSIS Package Executions, Schedules
  • Limitations of SSIS for ETL & DWH

Ch 7: ADF BASICS, COPY DATA TOOL

  • Azure Data Factory Architecture
  • ADF : Data Processing Components
  • ADF : Data Movement Components
  • Data Pipelines and ADF Activities
  • Data Orchestration and Datasets
  • ADF Resource Creation in Azure
  • Pipeline Concept, Linked Services
  • SSIS Lift and Shift with ADF
  • Azure Data Factory (ADF) : Purpose
  • Creating ADF Resource in Azure, Use
  • Pipeline Creation and Usage Options
  • Copy Data Tool in ADF Portal, Use
  • Linked Service Creation in ADF
  • Dataset Creation, Connection Reuse
  • Staging Dataset with Azure Storage
  • ADF Pipeline Deployments, Triggers
  • Polybase Benefits for Data Staging

Ch 11: AZURE DATA LAKE, ADF

  • Azure Data Lake Storage Environment
  • Azure Data Lake Gen 1 and Gen 2
  • Using ADF to get data from Data lake
  • ADF Integration with Azure Data Store
  • Scale-Out and Managed Data Transfer
  • Copy Data Performance Monitoring
  • Monitor, Activity Runs and Options
  • Data Partition Recommendations in ADL
  • Size Limitations and ADF Activities
  • Azure Data Warehouse Management
  • Threat Detection, Threat Protection
  • Dynamic resource classes, ADW Queries
  • Resource Class Operations & Precedence
  • Backups and Restores with ADWH
  • Export and Import Options with ADWH
  • TDE Option and Maintenance Schedules
  • Security Management, Role Based Access
  • Azure DWH Alerts and Rule Creations

Ch 4: Azure SQL Server, Database

  • Azure SQL Server Deployments
  • Azure Logical Servers, Usage
  • DataCenter Locations, Collation
  • Azure Resource Groups and Usage
  • Azure SQL Database Deployment
  • Azure Pricing Tiers: Basic Plan
  • Standard and Premium Plans
  • Azure SQL DB Connections @ SSMS
  • Azure SQL DB Connections @ ADS
  • Azure Storage : Purpose, Tools
  • Azure Storage Explorer Tool
  • Azure Storage Container, Use
  • Azure Blob Storage and Types
  • LRS, GRS and RA - GRS Storage
  • Hot Access, Replication Options
  • File Storage (Uploads) to Azure
  • Azure Storage Security Options

Ch 8: ETL WITH AZURE SQL DB

  • Copy Data From Azure SQL Database
  • Azure SQL Data Warehouse Sinks
  • Data Sets Creation, ADF Data Pipelines
  • Activities : Move & Transform for ETL
  • Actions and View Activity Runs
  • ADF Pipeline Monitoring and Refresh
  • ADF Triggers - Manual, Automatic
  • Azure Database to Azure Data Warehouse
  • Data Integration Units [DIU] Usage
  • Azure Data Factory Runtime Engine
  • On-Premise Data Extraction and Loads
  • Self Hosted IR in Copy Data Tool
  • IR Engine Installation, Key Concepts
  • IR Connections & Authentication Methods
  • IR Tool Configuration and Monitoring
  • Azure Storage Linked Service, Reuse

Ch 12: ADF MIGRATIONS FROM SSIS

  • Execute SSIS Packages in Azure
  • Azure-SSIS Integration Runtime, SSDT
  • Azure-SSIS IR Creation and Use
  • IR Setup and Cluster Size, Nodes
  • License Options, Location, Cores
  • Node Numbers & Parallel Executions
  • SSISDB Project Deployment Model
  • Execute SSIS Package Activity
  • Executing SSIS Packages in ADF
  • SSIS Project Connections to ADF
  • Azure Storage for SSIS Packages
  • Execute SSIS packages in Azure
  • SSIS Package Migrations to ADF
  • SSIS Catalog DB with ADF
  • On-Premise SSIS Versus Azure
  • SSIS Lift and Shift with ADF
  • Branching and Chaining in ADF
  • Data Lineage and Tagging in ADF

Implement non-relational data stores

  • implement a solution that uses Cosmos DB, Data Lake Storage Gen2, or Blob storage
  • implement data distribution and partitions
  • implement a consistency model in Cosmos DB
  • provision a non-relational data store
  • provide access to data to meet security requirements
  • implement for high availability, disaster recovery, and global distribution

Implement relational data stores

  • configure elastic pools
  • configure geo-replication
  • provide access to data to meet security requirements
  • implement for high availability, disaster recovery, and global distribution
  • implement data distribution and partitions for Azure Synapse Analytics
  • implement PolyBase

Manage data security

  • implement data masking
  • encrypt data at rest and in motion

Develop batch processing solutions

  • develop batch processing solutions by using Data Factory and Azure Databricks
  • ingest data by using PolyBase
  • implement the integration runtime for Data Factory
  • implement Copy Activity within Azure Data Factory
  • create linked services and datasets
  • create pipelines and activities
  • implement Mapping Data Flows in Azure Data Factory
  • create and schedule triggers
  • implement Azure Databricks clusters, notebooks, jobs, and autoscaling
  • ingest data into Azure Databricks

Develop streaming solutions

  • configure input and output
  • select the appropriate windowing functions
  • implement event processing by using Stream Analytics

Monitor data storage

  • monitor relational and non-relational data sources
  • implement Blob storage monitoring
  • implement Data Lake Storage monitoring
  • implement SQL Database monitoring
  • implement Azure Synapse Analytics monitoring
  • implement Cosmos DB monitoring
  • configure Azure Monitor alerts
  • implement auditing by using Azure Log Analytics

Monitor data processing

  • monitor Data Factory pipelines
  • monitor Azure Databricks
  • monitor Stream Analytics
  • configure Azure Monitor alerts
  • implement auditing by using Azure Log Analytics

Optimize Azure data solutions

  • troubleshoot data partitioning bottlenecks
  • optimize Data Lake Storage
  • optimize Stream Analytics
  • optimize Azure Synapse Analytics
  • optimize SQL Database
  • manage the data lifecycle

Recommend an Azure data storage solution based on requirements

  • choose the correct data storage solution to meet the technical and business requirements
  • choose the partition distribution type

Design non-relational cloud data stores

  • design data distribution and partitions
  • design for scale (including multi-region, latency, and throughput)
  • design a solution that uses Cosmos DB, Data Lake Storage Gen2, or Blob storage
  • select the appropriate Cosmos DB API
  • design a disaster recovery strategy
  • design for high availability

Design relational cloud data stores

  • design data distribution and partitions
  • design for scale (including multi-region, latency, and throughput)
  • design a solution that uses SQL Database and Azure Synapse Analytics
  • design a disaster recovery strategy
  • design for high availability

Design batch processing solutions

  • design batch processing solutions that use Data Factory and Azure Databricks
  • identify the optimal data ingestion method for a batch processing solution
  • identify where processing should take place, such as at the source, at the destination, or in transit
  • identify transformation logic to be used in the Mapping Data Flow in Azure Data Factory

Design real-time processing solutions

  • design for real-time processing by using Stream Analytics and Azure Databricks
  • design and provision compute resources

Design security for source data access

  • plan for secure endpoints (private/public)
  • choose the appropriate authentication mechanism, such as access keys, shared access signatures (SAS), and Azure Active Directory (Azure AD)

 Design security for data policies and standards

  • design data encryption for data at rest and in transit
  • design for data auditing and data masking
  • design for data privacy and data classification
  • design a data retention policy
  • plan an archiving strategy
  • plan to purge data based on business requirements