DP-203: Microsoft Azure Data Engineering

About Course
Are you ready to take your career to the next level in Azure Data Engineering?
Look no further! This comprehensive Udemy course is your ultimate guide to preparing for the DP-203: Microsoft Certified Azure Data Engineer Associate exam.
In today’s data-driven world, the demand for skilled Azure Data Engineers is higher. Organizations seek professionals who can design, implement, and manage data solutions using Microsoft Azure services. Achieving the DP-203 certification is a testament to your expertise in the field, making you a valuable asset in the industry.
What You’ll Learn:
-
Master Azure Data Services: Dive deep into Azure Data Factory, Azure Databricks, Azure HDInsight, and other essential Azure data services. Understand how these services can be leveraged to build data solutions.
-
Data Storage and Processing: Learn how to store and process data effectively using Azure SQL Data Warehouse, Azure Cosmos DB, and Azure Data Lake Storage.
-
Data Integration: Explore data integration techniques and best practices using Azure Data Factory, Azure Logic Apps, and more.
-
Data Orchestration: Gain expertise in orchestrating data workflows, data movement, and data transformation.
-
Data Security and Compliance: Understand how to secure your data solutions and ensure compliance with industry standards and regulations.
-
Monitoring and Optimization: Learn how to monitor the performance of your data solutions and optimize them for efficiency.
-
Real-world Scenarios: Get hands-on experience with real-world scenarios and practical exercises that simulate the exam environment.
-
Exam Preparation: Receive expert guidance on the DP-203 exam structure, question types, and test-taking strategies.
Regards
Vishmita Data Labs
What Will You Learn?
- Pass the Microsoft Azure DP-203 Azure Data Solutions test
- Learn Basic Data Engineering & Database Concept
- Learn Various Storage Product like Azure Storage, Queue, File, Table Storage
- Build structured data solution with various SQL database options
- Store Semi-structured data with CosmosDB
- Develop ETL, ELT Solution with Azure Data Factory
- Analyze Data with Data ware housing system - Synapse analytics
- Synapse SQL in Azure Synapse Analytics
- Azure Data Factory Control Flow Activities
- Apache Spark in Azure Synapse Analytics
- Azure Databricks
- Delta Lake & Data warehouse in Azure Databricks
- Azure Stream Analytics
Course Content
Basics of Data
-
Structured vs Unstructured vs Semi-Structured Data
-
Batch vs Streaming Data
-
OLTP vs OLAP
-
Data Lake vs Data Warehouse
-
Section Quiz
Azure Basic Services
-
Introduction to Cloud Computing
00:00 -
Introduction to Azure
-
Create Azure Free Account
-
Azure Portal Walkthrough
-
Managed and Unmanaged Service
-
Resource Management group and Subscription
-
Tagging
-
Access Control
-
Set Budget
00:00 -
Create Azure Resource Groups
01:57 -
Create Azure Resources
00:00 -
Delete resources
Azure Storage
-
Different Services for Azure Storage
-
Azure Blob Storage
-
Azure Queue
-
Azure File Share
-
Azure Disk Storage
Azure SQL Database
-
Introduction to Azure SQL Database
-
Azure SQL IaaS & PaaS Offering
-
Different Paas Deployment
-
SQL Pricing Model & Service Tier
-
Azure SQL Server in Virtual Machine (IaaS)
-
Azure SQL Database Backup and Restore
-
Azure SQL Database Scaling
-
Azure Database Security options
-
Azure Managed Instance advance security
-
Encrypting Data at Rest and Motion
-
Dynamic Data Masking
-
High Availability vs Disaster Recovery
-
RTO vs RPO
-
Azure SQL Database High Availability and Disaster Recovery options
-
Azure Database vs Azure Data warehouse
Transact SQL (T-SQL)
-
Introduction
-
Lab – Installing Azure Data Studio
-
Lab – T-SQL – Create Table command
-
Lab – T-SQL – SELECT clause
-
Lab – T-SQL – WHERE clause
-
Lab – T-SQL – ORDER BY clause
-
Lab – T-SQL – Aggregate Functions
-
Lab – T-SQL – GROUP BY clause
-
Lab – Using PARTITION BY
-
Lab – LEAD and LAG functions
-
Lab – WITH Clause
-
Lab – T-SQL – Foreign Key constraints
Azure Synapse Analytics
-
Introduction to Data Warehouse
-
Traditional vs Modern Warehouse architecture
-
What is Synapse Analytics Service
-
Azure Synapse Benefits
-
Azure Synapse MPP Architecture
-
Lab – Let’s create a Azure Synapse workspace
-
Demo: Explore Synapse Studio V2
-
Demo: Monitor Synapse Analytics Studio
-
Storage and Sharding patterns
-
Data Distribution and Distributing Keys
-
About the serverless SQL pool
-
Lab – Using External tables – CSV – Part 1
-
Lab – Using External tables – CSV – Part 2
-
Lab – External Tables – Parquet file
-
Lab – External Tables – Multiple Parquet files
-
Lab – OPENROWSET – JSON files
-
The dedicated SQL pool
-
Lab – Creating a SQL pool
-
Lab – SQL Pool – External Tables – CSV
-
Lab – SQL Pool – External Tables – Parquet
-
Lab – External table – Hidden files and folders
-
Pausing the SQL Pool
-
Lab – Loading data into a SQL pool using Polybase
-
Lab – Loading data into a table – COPY Command – CSV
-
Lab – Loading data into a table – COPY Command – Parquet
-
Lab – Loading data – Pipelines – Storage accounts
-
Lab – Loading data – Pipelines – Azure SQL database
-
Designing a data warehouse
-
Fact and Dimension Tables
-
Lab – Building a Fact Table
-
Lab – Building a dimension table
-
Lab – Transfer data to our SQL Pool
-
Lab – Using Power BI for Star Schema
-
Understanding table types
-
Lab – Creating Hash-distributed Tables
-
Lab – Creating Replicated Tables
-
Lab – Surrogate keys for dimension tables
-
Fact as Hash and Dimensions as Replicate
-
Slowly Changing dimensions
-
Indexes in Azure Synapse
-
Which Load Method to use
-
Partitioning in Synapse Analytics
-
Lab – Creating a table with partitions
-
Lab – Switching partitions
-
Lab – CASE statement
-
Analyse Data using Apache Spark Notebook
-
Demo: Analyse Data using Serverless SQL Pool
Security layers in Azure Synapse Service
-
Introduction
-
Advance Data Security
-
Auditing
-
Network Security
-
Transparent data encryption
-
Dynamic Data Masking
-
Access Management
Azure Data Lake
-
What is Data Lake? and how it can solve this problem?
-
Blob Storage vs Data Lake
-
Hierarchical namespace
-
Demo – Create Azure Data Lake Gen2 Account
-
Demo – On-Premises to Data Lake Gen2 using portal and storage explorer
-
Demo – On-Premises to Data Lake Gen2 using azcopy
-
Demo – Azure Blob to Data Lake Gen 2 using Data Factory
-
Demo – Azure SQL Database to Data Lake Gen2 using Data Factory
-
Data flow around Data Lake
-
Data Lake and Transient clusters
-
Data Processing using HDInsight
Data Lake Security Layers
-
Introduction
-
Storage Access Keys
-
SAS – Shared Access Signature
-
Microsoft Entra ID (Azure Active Directory)
-
Role Based Access Control (RBAC)
-
Access Control list (ACL)
-
Firewalls and Virtual Networks
-
Encryption in Transit
-
Encryption at Rest
-
Advanced threat protection
Azure Cosmos DB
-
Introduction to NoSQL
-
SQL vs NoSQL
-
Introduction to Azure Cosmos DB
-
Components of Azure Cosmos DB for NoSQL
-
Cosmos DB Features
-
Cosmos DB – Multi Model 5 APIs
-
APIs in Azure Cosmos DB
-
Azure Table Storage vs Cosmos DB Table API
-
Provision Cosmos DB Account
-
Cosmos DB – Database , Containers and items
-
Cosmos DB – Throughput and Request units
-
Cosmos DB – Horizontally Scalable
-
Cosmos DB – Partition and Partitioning key
-
Cosmos DB – Dedicated vs Shared throughput
-
Cosmos DB – Avoiding hot partitions
-
Cosmos DB – Single partition vs Cross partition
-
Cosmos DB – Composite Key
-
Cosmos DB – Partition key best practice
-
Cosmos DB – Automatic Indexing
-
Demo Insert and query data in your cosmos DB
-
Cosmos DB – Time to Live feature
-
Cosmos DB – Globally Distribution feature
-
Cosmos DB – Multi Master feature
-
Cosmos DB – Manual vs Automatics Failover
-
Cosmos DB – 5 consistent levels
-
Cosmos DB – Azure CLI
-
Cosmos DB – Pricing
-
Cosmos DB – Monitoring through AMS (Azure Monitor Service)
-
Cosmos DB – Monitoring through Cosmos DB Portal
-
Cosmos DB – Security
-
Cosmos DB – High Availability and Disaster Recovery Option
Azure Databricks
-
Spark Basics
-
Why Databricks Evolved?
-
Introduction to Azure Databricks
-
Databricks Architecture
-
How to save Databricks demo Cost
-
Azure Databricks Clusters
-
Lab: Azure Databricks Workspace Creation
-
Lab: Provision Clusters and Notebooks
-
Lab: Create Databricks Community Edition
-
Lab: Magic Commands
-
Lab: Databricks Utilities (dbutils)
-
DBFS
-
Lab: Create Dataframe
-
Lab: Read csv file
-
Lab: Read Text File
-
Lab: Read JSON File
-
Lab: Read Parquet File
-
Lab: Write to Parquet File
-
Introduction to Spark SQL
-
Lab: Running SQL on Dataframes
-
Lab: Views in Spark SQL
-
Hive Metastore
-
Lab: Create Databases
-
Lab: Managed Tables
-
Lab: Unmanaged Tables
-
Delta Table
-
Lab: Write to Delta Table
Transformation in ADB
-
Select Columns
-
Add new column
-
Rename column
-
Calculated Columns
-
Drop Columns
-
Sort Columns
-
Manual schema
-
Read csv file using manual schema
-
Changing data types (Type Casting)
-
Math functions
-
Date function
-
String function
-
Sort function
-
UNION
-
JOIN
-
Broadcast Join
-
Filter
-
Grouping
-
Repartition()
-
Coalesce()
-
Salting
Connect ADB to ADLS Gen2 using Azure credentials
-
Access ADLS Using Access Keys
-
Access ADLS Using SAS Token
-
Access ADLS Using Service Principal
-
Access ADLS Using Azure Key Vault
Azure Data Factory (ADF)
-
What is Azure Data Factory
-
Costing aspect when it comes to Azure Data Factory
-
Lab: Provision Azure Data Factory Instance
-
Data Factory Components
-
Data Factory – Pipeline and Activities
-
Types of activities
-
Data Factory – Linked Service and Datasets
-
Data Factory – Integration runtime
-
Lab: Create SSIR
-
Copy data activity in ADF
-
Lab: Create your first pipeline
-
Debug your ADF Pipeline
-
Lab: Copy data from GitHub to ADLS
-
Lab: Copy from Rest API to ADLS
-
Introduction to Parameterization
-
Parameterize linked services in Azure Data Factory
-
Parameterize datasets in Azure Data Factory
-
Parameterize pipeline in Azure Data Factory
-
System variable in ADF
-
Connectors in ADF
-
Supported file formats in ADF
-
User Properties in ADF
-
Lab: Copy from Azure SQL database
-
Get metadata activity
-
Delete activity
-
Fail activity
-
Set variable activity
-
Append variable activity
-
Execute activity
-
Deactivate activity
-
Lookup activity
-
For Each activity
-
If condition activity
-
Monitoring in Azure Data Factory
-
Mini Project
-
Execute pipeline activity
-
Notebook activity
-
Web activity
-
Triggers in ADF
-
Data Flow Concept
-
Mapping Data Flow
-
Wrangling Data Flow
CI/CD via Azure DevOps
-
Introduction to CI/CD
-
Lab: Configure GIT in Azure Data Factory
-
Lab: Create feature branch & ADF Pipeline
-
Lab: Azure Data Factory – Git – Pull request
Event Hub & Stream Analytics
-
Introduction
-
Lab – Streaming from Azure Event Hub – Setup
-
Lab – Streaming from Azure Event Hub
Projects
-
IOT Hub: Stream Data Project
-
Retail Chain Analytics: E2E Project using CI/CD