ADF ( Azure Data Factory ) Introduction
In the world of big data, raw, unorganized data is often stored in relational, non-relational, and other storage systems. However, on its own, raw data doesn’t have the proper context or meaning to provide meaningful insights to analysts, data scientists, or business decision makers.
Big data requires a service that can orchestrate and operationalize processes to refine these enormous stores of raw data into actionable business insights. Azure Data Factory is a managed cloud service that’s built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.
Features of Azure Data Factory
Data Compression: During the Data Copy activity, it is possible to compress the data and write the compressed data to the target data source. This feature helps optimize bandwidth usage in data copying.
Extensive Connectivity Support for Different Data Sources: Azure Data Factory provides broad connectivity support for connecting to different data sources. This is useful when you want to pull or write data from different data sources.
Custom Event Triggers: Azure Data Factory allows you to automate data processing using custom event triggers. This feature allows you to automatically execute a certain action when a certain event occurs.
Data Preview and Validation: During the Data Copy activity, tools are provided for previewing and validating data. This feature helps you ensure that data is copied correctly and written to the target data source correctly.
Customizable Data Flows: Azure Data Factory allows you to create customizable data flows. This feature allows you to add custom actions or steps for data processing.
Integrated Security: Azure Data Factory offers integrated security features such as Entra ID integration and role-based access control to control access to dataflows. This feature increases security in data processing and protects your data.
Top-level concepts
An Azure subscription might have one or more Azure Data Factory instances (or data factories). Azure Data Factory is composed of the following key components:
- Pipelines
- Activities
- Datasets
- Linked services
- Data Flows
- Integration Runtimes
These components work together to provide the platform on which you can compose data-driven workflows with steps to move and transform data.
Pipeline
A data factory might have one or more pipelines. A pipeline is a logical grouping of activities that performs a unit of work. Together, the activities in a pipeline perform a task. For example, a pipeline can contain a group of activities that ingests data from an Azure blob, and then runs a Hive query on an HDInsight cluster to partition the data.
The benefit of this is that the pipeline allows you to manage the activities as a set instead of managing each one individually. The activities in a pipeline can be chained together to operate sequentially, or they can operate independently in parallel.
Mapping data flows
Create and manage graphs of data transformation logic that you can use to transform any-sized data. You can build-up a reusable library of data transformation routines and execute those processes in a scaled-out manner from your ADF pipelines. Data Factory will execute your logic on a Spark cluster that spins-up and spins-down when you need it. You won’t ever have to manage or maintain clusters.
Activity
Activities represent a processing step in a pipeline. For example, you might use a copy activity to copy data from one data store to another data store. Similarly, you might use a Hive activity, which runs a Hive query on an Azure HDInsight cluster, to transform or analyze your data. Data Factory supports three types of activities: data movement activities, data transformation activities, and control activities.
Datasets
Datasets represent data structures within the data stores, which simply point to or reference the data you want to use in your activities as inputs or outputs.
ADF ( Azure Data Factory ) Online Training Course Content :
Introduction to Microsoft Azure
- What is Data?
- How many types of databases (OLTP&OLAP)
- Types Cloud Vendors
- What is on-premises and what is cloud
- Differences between on-premises and cloud
- what is Cloud computing
- what are services offered by Microsoft Azure
- Advantages of cloud computing
- Types of Cloud Deployments
- Types of Cloud Services
- Microsoft Azure vs AWS
- How to Create Azure Account Azure Architecture & Services
- How to Create Resources and Resource Groups in Microsoft Azure
- How to create Virtual Machine
- How to create SQL in virtual machines
Storage Accounts In Azure
- Containers/Blob Storage
- File
- Queues
- Tables
Data Lake
- Blob Storage vs Data Lake
- What is Data Lake
- Date Lake Gen1 Vs Gen2
- How to store Data into data lake
- How to Query data from data lake to SQL Server.
- Introduction to Azure Data Lake U-SQL Batch Job
- Data Lake Analytics
- How to Schedule Data Lake Analytics Jobs using Data Factory
- Azure Blob to Data Lake Gen 2 using Data Factory
- What is HD-insight
SQL services In Azure
- Azure SQL
- Dedicated SQL Pool Formerly known as DW
- SQL VM’s
- SQL Managed Instances
- Azure Synapse Warehouse
- Security in Storage Account
- Access keys
- Connection strings
- ADF Course Curriculum
- IAM/RBAC
- Share Access Signatures (SAS Tokens)
- Service Principal Identity Mechanism(Tenant ID, Client ID,Secret Key)
What Is Azure Data Ware House
- Compare SQL Database vs Azure Data Warehouse
- How to Create Azure Data Warehouse
- How to cross query data in Data Warehouse
- Poly base using Azure Data warehouse
- How to Query Azure SQL Data Warehouse from On-premises SSMS and Cloud
- How to Load Data to Azure Data Warehouse using data Factory from Azure SQL
- How to Load Data from on-premises to Cloud SQL DW using data factory
Azure Data Factory
- What is Azure Data Factory vs SSIS
- What is Linked Services
- What is Data Sets
- What is Pipelines
- Parameters vs Variables
- Copy Data
- Monitoring pipelines using different approaches
- Different kinds of integration runtimes
- How to create pipelines from a template
- How to do Transformations using Data Flows
- How to configure different Integration Runtimes
- Azure Integration runtime
- Auto resolve Integration runtime
- Azure self Hosted Integration Runtime
- SSIS Integration runtime.
- Triggering the Pipelines by using 3 types of Triggers.
- Move & transform
- Copy Data
Logic Apps
- What are Logic Apps
- How to Create Workflow using Logic Apps in Azure.
- How to Send Mail using a Logic app in Azure Data Factory
- REST API’S
- What is REST APIS and how to transfer the data from JSON format
into Azure SQL Tables.
Azure Event Hub - How to create Event hubs and use streaming Analytics how to get the streaming data
and send it to Blob storages By Using Logic Apps.
Databricks
- What is Databricks
- How to Create a Data Bricks Free Account
- What is Workspace in Data Bricks
- Different Kind of Clusters in Data Bricks
- Basics of Spark.
- What is Note Books
- How to Mount Data take to data bricks using Scala
- Reading data From Blob storage and Writing into Azure sqL
- Reading data From Data Lake storage and Writing into Azure SQL
- using spark sql
- Explore, Analyze, Clean, Transform and Load Data in Data bricks using SqL
- Python vs state vs SQL
- Configure data bricks
- schedule and Run Notebooks in Azure Data Factory with the real-time notebooks
Security
- Access Keys
- Shared Access Signature
- Azure Active Directory
- How to register App in Azure Active Directory
- Role Based Access Control
- Azure Key Vaults
CI/CD Deployments
- Configuring the folder structures
- Creating Repos
- Creating Releases
- Creating Release pipelines
- Deployments from one environment to another environment.
End To End Real Time Project
- Explanation:
- Will cover end to end flow of a project
- How will get the requirements
- What kind of requirements will get?
- Documentation
- End to end implementation
- Unit test cases preparation
- DMR and FS Documents
- Will Conduct Mockup
Power BI
- what is Power BI
- How to create Reports in the Power BI desktop Using Azure Data sources
- Publish Reports into the Power BI cloud