Azure Data factory Training | Microsoft Azure Data Factory Course | Intellipaat

Azure Data factory Training | Microsoft Azure Data Factory Course | Intellipaat

Hey guys. Welcome to this session by Intellipaat. Today we’re going to learn what is Data Factory
and why do we need to use data factory? After learning about data factory, you will
be learning the working process of data factory. Once you are clear with the working process,
you will be getting introduced to data lake. After getting introduced to Data lake, you
will be learning how to copy data from Azure SQL to Azure Data Lake. After learning the copying process, you will
be learning the visualization part, how to load data from Data Lake to Power BI. After learning the visualization part, you
will be learning how to create ETL process by creating a pipeline in Data Factory. By the end of this video you will be able
to use data factory for automating the data movement and data transformation. Now let’s see. Why do we need to use Data Factory. These days the amount of data generated is
really huge. And this data comes from different sources. Let’s look at an example for better understanding. We are having multiple sources of social media. And we need to collect the time spent by each
user on all the platforms per day. Now basically the extracted user details will
be in JSON format which means Javascript Object Notation. If you want to perform some analytics on all
the user’s time period spent on each social media every day, you will be collecting the
Customers log files which will be in flat file that is text file. Now, the data which has been extracted is
raw data and cannot be used as it is. So it needs to be cleansed, mapped and then
transformed as per our requirement and then it is loaded into the data warehouse. Now the storage is connected to an external
source like Power BI to perform some analytics. Now, for automating all this workflow and
monitor and manage it on daily schedule, we use data factory. So the complete process of extracting transforming
and loading the data into warehouse is called extract transform and load. The source turns and keeps on changing right. So this is a regular process with the data
has been regularly fetched from the social media and loaded into warehouse after transformations
are complete. So for automating the movement and transformation
of this data, we need data factory. Let’s see, what is data factory. The factory is a cloud based integration service
that orchestrates and automate the movement and transmission of data. It stores the data with the help of data storage and many other services and then it transforms the data with the help of pipeline where logical grouping of activities that together
perform a task and then it publishes the organized data, the third party applications like Power BI
to perform analytics and visualization. I hope you all are clear with what is data is Data factory? Now let’s move on to the working process of
data factory. Input data is the data that we have within our data store. Now let’s say we’re collecting the data set
from the IMDB website to perform visualization based on movie rating and this is the sample data which we want
to transform. This very basic example which I’m showing
just for understanding purpose. In actual the source turns keep on changing. It is a regular process where data is fetched continuously which is really
huge and we need to process this raw data through pipeline. PipeLine, basically performs
an operation on the data which could be anything anything like data movement or data transmission. And now let us see how it works. Now I want a list of movies to be reviewed
on the basis of rating whether it is worst or average or good and now this transformation
is performed in pipeline. Data transmission transformation is possible with the help of USQL or some stored procedures or hive. After this is done, you will get an output data and this output
data will contain data that is in structure form because it is already transformed and
made structure in the pipeline stage itself. And now here is the list of movies which are
transformed based on the rating which we have provided in the raw data. And then it is send to link services like data lake or block storage or SQL. And now, what it does is, it stores the information. And then it connects to an external source like Power BI. And now we’re publishing our output data with the
list of movies reviewed to Power BI for performing visualization. It is similar to the concept of SQL server. In SQL server, we know how to mention the source and destination
of data. We need a connection string so that we can
connect to an external device. We just need to mention the source and destination
of our data and this is how link services work. And then comes the optional gateway. If you want to connect your on premises system
then you need to install self-hosted integration runtime, so that you can connect it to Azure
cloud and finally we need a BI tool. We visualize and analyze the transformed data which
is published with the help of Power BI or Tableau. Input data sets comprises of various
sources. You can have data coming from Facebook, Twitter
and so on. Now collect all this data and store it at
one location. Now lets say, we are storing the data in data warehouse. Now we need to transform this data through
pipeline as the data is coming from different sources which is raw data. We need a NoSQL data store with huge amount of storage. For this, Azure provides data lake storage. I hope you all are clear with working process of data factory. Now, lets see hoe to copy data from Azure SQL to Azure data lake. We all are aware of SQL right. But we
are new to data lake. Data lake is much different from data warehouse. Since the allowed data to
be in RAW format without converting and analyzing first. Data lake is a data storage or a file
system which is highly scalable and distributed. It is in the cloud and works with multiple
analytic frameworks which are external frameworks like Power BI and Tableau. And there are two main concepts when it comes to data lake. One is storage and another one is analytics. Now storage is of unlimited size. It can be terabytes, gigabytes or petabytes. It shows wide variety of data. It could be structured or unstructured and
it can store really large files and another concept when it comes to detailing
is analytics. No data lake analytics is a distributed analytics
service. When we talk about analytics, we’re talking about big data as a service. We will discuss regarding this topic in further
lecture. I hope you all are clear with data lake. Now,
let’s move on to the copying process. We need to create a new data warehouse in Azure and
then we need to install SQL server management studio in our system to connect to an Azure SQL database and then create a new server in data warehouse configure and deploy this
server after the successful deployment, open SSMS and connect to the server created in
data warehouse using SQL authentication. Now let perform hands-on on this operation
which we have discussed. Now go to Azure portal and log in with
your credentials. Now go to dashboard and then create a new
resource, select databases and then create a new SQL data warehouse. As we are not having any resource group before, create a new resource and then create a new data warehouse. Now create a new server. Now configured the performance and then create. Now once the deployment is
done, we can go to resource and under the data warehouse which we have created, copy the server name of it and using SQL server management studio connect to this server. Now under the data warehouse which we have created in Azure, create a new table inside it. Now execute these commands. Now create a new data lake storage and then create
a new data factory. Now mention the source data store and create
linked service for it and then mention the destination data store and create linked service
for it. Now let’s perform hands-on on this operation which
we have discussed now. And then go to Azure dashboard and then click
on all services and click on storage and select data lake storage. Now create a new data lake storage. Now fill the
details. We have already created a resource before right, select it and then create. Once the deployment of data lake is successful, create a new data factory. Now go to all services and then select data factory
and create a new data factory. Now fill all those details And use existing resource and use the latest
version for this i.e. version 2. I’m not enabling it for this. After the deployment
of Data Factory, click on go to resource and then click on author
and monitor. Now select copy data. Now assign the task name for this and run it. Now select the source data store which is
data warehouse and create a new linked service for this which is data warehouse. If you want to connect your on premises system,
then you need to install self hosted integration runtime. As we’re not using on premise, we are using cloud data warehouse only, so we
don’t need to install it. And then we have already created our server. That’s the connection. So the connection is
successful. And then click on next. Now select the database which
we have created and this is the dataset which we have created. You can see your data set which we have created here. So click on next and then mentioned the destination data store which is our data lake. And create a new link service for this which
is data lake. Now mention the name of the linked service
and then choose the subscription which is free trial. Now choose the data lake store account name and coming
to authentication type, choose it as service principle Id and now I’ll show you how to
generate this service principle id. Now come back to Azure dashboard and select
azure active directory and then select app registrations. Now create a new web application over here. Let me write it as sample application and then register it. Now this application Id is my service principle
Id and copy to it to the clipboard and then paste it over here. Now coming to service principle key. Go to the certificates and secrets.Under client
secrets, add a new client secret and then add it over here and this is the password for you
copy it and paste it over here. Now test the connection. Now this connection is failed because of
access privileges in data lake store. Now let’s fix it. Go to all services, storage, data lake storage and
then explore the data. Now go to access and then add it. Our application
Id is sample application is the name of our application right. Now add
all the permissions. Test the connection. Here it is. It worked. Now lets select the destination data store, click on next. You can skip this because you will have a
default. But the file format this text format and the
column did column beat him to his comma and fraudulent previous order detected and click
on Next. You can skip the staging. And. The deployment is complete you can edit this
pipeline as well as you can monitor it too. Now let’s see whether this copying process
is done. Ordinal will go also uses sewerage due to
storage upload data and finally our data is opening. I hope you all are cleared with the coping
process. Now let’s see how to load data from data lake
to poverty before moving on to the topic. Let’s talk about all these a cloud based business
analytics surveys for analyzing and visualizing the data. Now let’s discuss the processes and probably
it is used for connecting to your data and for shaping of the data which means transforming
the data. Like renaming columns of tables changing text
to numbers removing rows and so on and then modeling of data which means relationship
which defines how data sources are connected with each other and then data visualization
which means creating jobs or graphs to provide visuals of the data and then publishing data
reports. Let’s move on to the learning process. First of all download be a desktop from dimension
learning and then Google as your dashboard and create a new data and explode and then
upload the dataset and then open albeit. Connect to the data stored in possibly a video
you ever played by little extra and then added coded to import data from binary column and
then select the columns from the data set whichever you want to visualize and then create
visualization. Now go through data links to it and upload
the data set. If you remember we have already created the
data next to it is before explode the data now and upload the data as it. Now uploading is successfully completed. Not close this and go back to the data set
properties which we have created a copy this part and open public. We click on get data and select the data show
digitally. Connect to it now based the part which we
have copy to as your portal. And now you need to sign in with your as your
credentials. Load the data. If you want to read your data click on equities. And here is the dataset which I have uploaded. So we need to visualize this data. Now select the columns whichever we want to
visualize and these are the various kinds of graphs and charts which you own. You can choose any of these. I hope you all appear with how to load data
from data to all of you. Now let’s see how to create a pipeline using
data factory. Creating a pipeline is listed as one of the
steps in performing EDL solution. Now we have a school database and we’re trying
to extract some data from this database. We will be extracting and if something has
to be processed then it will be processed and then stored it in Italy. Now let’s see the steps would be adding EPL
solution. First of all we will be creating a linked
service for Esko server database and then we have the linked service for data links
to not create a dataset for data extraction and then create a data set for data saving. Now create the pipeline and add the copy I
finally should do the pipeline by adding to go. Now let’s get back to the goods with few questions. Your first question is which of the following
is not a link to is indeed a fact. Is it data lake or probably or as good little
white house or blob storage. No come into your answer below the comment
box. I think your second question which of this
tomb is used for visualizing the data and your options are as good w tableau out it
and coming your answer below the comment box. I hope this session was informative for you
all. If you have any doubt please feel free to
comment below. Thank you.

13 thoughts on “Azure Data factory Training | Microsoft Azure Data Factory Course | Intellipaat

  1. Following topics are covered in this video:

    0:46 – Introduction to Data Factory

    2:33 – Flow process of Data Factory

    5:12 – Data Lake

    6:08 – Copying Data from Azure SQL to Azure Data Lake

    6:30 – Hands-on for copying data

    10:52 – Load data from Data Lake to Power BI

    13:00 – Creating Pipeline using Data Factory

    13:46 – Quiz

  2. Guys, which technology you want to learn from Intellipaat? Comment down below and let us know so we can create in depth video tutorials for you.:)

  3. Q1: Which of the following is not a Linked service in Data Factory?
    A1: B

    Q2. Which of this tool is used for visualizing the data?
    A2: B

Leave a Reply

Your email address will not be published. Required fields are marked *