1 00:00:00,066 --> 00:00:02,300 Hello and welcome to this tutorial. 2 00:00:02,300 --> 00:00:05,900 Okay, we are almost ready to begin our journey of data pre-processing, 3 00:00:06,100 --> 00:00:07,800 but first we need to get the data set. 4 00:00:07,800 --> 00:00:10,133 So that's what we are going to do in this tutorial. 5 00:00:10,133 --> 00:00:11,700 So let's do this right now. 6 00:00:11,700 --> 00:00:16,533 And here it is just going to put the column titles in bold okay. 7 00:00:16,966 --> 00:00:18,300 That's the data set. 8 00:00:18,300 --> 00:00:20,100 So what is this data set about. 9 00:00:20,100 --> 00:00:23,233 The data set contains four columns country age 10 00:00:23,233 --> 00:00:26,800 salary and purchased and ten lines ten observations. 11 00:00:27,100 --> 00:00:31,600 And basically this contains informations of customers of some company. 12 00:00:32,100 --> 00:00:34,700 And the first three columns are informations 13 00:00:34,700 --> 00:00:37,800 of these customers like the country, the age and the salary. 14 00:00:38,266 --> 00:00:41,100 And the fourth column purchased here tells 15 00:00:41,100 --> 00:00:44,766 if yes or no, the customer bought the product of the company. 16 00:00:45,400 --> 00:00:49,266 So we have to distinguish something very important here 17 00:00:49,466 --> 00:00:52,466 that we will distinguish for the rest of the course. It's 18 00:00:52,466 --> 00:00:57,000 the difference between the independent variables and the dependent variables. 19 00:00:57,500 --> 00:01:02,133 So the independent variables are the first three columns country age and salary. 20 00:01:02,533 --> 00:01:06,000 And the dependent variable is purchased here the fourth column. 21 00:01:06,533 --> 00:01:10,066 And in any machine learning model we are going to use 22 00:01:10,066 --> 00:01:13,966 some independent variables to predict a dependent variable. 23 00:01:14,400 --> 00:01:17,400 So that means here that with the three first columns, 24 00:01:17,400 --> 00:01:20,400 the three independent variables, we are going to predict 25 00:01:20,600 --> 00:01:25,200 if yes or no, the customer purchased a product okay. 26 00:01:25,200 --> 00:01:28,600 So that's the first distinction that we really need to understand. 27 00:01:29,000 --> 00:01:32,933 And it's very important to do this section because the data pre-processing steps 28 00:01:32,933 --> 00:01:34,533 that we're going to do in this section, 29 00:01:34,533 --> 00:01:38,933 we will have to do it for all the machine learning models we are going to make. 30 00:01:39,200 --> 00:01:42,166 So it's really essential to know how to manage this. 31 00:01:42,166 --> 00:01:44,100 But don't worry it's going to be very simple. 32 00:01:44,100 --> 00:01:46,233 And besides, I'm going to give you at the end of this section 33 00:01:46,233 --> 00:01:49,866 a template that will allow us later to preprocess the data 34 00:01:49,866 --> 00:01:53,100 in a flashlight for all the machine learning models we're going to make. 35 00:01:53,633 --> 00:01:55,700 So I look forward to starting the steps with you. 36 00:01:55,700 --> 00:01:57,466 And until then, enjoy machine learning.