1 00:00:01,010 --> 00:00:02,270 Now, let's proceed further. 2 00:00:04,290 --> 00:00:10,220 First, let's create a new variable with the name data and store our data for a minute. 3 00:00:12,030 --> 00:00:19,020 And then let's create a dummy variable, the variable is a numerical variable that represents categorical 4 00:00:19,020 --> 00:00:22,080 data such as gender, race, etc.. 5 00:00:24,020 --> 00:00:31,900 Dummy variables are quantitative variables they can take any to quantitative values suggest one and 6 00:00:31,910 --> 00:00:32,300 zero. 7 00:00:33,320 --> 00:00:38,450 One man's presence of that attribute and zero absence of that. 8 00:00:40,890 --> 00:00:44,650 Here we are creating the variable for the column name page schedule. 9 00:00:46,750 --> 00:00:52,290 After creating that dummy variable, we are removing biweekly label from. 10 00:00:53,460 --> 00:01:03,250 Let's run the cell and show you what it is, let's produce values first, let's piece in the bulletin. 11 00:01:05,710 --> 00:01:06,490 Let's created. 12 00:01:07,760 --> 00:01:09,200 Al Sharpton told Anderson. 13 00:01:10,670 --> 00:01:11,900 Let's see this. 14 00:01:14,930 --> 00:01:16,750 Let's see this dummy. 15 00:01:23,680 --> 00:01:30,740 So patient deal was one column, so it converted one column and created four dummy variables, bi weekly 16 00:01:30,760 --> 00:01:31,510 one means. 17 00:01:32,610 --> 00:01:36,400 Presence of bigly and monthly, semi, monthly and weekly zero. 18 00:01:37,110 --> 00:01:38,340 So you're in the second row. 19 00:01:38,370 --> 00:01:41,910 We have weekly one, which meant it was a weekly payment. 20 00:01:43,360 --> 00:01:46,550 Weekly pay, you and all of us are zero. 21 00:01:47,930 --> 00:01:51,080 Now, from this, let's drop by quickly. 22 00:01:52,600 --> 00:01:59,410 So dramatically so the money is equal to the money that drug labels is able to buy quickly and actually 23 00:01:59,860 --> 00:02:01,630 the one person and sell. 24 00:02:02,920 --> 00:02:06,800 Now, let's drop this dual column from the. 25 00:02:08,960 --> 00:02:09,190 Let's. 26 00:02:11,060 --> 00:02:19,430 So as we have removed the Piscatella column, instead of that, let's add this dummy columns in our 27 00:02:19,550 --> 00:02:20,150 dataset. 28 00:02:21,880 --> 00:02:29,170 So that is a call to be read out concat data and Dommy and Access is called to one. 29 00:02:30,580 --> 00:02:31,510 Let's run this in. 30 00:02:33,200 --> 00:02:36,010 Now, let's check the shape of their data from. 31 00:02:38,410 --> 00:02:39,370 So here we have. 32 00:02:41,020 --> 00:02:45,880 So nine thousand nine zero eight rows and 23 columns. 33 00:02:47,160 --> 00:02:49,920 Now let's update our features and labels. 34 00:02:51,710 --> 00:02:58,080 So is this our label, so let's save you U.S. column in response. 35 00:02:58,670 --> 00:03:01,670 So from day to day, the frame we are taking. 36 00:03:02,780 --> 00:03:06,380 You signed the column name and we are storing it in response. 37 00:03:07,840 --> 00:03:10,630 So response contain only our labels. 38 00:03:11,810 --> 00:03:13,640 So now we have to remove. 39 00:03:15,050 --> 00:03:16,550 This is a column from. 40 00:03:19,360 --> 00:03:29,260 So data don't drop, columns is equal, do you find comma in and we also need to remove this in reality 41 00:03:29,720 --> 00:03:31,280 illustrated in the. 42 00:03:33,050 --> 00:03:36,380 Let's run this El Al Sharpton tattoed on D.L.. 43 00:03:37,920 --> 00:03:44,430 Now we need to do data transformation, just standard killing, the use of standard scale to transform 44 00:03:44,430 --> 00:03:52,560 our data into same scale, it standardize features by removing the mean and scaling it to unit variance. 45 00:03:54,120 --> 00:03:59,700 So the standard is of a sample, it is calculated with the formula. 46 00:04:01,530 --> 00:04:02,940 Zero is equal to. 47 00:04:04,310 --> 00:04:08,780 X minus you, you by s. 48 00:04:10,190 --> 00:04:12,800 So excuse here our value. 49 00:04:14,210 --> 00:04:18,080 And you is mean of the training's ample. 50 00:04:20,850 --> 00:04:23,340 And this is standard deviation. 51 00:04:28,010 --> 00:04:34,520 So excuse our value minus you is over, I mean, they were based on the division, so this is how we 52 00:04:34,520 --> 00:04:37,520 calculate the value of standards called. 53 00:04:40,280 --> 00:04:41,130 Do this. 54 00:04:43,900 --> 00:04:44,890 That's Randall. 55 00:04:48,400 --> 00:04:51,060 No, let's split our data into training set test. 56 00:04:53,420 --> 00:04:54,730 So to split our data. 57 00:04:58,080 --> 00:05:05,250 They use a skill and library, so from a skill learned to model selection, import, train spirit. 58 00:05:07,900 --> 00:05:16,450 So your ex is our dataset is the response, which is labeled features, is called dataset labels, is 59 00:05:16,450 --> 00:05:22,540 called response and pesticides is equal to zero point two, which means 20 percent testing site and 60 00:05:22,540 --> 00:05:26,860 80 percent training set and random state is equal to zero. 61 00:05:28,150 --> 00:05:31,910 So here we are creating extra next test by train test. 62 00:05:32,380 --> 00:05:38,790 So these are training features and training training levels, which is why train and exercise control 63 00:05:38,860 --> 00:05:40,590 testing features and why test is called. 64 00:05:41,710 --> 00:05:42,670 Heisting libbers. 65 00:05:43,820 --> 00:05:44,690 Let's run renditioned. 66 00:05:46,760 --> 00:05:53,600 So here we have initialised our standards, Caylor and stored it in a sea, underscore X. Now we need 67 00:05:53,600 --> 00:05:54,950 to fit the standard killer. 68 00:05:56,770 --> 00:05:58,600 So Gautrain is going to. 69 00:06:00,450 --> 00:06:03,570 As you underscore X, which is standard scalar. 70 00:06:04,590 --> 00:06:13,020 That would transform extrem, so we are transforming our extrem and expressed with standards killer. 71 00:06:14,330 --> 00:06:15,260 Let's undersell. 72 00:06:17,960 --> 00:06:19,510 Check the shape of training SEC. 73 00:06:24,060 --> 00:06:28,710 So here we have fourteen thousand three twenty six heroes and 21 columns in.