1 00:00:01,770 --> 00:00:05,700 The next thing we are going to do is known as variable transformation. 2 00:00:07,260 --> 00:00:12,420 If you don't notice that we have four different distance variables to represent the same information. 3 00:00:12,690 --> 00:00:20,130 That is, we just want to get the effect of having employment hubs nearby this town. 4 00:00:20,760 --> 00:00:24,420 And for that, we had having four different variables. 5 00:00:25,770 --> 00:00:27,940 This is already presenting a single feature. 6 00:00:29,970 --> 00:00:34,140 So we'll be transforming these four different variables into one single variable. 7 00:00:34,830 --> 00:00:36,360 We can do it in multiple ways. 8 00:00:36,780 --> 00:00:43,410 One of the ways is to use the maximum of all these four or we can to take the minimum of all this forward, 9 00:00:44,370 --> 00:00:46,200 or we can take the average of these four. 10 00:00:47,940 --> 00:00:50,790 Usually we have to do whatever makes more business sense. 11 00:00:51,780 --> 00:00:58,660 Here we will be taking the average of these four distances to get one average distance value, which 12 00:00:58,660 --> 00:01:03,910 we will use to represent the feature of employment activity near that particular town. 13 00:01:05,490 --> 00:01:10,120 So to create that variable, we will right be if dollar average distance. 14 00:01:15,490 --> 00:01:16,470 This video will get. 15 00:01:19,060 --> 00:01:21,140 Some of those four divided by four. 16 00:01:21,730 --> 00:01:26,460 So the taller dollar, this one plus the M dollar. 17 00:01:26,510 --> 00:01:27,230 This to. 18 00:01:32,090 --> 00:01:32,670 Misty. 19 00:01:36,970 --> 00:01:37,660 And just for. 20 00:01:42,910 --> 00:01:43,970 Divided by four. 21 00:01:46,230 --> 00:01:47,460 Let's create this variable. 22 00:01:48,180 --> 00:01:51,100 Now we'll click on DBI, a variable the does it. 23 00:01:52,440 --> 00:01:55,490 You can see in the end. 24 00:01:55,560 --> 00:01:56,820 We have another column. 25 00:01:57,660 --> 00:02:03,870 You can see that the four point zero eight value is actually the average of those forward values in 26 00:02:03,870 --> 00:02:04,090 this. 27 00:02:04,090 --> 00:02:05,850 To add this to this three individual. 28 00:02:08,630 --> 00:02:14,690 Now that we have this video being created, we have to delete the other four based variables from our 29 00:02:14,690 --> 00:02:15,170 dataset. 30 00:02:17,540 --> 00:02:25,160 To delete the four variables, we need to get that position in the dataset, so we need to find out 31 00:02:25,160 --> 00:02:31,460 the location of Difford distance variables that we want to remove to find out the location of the variable. 32 00:02:32,840 --> 00:02:38,810 If you bought over the name of that column, you can see that this is column number six. 33 00:02:40,280 --> 00:02:42,290 This is column seven, eight, nine. 34 00:02:42,620 --> 00:02:45,830 We want to remove columns six to nine. 35 00:02:47,660 --> 00:02:53,930 While we are removing a column, we can create a new dataset just to ensure that when we delete the 36 00:02:53,930 --> 00:02:58,910 variables, we do not accidentally delete these variables that we do not want to delete. 37 00:03:00,140 --> 00:03:03,740 So let us create a new dataset once, which is D.F. to. 38 00:03:08,640 --> 00:03:11,210 It will get all relisten, be it. 39 00:03:11,490 --> 00:03:12,530 So it is D.F.. 40 00:03:13,380 --> 00:03:14,760 We'll start with square brackets. 41 00:03:15,540 --> 00:03:17,790 The first parameter is for the rows. 42 00:03:18,030 --> 00:03:20,700 We want all the rows, so we will not mention anything. 43 00:03:20,830 --> 00:03:22,080 We will straightaway put a comma. 44 00:03:22,590 --> 00:03:24,060 The second bedroom I that is columns. 45 00:03:24,090 --> 00:03:28,450 We want to include we do not want to include column six to nine. 46 00:03:28,560 --> 00:03:29,910 So we'll use a minus sign. 47 00:03:30,590 --> 00:03:33,150 So it is minus six to minus nine. 48 00:03:35,330 --> 00:03:36,030 Let's run this. 49 00:03:36,560 --> 00:03:38,000 We could be off to No. 50 00:03:38,730 --> 00:03:39,860 Just click on, be up to. 51 00:03:43,880 --> 00:03:49,540 You can see that this one, this two, this three, just for all these four variables are now deleted 52 00:03:49,880 --> 00:03:55,070 since we deleted the correct variables and B of two is the dataset, which we want to use. 53 00:03:55,400 --> 00:04:01,340 We will assign deserve to to be if one that is D.F., so be F is equal to be of two. 54 00:04:07,470 --> 00:04:11,100 Now you can see that B.F. has 15 variables instead of 20. 55 00:04:12,660 --> 00:04:18,330 We can remove B of two by writing item and within records we will be off to. 56 00:04:22,940 --> 00:04:24,420 So B of two is removed. 57 00:04:25,500 --> 00:04:31,020 And we have the D variable, which has 16 variables. 58 00:04:31,410 --> 00:04:37,410 Now, if you remember the three points that we know there, one point was to remove the bus terminal 59 00:04:37,410 --> 00:04:37,890 where even. 60 00:04:40,500 --> 00:04:44,500 To remove that variable, we will again go to the data set. 61 00:04:48,490 --> 00:04:57,100 And find out the location of bus terminal variable, if I hold already hitting it tells me that it is 62 00:04:57,100 --> 00:04:58,090 column number 13. 63 00:04:58,720 --> 00:05:03,880 So I will go and read B.F. Gate. 64 00:05:07,260 --> 00:05:12,300 D.F. Square Records, Gomaa, minus 30. 65 00:05:14,840 --> 00:05:26,900 And this you can see now we have 15 variables just to confirm the flick again and check if I'm in a 66 00:05:26,900 --> 00:05:28,490 little mood and it is. 67 00:05:30,790 --> 00:05:37,100 So this is how we removed the variables that we do not want in a dataset. 68 00:05:40,300 --> 00:05:47,480 I go back to the three points that we noted in the summary of edict of univariate analysis. 69 00:05:52,560 --> 00:05:59,570 So we have handled the outliers by capping them in rainfall and and hard rooms. 70 00:06:01,370 --> 00:06:07,100 We have handled the missing values by replacing them with the mean values in an ice bed. 71 00:06:08,150 --> 00:06:10,520 And we have diluted this variable called Busted. 72 00:06:11,330 --> 00:06:15,620 And we have also transformed before distance what he would do. 73 00:06:15,860 --> 00:06:23,210 One common average distance variable and we'll remove those other for distance valuables in the coming 74 00:06:23,210 --> 00:06:23,620 videos. 75 00:06:23,660 --> 00:06:26,420 We will learn how to handle categorical variables.