1 00:00:00,880 --> 00:00:03,940 In this video, we will learn about variable transformation. 2 00:00:05,730 --> 00:00:13,980 As you can see in our dataset, we have four distance variables, these distance variables are giving 3 00:00:13,980 --> 00:00:19,010 us the distance of the house from four different employment hubs. 4 00:00:20,610 --> 00:00:27,390 Now keeping for distance variables in our dataset is over, representing the importance of the employment 5 00:00:27,390 --> 00:00:27,840 variable. 6 00:00:29,290 --> 00:00:36,700 We need to transform these four variables into one single distance variable, which can represent the 7 00:00:36,700 --> 00:00:39,040 employment opportunity near that house. 8 00:00:40,710 --> 00:00:47,070 There can be several ways to transform these four into one, either we can choose the minimum of the 9 00:00:47,070 --> 00:00:54,810 distance or we can choose the maximum of these four distances, or we can just take an average of these 10 00:00:54,810 --> 00:00:55,620 four distances. 11 00:00:57,300 --> 00:01:03,900 Usually we have to take into account the business scenario in which we are to decide which transformation 12 00:01:03,900 --> 00:01:05,370 matter we are going to use. 13 00:01:07,140 --> 00:01:13,620 Here, I'll use the mean method, that is, I'll replace all these four variables by the mean of these 14 00:01:13,620 --> 00:01:15,030 four variables. 15 00:01:16,990 --> 00:01:20,670 Depending on your business scenario, you have to take the decision for your problem. 16 00:01:22,510 --> 00:01:25,240 So I'll add a new variable. 17 00:01:26,580 --> 00:01:33,870 To add a new I need a new column, so I write on the column name and I inserted a new column and I'm 18 00:01:33,870 --> 00:01:37,800 naming it average based EVGA and this could be used. 19 00:01:40,360 --> 00:01:48,460 Now, in this variable, I'll put the average value of the other four variables to this is average, 20 00:01:50,200 --> 00:01:54,070 what these four variables render. 21 00:01:55,900 --> 00:02:03,700 Now, to extend this formula for other cells, you can just double click on this bottom right corner 22 00:02:03,700 --> 00:02:09,330 of the cell and this formula will be extended to other cells below the cell. 23 00:02:12,110 --> 00:02:18,980 So this value of 4.0 aid is the average of these four distances, four point three five, three point 24 00:02:18,980 --> 00:02:21,260 eight one or one minute and 4.0 one. 25 00:02:22,720 --> 00:02:28,140 Similarly, the average distance in the next observation is the average of next for distant variables. 26 00:02:30,950 --> 00:02:36,460 Now that we have the average distance variable, we do not need the other four distance variables, 27 00:02:37,340 --> 00:02:40,950 so therefore we will be deleting those four distant variables. 28 00:02:41,840 --> 00:02:48,860 But before we delete these four variables, we first need to change the value in this cell from this 29 00:02:48,860 --> 00:02:51,240 formula to the actual value. 30 00:02:52,430 --> 00:02:55,070 So we will select the values in all these cells. 31 00:02:56,580 --> 00:02:58,580 Copied them by pressing control, see? 32 00:02:59,980 --> 00:03:04,930 Then right, click on the cell and paste these values. 33 00:03:06,010 --> 00:03:12,720 Now we have the exact same value in this cell, but instead of the formula, we have the value here. 34 00:03:13,960 --> 00:03:17,470 Now, if I delete these four columns, it will not impact. 35 00:03:19,430 --> 00:03:26,900 Devaluing the average tenth column to have selected those four columns and clicked on the lead option. 36 00:03:27,920 --> 00:03:37,460 So this is how we identified variables which need transformation, we transform them into a new variable 37 00:03:37,910 --> 00:03:40,040 and we deleted the old variables.