1 00:00:01,960 --> 00:00:08,920 And this video we will discuss about different types of subsects election techniques, as I told you 2 00:00:08,950 --> 00:00:10,840 earlier, in subject election. 3 00:00:11,440 --> 00:00:16,130 We will use a subset of the people to variables instead of using all of them. 4 00:00:17,770 --> 00:00:19,780 But how do we identify the subset? 5 00:00:21,260 --> 00:00:22,940 There are three main ways of doing that. 6 00:00:24,380 --> 00:00:26,720 The first is called best subject selection. 7 00:00:27,620 --> 00:00:29,780 Second is forward step way selection. 8 00:00:29,930 --> 00:00:32,360 And the third is backwards to place selection. 9 00:00:33,840 --> 00:00:35,780 We will discuss each of these one by one. 10 00:00:40,820 --> 00:00:42,880 In best subject to likin method. 11 00:00:44,030 --> 00:00:49,010 We further separately squared regression for each combination of DEPI predictors. 12 00:00:51,560 --> 00:00:54,680 For example, suppose we have tea predictor variables. 13 00:00:56,620 --> 00:01:02,080 First step is to run the model with no predictive variable, which will basically be done the mean value 14 00:01:02,080 --> 00:01:03,130 of the response variable. 15 00:01:05,120 --> 00:01:07,310 Next week on the model with one variable. 16 00:01:08,410 --> 00:01:11,860 Since we have three variables, we have to run this model three times. 17 00:01:13,970 --> 00:01:17,450 Then we will run the model with a combination of two variables. 18 00:01:20,930 --> 00:01:23,830 And lastly, we run with all the three variables. 19 00:01:26,160 --> 00:01:30,540 We will then look at all these darling models to indicate which one is the best. 20 00:01:32,700 --> 00:01:39,870 So since 40 variables will have to rise to what possible combinations and comparing all of these together 21 00:01:39,930 --> 00:01:41,940 at the end may be cumbersome. 22 00:01:42,150 --> 00:01:45,060 The process is usually divided into these three steps. 23 00:01:46,600 --> 00:01:52,210 As I told you first, we will run Dinello model, which will have no predictive variables. 24 00:01:54,350 --> 00:01:57,950 Let us take an example, suppose for that house pricing data. 25 00:01:58,610 --> 00:02:00,350 We take three predictive variables. 26 00:02:01,280 --> 00:02:02,400 One will be room them. 27 00:02:03,020 --> 00:02:04,190 One is air quality. 28 00:02:04,710 --> 00:02:06,620 And third is teacher ratio. 29 00:02:08,490 --> 00:02:11,190 Now, the first step is to run the model. 30 00:02:11,550 --> 00:02:18,210 That is to estimate how Osprey's, without using any predicted this model, will estimate the mean value 31 00:02:18,210 --> 00:02:20,190 of the house every day. 32 00:02:22,090 --> 00:02:25,240 Next step is to don this model with one prettier variable. 33 00:02:26,180 --> 00:02:30,520 So since we had three Briatore variables, we'll have to write three things. 34 00:02:30,700 --> 00:02:36,130 So first model will be house price will be predicted by only against them. 35 00:02:38,060 --> 00:02:44,570 The second model will have house pricing against air quality and third model will have House pricing 36 00:02:44,570 --> 00:02:45,890 against each official. 37 00:02:47,260 --> 00:02:50,540 Once we run all these three models, we will select the best model. 38 00:02:50,690 --> 00:02:54,950 That is the model which is giving the largest R-squared amongst these. 39 00:02:56,140 --> 00:03:01,370 And we will keep that model and say, what does M1, because it had one variable. 40 00:03:01,420 --> 00:03:02,390 So it is M1. 41 00:03:03,790 --> 00:03:07,900 Then we will select two predictive variables to predict the value for space. 42 00:03:08,380 --> 00:03:15,910 So we will use both room NUM and beta ratio to predict the house price room number and air quality stability, 43 00:03:15,990 --> 00:03:20,830 osprey's air quality and beta ratio to the whole space. 44 00:03:21,040 --> 00:03:22,780 So all these three models will be done. 45 00:03:23,530 --> 00:03:30,520 And the best out of these three models will be Sivas as M2 because it has to predict the variables. 46 00:03:31,720 --> 00:03:34,720 Then we'll run it with all three variables which will receive as M3. 47 00:03:36,850 --> 00:03:43,360 Now we have M0, M1, M2 and M3 amongst these four models. 48 00:03:43,600 --> 00:03:52,150 We need to pick the best model that we will select by taking the adjusted R-squared value of all these 49 00:03:52,150 --> 00:03:52,570 models. 50 00:03:53,860 --> 00:03:58,890 The model with highest value of adjusted R-squared will be selected as the best model. 51 00:04:00,880 --> 00:04:06,320 And this third step, we are using a district, R-squared, because if you use Osgood. 52 00:04:07,250 --> 00:04:13,640 You probably remember R-squared monotonically increases as we increase the number of predictors and 53 00:04:13,880 --> 00:04:15,980 entry has the maximum number of predictors. 54 00:04:16,100 --> 00:04:18,290 So if you use R-squared. 55 00:04:19,740 --> 00:04:22,730 We will always end up selecting the empty model. 56 00:04:27,200 --> 00:04:31,550 Two, although this MTV model will always have the lowest rating ever. 57 00:04:32,790 --> 00:04:34,920 But it may not have the lowest. 58 00:04:34,940 --> 00:04:35,620 Tested it. 59 00:04:40,450 --> 00:04:45,610 Therefore, that is why we will be using adjusted R Square, which will be handing us that. 60 00:04:45,640 --> 00:04:48,410 Which of these models has the lowest of. 61 00:04:50,120 --> 00:04:54,380 So when we are working on a software package, all these steps have been in the background. 62 00:04:55,430 --> 00:04:56,990 You'll just get these out of it. 63 00:04:57,530 --> 00:05:00,610 You do not need to do do perform all these steps. 64 00:05:01,780 --> 00:05:07,130 Individually, but it is important because the result also comes in this format. 65 00:05:07,660 --> 00:05:14,050 When we run it for our dataset, which has 16 variables, it will give us all these 16 steps and its 66 00:05:14,050 --> 00:05:15,290 results individually. 67 00:05:15,490 --> 00:05:20,410 So we'll be able to understand that dessert only if you know how we got the dessert. 68 00:05:24,350 --> 00:05:28,700 So all the best subsect election is simple and conceptually appealing. 69 00:05:30,040 --> 00:05:35,380 It involves a large amount of computation and may be limited by a computational limit. 70 00:05:36,820 --> 00:05:39,040 Imagine if we have these equal to 20. 71 00:05:39,100 --> 00:05:41,980 That is, there are two individuals in the model. 72 00:05:42,490 --> 00:05:45,460 Then we need to run the regression model over a million times. 73 00:05:48,180 --> 00:05:54,180 Therefore, we need some computationally efficient alternative to this best subset selection method. 74 00:05:56,980 --> 00:05:57,900 Let us look at those. 75 00:06:00,950 --> 00:06:03,820 So this method is called forward step base election. 76 00:06:04,870 --> 00:06:08,380 It is a computationally efficient alternative to best subsect election. 77 00:06:09,640 --> 00:06:17,470 Instead of going through all possible to do this two Part B models, it considers a much smaller set 78 00:06:17,470 --> 00:06:18,040 of models. 79 00:06:19,600 --> 00:06:21,610 It starts with gains equal to zero. 80 00:06:21,910 --> 00:06:28,240 That is, there is no greater variable than it adds one variable at a time until all predictors are 81 00:06:28,330 --> 00:06:28,990 in the model. 82 00:06:31,000 --> 00:06:35,860 Let us again take the example of three predicted variables in the first step. 83 00:06:36,520 --> 00:06:38,400 We have no predictors. 84 00:06:38,560 --> 00:06:39,520 That is the normal. 85 00:06:41,020 --> 00:06:47,350 Then we consider a case where we have one variable, so we will run the model three times since we had 86 00:06:47,350 --> 00:06:48,160 three variables. 87 00:06:49,920 --> 00:06:56,640 Now, at the end of learning, these three models will be selecting the one which has the highest R-squared. 88 00:06:58,600 --> 00:06:59,470 In the next step. 89 00:06:59,770 --> 00:07:05,320 Instead of learning all the possible combinations of two variables. 90 00:07:06,440 --> 00:07:14,660 We'll keep that one selected variable and only add one variable, which is from the remaining two variables. 91 00:07:16,290 --> 00:07:21,030 Two out of three variables, we selected one variable in the first step, in the second step. 92 00:07:21,420 --> 00:07:24,480 We will select only one variable from the remaining two variables. 93 00:07:26,000 --> 00:07:30,320 This time demotivate, we don't do things because we have two variables remaining. 94 00:07:33,020 --> 00:07:36,440 And again, we will select the best model, which is called M2. 95 00:07:38,240 --> 00:07:41,290 And lastly, we'll have all the three variables, which will be called M3. 96 00:07:42,030 --> 00:07:48,360 Again, we will compare all these MGD and one M2 M3 using the adjusted R-squared. 97 00:07:52,110 --> 00:07:55,910 If I generalize my example of three variables, four be variables. 98 00:07:57,930 --> 00:08:01,510 I've done it one time when I have selected no evil. 99 00:08:01,980 --> 00:08:04,620 When I select one variable, I have to write it beatings. 100 00:08:05,050 --> 00:08:07,870 When there's an activity Abels I put on a B minus one being. 101 00:08:10,270 --> 00:08:10,900 And so on. 102 00:08:12,570 --> 00:08:16,360 Delacey like B minus one variables, and I ran it one time. 103 00:08:17,430 --> 00:08:21,830 So if I add all of this, this is the total number of models that I will get. 104 00:08:21,990 --> 00:08:25,200 It is one plus being two people, one by two. 105 00:08:27,440 --> 00:08:31,900 So you can clearly see that there is computational advantage over, at best, split election. 106 00:08:33,170 --> 00:08:35,630 Well, we were running it to this, to, what, B times? 107 00:08:37,030 --> 00:08:43,430 So if I'm competing for the model, which is to integrity, was this substance election needed to work 108 00:08:43,520 --> 00:08:44,570 or a million times? 109 00:08:45,740 --> 00:08:50,810 But forward step way selection method will just run 200 living teams. 110 00:08:53,080 --> 00:08:54,110 But what does it cost? 111 00:08:54,130 --> 00:08:56,950 That we are paying for the reduction in computation? 112 00:08:59,260 --> 00:09:05,230 The cost is the loss of guarantee that the final model we get from the forward selection will be the 113 00:09:05,230 --> 00:09:06,380 best possible method. 114 00:09:08,130 --> 00:09:15,840 One instance in that example of three predictors, suppose our model M1 has X1 selected. 115 00:09:17,420 --> 00:09:22,370 Now, M2 can only have X1 and X2 or X1 and X3. 116 00:09:23,610 --> 00:09:29,550 If the best possible solution was X2 and X3, it will be missed by this approach. 117 00:09:32,810 --> 00:09:37,100 So by losing the guarantee that we will get the best solution to the problem. 118 00:09:38,240 --> 00:09:42,950 We had considerably reducing the computational efforts of our software package. 119 00:09:45,500 --> 00:09:48,210 The next technique is similar to forwards to base election. 120 00:09:48,320 --> 00:09:50,160 It is called backward step base election. 121 00:09:51,080 --> 00:09:55,850 So only instead of starting with zero predictors, we will start with all predictors. 122 00:09:56,330 --> 00:10:00,380 And then we will remove predictors one by one till we have removed all of them. 123 00:10:02,020 --> 00:10:07,150 So the number of model runs for backward step for this election will also be one plus being two plus 124 00:10:07,150 --> 00:10:07,770 one by two. 125 00:10:09,250 --> 00:10:13,450 And this also, we will lose the guarantee that it will give us the best model. 126 00:10:16,330 --> 00:10:20,250 However, there is one limitation of backward step place selection. 127 00:10:21,630 --> 00:10:25,980 That is, if number of observations is less than number of variables. 128 00:10:26,610 --> 00:10:30,630 So suppose we have hundred variables and number of observations is less than hundred. 129 00:10:33,320 --> 00:10:39,410 This method that is backwards step plays election and even best objets election will not work. 130 00:10:40,820 --> 00:10:45,290 Only forewords election method is a viable election method into destitution. 131 00:10:47,540 --> 00:10:50,250 So these are these absurd election techniques commonly used. 132 00:10:51,740 --> 00:10:54,560 There are few others also like hybrid approach. 133 00:10:54,830 --> 00:10:56,330 But we will not discuss them head. 134 00:10:59,060 --> 00:11:00,020 Just one last thing. 135 00:11:01,190 --> 00:11:06,140 When we were comparing models, having different number of predictors, and I told you that we should 136 00:11:06,140 --> 00:11:08,450 use adjusted R-squared in such a case. 137 00:11:09,480 --> 00:11:11,880 You can also use to test set error instead. 138 00:11:13,360 --> 00:11:16,120 You know, we can split did it in two tested and rounded up. 139 00:11:17,300 --> 00:11:22,880 We can train the model using the training data and then use the test data to finalize the best model 140 00:11:23,510 --> 00:11:26,270 out of M0, M1, M2 and so on. 141 00:11:27,900 --> 00:11:30,510 This can work even better than looking at R-squared. 142 00:11:32,620 --> 00:11:33,770 And that's all for this election.