1 00:00:00,830 --> 00:00:06,170 In this we do we will see the goal to train our model using subset selection techniques. 2 00:00:07,340 --> 00:00:11,100 So running a model with subsect election is very easy. 3 00:00:11,460 --> 00:00:14,760 And you just need to write a single line of code. 4 00:00:16,420 --> 00:00:21,090 First thing we need to do is to install a library called Leap's. 5 00:00:22,710 --> 00:00:23,820 So just check. 6 00:00:23,880 --> 00:00:29,790 On the right hand side, if there is an in-store library already or leap's, if it is there, you can 7 00:00:29,790 --> 00:00:30,290 take it. 8 00:00:30,570 --> 00:00:31,230 If it is not. 9 00:00:31,440 --> 00:00:34,640 You can install it, as you know, by writing it all out packages. 10 00:00:35,880 --> 00:00:41,400 So I have it then to run a model with best subsects election. 11 00:00:41,820 --> 00:00:42,840 We just need to write. 12 00:00:44,050 --> 00:00:50,030 So will create a variable called L.M. underscored best since we are running best subsects election technique. 13 00:00:51,010 --> 00:00:54,190 And this is equal to Regg subject. 14 00:00:57,790 --> 00:01:05,710 And within bracket will rate price DataDot commentators, it could be if. 15 00:01:11,810 --> 00:01:15,860 If we run it like this, it will go only up to eight variables. 16 00:01:16,670 --> 00:01:18,570 It will not have more dented valuables. 17 00:01:18,900 --> 00:01:25,130 So it will then all permutation combinations of till eight variables and it will stop at that. 18 00:01:25,430 --> 00:01:28,200 If you want to run it for more than eight variables. 19 00:01:28,370 --> 00:01:32,480 Since we have 15 deepening variables, we want to transport weepin variables. 20 00:01:32,900 --> 00:01:38,080 We have to give additional parameter, which is N.V. Max will set. 21 00:01:38,300 --> 00:01:39,930 And we max at fifteen. 22 00:01:42,090 --> 00:01:44,810 So now it will go up to fifteen variables. 23 00:01:45,620 --> 00:01:46,430 Let us run this. 24 00:01:48,860 --> 00:01:52,660 So you can see that a just good best is not created. 25 00:01:55,390 --> 00:02:01,200 Let us look at the summary of L.M. underscored best relaid somebody in the backyard. 26 00:02:01,440 --> 00:02:03,010 It would be 11. would best. 27 00:02:10,680 --> 00:02:13,500 If you look at it, it is a bit difficult to understand. 28 00:02:14,870 --> 00:02:22,550 But if you do a military lecture, I told you that it will start with one variable, then it will go 29 00:02:22,550 --> 00:02:27,650 to two variables and then we have one variable, it will find out the best model. 30 00:02:28,910 --> 00:02:30,700 This is the R-squared value. 31 00:02:31,250 --> 00:02:36,220 So whichever model having one variable has the highest Askwith. 32 00:02:36,530 --> 00:02:37,340 It will keep that. 33 00:02:38,090 --> 00:02:39,710 Then remove one, two, two variables. 34 00:02:40,160 --> 00:02:44,450 It will find the highest R-squared value amongst all the models and that it will keep. 35 00:02:44,990 --> 00:02:47,750 So this is the list of all those models. 36 00:02:48,770 --> 00:02:52,940 So this is the best model with one variable. 37 00:02:53,600 --> 00:02:58,460 This is the best model with two variables in the best model with one variable we have were properties, 38 00:02:58,480 --> 00:03:01,370 the significant variable with two variables. 39 00:03:01,400 --> 00:03:04,220 We have room them and for prob. 40 00:03:05,360 --> 00:03:06,620 So that's how it escalated. 41 00:03:06,800 --> 00:03:07,940 Ali, you've been. 42 00:03:09,270 --> 00:03:15,800 Best models amongst these 50 will be selecting the one with the highest adjusted R-squared value. 43 00:03:18,020 --> 00:03:24,080 You can find the adjusted R-squared value of all these models using this good. 44 00:03:25,630 --> 00:03:26,390 So we'll wait. 45 00:03:27,620 --> 00:03:31,430 Somebody elementary school west as well. 46 00:03:32,780 --> 00:03:43,710 And then we will read L.M. underscored best dollar ETG are two, which tankful adjusted Askwith to learn 47 00:03:43,710 --> 00:03:44,220 from this. 48 00:03:45,380 --> 00:03:51,380 You can see we have adjusted our school values for all the 50 models. 49 00:03:53,570 --> 00:03:59,990 So we can compare all these values to find out which is the highest adjusted good value and then we 50 00:03:59,990 --> 00:04:01,820 can use that model. 51 00:04:03,030 --> 00:04:09,840 As I selected model, if you have a lot of variables, then you'll get a lot of values it. 52 00:04:12,780 --> 00:04:16,760 So in that case, if you want to find out the maximum value, you can use the. 53 00:04:16,980 --> 00:04:18,240 Which tarmac's function. 54 00:04:19,380 --> 00:04:21,650 So you can write which dot max. 55 00:04:22,830 --> 00:04:26,610 And within its bracket, you can copy paste this above line of code. 56 00:04:31,050 --> 00:04:39,240 So amongst this array of values, which is the maximum value, you can see that the eighth value is 57 00:04:39,240 --> 00:04:40,740 the maximum value. 58 00:04:42,940 --> 00:04:50,320 If you want to look at the coefficients that you will get in this eight model, you can do that by writing 59 00:04:50,350 --> 00:04:59,770 quiff and within bracket you will write the name of the model, which is telemeters called Best comma 60 00:04:59,870 --> 00:05:00,160 eight. 61 00:05:05,460 --> 00:05:08,760 So you can look at the intercepts and the meta values. 62 00:05:09,510 --> 00:05:13,040 But this particular model, which has the highest value of adjusted R-squared. 63 00:05:15,140 --> 00:05:20,340 So you can see that this eight model has an adjusted outscored value of point seven 145. 64 00:05:21,590 --> 00:05:26,840 Whereas this last model, this last model has all the variables in it. 65 00:05:27,440 --> 00:05:31,460 So this is similar to a normal multiple linear regression. 66 00:05:32,270 --> 00:05:35,970 This has adjusted our score, twelve point seven, one to two. 67 00:05:36,590 --> 00:05:43,640 Clearly, this subset, clearly, if you select this object, it is expected to perform slightly better 68 00:05:43,820 --> 00:05:46,010 on the test than this model. 69 00:05:48,950 --> 00:05:52,010 Not to run forward selection and backward selection. 70 00:05:52,250 --> 00:05:53,810 There is a very minor difference. 71 00:05:55,620 --> 00:05:57,690 So we will light elements called forward. 72 00:06:03,520 --> 00:06:06,740 Is equal to the same thing as a bell. 73 00:06:07,810 --> 00:06:12,340 Just we will add another parameter, which is called Methode. 74 00:06:14,860 --> 00:06:19,470 So comma matter is equal to forward. 75 00:06:22,970 --> 00:06:25,030 What would you put on this? 76 00:06:27,620 --> 00:06:30,050 So you can look at somebody of this method also. 77 00:06:34,560 --> 00:06:35,670 On this. 78 00:06:40,610 --> 00:06:45,620 Again, it has run all these have been models from this. 79 00:06:45,710 --> 00:06:51,680 If you want to find out the model with the highest value of adjusted R-squared, you can repeat these 80 00:06:51,680 --> 00:06:58,400 steps above and you'll find whichever modality best according to these forward stepwise selection. 81 00:06:58,430 --> 00:06:58,670 Madame. 82 00:06:59,790 --> 00:07:01,580 I encourage you to do that on your own. 83 00:07:02,940 --> 00:07:08,310 Again, if you want to run a backwards selection method, you just need to change this forward to backward 84 00:07:09,120 --> 00:07:11,430 and it will run them backwards selection technique for you. 85 00:07:12,120 --> 00:07:14,880 And again, you can perform all these operations on it. 86 00:07:16,870 --> 00:07:23,700 So this is how you get on best subject selection forwards to base election and backward step based election. 87 00:07:24,650 --> 00:07:32,050 And once you get the model, this is how we find out the adjusted R-squared of all the models and we 88 00:07:32,050 --> 00:07:37,720 select the best model out of it, depending on the model, which has highest value of adjusted R-squared.