1
00:00:00,830 --> 00:00:06,170
In this we do we will see the goal to train our model using subset selection techniques.

2
00:00:07,340 --> 00:00:11,100
So running a model with subsect election is very easy.

3
00:00:11,460 --> 00:00:14,760
And you just need to write a single line of code.

4
00:00:16,420 --> 00:00:21,090
First thing we need to do is to install a library called Leap's.

5
00:00:22,710 --> 00:00:23,820
So just check.

6
00:00:23,880 --> 00:00:29,790
On the right hand side, if there is an in-store library already or leap's, if it is there, you can

7
00:00:29,790 --> 00:00:30,290
take it.

8
00:00:30,570 --> 00:00:31,230
If it is not.

9
00:00:31,440 --> 00:00:34,640
You can install it, as you know, by writing it all out packages.

10
00:00:35,880 --> 00:00:41,400
So I have it then to run a model with best subsects election.

11
00:00:41,820 --> 00:00:42,840
We just need to write.

12
00:00:44,050 --> 00:00:50,030
So will create a variable called L.M. underscored best since we are running best subsects election technique.

13
00:00:51,010 --> 00:00:54,190
And this is equal to Regg subject.

14
00:00:57,790 --> 00:01:05,710
And within bracket will rate price DataDot commentators, it could be if.

15
00:01:11,810 --> 00:01:15,860
If we run it like this, it will go only up to eight variables.

16
00:01:16,670 --> 00:01:18,570
It will not have more dented valuables.

17
00:01:18,900 --> 00:01:25,130
So it will then all permutation combinations of till eight variables and it will stop at that.

18
00:01:25,430 --> 00:01:28,200
If you want to run it for more than eight variables.

19
00:01:28,370 --> 00:01:32,480
Since we have 15 deepening variables, we want to transport weepin variables.

20
00:01:32,900 --> 00:01:38,080
We have to give additional parameter, which is N.V. Max will set.

21
00:01:38,300 --> 00:01:39,930
And we max at fifteen.

22
00:01:42,090 --> 00:01:44,810
So now it will go up to fifteen variables.

23
00:01:45,620 --> 00:01:46,430
Let us run this.

24
00:01:48,860 --> 00:01:52,660
So you can see that a just good best is not created.

25
00:01:55,390 --> 00:02:01,200
Let us look at the summary of L.M. underscored best relaid somebody in the backyard.

26
00:02:01,440 --> 00:02:03,010
It would be 11. would best.

27
00:02:10,680 --> 00:02:13,500
If you look at it, it is a bit difficult to understand.

28
00:02:14,870 --> 00:02:22,550
But if you do a military lecture, I told you that it will start with one variable, then it will go

29
00:02:22,550 --> 00:02:27,650
to two variables and then we have one variable, it will find out the best model.

30
00:02:28,910 --> 00:02:30,700
This is the R-squared value.

31
00:02:31,250 --> 00:02:36,220
So whichever model having one variable has the highest Askwith.

32
00:02:36,530 --> 00:02:37,340
It will keep that.

33
00:02:38,090 --> 00:02:39,710
Then remove one, two, two variables.

34
00:02:40,160 --> 00:02:44,450
It will find the highest R-squared value amongst all the models and that it will keep.

35
00:02:44,990 --> 00:02:47,750
So this is the list of all those models.

36
00:02:48,770 --> 00:02:52,940
So this is the best model with one variable.

37
00:02:53,600 --> 00:02:58,460
This is the best model with two variables in the best model with one variable we have were properties,

38
00:02:58,480 --> 00:03:01,370
the significant variable with two variables.

39
00:03:01,400 --> 00:03:04,220
We have room them and for prob.

40
00:03:05,360 --> 00:03:06,620
So that's how it escalated.

41
00:03:06,800 --> 00:03:07,940
Ali, you've been.

42
00:03:09,270 --> 00:03:15,800
Best models amongst these 50 will be selecting the one with the highest adjusted R-squared value.

43
00:03:18,020 --> 00:03:24,080
You can find the adjusted R-squared value of all these models using this good.

44
00:03:25,630 --> 00:03:26,390
So we'll wait.

45
00:03:27,620 --> 00:03:31,430
Somebody elementary school west as well.

46
00:03:32,780 --> 00:03:43,710
And then we will read L.M. underscored best dollar ETG are two, which tankful adjusted Askwith to learn

47
00:03:43,710 --> 00:03:44,220
from this.

48
00:03:45,380 --> 00:03:51,380
You can see we have adjusted our school values for all the 50 models.

49
00:03:53,570 --> 00:03:59,990
So we can compare all these values to find out which is the highest adjusted good value and then we

50
00:03:59,990 --> 00:04:01,820
can use that model.

51
00:04:03,030 --> 00:04:09,840
As I selected model, if you have a lot of variables, then you'll get a lot of values it.

52
00:04:12,780 --> 00:04:16,760
So in that case, if you want to find out the maximum value, you can use the.

53
00:04:16,980 --> 00:04:18,240
Which tarmac's function.

54
00:04:19,380 --> 00:04:21,650
So you can write which dot max.

55
00:04:22,830 --> 00:04:26,610
And within its bracket, you can copy paste this above line of code.

56
00:04:31,050 --> 00:04:39,240
So amongst this array of values, which is the maximum value, you can see that the eighth value is

57
00:04:39,240 --> 00:04:40,740
the maximum value.

58
00:04:42,940 --> 00:04:50,320
If you want to look at the coefficients that you will get in this eight model, you can do that by writing

59
00:04:50,350 --> 00:04:59,770
quiff and within bracket you will write the name of the model, which is telemeters called Best comma

60
00:04:59,870 --> 00:05:00,160
eight.

61
00:05:05,460 --> 00:05:08,760
So you can look at the intercepts and the meta values.

62
00:05:09,510 --> 00:05:13,040
But this particular model, which has the highest value of adjusted R-squared.

63
00:05:15,140 --> 00:05:20,340
So you can see that this eight model has an adjusted outscored value of point seven 145.

64
00:05:21,590 --> 00:05:26,840
Whereas this last model, this last model has all the variables in it.

65
00:05:27,440 --> 00:05:31,460
So this is similar to a normal multiple linear regression.

66
00:05:32,270 --> 00:05:35,970
This has adjusted our score, twelve point seven, one to two.

67
00:05:36,590 --> 00:05:43,640
Clearly, this subset, clearly, if you select this object, it is expected to perform slightly better

68
00:05:43,820 --> 00:05:46,010
on the test than this model.

69
00:05:48,950 --> 00:05:52,010
Not to run forward selection and backward selection.

70
00:05:52,250 --> 00:05:53,810
There is a very minor difference.

71
00:05:55,620 --> 00:05:57,690
So we will light elements called forward.

72
00:06:03,520 --> 00:06:06,740
Is equal to the same thing as a bell.

73
00:06:07,810 --> 00:06:12,340
Just we will add another parameter, which is called Methode.

74
00:06:14,860 --> 00:06:19,470
So comma matter is equal to forward.

75
00:06:22,970 --> 00:06:25,030
What would you put on this?

76
00:06:27,620 --> 00:06:30,050
So you can look at somebody of this method also.

77
00:06:34,560 --> 00:06:35,670
On this.

78
00:06:40,610 --> 00:06:45,620
Again, it has run all these have been models from this.

79
00:06:45,710 --> 00:06:51,680
If you want to find out the model with the highest value of adjusted R-squared, you can repeat these

80
00:06:51,680 --> 00:06:58,400
steps above and you'll find whichever modality best according to these forward stepwise selection.

81
00:06:58,430 --> 00:06:58,670
Madame.

82
00:06:59,790 --> 00:07:01,580
I encourage you to do that on your own.

83
00:07:02,940 --> 00:07:08,310
Again, if you want to run a backwards selection method, you just need to change this forward to backward

84
00:07:09,120 --> 00:07:11,430
and it will run them backwards selection technique for you.

85
00:07:12,120 --> 00:07:14,880
And again, you can perform all these operations on it.

86
00:07:16,870 --> 00:07:23,700
So this is how you get on best subject selection forwards to base election and backward step based election.

87
00:07:24,650 --> 00:07:32,050
And once you get the model, this is how we find out the adjusted R-squared of all the models and we

88
00:07:32,050 --> 00:07:37,720
select the best model out of it, depending on the model, which has highest value of adjusted R-squared.