1
00:00:01,960 --> 00:00:08,920
And this video we will discuss about different types of subsects election techniques, as I told you

2
00:00:08,950 --> 00:00:10,840
earlier, in subject election.

3
00:00:11,440 --> 00:00:16,130
We will use a subset of the people to variables instead of using all of them.

4
00:00:17,770 --> 00:00:19,780
But how do we identify the subset?

5
00:00:21,260 --> 00:00:22,940
There are three main ways of doing that.

6
00:00:24,380 --> 00:00:26,720
The first is called best subject selection.

7
00:00:27,620 --> 00:00:29,780
Second is forward step way selection.

8
00:00:29,930 --> 00:00:32,360
And the third is backwards to place selection.

9
00:00:33,840 --> 00:00:35,780
We will discuss each of these one by one.

10
00:00:40,820 --> 00:00:42,880
In best subject to likin method.

11
00:00:44,030 --> 00:00:49,010
We further separately squared regression for each combination of DEPI predictors.

12
00:00:51,560 --> 00:00:54,680
For example, suppose we have tea predictor variables.

13
00:00:56,620 --> 00:01:02,080
First step is to run the model with no predictive variable, which will basically be done the mean value

14
00:01:02,080 --> 00:01:03,130
of the response variable.

15
00:01:05,120 --> 00:01:07,310
Next week on the model with one variable.

16
00:01:08,410 --> 00:01:11,860
Since we have three variables, we have to run this model three times.

17
00:01:13,970 --> 00:01:17,450
Then we will run the model with a combination of two variables.

18
00:01:20,930 --> 00:01:23,830
And lastly, we run with all the three variables.

19
00:01:26,160 --> 00:01:30,540
We will then look at all these darling models to indicate which one is the best.

20
00:01:32,700 --> 00:01:39,870
So since 40 variables will have to rise to what possible combinations and comparing all of these together

21
00:01:39,930 --> 00:01:41,940
at the end may be cumbersome.

22
00:01:42,150 --> 00:01:45,060
The process is usually divided into these three steps.

23
00:01:46,600 --> 00:01:52,210
As I told you first, we will run Dinello model, which will have no predictive variables.

24
00:01:54,350 --> 00:01:57,950
Let us take an example, suppose for that house pricing data.

25
00:01:58,610 --> 00:02:00,350
We take three predictive variables.

26
00:02:01,280 --> 00:02:02,400
One will be room them.

27
00:02:03,020 --> 00:02:04,190
One is air quality.

28
00:02:04,710 --> 00:02:06,620
And third is teacher ratio.

29
00:02:08,490 --> 00:02:11,190
Now, the first step is to run the model.

30
00:02:11,550 --> 00:02:18,210
That is to estimate how Osprey's, without using any predicted this model, will estimate the mean value

31
00:02:18,210 --> 00:02:20,190
of the house every day.

32
00:02:22,090 --> 00:02:25,240
Next step is to don this model with one prettier variable.

33
00:02:26,180 --> 00:02:30,520
So since we had three Briatore variables, we'll have to write three things.

34
00:02:30,700 --> 00:02:36,130
So first model will be house price will be predicted by only against them.

35
00:02:38,060 --> 00:02:44,570
The second model will have house pricing against air quality and third model will have House pricing

36
00:02:44,570 --> 00:02:45,890
against each official.

37
00:02:47,260 --> 00:02:50,540
Once we run all these three models, we will select the best model.

38
00:02:50,690 --> 00:02:54,950
That is the model which is giving the largest R-squared amongst these.

39
00:02:56,140 --> 00:03:01,370
And we will keep that model and say, what does M1, because it had one variable.

40
00:03:01,420 --> 00:03:02,390
So it is M1.

41
00:03:03,790 --> 00:03:07,900
Then we will select two predictive variables to predict the value for space.

42
00:03:08,380 --> 00:03:15,910
So we will use both room NUM and beta ratio to predict the house price room number and air quality stability,

43
00:03:15,990 --> 00:03:20,830
osprey's air quality and beta ratio to the whole space.

44
00:03:21,040 --> 00:03:22,780
So all these three models will be done.

45
00:03:23,530 --> 00:03:30,520
And the best out of these three models will be Sivas as M2 because it has to predict the variables.

46
00:03:31,720 --> 00:03:34,720
Then we'll run it with all three variables which will receive as M3.

47
00:03:36,850 --> 00:03:43,360
Now we have M0, M1, M2 and M3 amongst these four models.

48
00:03:43,600 --> 00:03:52,150
We need to pick the best model that we will select by taking the adjusted R-squared value of all these

49
00:03:52,150 --> 00:03:52,570
models.

50
00:03:53,860 --> 00:03:58,890
The model with highest value of adjusted R-squared will be selected as the best model.

51
00:04:00,880 --> 00:04:06,320
And this third step, we are using a district, R-squared, because if you use Osgood.

52
00:04:07,250 --> 00:04:13,640
You probably remember R-squared monotonically increases as we increase the number of predictors and

53
00:04:13,880 --> 00:04:15,980
entry has the maximum number of predictors.

54
00:04:16,100 --> 00:04:18,290
So if you use R-squared.

55
00:04:19,740 --> 00:04:22,730
We will always end up selecting the empty model.

56
00:04:27,200 --> 00:04:31,550
Two, although this MTV model will always have the lowest rating ever.

57
00:04:32,790 --> 00:04:34,920
But it may not have the lowest.

58
00:04:34,940 --> 00:04:35,620
Tested it.

59
00:04:40,450 --> 00:04:45,610
Therefore, that is why we will be using adjusted R Square, which will be handing us that.

60
00:04:45,640 --> 00:04:48,410
Which of these models has the lowest of.

61
00:04:50,120 --> 00:04:54,380
So when we are working on a software package, all these steps have been in the background.

62
00:04:55,430 --> 00:04:56,990
You'll just get these out of it.

63
00:04:57,530 --> 00:05:00,610
You do not need to do do perform all these steps.

64
00:05:01,780 --> 00:05:07,130
Individually, but it is important because the result also comes in this format.

65
00:05:07,660 --> 00:05:14,050
When we run it for our dataset, which has 16 variables, it will give us all these 16 steps and its

66
00:05:14,050 --> 00:05:15,290
results individually.

67
00:05:15,490 --> 00:05:20,410
So we'll be able to understand that dessert only if you know how we got the dessert.

68
00:05:24,350 --> 00:05:28,700
So all the best subsect election is simple and conceptually appealing.

69
00:05:30,040 --> 00:05:35,380
It involves a large amount of computation and may be limited by a computational limit.

70
00:05:36,820 --> 00:05:39,040
Imagine if we have these equal to 20.

71
00:05:39,100 --> 00:05:41,980
That is, there are two individuals in the model.

72
00:05:42,490 --> 00:05:45,460
Then we need to run the regression model over a million times.

73
00:05:48,180 --> 00:05:54,180
Therefore, we need some computationally efficient alternative to this best subset selection method.

74
00:05:56,980 --> 00:05:57,900
Let us look at those.

75
00:06:00,950 --> 00:06:03,820
So this method is called forward step base election.

76
00:06:04,870 --> 00:06:08,380
It is a computationally efficient alternative to best subsect election.

77
00:06:09,640 --> 00:06:17,470
Instead of going through all possible to do this two Part B models, it considers a much smaller set

78
00:06:17,470 --> 00:06:18,040
of models.

79
00:06:19,600 --> 00:06:21,610
It starts with gains equal to zero.

80
00:06:21,910 --> 00:06:28,240
That is, there is no greater variable than it adds one variable at a time until all predictors are

81
00:06:28,330 --> 00:06:28,990
in the model.

82
00:06:31,000 --> 00:06:35,860
Let us again take the example of three predicted variables in the first step.

83
00:06:36,520 --> 00:06:38,400
We have no predictors.

84
00:06:38,560 --> 00:06:39,520
That is the normal.

85
00:06:41,020 --> 00:06:47,350
Then we consider a case where we have one variable, so we will run the model three times since we had

86
00:06:47,350 --> 00:06:48,160
three variables.

87
00:06:49,920 --> 00:06:56,640
Now, at the end of learning, these three models will be selecting the one which has the highest R-squared.

88
00:06:58,600 --> 00:06:59,470
In the next step.

89
00:06:59,770 --> 00:07:05,320
Instead of learning all the possible combinations of two variables.

90
00:07:06,440 --> 00:07:14,660
We'll keep that one selected variable and only add one variable, which is from the remaining two variables.

91
00:07:16,290 --> 00:07:21,030
Two out of three variables, we selected one variable in the first step, in the second step.

92
00:07:21,420 --> 00:07:24,480
We will select only one variable from the remaining two variables.

93
00:07:26,000 --> 00:07:30,320
This time demotivate, we don't do things because we have two variables remaining.

94
00:07:33,020 --> 00:07:36,440
And again, we will select the best model, which is called M2.

95
00:07:38,240 --> 00:07:41,290
And lastly, we'll have all the three variables, which will be called M3.

96
00:07:42,030 --> 00:07:48,360
Again, we will compare all these MGD and one M2 M3 using the adjusted R-squared.

97
00:07:52,110 --> 00:07:55,910
If I generalize my example of three variables, four be variables.

98
00:07:57,930 --> 00:08:01,510
I've done it one time when I have selected no evil.

99
00:08:01,980 --> 00:08:04,620
When I select one variable, I have to write it beatings.

100
00:08:05,050 --> 00:08:07,870
When there's an activity Abels I put on a B minus one being.

101
00:08:10,270 --> 00:08:10,900
And so on.

102
00:08:12,570 --> 00:08:16,360
Delacey like B minus one variables, and I ran it one time.

103
00:08:17,430 --> 00:08:21,830
So if I add all of this, this is the total number of models that I will get.

104
00:08:21,990 --> 00:08:25,200
It is one plus being two people, one by two.

105
00:08:27,440 --> 00:08:31,900
So you can clearly see that there is computational advantage over, at best, split election.

106
00:08:33,170 --> 00:08:35,630
Well, we were running it to this, to, what, B times?

107
00:08:37,030 --> 00:08:43,430
So if I'm competing for the model, which is to integrity, was this substance election needed to work

108
00:08:43,520 --> 00:08:44,570
or a million times?

109
00:08:45,740 --> 00:08:50,810
But forward step way selection method will just run 200 living teams.

110
00:08:53,080 --> 00:08:54,110
But what does it cost?

111
00:08:54,130 --> 00:08:56,950
That we are paying for the reduction in computation?

112
00:08:59,260 --> 00:09:05,230
The cost is the loss of guarantee that the final model we get from the forward selection will be the

113
00:09:05,230 --> 00:09:06,380
best possible method.

114
00:09:08,130 --> 00:09:15,840
One instance in that example of three predictors, suppose our model M1 has X1 selected.

115
00:09:17,420 --> 00:09:22,370
Now, M2 can only have X1 and X2 or X1 and X3.

116
00:09:23,610 --> 00:09:29,550
If the best possible solution was X2 and X3, it will be missed by this approach.

117
00:09:32,810 --> 00:09:37,100
So by losing the guarantee that we will get the best solution to the problem.

118
00:09:38,240 --> 00:09:42,950
We had considerably reducing the computational efforts of our software package.

119
00:09:45,500 --> 00:09:48,210
The next technique is similar to forwards to base election.

120
00:09:48,320 --> 00:09:50,160
It is called backward step base election.

121
00:09:51,080 --> 00:09:55,850
So only instead of starting with zero predictors, we will start with all predictors.

122
00:09:56,330 --> 00:10:00,380
And then we will remove predictors one by one till we have removed all of them.

123
00:10:02,020 --> 00:10:07,150
So the number of model runs for backward step for this election will also be one plus being two plus

124
00:10:07,150 --> 00:10:07,770
one by two.

125
00:10:09,250 --> 00:10:13,450
And this also, we will lose the guarantee that it will give us the best model.

126
00:10:16,330 --> 00:10:20,250
However, there is one limitation of backward step place selection.

127
00:10:21,630 --> 00:10:25,980
That is, if number of observations is less than number of variables.

128
00:10:26,610 --> 00:10:30,630
So suppose we have hundred variables and number of observations is less than hundred.

129
00:10:33,320 --> 00:10:39,410
This method that is backwards step plays election and even best objets election will not work.

130
00:10:40,820 --> 00:10:45,290
Only forewords election method is a viable election method into destitution.

131
00:10:47,540 --> 00:10:50,250
So these are these absurd election techniques commonly used.

132
00:10:51,740 --> 00:10:54,560
There are few others also like hybrid approach.

133
00:10:54,830 --> 00:10:56,330
But we will not discuss them head.

134
00:10:59,060 --> 00:11:00,020
Just one last thing.

135
00:11:01,190 --> 00:11:06,140
When we were comparing models, having different number of predictors, and I told you that we should

136
00:11:06,140 --> 00:11:08,450
use adjusted R-squared in such a case.

137
00:11:09,480 --> 00:11:11,880
You can also use to test set error instead.

138
00:11:13,360 --> 00:11:16,120
You know, we can split did it in two tested and rounded up.

139
00:11:17,300 --> 00:11:22,880
We can train the model using the training data and then use the test data to finalize the best model

140
00:11:23,510 --> 00:11:26,270
out of M0, M1, M2 and so on.

141
00:11:27,900 --> 00:11:30,510
This can work even better than looking at R-squared.

142
00:11:32,620 --> 00:11:33,770
And that's all for this election.