1
00:00:00,210 --> 00:00:01,350
Welcome back.

2
00:00:01,420 --> 00:00:07,110
The last video we saw had a tune a model's hyper parameters using randomized search CV.

3
00:00:07,230 --> 00:00:12,570
And what that did was we defined a space or dictionary of different hyper parameter values we'd like

4
00:00:12,570 --> 00:00:19,860
to try and then randomize socially we combine these parameters at random and built a a classifier with

5
00:00:19,860 --> 00:00:25,950
different settings different of parameters and then evaluated them all using cross validation or we

6
00:00:25,950 --> 00:00:27,360
might change this back to 10.

7
00:00:27,390 --> 00:00:29,180
We don't want 50 models running off.

8
00:00:29,180 --> 00:00:34,050
And then after trying 10 different combinations it found the best parameters to be this set of values

9
00:00:34,050 --> 00:00:34,940
here.

10
00:00:34,980 --> 00:00:40,080
Then after evaluating them we saw it didn't really do as well as our previous model up here where we

11
00:00:40,080 --> 00:00:46,720
choose just by hand and so finally a workflow you might end up with when you're turning off parameters

12
00:00:46,990 --> 00:00:52,900
is starting off by hand and then using randomized search CV across a space of high parameters.

13
00:00:53,050 --> 00:00:57,970
And then once you found some pretty good hybrid parameters you'll probably finish up with grid search

14
00:00:58,020 --> 00:00:58,770
CV.

15
00:00:58,810 --> 00:01:00,070
Let's see that in action here.

16
00:01:00,400 --> 00:01:11,100
So five point three hyper parameter tuning with grid search CV beautiful.

17
00:01:11,150 --> 00:01:18,790
And let's check out our grid so our grid at the moment is you might notice the grid search CV has the

18
00:01:18,790 --> 00:01:24,280
word grid in it and our grid of hybrid parameters at the moment are these values here.

19
00:01:24,280 --> 00:01:31,990
And the key difference here between randomize search and grid search CV is that randomize search CV

20
00:01:32,020 --> 00:01:39,850
has a parameter called n ita which is what we saw up here which we can set to limit the number of models

21
00:01:39,850 --> 00:01:43,860
to try so in our case we use 10.

22
00:01:43,870 --> 00:01:49,900
So if we look back up here the top of this code fitting five folds for each of 10 candidates totalling

23
00:01:49,900 --> 00:01:50,980
50 fits.

24
00:01:51,050 --> 00:01:57,400
The key difference between randomized socially V and grid search CV is that grid search CV is kind of

25
00:01:57,400 --> 00:02:05,760
like a brute force search it means it will go through every single combination that is available here.

26
00:02:05,830 --> 00:02:08,650
So if we have a look we've got six values there.

27
00:02:08,810 --> 00:02:15,890
So we six times five times two times three times three.

28
00:02:15,910 --> 00:02:18,080
So there's six parameters here.

29
00:02:18,220 --> 00:02:27,610
There's five here two here three three in total 540 different parameters that it would try or combinations

30
00:02:27,610 --> 00:02:28,240
of parameters.

31
00:02:28,290 --> 00:02:32,730
That's a lot that's and then cross validated So time's up by five.

32
00:02:32,830 --> 00:02:34,510
So two thousand seven hundred model.

33
00:02:34,510 --> 00:02:35,310
That's a lot right.

34
00:02:35,380 --> 00:02:39,310
That's going to take a lot of compute power training one model might take long enough time but training

35
00:02:39,310 --> 00:02:44,950
twenty seven hundred of them especially if you're working with a large dataset probably not going to

36
00:02:44,950 --> 00:02:51,250
be suitable for your laptop you may need a big computer or we can reduce the search space so let's see

37
00:02:51,250 --> 00:02:53,200
that in action.

38
00:02:53,230 --> 00:03:00,040
So what I mean by that is we'll create Grid 2 using a little bit more of a confined hyper parameter

39
00:03:00,040 --> 00:03:00,850
space.

40
00:03:00,850 --> 00:03:07,040
So in essence just reducing the number of hyper parameters grid search CV has to go through.

41
00:03:07,130 --> 00:03:14,230
So let's see this in action what we might do is just copy this and now how would you create Grid 2 in

42
00:03:14,230 --> 00:03:15,640
practice.

43
00:03:15,640 --> 00:03:18,420
We could just copy and paste it which we've done.

44
00:03:18,550 --> 00:03:22,960
I want to line these up so it all looks pi formic.

45
00:03:23,050 --> 00:03:24,030
There we go.

46
00:03:24,260 --> 00:03:30,880
And we could just copy and paste in and then just delete these at random delete delete we might not

47
00:03:30,880 --> 00:03:39,160
do that say we've gone through randomize search CV and it's gone through let's say we up this to 50

48
00:03:39,160 --> 00:03:45,420
different models like we had before but in our case we've only actually done 10 we might take these

49
00:03:45,420 --> 00:03:52,970
parameters the best parameter from our randomize search CV and use them to influence where we'd like

50
00:03:52,970 --> 00:03:55,030
grid search CV to run.

51
00:03:55,100 --> 00:04:01,010
So let's do that our best parameters is two hundred six to square root.

52
00:04:02,030 --> 00:04:11,210
So what we might do remove 10 keep 500 day and remove these two yep max depth is none.

53
00:04:11,210 --> 00:04:19,170
So we might just keep that as none so it's only got one option there and then for square root we might

54
00:04:20,130 --> 00:04:21,820
actually keep that is an order.

55
00:04:21,820 --> 00:04:27,400
We'll square root then what else do we have means sample split is six.

56
00:04:27,540 --> 00:04:34,600
We might get rid of four in two and then what we might do mean samples leaf.

57
00:04:34,620 --> 00:04:36,390
So we might keep that is one and two.

58
00:04:37,110 --> 00:04:38,340
So what have we done here.

59
00:04:38,850 --> 00:04:45,730
Well what we've done is we've reduced our search space of hyper parameters so now you remember we had

60
00:04:45,740 --> 00:04:48,030
540 different combinations there.

61
00:04:48,100 --> 00:04:48,790
What do we have.

62
00:04:48,790 --> 00:04:58,510
So we have three times one times two times one times two so twelve.

63
00:04:58,550 --> 00:05:03,090
And then it's gonna be cross validated five times so we can adjust this using the CV parameter.

64
00:05:03,800 --> 00:05:06,370
So 60 versus 2700.

65
00:05:06,380 --> 00:05:08,510
So that's a lot less parameters.

66
00:05:08,530 --> 00:05:14,870
So how would we do this in practice how would we use grid search CV to find the best hyper parameters

67
00:05:14,960 --> 00:05:16,670
in this space.

68
00:05:16,670 --> 00:05:26,570
Well as always let's see it in action from S K learn the model selection import grid search CV and we

69
00:05:26,570 --> 00:05:27,900
want to import as well.

70
00:05:27,890 --> 00:05:37,090
Train test split we probably already have that we can put that there the random seed wonderful split

71
00:05:37,120 --> 00:05:43,210
edit what we might do actually is just bring this code up here because it's the exact same code that

72
00:05:43,210 --> 00:05:44,230
we want to use before.

73
00:05:44,260 --> 00:05:54,010
So we'll just take this said we're going to change one thing come down wonderful.

74
00:05:54,060 --> 00:06:00,360
So we're using the same dataset as our randomized search CV making the same split we're instantiating

75
00:06:00,360 --> 00:06:01,800
a random forest classifier.

76
00:06:02,250 --> 00:06:14,810
But this time we're gonna set up grid search savy grid search save a wonderful grid search saving and

77
00:06:14,810 --> 00:06:25,670
then we're going to fit it the grid search savy here and now there's a different parameter here.

78
00:06:25,700 --> 00:06:32,300
So this one is going to be param grid I believe it's not going to have an ETA because grid search CV

79
00:06:32,300 --> 00:06:39,590
is like brute force it tries every single combination and we want it to try out grid 2 not grid one

80
00:06:40,370 --> 00:06:44,250
because it's got a few less options there and grid 2 as we saw before.

81
00:06:44,300 --> 00:06:45,380
So we'll come down here.

82
00:06:45,380 --> 00:06:50,420
Keep cross validation as fivefold verbose equals that.

83
00:06:50,630 --> 00:06:51,040
Okay.

84
00:06:51,110 --> 00:06:53,190
Now let's step through what's happening.

85
00:06:53,330 --> 00:06:55,200
So we're importing grid search TV.

86
00:06:55,460 --> 00:07:01,280
We've created Grid 2 which is a refined search base of different hyper parameters so different settings

87
00:07:01,550 --> 00:07:05,190
so much like if you were working on your favorite meal adjusting your oven.

88
00:07:05,360 --> 00:07:09,140
If you set the temperature to 400 degrees that's way too high.

89
00:07:09,200 --> 00:07:15,950
So if it starts off 180 degrees and it does okay You might go why would I go right to 500 or 400 degrees

90
00:07:16,220 --> 00:07:18,590
when I can just make a little jump to 400.

91
00:07:18,590 --> 00:07:25,310
So that's why we've created Grid 2 We've based it off the best parameters or the best hyper parameters

92
00:07:25,340 --> 00:07:34,750
that our randomize searched CV or randomize search CV has found so let's do it let's build a random

93
00:07:34,750 --> 00:07:39,130
fast classifier using grid search CV.

94
00:07:39,290 --> 00:07:40,850
So here's how it's gonna go again.

95
00:07:40,850 --> 00:07:46,360
Same thing in feeding five falls for each of 12 candidates totalling sixty fits.

96
00:07:46,460 --> 00:07:51,440
Now what grid search CV is going to do just like randomize search CV.

97
00:07:51,650 --> 00:07:57,350
It's going to go through all of these different hybrid parameters and pass them to our random forest

98
00:07:57,350 --> 00:08:04,790
classifier trying each different combination and then eventually when it's finished we'll be able to

99
00:08:04,790 --> 00:08:08,840
go G.S. CnF best Grams

100
00:08:12,020 --> 00:08:15,700
wonderful so it might take a little while with grid search CV.

101
00:08:15,800 --> 00:08:21,770
This is something to keep in mind the more hybrid parameters you have in here the longer it will take.

102
00:08:21,770 --> 00:08:28,950
Grid search CV to run because it has to try every single combination of high parameters in here.

103
00:08:29,000 --> 00:08:36,380
So now as we did before we can evaluate our grid search classifier by making predictions with it and

104
00:08:36,380 --> 00:08:41,300
then using our evaluation function to figure out how it did.

105
00:08:41,420 --> 00:08:42,420
So let's do that.

106
00:08:42,470 --> 00:08:51,200
G.S. CLSA I actually got to guess why friends equals G.S. Steel f don't predict will predict on the

107
00:08:51,200 --> 00:08:52,710
test data as well.

108
00:08:52,880 --> 00:08:59,160
Ex test and then we'll go evaluate the predictions.

109
00:08:59,180 --> 00:09:06,380
G.S. metrics equals evaluate Fred's our function when that we defined a couple of videos ago and then

110
00:09:06,430 --> 00:09:06,710
go.

111
00:09:06,720 --> 00:09:15,330
G.S. y Fred's wonderful and now we again we see a slight decline in accuracy.

112
00:09:15,370 --> 00:09:16,340
Mm hmm.

113
00:09:16,370 --> 00:09:21,570
And that just goes to show how much of a process churning hyper parameters can be.

114
00:09:21,560 --> 00:09:22,190
Right.

115
00:09:22,220 --> 00:09:24,360
It's all about that trial and error.

116
00:09:24,530 --> 00:09:32,690
What you could do to fix this is go back up your workflow would probably be try a few settings by hand

117
00:09:33,020 --> 00:09:37,430
as we did right back up here churning hard parameters by hand.

118
00:09:37,430 --> 00:09:39,910
This is the workflow you'd probably do try.

119
00:09:39,910 --> 00:09:45,480
Just a few of these by hand see if you can figure something out with the validation set like we've done

120
00:09:46,470 --> 00:09:53,790
and then try some high parameter tuning using randomized search CV you might in this case up this number

121
00:09:53,790 --> 00:09:57,870
to 20 or 50 or so or whatever arbitrary number that you find again.

122
00:09:57,900 --> 00:10:03,190
This is all experimentation all to try and figure out what type of parameters are better and you might

123
00:10:03,190 --> 00:10:08,930
even change the grid that you're searching over based on the hyper parameters you find in the psychic

124
00:10:08,950 --> 00:10:13,990
learn model documentation for the model that you're using and then once you've found some good type

125
00:10:13,990 --> 00:10:20,800
of RAM and is using randomized search CV you might take that those hyper parameters these ones here

126
00:10:20,950 --> 00:10:28,270
best parameters and then create another grid to grid search over like we've done here in our case it

127
00:10:28,270 --> 00:10:33,010
hasn't improved our models metrics but this is where we'd probably start to experiment a little bit

128
00:10:33,010 --> 00:10:33,310
more.

129
00:10:33,310 --> 00:10:38,800
Try a few different other metrics try a few different other hybrid parameters but for completeness what

130
00:10:38,800 --> 00:10:43,630
you probably do then after you've tried a fair few different parameters you've created a baseline model

131
00:10:43,630 --> 00:10:48,960
with the default settings you've tried to adjust and by hand you've tried to adjust and by randomize

132
00:10:48,960 --> 00:10:54,730
search TV and you've tried to adjust and by grid search you're gonna want to compare your models so

133
00:10:54,730 --> 00:10:55,630
let's compare

134
00:10:58,400 --> 00:11:01,390
how different models metrics

135
00:11:04,050 --> 00:11:08,880
wonderful and to do so we're going to create a data frame so payday.

136
00:11:08,880 --> 00:11:15,270
This is why we've been returning dictionaries from all of our all of our models and I'll evaluate Fred's

137
00:11:15,270 --> 00:11:32,360
function so baseline is going to be baseline metrics and then a 2 is going to be left to metrics type

138
00:11:32,360 --> 00:11:43,410
and ends here when we don't need them random search is going to be our race metrics and then grid search

139
00:11:44,640 --> 00:11:54,990
is going to be G.S. metrics wonderful and then to see it in action we're gonna go compare metrics which

140
00:11:54,990 --> 00:12:02,370
is just a data frame that we've instantiated here or we might need to move this up there is gonna be

141
00:12:02,370 --> 00:12:10,350
an error there in there that's actually squiggly bracket that we didn't need dot plot dot the big size

142
00:12:10,440 --> 00:12:17,210
equals 10 eight let's see it wonderful.

143
00:12:18,230 --> 00:12:23,420
So this is going to compare all of our different classification models metrics in the one space so we

144
00:12:23,420 --> 00:12:31,010
can see our baseline model is the blue so on this one it didn't get the best year it was more of a tie

145
00:12:31,040 --> 00:12:38,750
between CnF 2 and random search on precision CnF 2 is out in front on recall baseline is out in front

146
00:12:39,110 --> 00:12:46,370
and on EF 1 it's a tie between baseline and EF 2 and again you could adjust this graph to to make it

147
00:12:46,370 --> 00:12:47,220
more visible.

148
00:12:47,360 --> 00:12:49,820
This is just to demonstrate different metrics.

149
00:12:49,940 --> 00:12:55,070
So this is the kind of communication you do for not only yourself but for your teammates or if you're

150
00:12:55,070 --> 00:13:00,260
reporting to someone in a project is to show them the different models that you've tried and and how

151
00:13:00,260 --> 00:13:03,800
they perform differently on different classification metrics.

152
00:13:03,830 --> 00:13:07,790
If you are working on a classification problem of course you can do the same sort of process with a

153
00:13:07,790 --> 00:13:14,600
regression problem and now depending on what you need for the project will depend on which model.

154
00:13:14,600 --> 00:13:20,060
So say you wanted a model that optimized recall in this case you might use the the baseline model or

155
00:13:20,200 --> 00:13:24,450
if you wanted a model that optimized precision you might choose the CSF to model.

156
00:13:24,620 --> 00:13:29,630
And if none of these metrics really worked for you you might continue to trying to find some better

157
00:13:29,630 --> 00:13:33,190
ones through random search or grid search.

158
00:13:33,530 --> 00:13:39,600
And now we finally got to the end of how you can improve a model via hyper parameter tuning.

159
00:13:39,710 --> 00:13:42,280
And again there is a lot to take in here.

160
00:13:42,290 --> 00:13:47,870
The key point is remember it's an experimental process improving a model through hyper parameter tuning

161
00:13:47,990 --> 00:13:55,280
is it there's no sort of written law on how you do it but the point is to just keep trying.

162
00:13:55,280 --> 00:13:56,930
See if you can figure out something.

163
00:13:56,930 --> 00:14:01,520
See if you can get yourself familiarized with with different hybrid parameters at different models have

164
00:14:01,520 --> 00:14:08,330
such as a random forest classifier and just remind yourself that the first model that you make is more

165
00:14:08,330 --> 00:14:11,020
than likely not the best model that you have.

166
00:14:11,060 --> 00:14:17,140
It's a baseline model and it can be improved upon and so built a model we've tried to improve it.

167
00:14:17,180 --> 00:14:21,750
Let's say you've decided you're going to focus on on recall or something like that.

168
00:14:21,920 --> 00:14:28,770
You may want to export your model so you can share it or use it in some sort of deployment setup.

169
00:14:28,820 --> 00:14:35,390
So in the next section we're going to look at if we look at our list here what we're covering we're

170
00:14:35,390 --> 00:14:41,330
going to look at how to save and share and load a model so save and load a trained model.

171
00:14:41,330 --> 00:14:46,700
The benefit of this is that you won't have to retrain a model like we've done in the past few sections

172
00:14:47,290 --> 00:14:51,730
so go back through check out what we've covered with high priority training how to improve a model high

173
00:14:51,730 --> 00:14:53,440
parameter with grid search CV.

174
00:14:53,510 --> 00:14:55,570
Try it out with randomize search.

175
00:14:55,580 --> 00:14:58,990
Try it out by hand and I'll see you in the next video.