1
00:00:00,360 --> 00:00:01,450
Welcome back.

2
00:00:01,450 --> 00:00:06,550
And in the last video we checked out how to tune a model's hyper parameters by hand.

3
00:00:06,660 --> 00:00:11,460
And we figured out by the end of it once he'd written some code ones we've split our data into train

4
00:00:11,460 --> 00:00:13,500
validation and test sets manually.

5
00:00:13,500 --> 00:00:17,330
We trade loan valuation function Yeah.

6
00:00:17,390 --> 00:00:21,140
At the end we figured out cheering him on by hand.

7
00:00:21,140 --> 00:00:24,230
That sounds like a lot of work and you're right.

8
00:00:24,350 --> 00:00:27,400
It would be if we went through and tried a whole bunch of different parameters.

9
00:00:27,410 --> 00:00:32,780
But luckily the developers socket loan have also found to come across this problem and have created

10
00:00:32,780 --> 00:00:38,150
randomized search CV which stands for randomize search cross validation.

11
00:00:38,150 --> 00:00:44,750
Let's see how we'll tune hub parameters so a.k.a. adjust the settings on our models to make better predictions

12
00:00:44,810 --> 00:00:46,540
or hopefully make better predictions.

13
00:00:46,670 --> 00:00:49,420
Using randomized search CV.

14
00:00:50,110 --> 00:00:52,640
So let's go here and create another heading.

15
00:00:52,640 --> 00:00:54,070
This can be 5.2.

16
00:00:54,200 --> 00:01:01,610
So hyper parameter churning with randomized search.

17
00:01:01,750 --> 00:01:07,770
See the beautiful first thing we'll do is Will important.

18
00:01:07,780 --> 00:01:15,580
So from S.K. learn dot model selection import randomize name.

19
00:01:15,590 --> 00:01:18,300
We might be had a press tab here at randomize search CV.

20
00:01:18,320 --> 00:01:19,480
Beautiful.

21
00:01:19,520 --> 00:01:21,320
Let's see an example of how we use it.

22
00:01:21,350 --> 00:01:23,790
Before we check out the dog string.

23
00:01:24,440 --> 00:01:30,050
So what we'll do is we'll create a grid of hybrid parameters and by grid I mean dictionary and I've

24
00:01:30,050 --> 00:01:37,430
typed Rod here a dictionary of hyper parameters we'd like to adjust so if we come back up here these

25
00:01:37,430 --> 00:01:40,370
are the parameters we're going to adjust.

26
00:01:40,370 --> 00:01:45,290
So what we're going to do is create a dictionary with the hyper parameters we'd like to adjust as the

27
00:01:45,290 --> 00:01:51,290
keys and then the values we'd like to try as the values of the dictionary.

28
00:01:51,950 --> 00:01:52,760
So let's do that.

29
00:01:52,850 --> 00:01:59,030
So an estimate is that's a key we're going to get a positive list.

30
00:01:59,030 --> 00:02:07,860
So one hundred two hundred five hundred one thousand and 12 hundred beautiful.

31
00:02:07,880 --> 00:02:11,750
Now if you're wondering Daniel like where are you getting these values.

32
00:02:11,750 --> 00:02:17,850
Well remember as always with any model or estimating socket loan there's extensive documentation on

33
00:02:17,850 --> 00:02:21,620
it which tells you some of the settings that you can change on the model.

34
00:02:21,620 --> 00:02:27,710
So what I've done is I've read through here read through the examples done some research and found different

35
00:02:27,710 --> 00:02:31,420
values for the high parameters we can try to get here.

36
00:02:31,460 --> 00:02:35,570
None 5 10 so it looks like I'm just throwing our values here.

37
00:02:35,570 --> 00:02:39,050
Trust me they're not just sort of throwing out values.

38
00:02:39,050 --> 00:02:45,290
They are based upon some research and some experience and don't worry you won't begin with that.

39
00:02:45,290 --> 00:02:46,310
No one begins with that.

40
00:02:46,310 --> 00:02:48,290
It takes a little bit of practice to go through it.

41
00:02:48,290 --> 00:02:52,310
That's why we're having a look hands on at all these different functions that we can use to improve

42
00:02:52,310 --> 00:02:52,790
our models

43
00:02:55,390 --> 00:02:56,240
wonderful.

44
00:02:56,380 --> 00:02:58,160
And then we'll do two more.

45
00:02:58,210 --> 00:03:00,820
Which is mean samples split

46
00:03:03,380 --> 00:03:11,510
two four six Come here man samples leaf

47
00:03:17,070 --> 00:03:21,350
and let's see how we'd implement randomized search saving.

48
00:03:21,960 --> 00:03:23,220
So we'll set up a random seed

49
00:03:27,480 --> 00:03:30,420
we'll split into x and y

50
00:03:33,450 --> 00:03:41,420
so we want heart disease shuffled actually because that's what we've used before.

51
00:03:41,520 --> 00:03:42,260
Don't drop.

52
00:03:42,520 --> 00:03:54,190
Oh actually we need to make X here don't drop target access equals one and we'll make Y here say heart

53
00:03:54,190 --> 00:03:57,550
disease shuffled

54
00:04:01,770 --> 00:04:05,580
target beautiful.

55
00:04:05,690 --> 00:04:10,340
We're gonna go split into train and test sets huh.

56
00:04:10,520 --> 00:04:12,590
I've been wondering why we only using train and test here.

57
00:04:12,590 --> 00:04:14,550
We've just created a validation set now.

58
00:04:14,570 --> 00:04:15,480
We'll get to that.

59
00:04:15,590 --> 00:04:16,790
Why train.

60
00:04:17,030 --> 00:04:18,500
Why test.

61
00:04:18,500 --> 00:04:19,690
Train test split.

62
00:04:21,920 --> 00:04:30,590
X Y and we'll use a normal split he s a test size equals zero point two wonderful and then we're going

63
00:04:30,590 --> 00:04:32,270
to instantiate

64
00:04:38,930 --> 00:04:47,980
random forest classifier CSF equals random forest classifier and jobs equals.

65
00:04:48,010 --> 00:04:52,100
Now we could do negative one here but at the moment negative one is broken for me.

66
00:04:52,100 --> 00:04:58,400
Now that'll make sense in a in a second but in jobs stands for how much of your computer processor are

67
00:04:58,400 --> 00:05:03,750
you going to dedicate towards this machine learning model and negative one means all of it.

68
00:05:03,980 --> 00:05:09,470
So by default and jobs is one actually what is the default in jobs.

69
00:05:09,640 --> 00:05:14,070
Let's go here and jobs equals none.

70
00:05:14,460 --> 00:05:19,790
Okay so we'll said listen I'm actually I'll just settle as one so different values you pass to and jobs

71
00:05:19,790 --> 00:05:24,350
will dictate how much of your computer processor that you want to dedicate towards machine learning

72
00:05:24,350 --> 00:05:25,110
model.

73
00:05:25,250 --> 00:05:34,560
And now once we've got a classifier instantiated we'll set up randomized search saving.

74
00:05:35,840 --> 00:05:40,120
So we're gonna pass it we're gonna call it R.S. CNN.

75
00:05:40,280 --> 00:05:49,460
So what that's going to do is it just pending arrest to it for randomize search and now randomized search

76
00:05:49,580 --> 00:05:54,550
CV the first parameter it takes is estimated equals CSF.

77
00:05:54,560 --> 00:06:01,700
So we're passing it this this random fast classifier we've we've instantiated param distribution is

78
00:06:01,700 --> 00:06:04,260
the next one we're going to pass it out grid.

79
00:06:04,340 --> 00:06:06,990
So this is great up here.

80
00:06:07,470 --> 00:06:19,760
The next thing is we're going to define in it to say equals 10 and this is the number of models to try

81
00:06:22,160 --> 00:06:34,520
would be five fold cross validation and then we'll set the bow city for both equals to while this is

82
00:06:34,520 --> 00:06:35,960
some code we haven't seen before.

83
00:06:36,380 --> 00:06:43,940
So what is happening when we run randomized search CV we look at the doctoring press shift tab in here

84
00:06:44,860 --> 00:06:46,850
that's because we have an imported.

85
00:06:46,850 --> 00:06:48,000
That's right.

86
00:06:48,400 --> 00:06:50,890
Well press shift tab here now.

87
00:06:51,020 --> 00:06:52,780
So what's the doctoring here.

88
00:06:52,840 --> 00:06:58,800
Randomize search on hyper parameters randomize search CV implements a fit and a score method.

89
00:06:58,910 --> 00:07:04,080
It also implements predict predict probe decision function transform in inverse transform.

90
00:07:04,100 --> 00:07:11,180
If they implemented in the estimate we used as a randomized search on hyper parameters that can sound

91
00:07:11,180 --> 00:07:18,100
a little bit confusing what does it mean by randomize searched so that's in the name randomize search

92
00:07:18,100 --> 00:07:23,170
CV CV is for cross validation which is where you might recognize this parameter see vehicles five.

93
00:07:23,170 --> 00:07:27,310
So that means we're using five fold cross validation.

94
00:07:27,310 --> 00:07:29,240
So five fold cross validation.

95
00:07:29,260 --> 00:07:33,100
This is why we don't necessarily have to create a validation set here.

96
00:07:33,100 --> 00:07:41,560
So what randomize search CV will do is it will take our classifier and it'll take a pram distributions

97
00:07:41,560 --> 00:07:52,270
grid which is this and it's going to search over this grid for 10 different times different combinations

98
00:07:52,360 --> 00:07:59,090
of these parameters at random so for example on its first iteration.

99
00:07:59,130 --> 00:08:04,170
So because it's doing any the Eagles 10 it's gonna do this 10 times on its first iteration.

100
00:08:04,170 --> 00:08:12,810
It might try and model with 10 estimated as a max depth of nine Max features set to auto mean samples

101
00:08:12,990 --> 00:08:20,430
split set to two mean samples leaf set to 1 and then on the next iteration so iteration 2 out of 10

102
00:08:20,760 --> 00:08:28,710
it might try a thousand estimate as 30 as the max depth auto is the max features four as the main samples

103
00:08:28,710 --> 00:08:33,960
split and then four as a means samples leaf then it's gonna keep going that up to 10.

104
00:08:33,960 --> 00:08:36,240
Now we could set this to 100 to a thousand.

105
00:08:36,240 --> 00:08:42,300
I don't think there's even a thousand combinations in here if you went through you could go six times

106
00:08:42,300 --> 00:08:49,290
five that's 30 times to that 60 times three that's 180 times three that's four hundred and sixty on

107
00:08:49,530 --> 00:08:53,040
the math may be a little bit off there but you get the point right if you went through all of these

108
00:08:53,040 --> 00:08:58,330
and tried to combine them and every single combination that's gonna be a lot of different models.

109
00:08:58,570 --> 00:09:01,080
So let's see this in action.

110
00:09:01,170 --> 00:09:08,510
So once we've set it up instantiated randomize search CV we'll go here fit the randomized

111
00:09:11,060 --> 00:09:14,940
such savy version of CLSA.

112
00:09:17,050 --> 00:09:23,870
On R.S. CSF not fit X train that y train.

113
00:09:23,930 --> 00:09:29,870
Now the reason we only have defeated X train and Y train is because the CV in randomize search CV stands

114
00:09:29,870 --> 00:09:31,650
for cross validation.

115
00:09:31,760 --> 00:09:36,770
So that is what it's going to do it's going to automatically make our validation sets for us which is

116
00:09:36,770 --> 00:09:38,230
a beautiful thing.

117
00:09:38,270 --> 00:09:43,850
So if we come back to our diagram it's going to take our data and then it's gonna try different hyper

118
00:09:43,850 --> 00:09:49,370
parameters cross validated on different hyper parameters settings.

119
00:09:49,550 --> 00:09:51,600
So that's what it's going to return.

120
00:09:51,760 --> 00:09:59,750
It's gonna figure out which combination of these high parameters is the best up to 10 different models.

121
00:09:59,750 --> 00:10:03,200
And again we can increase this if we wanted to try more combinations.

122
00:10:03,470 --> 00:10:05,340
So let's see it in action.

123
00:10:08,670 --> 00:10:14,120
So this is where we've set for verbose to 2 it's going to output and tell us what's going on.

124
00:10:14,280 --> 00:10:22,290
So fitting 5 folds for each of 10 candidates totaling 50 fits a.k.a. it's trying 10 iterations of different

125
00:10:22,290 --> 00:10:29,760
combinations of parameters in this grid and splitting each combination five times because save equals

126
00:10:29,760 --> 00:10:32,460
five totalling 50 fits on the data.

127
00:10:32,460 --> 00:10:38,100
So it's going to run this fit function 50 different times using different hyper parameters on different

128
00:10:38,220 --> 00:10:44,740
splits of the data so we can see here the first bullet it's trying is an estimate as equals twelve hundred

129
00:10:44,980 --> 00:10:48,590
because this is one of our hyper parameters up here.

130
00:10:48,650 --> 00:10:54,500
Remember each split each iteration it's going to pick combinations of these at random.

131
00:10:54,600 --> 00:11:03,510
It's also set min samples split to six here Min samples leave to two max features the square root and

132
00:11:03,510 --> 00:11:05,350
Max depth to five.

133
00:11:05,580 --> 00:11:06,660
Okay.

134
00:11:06,660 --> 00:11:11,550
And then if we were to scroll through here we can see all of the different combinations it's tried so

135
00:11:11,550 --> 00:11:12,910
let's pick one at random.

136
00:11:13,140 --> 00:11:14,300
This one here.

137
00:11:14,400 --> 00:11:21,120
So this combination is trying an estimate as equals 200 means samples of split equals six Min samples

138
00:11:21,120 --> 00:11:25,980
leaf equals two max features equals square root and Max depth equals none.

139
00:11:26,180 --> 00:11:33,640
And then it's gonna keep going until it's finished it's gonna give us a warning here so let's see once

140
00:11:33,640 --> 00:11:34,750
it's finished.

141
00:11:34,750 --> 00:11:39,220
What we'll be able to do is call best programs on it.

142
00:11:39,220 --> 00:11:46,630
So if we go r s CnF not best parameter and this is going to show us the combination of hyper parameters

143
00:11:47,200 --> 00:11:51,090
which combination of these got the best results.

144
00:11:55,060 --> 00:11:56,130
Wonderful.

145
00:11:56,170 --> 00:12:04,690
So where an estimate is equals 200 Min samples split equals 6 mean samples leaf equals 2 max features

146
00:12:04,690 --> 00:12:08,180
equals square root and Max depth equals none.

147
00:12:08,230 --> 00:12:16,060
They were the best cross validated results across 10 different models and now when we call predate on

148
00:12:16,630 --> 00:12:21,580
our randomize search classifier by default it's going to use these parameters.

149
00:12:21,600 --> 00:12:27,970
So instead of finding these by hand randomize search CV has found them for us.

150
00:12:28,240 --> 00:12:28,940
So let's do it.

151
00:12:28,960 --> 00:12:34,390
Let's make predictions with the best hybrid parameters

152
00:12:38,800 --> 00:12:47,300
what we might do are as y parades eagles are ACL left up predict we use we could use the validation

153
00:12:47,300 --> 00:12:54,740
said here but in our case we're gonna use the test set and then we'll go evaluate the predictions

154
00:12:58,670 --> 00:13:02,740
IRS metrics Eagles evaluate

155
00:13:05,660 --> 00:13:10,090
why test are X Y parades

156
00:13:16,190 --> 00:13:22,800
well now function is supposed to be evaluate parades there we go wonderful.

157
00:13:22,930 --> 00:13:28,320
So if we come back up here did we see an improvement here.

158
00:13:28,320 --> 00:13:30,840
No we didn't.

159
00:13:30,840 --> 00:13:37,230
And so this is the sort of where the experimentation comes in hyper parameter tuning you won't always

160
00:13:37,230 --> 00:13:40,650
find an improvement after running something like this.

161
00:13:40,650 --> 00:13:45,660
Maybe we could run it for longer and try 50 different combinations.

162
00:13:45,660 --> 00:13:51,360
And after it's tried 50 different combinations it will find some parameters which end up being better

163
00:13:51,360 --> 00:13:54,210
than our manually tuned result.

164
00:13:54,210 --> 00:14:00,030
But what I hope you're saying to see is that using randomized search CV rather than running through

165
00:14:00,030 --> 00:14:08,310
all these different settings by hand it gives us a way to codify or function eyes to the tuning of hyper

166
00:14:08,310 --> 00:14:09,690
parameters.

167
00:14:09,900 --> 00:14:15,390
And so now there's one more way we can use to improve our models hyper parameters and it's with grid

168
00:14:15,390 --> 00:14:16,500
search TV.

169
00:14:16,860 --> 00:14:19,700
So it's kind of similar to randomize search but it's got one.

170
00:14:19,710 --> 00:14:21,140
One key difference.

171
00:14:21,240 --> 00:14:22,560
We'll have a look at that in the next video.