1
00:00:00,190 --> 00:00:07,380
In the last video we saw how we can put together a number of different steps using pipeline and now

2
00:00:07,410 --> 00:00:14,640
it's also possible to use grid search CV what we've seen before randomized search CV with a pipeline.

3
00:00:14,640 --> 00:00:18,220
There's a slight little difference here and it will make sense once we run the code.

4
00:00:18,510 --> 00:00:20,920
So let's go here.

5
00:00:21,120 --> 00:00:23,630
Actually we'll leave a little comment.

6
00:00:23,700 --> 00:00:38,440
It's also possible to use grid search CV or randomized search CV with our pipeline.

7
00:00:38,760 --> 00:00:46,170
And now remember the purpose of using grid search CV or randomize search CV is to try and find another

8
00:00:46,170 --> 00:00:47,880
set of hyper parameters.

9
00:00:47,880 --> 00:00:52,650
We've seen searching for high parameters with a classification model but the process again is very similar

10
00:00:52,650 --> 00:00:56,250
with a regression model once you kind of learn it for one socket line model.

11
00:00:56,250 --> 00:01:01,800
Due to the fact of of how well the library is designed you can use those principles for other types

12
00:01:01,800 --> 00:01:03,210
of models.

13
00:01:03,340 --> 00:01:05,630
And now let's see how we could use it.

14
00:01:05,670 --> 00:01:10,450
Grid search CV to find better hybrid parameters better settings on our random forest regress.

15
00:01:10,740 --> 00:01:12,920
And hopefully improve this score.

16
00:01:13,140 --> 00:01:14,880
So let's type it in here.

17
00:01:14,940 --> 00:01:21,610
Use grid search CV with our regression pipeline.

18
00:01:21,830 --> 00:01:26,370
It's the first things first to create a grid so pipe grid.

19
00:01:26,370 --> 00:01:28,800
So this is going to be the grid of hyper parameters.

20
00:01:28,800 --> 00:01:31,440
Our grid search CV is going to search over.

21
00:01:31,440 --> 00:01:37,840
Now the main difference here comes in how you you set up the keys of your hyper parameter dictionary.

22
00:01:37,890 --> 00:01:39,360
Let's see one for example.

23
00:01:39,360 --> 00:01:40,250
Now we can talk about it.

24
00:01:40,680 --> 00:01:44,280
So pretty processor on this score.

25
00:01:44,280 --> 00:01:52,700
Double underscore sorry num double underscore computer double underscore strategy.

26
00:01:52,950 --> 00:01:55,950
While you might be like Daniel there's a lot of underscores.

27
00:01:55,950 --> 00:01:58,980
Double underscores and what is even happening here.

28
00:01:59,280 --> 00:02:06,210
Well let's finish this out and then we can have a look so we're having a look at this.

29
00:02:06,300 --> 00:02:08,190
What is this key referring to.

30
00:02:09,500 --> 00:02:16,760
So if we go back up here we look at our pipeline the name of this step in the pipeline is pretty process.

31
00:02:17,020 --> 00:02:17,630
Okay.

32
00:02:17,780 --> 00:02:24,020
So you saying you can line up those and then we're got to double underscore here so it goes to numbers

33
00:02:24,030 --> 00:02:30,080
who come up and have a look into our pre processor then it's going to step into our num.

34
00:02:30,110 --> 00:02:30,820
Okay.

35
00:02:31,070 --> 00:02:32,530
That certainly makes sense.

36
00:02:32,570 --> 00:02:33,540
So num.

37
00:02:33,590 --> 00:02:37,670
And then in puta underscore strategy.

38
00:02:37,790 --> 00:02:45,590
So we have to refer to num so our impurity here is our numeric transformer which is also a pipeline

39
00:02:46,460 --> 00:02:48,170
which goes to impute.

40
00:02:49,010 --> 00:02:49,600
Okay.

41
00:02:49,640 --> 00:02:49,940
Yeah.

42
00:02:49,970 --> 00:02:51,780
That's where impurity comes from.

43
00:02:51,830 --> 00:02:56,800
And then double underscore strategy strategy.

44
00:02:58,060 --> 00:02:58,360
Okay.

45
00:02:58,360 --> 00:03:05,440
So if you trace it back it's going to say try the imputation strategy of mean which is what it is now

46
00:03:06,220 --> 00:03:08,030
and medium.

47
00:03:08,060 --> 00:03:12,950
So this is the main difference between what we've seen before is grid search CV and using grid search

48
00:03:12,950 --> 00:03:14,510
CV with a pipeline.

49
00:03:14,570 --> 00:03:18,430
This is where the names come in the strings at the front of all our stamps.

50
00:03:18,440 --> 00:03:21,740
You might be wondering before what are these strings doing at the start.

51
00:03:22,100 --> 00:03:31,430
So if we want to access our pre processes num step the feature the attribute of num in pewter and then

52
00:03:31,430 --> 00:03:33,650
the strategy variable of input.

53
00:03:34,040 --> 00:03:35,670
We have to do it like this.

54
00:03:35,720 --> 00:03:37,080
So let's step back through.

55
00:03:37,280 --> 00:03:38,720
So we go pretty processor.

56
00:03:39,460 --> 00:03:39,740
Yeah.

57
00:03:39,770 --> 00:03:42,500
This one then we go up here pre processor.

58
00:03:42,590 --> 00:03:43,630
Yep num.

59
00:03:43,710 --> 00:03:49,050
Okay then we go up to num which is numeric transformer and we see impute to.

60
00:03:49,100 --> 00:03:49,650
Yep.

61
00:03:49,700 --> 00:03:52,460
And then we're going to adjust strategy.

62
00:03:52,460 --> 00:03:52,790
Okay.

63
00:03:52,790 --> 00:03:54,020
So that's the parameters for that.

64
00:03:54,020 --> 00:03:57,090
Now let's see how we would access our model.

65
00:03:57,320 --> 00:04:05,510
So we can a model double underscore well shift enter again get trigger happy model double on the school

66
00:04:05,820 --> 00:04:07,310
an estimate is.

67
00:04:08,270 --> 00:04:12,740
So we only need one double on the scale here because we're only going up one level.

68
00:04:12,740 --> 00:04:17,970
So if we access our model which is pipeline we want to access the model step.

69
00:04:18,080 --> 00:04:23,870
So this is where the model double underscore comes in and we want to pass an estimate is which would

70
00:04:23,870 --> 00:04:26,610
be the same as going in here and pressing an estimated.

71
00:04:26,630 --> 00:04:28,090
But we're not going to do it there.

72
00:04:28,160 --> 00:04:29,640
We can do it here.

73
00:04:29,750 --> 00:04:34,690
We're going to pass it one hundred and a thousand.

74
00:04:34,750 --> 00:04:41,380
So again this is just going into the model step of our pipeline which is here and passing random forest

75
00:04:41,380 --> 00:04:48,100
regress to an estimate as these two values then we can do the same for other features that we've seen

76
00:04:48,100 --> 00:04:48,530
before.

77
00:04:48,880 --> 00:04:58,150
So model underscore underscore max depth then we're going to pass it let's go none 5 and then we'll

78
00:04:58,150 --> 00:05:06,520
go model double underscore Max features now and we're going to pass it auto

79
00:05:09,460 --> 00:05:14,890
actually yeah we'll just keep it as one there is to save some time on our grid search model double on

80
00:05:14,890 --> 00:05:24,480
the score mean samples split and a pass at 2 and 4.

81
00:05:25,410 --> 00:05:26,290
OK.

82
00:05:26,490 --> 00:05:26,940
Wonderful.

83
00:05:26,940 --> 00:05:29,230
So now we've got our parameter grid set up.

84
00:05:29,490 --> 00:05:32,730
How would we run grid search saving.

85
00:05:32,760 --> 00:05:36,120
So this is where we've already imported up here.

86
00:05:36,120 --> 00:05:42,710
We've imported grid search CV from model selection but from completeness or for completeness what we

87
00:05:42,710 --> 00:05:43,640
might do is do it again.

88
00:05:43,640 --> 00:05:48,760
So from that skyline model selection import.

89
00:05:48,980 --> 00:05:50,800
You don't need to do this if you've already reported it.

90
00:05:50,840 --> 00:05:57,270
But we're just doing it so you can see where it's from from S.K. learned model selection import grid

91
00:05:57,270 --> 00:06:00,240
search CV and a great great Guess model.

92
00:06:00,280 --> 00:06:09,420
Grid search model grid search saving deposit our model which is again here in the form of a pipeline

93
00:06:11,050 --> 00:06:18,640
we're going to posit pipe grid which is this dictionary here of different parameters it's going to search

94
00:06:18,640 --> 00:06:24,520
over and then we're gonna pass it see Eagles five four five fold cross validation on positive the boast

95
00:06:24,520 --> 00:06:29,920
level of two so it prints out a little bit of progress for us so we can see what's happening in a Guess

96
00:06:29,920 --> 00:06:39,770
model don't fit it's gonna get X train y train beautiful and so this is going through five folds for

97
00:06:39,770 --> 00:06:46,940
each of the 16 candidates if you wanted to work that out you go to so two variables here times two times

98
00:06:46,940 --> 00:06:47,750
two.

99
00:06:47,840 --> 00:06:52,980
So that's two times to four times two eight times one eight times two.

100
00:06:53,000 --> 00:06:58,420
So that's 16 different combinations of parameters we're doing five fold cross validation so it's going

101
00:06:58,420 --> 00:07:06,040
to do 16 times five which is 80 so once that's finished here we've got an output here of all the different

102
00:07:06,040 --> 00:07:08,380
versions of parameters that it's doing.

103
00:07:08,380 --> 00:07:17,870
And once it's finished we'll be able to evaluate our grid search model using Jesus model dot school

104
00:07:18,080 --> 00:07:19,430
and pass it the test data

105
00:07:26,160 --> 00:07:30,510
so this is the school we've got before without grid search they're just the baseline hyper parameters

106
00:07:30,510 --> 00:07:36,370
of random forest regressive they might need a way depending on how fast your computer is you might need

107
00:07:36,370 --> 00:07:37,240
to wait a little while.

108
00:07:37,240 --> 00:07:41,890
So this is why it's always always important to be careful how many parameters you passed a grid search

109
00:07:41,890 --> 00:07:50,120
CV because it is going to exhaustively search every single combination so what we might do is pause

110
00:07:50,120 --> 00:07:54,850
the video and wait until our model has finished training and then see what its score is.

111
00:07:56,310 --> 00:07:57,090
Excellent.

112
00:07:57,090 --> 00:08:02,610
So after about a minute or so our grid search is finished and we can see that if we evaluate our grid

113
00:08:02,610 --> 00:08:08,970
searched model it gets a score of point three three three now versus our original model score of point

114
00:08:08,970 --> 00:08:09,710
1 8 2.

115
00:08:10,170 --> 00:08:11,310
That's pretty damn good.

116
00:08:11,310 --> 00:08:16,520
That's almost an improvement of double by searching for different hyper parameters using a grid search.

117
00:08:16,560 --> 00:08:21,480
Now we could keep this going for more different parameters but in the sake of time we're going to leave

118
00:08:21,480 --> 00:08:25,760
it there and say we've covered a lot.

119
00:08:26,070 --> 00:08:27,850
And if you've made it this far.

120
00:08:28,200 --> 00:08:30,020
Congratulations let's have a look here.

121
00:08:30,030 --> 00:08:38,980
What we've covered shift into what we're covering it should be what we've covered now.

122
00:08:39,370 --> 00:08:42,570
So we've gone from an end to end psychic learn workflow.

123
00:08:42,620 --> 00:08:44,400
We've seen how to get data ready.

124
00:08:44,420 --> 00:08:48,530
We've seen how to choose the right estimate a slash algorithm for our problems.

125
00:08:48,620 --> 00:08:53,560
We've seen how to fit models slash algorithms and use them to make predictions on our data.

126
00:08:53,600 --> 00:08:55,430
We've seen how to evaluate models.

127
00:08:55,430 --> 00:08:57,590
We've seen how to improve models.

128
00:08:57,590 --> 00:09:00,090
We've seen how to save and load trained models.

129
00:09:00,200 --> 00:09:05,330
And now we've just put it all together using a pipeline we fit everything that we did before that took

130
00:09:05,330 --> 00:09:12,680
multiple cells before into one cell one cell of a Jupiter a notebook we've got our entire pipeline using

131
00:09:12,680 --> 00:09:15,050
psychic loans pipeline function.

132
00:09:15,050 --> 00:09:17,680
Now if you've made it this far it's worth putting yourself on the back.

133
00:09:17,690 --> 00:09:19,340
So congratulations.

134
00:09:19,340 --> 00:09:24,500
We've covered a lot of ground in the psychic loan library but as you might have guessed there's a lot

135
00:09:24,500 --> 00:09:25,820
more that's still there.

136
00:09:26,510 --> 00:09:31,640
But with what you've learned so far you'll be able to take it on you you'll be able to look at things

137
00:09:31,640 --> 00:09:35,180
you had to look at psychic loan workflows you'll be out to look at different machine learning problems

138
00:09:35,180 --> 00:09:41,690
and go Okay I can kind of figure out that I have to do have to get the data ready first I can see if

139
00:09:41,690 --> 00:09:46,670
I can choose the right estimates slash algorithm for our problem I can go through trying to fit my model

140
00:09:46,670 --> 00:09:52,730
to the data and make predictions then I can try and evaluate it and improve it save and loaded after

141
00:09:52,730 --> 00:09:58,430
I've done a lot of messy code I can try pull it all together and clean it all up and don't worry.

142
00:09:58,430 --> 00:10:01,930
Putting all this together it's not like you're going to learn it overnight.

143
00:10:01,930 --> 00:10:05,300
I don't expect you to even after going through all the videos that we've covered.

144
00:10:05,300 --> 00:10:08,850
It's going to take a lot of practice right.

145
00:10:08,870 --> 00:10:12,730
And hopefully you can use the resources that are available to you.

146
00:10:12,730 --> 00:10:18,530
So all the code that we've got here it will be available in the in the resources section will be available

147
00:10:18,530 --> 00:10:20,770
in a notebook you can refer back to.

148
00:10:21,020 --> 00:10:27,230
But if you're looking for somewhere next of where you want to go the best place is the psychic loan

149
00:10:27,230 --> 00:10:28,870
documentation.

150
00:10:28,940 --> 00:10:35,230
So I get loan documentation if there's something that sticks out to you on something you want to learn

151
00:10:35,230 --> 00:10:40,090
a little bit more of one of these topics that we've gone through uncovered some you want to dig a little

152
00:10:40,090 --> 00:10:45,550
bit deeper into maybe you want to look at different classification models regression models or clustering

153
00:10:45,550 --> 00:10:51,530
something that we haven't seen much of yet go to the site get loan documentation go to the user guide

154
00:10:51,860 --> 00:10:56,850
and step through all of these different categories here and see if there's a little topic here that

155
00:10:56,850 --> 00:10:58,920
you want to read a bit deeper into.

156
00:10:58,950 --> 00:11:04,650
And again it's gonna be a little bit overwhelming to begin with but with practice you'll be out to start

157
00:11:04,650 --> 00:11:09,390
to make your own mental models of how you can use a psychic loan library to solve your own machine learning

158
00:11:09,390 --> 00:11:17,400
problems now with that being said I'll see you in the next section and we're going to put everything

159
00:11:17,400 --> 00:11:21,080
that we've learned in the previous section section to all these tools that we've learned we're going

160
00:11:21,070 --> 00:11:25,190
to put them together and start to work on some of our own projects.

161
00:11:25,260 --> 00:11:31,260
So take a break revise what we've learned to get the resources check out the documentation and get yourself

162
00:11:31,260 --> 00:11:33,100
excited for the next section.

163
00:11:33,210 --> 00:11:33,810
I'll see you there.