1 00:00:00,190 --> 00:00:07,380 In the last video we saw how we can put together a number of different steps using pipeline and now 2 00:00:07,410 --> 00:00:14,640 it's also possible to use grid search CV what we've seen before randomized search CV with a pipeline. 3 00:00:14,640 --> 00:00:18,220 There's a slight little difference here and it will make sense once we run the code. 4 00:00:18,510 --> 00:00:20,920 So let's go here. 5 00:00:21,120 --> 00:00:23,630 Actually we'll leave a little comment. 6 00:00:23,700 --> 00:00:38,440 It's also possible to use grid search CV or randomized search CV with our pipeline. 7 00:00:38,760 --> 00:00:46,170 And now remember the purpose of using grid search CV or randomize search CV is to try and find another 8 00:00:46,170 --> 00:00:47,880 set of hyper parameters. 9 00:00:47,880 --> 00:00:52,650 We've seen searching for high parameters with a classification model but the process again is very similar 10 00:00:52,650 --> 00:00:56,250 with a regression model once you kind of learn it for one socket line model. 11 00:00:56,250 --> 00:01:01,800 Due to the fact of of how well the library is designed you can use those principles for other types 12 00:01:01,800 --> 00:01:03,210 of models. 13 00:01:03,340 --> 00:01:05,630 And now let's see how we could use it. 14 00:01:05,670 --> 00:01:10,450 Grid search CV to find better hybrid parameters better settings on our random forest regress. 15 00:01:10,740 --> 00:01:12,920 And hopefully improve this score. 16 00:01:13,140 --> 00:01:14,880 So let's type it in here. 17 00:01:14,940 --> 00:01:21,610 Use grid search CV with our regression pipeline. 18 00:01:21,830 --> 00:01:26,370 It's the first things first to create a grid so pipe grid. 19 00:01:26,370 --> 00:01:28,800 So this is going to be the grid of hyper parameters. 20 00:01:28,800 --> 00:01:31,440 Our grid search CV is going to search over. 21 00:01:31,440 --> 00:01:37,840 Now the main difference here comes in how you you set up the keys of your hyper parameter dictionary. 22 00:01:37,890 --> 00:01:39,360 Let's see one for example. 23 00:01:39,360 --> 00:01:40,250 Now we can talk about it. 24 00:01:40,680 --> 00:01:44,280 So pretty processor on this score. 25 00:01:44,280 --> 00:01:52,700 Double underscore sorry num double underscore computer double underscore strategy. 26 00:01:52,950 --> 00:01:55,950 While you might be like Daniel there's a lot of underscores. 27 00:01:55,950 --> 00:01:58,980 Double underscores and what is even happening here. 28 00:01:59,280 --> 00:02:06,210 Well let's finish this out and then we can have a look so we're having a look at this. 29 00:02:06,300 --> 00:02:08,190 What is this key referring to. 30 00:02:09,500 --> 00:02:16,760 So if we go back up here we look at our pipeline the name of this step in the pipeline is pretty process. 31 00:02:17,020 --> 00:02:17,630 Okay. 32 00:02:17,780 --> 00:02:24,020 So you saying you can line up those and then we're got to double underscore here so it goes to numbers 33 00:02:24,030 --> 00:02:30,080 who come up and have a look into our pre processor then it's going to step into our num. 34 00:02:30,110 --> 00:02:30,820 Okay. 35 00:02:31,070 --> 00:02:32,530 That certainly makes sense. 36 00:02:32,570 --> 00:02:33,540 So num. 37 00:02:33,590 --> 00:02:37,670 And then in puta underscore strategy. 38 00:02:37,790 --> 00:02:45,590 So we have to refer to num so our impurity here is our numeric transformer which is also a pipeline 39 00:02:46,460 --> 00:02:48,170 which goes to impute. 40 00:02:49,010 --> 00:02:49,600 Okay. 41 00:02:49,640 --> 00:02:49,940 Yeah. 42 00:02:49,970 --> 00:02:51,780 That's where impurity comes from. 43 00:02:51,830 --> 00:02:56,800 And then double underscore strategy strategy. 44 00:02:58,060 --> 00:02:58,360 Okay. 45 00:02:58,360 --> 00:03:05,440 So if you trace it back it's going to say try the imputation strategy of mean which is what it is now 46 00:03:06,220 --> 00:03:08,030 and medium. 47 00:03:08,060 --> 00:03:12,950 So this is the main difference between what we've seen before is grid search CV and using grid search 48 00:03:12,950 --> 00:03:14,510 CV with a pipeline. 49 00:03:14,570 --> 00:03:18,430 This is where the names come in the strings at the front of all our stamps. 50 00:03:18,440 --> 00:03:21,740 You might be wondering before what are these strings doing at the start. 51 00:03:22,100 --> 00:03:31,430 So if we want to access our pre processes num step the feature the attribute of num in pewter and then 52 00:03:31,430 --> 00:03:33,650 the strategy variable of input. 53 00:03:34,040 --> 00:03:35,670 We have to do it like this. 54 00:03:35,720 --> 00:03:37,080 So let's step back through. 55 00:03:37,280 --> 00:03:38,720 So we go pretty processor. 56 00:03:39,460 --> 00:03:39,740 Yeah. 57 00:03:39,770 --> 00:03:42,500 This one then we go up here pre processor. 58 00:03:42,590 --> 00:03:43,630 Yep num. 59 00:03:43,710 --> 00:03:49,050 Okay then we go up to num which is numeric transformer and we see impute to. 60 00:03:49,100 --> 00:03:49,650 Yep. 61 00:03:49,700 --> 00:03:52,460 And then we're going to adjust strategy. 62 00:03:52,460 --> 00:03:52,790 Okay. 63 00:03:52,790 --> 00:03:54,020 So that's the parameters for that. 64 00:03:54,020 --> 00:03:57,090 Now let's see how we would access our model. 65 00:03:57,320 --> 00:04:05,510 So we can a model double underscore well shift enter again get trigger happy model double on the school 66 00:04:05,820 --> 00:04:07,310 an estimate is. 67 00:04:08,270 --> 00:04:12,740 So we only need one double on the scale here because we're only going up one level. 68 00:04:12,740 --> 00:04:17,970 So if we access our model which is pipeline we want to access the model step. 69 00:04:18,080 --> 00:04:23,870 So this is where the model double underscore comes in and we want to pass an estimate is which would 70 00:04:23,870 --> 00:04:26,610 be the same as going in here and pressing an estimated. 71 00:04:26,630 --> 00:04:28,090 But we're not going to do it there. 72 00:04:28,160 --> 00:04:29,640 We can do it here. 73 00:04:29,750 --> 00:04:34,690 We're going to pass it one hundred and a thousand. 74 00:04:34,750 --> 00:04:41,380 So again this is just going into the model step of our pipeline which is here and passing random forest 75 00:04:41,380 --> 00:04:48,100 regress to an estimate as these two values then we can do the same for other features that we've seen 76 00:04:48,100 --> 00:04:48,530 before. 77 00:04:48,880 --> 00:04:58,150 So model underscore underscore max depth then we're going to pass it let's go none 5 and then we'll 78 00:04:58,150 --> 00:05:06,520 go model double underscore Max features now and we're going to pass it auto 79 00:05:09,460 --> 00:05:14,890 actually yeah we'll just keep it as one there is to save some time on our grid search model double on 80 00:05:14,890 --> 00:05:24,480 the score mean samples split and a pass at 2 and 4. 81 00:05:25,410 --> 00:05:26,290 OK. 82 00:05:26,490 --> 00:05:26,940 Wonderful. 83 00:05:26,940 --> 00:05:29,230 So now we've got our parameter grid set up. 84 00:05:29,490 --> 00:05:32,730 How would we run grid search saving. 85 00:05:32,760 --> 00:05:36,120 So this is where we've already imported up here. 86 00:05:36,120 --> 00:05:42,710 We've imported grid search CV from model selection but from completeness or for completeness what we 87 00:05:42,710 --> 00:05:43,640 might do is do it again. 88 00:05:43,640 --> 00:05:48,760 So from that skyline model selection import. 89 00:05:48,980 --> 00:05:50,800 You don't need to do this if you've already reported it. 90 00:05:50,840 --> 00:05:57,270 But we're just doing it so you can see where it's from from S.K. learned model selection import grid 91 00:05:57,270 --> 00:06:00,240 search CV and a great great Guess model. 92 00:06:00,280 --> 00:06:09,420 Grid search model grid search saving deposit our model which is again here in the form of a pipeline 93 00:06:11,050 --> 00:06:18,640 we're going to posit pipe grid which is this dictionary here of different parameters it's going to search 94 00:06:18,640 --> 00:06:24,520 over and then we're gonna pass it see Eagles five four five fold cross validation on positive the boast 95 00:06:24,520 --> 00:06:29,920 level of two so it prints out a little bit of progress for us so we can see what's happening in a Guess 96 00:06:29,920 --> 00:06:39,770 model don't fit it's gonna get X train y train beautiful and so this is going through five folds for 97 00:06:39,770 --> 00:06:46,940 each of the 16 candidates if you wanted to work that out you go to so two variables here times two times 98 00:06:46,940 --> 00:06:47,750 two. 99 00:06:47,840 --> 00:06:52,980 So that's two times to four times two eight times one eight times two. 100 00:06:53,000 --> 00:06:58,420 So that's 16 different combinations of parameters we're doing five fold cross validation so it's going 101 00:06:58,420 --> 00:07:06,040 to do 16 times five which is 80 so once that's finished here we've got an output here of all the different 102 00:07:06,040 --> 00:07:08,380 versions of parameters that it's doing. 103 00:07:08,380 --> 00:07:17,870 And once it's finished we'll be able to evaluate our grid search model using Jesus model dot school 104 00:07:18,080 --> 00:07:19,430 and pass it the test data 105 00:07:26,160 --> 00:07:30,510 so this is the school we've got before without grid search they're just the baseline hyper parameters 106 00:07:30,510 --> 00:07:36,370 of random forest regressive they might need a way depending on how fast your computer is you might need 107 00:07:36,370 --> 00:07:37,240 to wait a little while. 108 00:07:37,240 --> 00:07:41,890 So this is why it's always always important to be careful how many parameters you passed a grid search 109 00:07:41,890 --> 00:07:50,120 CV because it is going to exhaustively search every single combination so what we might do is pause 110 00:07:50,120 --> 00:07:54,850 the video and wait until our model has finished training and then see what its score is. 111 00:07:56,310 --> 00:07:57,090 Excellent. 112 00:07:57,090 --> 00:08:02,610 So after about a minute or so our grid search is finished and we can see that if we evaluate our grid 113 00:08:02,610 --> 00:08:08,970 searched model it gets a score of point three three three now versus our original model score of point 114 00:08:08,970 --> 00:08:09,710 1 8 2. 115 00:08:10,170 --> 00:08:11,310 That's pretty damn good. 116 00:08:11,310 --> 00:08:16,520 That's almost an improvement of double by searching for different hyper parameters using a grid search. 117 00:08:16,560 --> 00:08:21,480 Now we could keep this going for more different parameters but in the sake of time we're going to leave 118 00:08:21,480 --> 00:08:25,760 it there and say we've covered a lot. 119 00:08:26,070 --> 00:08:27,850 And if you've made it this far. 120 00:08:28,200 --> 00:08:30,020 Congratulations let's have a look here. 121 00:08:30,030 --> 00:08:38,980 What we've covered shift into what we're covering it should be what we've covered now. 122 00:08:39,370 --> 00:08:42,570 So we've gone from an end to end psychic learn workflow. 123 00:08:42,620 --> 00:08:44,400 We've seen how to get data ready. 124 00:08:44,420 --> 00:08:48,530 We've seen how to choose the right estimate a slash algorithm for our problems. 125 00:08:48,620 --> 00:08:53,560 We've seen how to fit models slash algorithms and use them to make predictions on our data. 126 00:08:53,600 --> 00:08:55,430 We've seen how to evaluate models. 127 00:08:55,430 --> 00:08:57,590 We've seen how to improve models. 128 00:08:57,590 --> 00:09:00,090 We've seen how to save and load trained models. 129 00:09:00,200 --> 00:09:05,330 And now we've just put it all together using a pipeline we fit everything that we did before that took 130 00:09:05,330 --> 00:09:12,680 multiple cells before into one cell one cell of a Jupiter a notebook we've got our entire pipeline using 131 00:09:12,680 --> 00:09:15,050 psychic loans pipeline function. 132 00:09:15,050 --> 00:09:17,680 Now if you've made it this far it's worth putting yourself on the back. 133 00:09:17,690 --> 00:09:19,340 So congratulations. 134 00:09:19,340 --> 00:09:24,500 We've covered a lot of ground in the psychic loan library but as you might have guessed there's a lot 135 00:09:24,500 --> 00:09:25,820 more that's still there. 136 00:09:26,510 --> 00:09:31,640 But with what you've learned so far you'll be able to take it on you you'll be able to look at things 137 00:09:31,640 --> 00:09:35,180 you had to look at psychic loan workflows you'll be out to look at different machine learning problems 138 00:09:35,180 --> 00:09:41,690 and go Okay I can kind of figure out that I have to do have to get the data ready first I can see if 139 00:09:41,690 --> 00:09:46,670 I can choose the right estimates slash algorithm for our problem I can go through trying to fit my model 140 00:09:46,670 --> 00:09:52,730 to the data and make predictions then I can try and evaluate it and improve it save and loaded after 141 00:09:52,730 --> 00:09:58,430 I've done a lot of messy code I can try pull it all together and clean it all up and don't worry. 142 00:09:58,430 --> 00:10:01,930 Putting all this together it's not like you're going to learn it overnight. 143 00:10:01,930 --> 00:10:05,300 I don't expect you to even after going through all the videos that we've covered. 144 00:10:05,300 --> 00:10:08,850 It's going to take a lot of practice right. 145 00:10:08,870 --> 00:10:12,730 And hopefully you can use the resources that are available to you. 146 00:10:12,730 --> 00:10:18,530 So all the code that we've got here it will be available in the in the resources section will be available 147 00:10:18,530 --> 00:10:20,770 in a notebook you can refer back to. 148 00:10:21,020 --> 00:10:27,230 But if you're looking for somewhere next of where you want to go the best place is the psychic loan 149 00:10:27,230 --> 00:10:28,870 documentation. 150 00:10:28,940 --> 00:10:35,230 So I get loan documentation if there's something that sticks out to you on something you want to learn 151 00:10:35,230 --> 00:10:40,090 a little bit more of one of these topics that we've gone through uncovered some you want to dig a little 152 00:10:40,090 --> 00:10:45,550 bit deeper into maybe you want to look at different classification models regression models or clustering 153 00:10:45,550 --> 00:10:51,530 something that we haven't seen much of yet go to the site get loan documentation go to the user guide 154 00:10:51,860 --> 00:10:56,850 and step through all of these different categories here and see if there's a little topic here that 155 00:10:56,850 --> 00:10:58,920 you want to read a bit deeper into. 156 00:10:58,950 --> 00:11:04,650 And again it's gonna be a little bit overwhelming to begin with but with practice you'll be out to start 157 00:11:04,650 --> 00:11:09,390 to make your own mental models of how you can use a psychic loan library to solve your own machine learning 158 00:11:09,390 --> 00:11:17,400 problems now with that being said I'll see you in the next section and we're going to put everything 159 00:11:17,400 --> 00:11:21,080 that we've learned in the previous section section to all these tools that we've learned we're going 160 00:11:21,070 --> 00:11:25,190 to put them together and start to work on some of our own projects. 161 00:11:25,260 --> 00:11:31,260 So take a break revise what we've learned to get the resources check out the documentation and get yourself 162 00:11:31,260 --> 00:11:33,100 excited for the next section. 163 00:11:33,210 --> 00:11:33,810 I'll see you there.