1 00:00:00,210 --> 00:00:01,350 Welcome back. 2 00:00:01,420 --> 00:00:07,110 The last video we saw had a tune a model's hyper parameters using randomized search CV. 3 00:00:07,230 --> 00:00:12,570 And what that did was we defined a space or dictionary of different hyper parameter values we'd like 4 00:00:12,570 --> 00:00:19,860 to try and then randomize socially we combine these parameters at random and built a a classifier with 5 00:00:19,860 --> 00:00:25,950 different settings different of parameters and then evaluated them all using cross validation or we 6 00:00:25,950 --> 00:00:27,360 might change this back to 10. 7 00:00:27,390 --> 00:00:29,180 We don't want 50 models running off. 8 00:00:29,180 --> 00:00:34,050 And then after trying 10 different combinations it found the best parameters to be this set of values 9 00:00:34,050 --> 00:00:34,940 here. 10 00:00:34,980 --> 00:00:40,080 Then after evaluating them we saw it didn't really do as well as our previous model up here where we 11 00:00:40,080 --> 00:00:46,720 choose just by hand and so finally a workflow you might end up with when you're turning off parameters 12 00:00:46,990 --> 00:00:52,900 is starting off by hand and then using randomized search CV across a space of high parameters. 13 00:00:53,050 --> 00:00:57,970 And then once you found some pretty good hybrid parameters you'll probably finish up with grid search 14 00:00:58,020 --> 00:00:58,770 CV. 15 00:00:58,810 --> 00:01:00,070 Let's see that in action here. 16 00:01:00,400 --> 00:01:11,100 So five point three hyper parameter tuning with grid search CV beautiful. 17 00:01:11,150 --> 00:01:18,790 And let's check out our grid so our grid at the moment is you might notice the grid search CV has the 18 00:01:18,790 --> 00:01:24,280 word grid in it and our grid of hybrid parameters at the moment are these values here. 19 00:01:24,280 --> 00:01:31,990 And the key difference here between randomize search and grid search CV is that randomize search CV 20 00:01:32,020 --> 00:01:39,850 has a parameter called n ita which is what we saw up here which we can set to limit the number of models 21 00:01:39,850 --> 00:01:43,860 to try so in our case we use 10. 22 00:01:43,870 --> 00:01:49,900 So if we look back up here the top of this code fitting five folds for each of 10 candidates totalling 23 00:01:49,900 --> 00:01:50,980 50 fits. 24 00:01:51,050 --> 00:01:57,400 The key difference between randomized socially V and grid search CV is that grid search CV is kind of 25 00:01:57,400 --> 00:02:05,760 like a brute force search it means it will go through every single combination that is available here. 26 00:02:05,830 --> 00:02:08,650 So if we have a look we've got six values there. 27 00:02:08,810 --> 00:02:15,890 So we six times five times two times three times three. 28 00:02:15,910 --> 00:02:18,080 So there's six parameters here. 29 00:02:18,220 --> 00:02:27,610 There's five here two here three three in total 540 different parameters that it would try or combinations 30 00:02:27,610 --> 00:02:28,240 of parameters. 31 00:02:28,290 --> 00:02:32,730 That's a lot that's and then cross validated So time's up by five. 32 00:02:32,830 --> 00:02:34,510 So two thousand seven hundred model. 33 00:02:34,510 --> 00:02:35,310 That's a lot right. 34 00:02:35,380 --> 00:02:39,310 That's going to take a lot of compute power training one model might take long enough time but training 35 00:02:39,310 --> 00:02:44,950 twenty seven hundred of them especially if you're working with a large dataset probably not going to 36 00:02:44,950 --> 00:02:51,250 be suitable for your laptop you may need a big computer or we can reduce the search space so let's see 37 00:02:51,250 --> 00:02:53,200 that in action. 38 00:02:53,230 --> 00:03:00,040 So what I mean by that is we'll create Grid 2 using a little bit more of a confined hyper parameter 39 00:03:00,040 --> 00:03:00,850 space. 40 00:03:00,850 --> 00:03:07,040 So in essence just reducing the number of hyper parameters grid search CV has to go through. 41 00:03:07,130 --> 00:03:14,230 So let's see this in action what we might do is just copy this and now how would you create Grid 2 in 42 00:03:14,230 --> 00:03:15,640 practice. 43 00:03:15,640 --> 00:03:18,420 We could just copy and paste it which we've done. 44 00:03:18,550 --> 00:03:22,960 I want to line these up so it all looks pi formic. 45 00:03:23,050 --> 00:03:24,030 There we go. 46 00:03:24,260 --> 00:03:30,880 And we could just copy and paste in and then just delete these at random delete delete we might not 47 00:03:30,880 --> 00:03:39,160 do that say we've gone through randomize search CV and it's gone through let's say we up this to 50 48 00:03:39,160 --> 00:03:45,420 different models like we had before but in our case we've only actually done 10 we might take these 49 00:03:45,420 --> 00:03:52,970 parameters the best parameter from our randomize search CV and use them to influence where we'd like 50 00:03:52,970 --> 00:03:55,030 grid search CV to run. 51 00:03:55,100 --> 00:04:01,010 So let's do that our best parameters is two hundred six to square root. 52 00:04:02,030 --> 00:04:11,210 So what we might do remove 10 keep 500 day and remove these two yep max depth is none. 53 00:04:11,210 --> 00:04:19,170 So we might just keep that as none so it's only got one option there and then for square root we might 54 00:04:20,130 --> 00:04:21,820 actually keep that is an order. 55 00:04:21,820 --> 00:04:27,400 We'll square root then what else do we have means sample split is six. 56 00:04:27,540 --> 00:04:34,600 We might get rid of four in two and then what we might do mean samples leaf. 57 00:04:34,620 --> 00:04:36,390 So we might keep that is one and two. 58 00:04:37,110 --> 00:04:38,340 So what have we done here. 59 00:04:38,850 --> 00:04:45,730 Well what we've done is we've reduced our search space of hyper parameters so now you remember we had 60 00:04:45,740 --> 00:04:48,030 540 different combinations there. 61 00:04:48,100 --> 00:04:48,790 What do we have. 62 00:04:48,790 --> 00:04:58,510 So we have three times one times two times one times two so twelve. 63 00:04:58,550 --> 00:05:03,090 And then it's gonna be cross validated five times so we can adjust this using the CV parameter. 64 00:05:03,800 --> 00:05:06,370 So 60 versus 2700. 65 00:05:06,380 --> 00:05:08,510 So that's a lot less parameters. 66 00:05:08,530 --> 00:05:14,870 So how would we do this in practice how would we use grid search CV to find the best hyper parameters 67 00:05:14,960 --> 00:05:16,670 in this space. 68 00:05:16,670 --> 00:05:26,570 Well as always let's see it in action from S K learn the model selection import grid search CV and we 69 00:05:26,570 --> 00:05:27,900 want to import as well. 70 00:05:27,890 --> 00:05:37,090 Train test split we probably already have that we can put that there the random seed wonderful split 71 00:05:37,120 --> 00:05:43,210 edit what we might do actually is just bring this code up here because it's the exact same code that 72 00:05:43,210 --> 00:05:44,230 we want to use before. 73 00:05:44,260 --> 00:05:54,010 So we'll just take this said we're going to change one thing come down wonderful. 74 00:05:54,060 --> 00:06:00,360 So we're using the same dataset as our randomized search CV making the same split we're instantiating 75 00:06:00,360 --> 00:06:01,800 a random forest classifier. 76 00:06:02,250 --> 00:06:14,810 But this time we're gonna set up grid search savy grid search save a wonderful grid search saving and 77 00:06:14,810 --> 00:06:25,670 then we're going to fit it the grid search savy here and now there's a different parameter here. 78 00:06:25,700 --> 00:06:32,300 So this one is going to be param grid I believe it's not going to have an ETA because grid search CV 79 00:06:32,300 --> 00:06:39,590 is like brute force it tries every single combination and we want it to try out grid 2 not grid one 80 00:06:40,370 --> 00:06:44,250 because it's got a few less options there and grid 2 as we saw before. 81 00:06:44,300 --> 00:06:45,380 So we'll come down here. 82 00:06:45,380 --> 00:06:50,420 Keep cross validation as fivefold verbose equals that. 83 00:06:50,630 --> 00:06:51,040 Okay. 84 00:06:51,110 --> 00:06:53,190 Now let's step through what's happening. 85 00:06:53,330 --> 00:06:55,200 So we're importing grid search TV. 86 00:06:55,460 --> 00:07:01,280 We've created Grid 2 which is a refined search base of different hyper parameters so different settings 87 00:07:01,550 --> 00:07:05,190 so much like if you were working on your favorite meal adjusting your oven. 88 00:07:05,360 --> 00:07:09,140 If you set the temperature to 400 degrees that's way too high. 89 00:07:09,200 --> 00:07:15,950 So if it starts off 180 degrees and it does okay You might go why would I go right to 500 or 400 degrees 90 00:07:16,220 --> 00:07:18,590 when I can just make a little jump to 400. 91 00:07:18,590 --> 00:07:25,310 So that's why we've created Grid 2 We've based it off the best parameters or the best hyper parameters 92 00:07:25,340 --> 00:07:34,750 that our randomize searched CV or randomize search CV has found so let's do it let's build a random 93 00:07:34,750 --> 00:07:39,130 fast classifier using grid search CV. 94 00:07:39,290 --> 00:07:40,850 So here's how it's gonna go again. 95 00:07:40,850 --> 00:07:46,360 Same thing in feeding five falls for each of 12 candidates totalling sixty fits. 96 00:07:46,460 --> 00:07:51,440 Now what grid search CV is going to do just like randomize search CV. 97 00:07:51,650 --> 00:07:57,350 It's going to go through all of these different hybrid parameters and pass them to our random forest 98 00:07:57,350 --> 00:08:04,790 classifier trying each different combination and then eventually when it's finished we'll be able to 99 00:08:04,790 --> 00:08:08,840 go G.S. CnF best Grams 100 00:08:12,020 --> 00:08:15,700 wonderful so it might take a little while with grid search CV. 101 00:08:15,800 --> 00:08:21,770 This is something to keep in mind the more hybrid parameters you have in here the longer it will take. 102 00:08:21,770 --> 00:08:28,950 Grid search CV to run because it has to try every single combination of high parameters in here. 103 00:08:29,000 --> 00:08:36,380 So now as we did before we can evaluate our grid search classifier by making predictions with it and 104 00:08:36,380 --> 00:08:41,300 then using our evaluation function to figure out how it did. 105 00:08:41,420 --> 00:08:42,420 So let's do that. 106 00:08:42,470 --> 00:08:51,200 G.S. CLSA I actually got to guess why friends equals G.S. Steel f don't predict will predict on the 107 00:08:51,200 --> 00:08:52,710 test data as well. 108 00:08:52,880 --> 00:08:59,160 Ex test and then we'll go evaluate the predictions. 109 00:08:59,180 --> 00:09:06,380 G.S. metrics equals evaluate Fred's our function when that we defined a couple of videos ago and then 110 00:09:06,430 --> 00:09:06,710 go. 111 00:09:06,720 --> 00:09:15,330 G.S. y Fred's wonderful and now we again we see a slight decline in accuracy. 112 00:09:15,370 --> 00:09:16,340 Mm hmm. 113 00:09:16,370 --> 00:09:21,570 And that just goes to show how much of a process churning hyper parameters can be. 114 00:09:21,560 --> 00:09:22,190 Right. 115 00:09:22,220 --> 00:09:24,360 It's all about that trial and error. 116 00:09:24,530 --> 00:09:32,690 What you could do to fix this is go back up your workflow would probably be try a few settings by hand 117 00:09:33,020 --> 00:09:37,430 as we did right back up here churning hard parameters by hand. 118 00:09:37,430 --> 00:09:39,910 This is the workflow you'd probably do try. 119 00:09:39,910 --> 00:09:45,480 Just a few of these by hand see if you can figure something out with the validation set like we've done 120 00:09:46,470 --> 00:09:53,790 and then try some high parameter tuning using randomized search CV you might in this case up this number 121 00:09:53,790 --> 00:09:57,870 to 20 or 50 or so or whatever arbitrary number that you find again. 122 00:09:57,900 --> 00:10:03,190 This is all experimentation all to try and figure out what type of parameters are better and you might 123 00:10:03,190 --> 00:10:08,930 even change the grid that you're searching over based on the hyper parameters you find in the psychic 124 00:10:08,950 --> 00:10:13,990 learn model documentation for the model that you're using and then once you've found some good type 125 00:10:13,990 --> 00:10:20,800 of RAM and is using randomized search CV you might take that those hyper parameters these ones here 126 00:10:20,950 --> 00:10:28,270 best parameters and then create another grid to grid search over like we've done here in our case it 127 00:10:28,270 --> 00:10:33,010 hasn't improved our models metrics but this is where we'd probably start to experiment a little bit 128 00:10:33,010 --> 00:10:33,310 more. 129 00:10:33,310 --> 00:10:38,800 Try a few different other metrics try a few different other hybrid parameters but for completeness what 130 00:10:38,800 --> 00:10:43,630 you probably do then after you've tried a fair few different parameters you've created a baseline model 131 00:10:43,630 --> 00:10:48,960 with the default settings you've tried to adjust and by hand you've tried to adjust and by randomize 132 00:10:48,960 --> 00:10:54,730 search TV and you've tried to adjust and by grid search you're gonna want to compare your models so 133 00:10:54,730 --> 00:10:55,630 let's compare 134 00:10:58,400 --> 00:11:01,390 how different models metrics 135 00:11:04,050 --> 00:11:08,880 wonderful and to do so we're going to create a data frame so payday. 136 00:11:08,880 --> 00:11:15,270 This is why we've been returning dictionaries from all of our all of our models and I'll evaluate Fred's 137 00:11:15,270 --> 00:11:32,360 function so baseline is going to be baseline metrics and then a 2 is going to be left to metrics type 138 00:11:32,360 --> 00:11:43,410 and ends here when we don't need them random search is going to be our race metrics and then grid search 139 00:11:44,640 --> 00:11:54,990 is going to be G.S. metrics wonderful and then to see it in action we're gonna go compare metrics which 140 00:11:54,990 --> 00:12:02,370 is just a data frame that we've instantiated here or we might need to move this up there is gonna be 141 00:12:02,370 --> 00:12:10,350 an error there in there that's actually squiggly bracket that we didn't need dot plot dot the big size 142 00:12:10,440 --> 00:12:17,210 equals 10 eight let's see it wonderful. 143 00:12:18,230 --> 00:12:23,420 So this is going to compare all of our different classification models metrics in the one space so we 144 00:12:23,420 --> 00:12:31,010 can see our baseline model is the blue so on this one it didn't get the best year it was more of a tie 145 00:12:31,040 --> 00:12:38,750 between CnF 2 and random search on precision CnF 2 is out in front on recall baseline is out in front 146 00:12:39,110 --> 00:12:46,370 and on EF 1 it's a tie between baseline and EF 2 and again you could adjust this graph to to make it 147 00:12:46,370 --> 00:12:47,220 more visible. 148 00:12:47,360 --> 00:12:49,820 This is just to demonstrate different metrics. 149 00:12:49,940 --> 00:12:55,070 So this is the kind of communication you do for not only yourself but for your teammates or if you're 150 00:12:55,070 --> 00:13:00,260 reporting to someone in a project is to show them the different models that you've tried and and how 151 00:13:00,260 --> 00:13:03,800 they perform differently on different classification metrics. 152 00:13:03,830 --> 00:13:07,790 If you are working on a classification problem of course you can do the same sort of process with a 153 00:13:07,790 --> 00:13:14,600 regression problem and now depending on what you need for the project will depend on which model. 154 00:13:14,600 --> 00:13:20,060 So say you wanted a model that optimized recall in this case you might use the the baseline model or 155 00:13:20,200 --> 00:13:24,450 if you wanted a model that optimized precision you might choose the CSF to model. 156 00:13:24,620 --> 00:13:29,630 And if none of these metrics really worked for you you might continue to trying to find some better 157 00:13:29,630 --> 00:13:33,190 ones through random search or grid search. 158 00:13:33,530 --> 00:13:39,600 And now we finally got to the end of how you can improve a model via hyper parameter tuning. 159 00:13:39,710 --> 00:13:42,280 And again there is a lot to take in here. 160 00:13:42,290 --> 00:13:47,870 The key point is remember it's an experimental process improving a model through hyper parameter tuning 161 00:13:47,990 --> 00:13:55,280 is it there's no sort of written law on how you do it but the point is to just keep trying. 162 00:13:55,280 --> 00:13:56,930 See if you can figure out something. 163 00:13:56,930 --> 00:14:01,520 See if you can get yourself familiarized with with different hybrid parameters at different models have 164 00:14:01,520 --> 00:14:08,330 such as a random forest classifier and just remind yourself that the first model that you make is more 165 00:14:08,330 --> 00:14:11,020 than likely not the best model that you have. 166 00:14:11,060 --> 00:14:17,140 It's a baseline model and it can be improved upon and so built a model we've tried to improve it. 167 00:14:17,180 --> 00:14:21,750 Let's say you've decided you're going to focus on on recall or something like that. 168 00:14:21,920 --> 00:14:28,770 You may want to export your model so you can share it or use it in some sort of deployment setup. 169 00:14:28,820 --> 00:14:35,390 So in the next section we're going to look at if we look at our list here what we're covering we're 170 00:14:35,390 --> 00:14:41,330 going to look at how to save and share and load a model so save and load a trained model. 171 00:14:41,330 --> 00:14:46,700 The benefit of this is that you won't have to retrain a model like we've done in the past few sections 172 00:14:47,290 --> 00:14:51,730 so go back through check out what we've covered with high priority training how to improve a model high 173 00:14:51,730 --> 00:14:53,440 parameter with grid search CV. 174 00:14:53,510 --> 00:14:55,570 Try it out with randomize search. 175 00:14:55,580 --> 00:14:58,990 Try it out by hand and I'll see you in the next video.