1 00:00:00,360 --> 00:00:01,450 Welcome back. 2 00:00:01,450 --> 00:00:06,550 And in the last video we checked out how to tune a model's hyper parameters by hand. 3 00:00:06,660 --> 00:00:11,460 And we figured out by the end of it once he'd written some code ones we've split our data into train 4 00:00:11,460 --> 00:00:13,500 validation and test sets manually. 5 00:00:13,500 --> 00:00:17,330 We trade loan valuation function Yeah. 6 00:00:17,390 --> 00:00:21,140 At the end we figured out cheering him on by hand. 7 00:00:21,140 --> 00:00:24,230 That sounds like a lot of work and you're right. 8 00:00:24,350 --> 00:00:27,400 It would be if we went through and tried a whole bunch of different parameters. 9 00:00:27,410 --> 00:00:32,780 But luckily the developers socket loan have also found to come across this problem and have created 10 00:00:32,780 --> 00:00:38,150 randomized search CV which stands for randomize search cross validation. 11 00:00:38,150 --> 00:00:44,750 Let's see how we'll tune hub parameters so a.k.a. adjust the settings on our models to make better predictions 12 00:00:44,810 --> 00:00:46,540 or hopefully make better predictions. 13 00:00:46,670 --> 00:00:49,420 Using randomized search CV. 14 00:00:50,110 --> 00:00:52,640 So let's go here and create another heading. 15 00:00:52,640 --> 00:00:54,070 This can be 5.2. 16 00:00:54,200 --> 00:01:01,610 So hyper parameter churning with randomized search. 17 00:01:01,750 --> 00:01:07,770 See the beautiful first thing we'll do is Will important. 18 00:01:07,780 --> 00:01:15,580 So from S.K. learn dot model selection import randomize name. 19 00:01:15,590 --> 00:01:18,300 We might be had a press tab here at randomize search CV. 20 00:01:18,320 --> 00:01:19,480 Beautiful. 21 00:01:19,520 --> 00:01:21,320 Let's see an example of how we use it. 22 00:01:21,350 --> 00:01:23,790 Before we check out the dog string. 23 00:01:24,440 --> 00:01:30,050 So what we'll do is we'll create a grid of hybrid parameters and by grid I mean dictionary and I've 24 00:01:30,050 --> 00:01:37,430 typed Rod here a dictionary of hyper parameters we'd like to adjust so if we come back up here these 25 00:01:37,430 --> 00:01:40,370 are the parameters we're going to adjust. 26 00:01:40,370 --> 00:01:45,290 So what we're going to do is create a dictionary with the hyper parameters we'd like to adjust as the 27 00:01:45,290 --> 00:01:51,290 keys and then the values we'd like to try as the values of the dictionary. 28 00:01:51,950 --> 00:01:52,760 So let's do that. 29 00:01:52,850 --> 00:01:59,030 So an estimate is that's a key we're going to get a positive list. 30 00:01:59,030 --> 00:02:07,860 So one hundred two hundred five hundred one thousand and 12 hundred beautiful. 31 00:02:07,880 --> 00:02:11,750 Now if you're wondering Daniel like where are you getting these values. 32 00:02:11,750 --> 00:02:17,850 Well remember as always with any model or estimating socket loan there's extensive documentation on 33 00:02:17,850 --> 00:02:21,620 it which tells you some of the settings that you can change on the model. 34 00:02:21,620 --> 00:02:27,710 So what I've done is I've read through here read through the examples done some research and found different 35 00:02:27,710 --> 00:02:31,420 values for the high parameters we can try to get here. 36 00:02:31,460 --> 00:02:35,570 None 5 10 so it looks like I'm just throwing our values here. 37 00:02:35,570 --> 00:02:39,050 Trust me they're not just sort of throwing out values. 38 00:02:39,050 --> 00:02:45,290 They are based upon some research and some experience and don't worry you won't begin with that. 39 00:02:45,290 --> 00:02:46,310 No one begins with that. 40 00:02:46,310 --> 00:02:48,290 It takes a little bit of practice to go through it. 41 00:02:48,290 --> 00:02:52,310 That's why we're having a look hands on at all these different functions that we can use to improve 42 00:02:52,310 --> 00:02:52,790 our models 43 00:02:55,390 --> 00:02:56,240 wonderful. 44 00:02:56,380 --> 00:02:58,160 And then we'll do two more. 45 00:02:58,210 --> 00:03:00,820 Which is mean samples split 46 00:03:03,380 --> 00:03:11,510 two four six Come here man samples leaf 47 00:03:17,070 --> 00:03:21,350 and let's see how we'd implement randomized search saving. 48 00:03:21,960 --> 00:03:23,220 So we'll set up a random seed 49 00:03:27,480 --> 00:03:30,420 we'll split into x and y 50 00:03:33,450 --> 00:03:41,420 so we want heart disease shuffled actually because that's what we've used before. 51 00:03:41,520 --> 00:03:42,260 Don't drop. 52 00:03:42,520 --> 00:03:54,190 Oh actually we need to make X here don't drop target access equals one and we'll make Y here say heart 53 00:03:54,190 --> 00:03:57,550 disease shuffled 54 00:04:01,770 --> 00:04:05,580 target beautiful. 55 00:04:05,690 --> 00:04:10,340 We're gonna go split into train and test sets huh. 56 00:04:10,520 --> 00:04:12,590 I've been wondering why we only using train and test here. 57 00:04:12,590 --> 00:04:14,550 We've just created a validation set now. 58 00:04:14,570 --> 00:04:15,480 We'll get to that. 59 00:04:15,590 --> 00:04:16,790 Why train. 60 00:04:17,030 --> 00:04:18,500 Why test. 61 00:04:18,500 --> 00:04:19,690 Train test split. 62 00:04:21,920 --> 00:04:30,590 X Y and we'll use a normal split he s a test size equals zero point two wonderful and then we're going 63 00:04:30,590 --> 00:04:32,270 to instantiate 64 00:04:38,930 --> 00:04:47,980 random forest classifier CSF equals random forest classifier and jobs equals. 65 00:04:48,010 --> 00:04:52,100 Now we could do negative one here but at the moment negative one is broken for me. 66 00:04:52,100 --> 00:04:58,400 Now that'll make sense in a in a second but in jobs stands for how much of your computer processor are 67 00:04:58,400 --> 00:05:03,750 you going to dedicate towards this machine learning model and negative one means all of it. 68 00:05:03,980 --> 00:05:09,470 So by default and jobs is one actually what is the default in jobs. 69 00:05:09,640 --> 00:05:14,070 Let's go here and jobs equals none. 70 00:05:14,460 --> 00:05:19,790 Okay so we'll said listen I'm actually I'll just settle as one so different values you pass to and jobs 71 00:05:19,790 --> 00:05:24,350 will dictate how much of your computer processor that you want to dedicate towards machine learning 72 00:05:24,350 --> 00:05:25,110 model. 73 00:05:25,250 --> 00:05:34,560 And now once we've got a classifier instantiated we'll set up randomized search saving. 74 00:05:35,840 --> 00:05:40,120 So we're gonna pass it we're gonna call it R.S. CNN. 75 00:05:40,280 --> 00:05:49,460 So what that's going to do is it just pending arrest to it for randomize search and now randomized search 76 00:05:49,580 --> 00:05:54,550 CV the first parameter it takes is estimated equals CSF. 77 00:05:54,560 --> 00:06:01,700 So we're passing it this this random fast classifier we've we've instantiated param distribution is 78 00:06:01,700 --> 00:06:04,260 the next one we're going to pass it out grid. 79 00:06:04,340 --> 00:06:06,990 So this is great up here. 80 00:06:07,470 --> 00:06:19,760 The next thing is we're going to define in it to say equals 10 and this is the number of models to try 81 00:06:22,160 --> 00:06:34,520 would be five fold cross validation and then we'll set the bow city for both equals to while this is 82 00:06:34,520 --> 00:06:35,960 some code we haven't seen before. 83 00:06:36,380 --> 00:06:43,940 So what is happening when we run randomized search CV we look at the doctoring press shift tab in here 84 00:06:44,860 --> 00:06:46,850 that's because we have an imported. 85 00:06:46,850 --> 00:06:48,000 That's right. 86 00:06:48,400 --> 00:06:50,890 Well press shift tab here now. 87 00:06:51,020 --> 00:06:52,780 So what's the doctoring here. 88 00:06:52,840 --> 00:06:58,800 Randomize search on hyper parameters randomize search CV implements a fit and a score method. 89 00:06:58,910 --> 00:07:04,080 It also implements predict predict probe decision function transform in inverse transform. 90 00:07:04,100 --> 00:07:11,180 If they implemented in the estimate we used as a randomized search on hyper parameters that can sound 91 00:07:11,180 --> 00:07:18,100 a little bit confusing what does it mean by randomize searched so that's in the name randomize search 92 00:07:18,100 --> 00:07:23,170 CV CV is for cross validation which is where you might recognize this parameter see vehicles five. 93 00:07:23,170 --> 00:07:27,310 So that means we're using five fold cross validation. 94 00:07:27,310 --> 00:07:29,240 So five fold cross validation. 95 00:07:29,260 --> 00:07:33,100 This is why we don't necessarily have to create a validation set here. 96 00:07:33,100 --> 00:07:41,560 So what randomize search CV will do is it will take our classifier and it'll take a pram distributions 97 00:07:41,560 --> 00:07:52,270 grid which is this and it's going to search over this grid for 10 different times different combinations 98 00:07:52,360 --> 00:07:59,090 of these parameters at random so for example on its first iteration. 99 00:07:59,130 --> 00:08:04,170 So because it's doing any the Eagles 10 it's gonna do this 10 times on its first iteration. 100 00:08:04,170 --> 00:08:12,810 It might try and model with 10 estimated as a max depth of nine Max features set to auto mean samples 101 00:08:12,990 --> 00:08:20,430 split set to two mean samples leaf set to 1 and then on the next iteration so iteration 2 out of 10 102 00:08:20,760 --> 00:08:28,710 it might try a thousand estimate as 30 as the max depth auto is the max features four as the main samples 103 00:08:28,710 --> 00:08:33,960 split and then four as a means samples leaf then it's gonna keep going that up to 10. 104 00:08:33,960 --> 00:08:36,240 Now we could set this to 100 to a thousand. 105 00:08:36,240 --> 00:08:42,300 I don't think there's even a thousand combinations in here if you went through you could go six times 106 00:08:42,300 --> 00:08:49,290 five that's 30 times to that 60 times three that's 180 times three that's four hundred and sixty on 107 00:08:49,530 --> 00:08:53,040 the math may be a little bit off there but you get the point right if you went through all of these 108 00:08:53,040 --> 00:08:58,330 and tried to combine them and every single combination that's gonna be a lot of different models. 109 00:08:58,570 --> 00:09:01,080 So let's see this in action. 110 00:09:01,170 --> 00:09:08,510 So once we've set it up instantiated randomize search CV we'll go here fit the randomized 111 00:09:11,060 --> 00:09:14,940 such savy version of CLSA. 112 00:09:17,050 --> 00:09:23,870 On R.S. CSF not fit X train that y train. 113 00:09:23,930 --> 00:09:29,870 Now the reason we only have defeated X train and Y train is because the CV in randomize search CV stands 114 00:09:29,870 --> 00:09:31,650 for cross validation. 115 00:09:31,760 --> 00:09:36,770 So that is what it's going to do it's going to automatically make our validation sets for us which is 116 00:09:36,770 --> 00:09:38,230 a beautiful thing. 117 00:09:38,270 --> 00:09:43,850 So if we come back to our diagram it's going to take our data and then it's gonna try different hyper 118 00:09:43,850 --> 00:09:49,370 parameters cross validated on different hyper parameters settings. 119 00:09:49,550 --> 00:09:51,600 So that's what it's going to return. 120 00:09:51,760 --> 00:09:59,750 It's gonna figure out which combination of these high parameters is the best up to 10 different models. 121 00:09:59,750 --> 00:10:03,200 And again we can increase this if we wanted to try more combinations. 122 00:10:03,470 --> 00:10:05,340 So let's see it in action. 123 00:10:08,670 --> 00:10:14,120 So this is where we've set for verbose to 2 it's going to output and tell us what's going on. 124 00:10:14,280 --> 00:10:22,290 So fitting 5 folds for each of 10 candidates totaling 50 fits a.k.a. it's trying 10 iterations of different 125 00:10:22,290 --> 00:10:29,760 combinations of parameters in this grid and splitting each combination five times because save equals 126 00:10:29,760 --> 00:10:32,460 five totalling 50 fits on the data. 127 00:10:32,460 --> 00:10:38,100 So it's going to run this fit function 50 different times using different hyper parameters on different 128 00:10:38,220 --> 00:10:44,740 splits of the data so we can see here the first bullet it's trying is an estimate as equals twelve hundred 129 00:10:44,980 --> 00:10:48,590 because this is one of our hyper parameters up here. 130 00:10:48,650 --> 00:10:54,500 Remember each split each iteration it's going to pick combinations of these at random. 131 00:10:54,600 --> 00:11:03,510 It's also set min samples split to six here Min samples leave to two max features the square root and 132 00:11:03,510 --> 00:11:05,350 Max depth to five. 133 00:11:05,580 --> 00:11:06,660 Okay. 134 00:11:06,660 --> 00:11:11,550 And then if we were to scroll through here we can see all of the different combinations it's tried so 135 00:11:11,550 --> 00:11:12,910 let's pick one at random. 136 00:11:13,140 --> 00:11:14,300 This one here. 137 00:11:14,400 --> 00:11:21,120 So this combination is trying an estimate as equals 200 means samples of split equals six Min samples 138 00:11:21,120 --> 00:11:25,980 leaf equals two max features equals square root and Max depth equals none. 139 00:11:26,180 --> 00:11:33,640 And then it's gonna keep going until it's finished it's gonna give us a warning here so let's see once 140 00:11:33,640 --> 00:11:34,750 it's finished. 141 00:11:34,750 --> 00:11:39,220 What we'll be able to do is call best programs on it. 142 00:11:39,220 --> 00:11:46,630 So if we go r s CnF not best parameter and this is going to show us the combination of hyper parameters 143 00:11:47,200 --> 00:11:51,090 which combination of these got the best results. 144 00:11:55,060 --> 00:11:56,130 Wonderful. 145 00:11:56,170 --> 00:12:04,690 So where an estimate is equals 200 Min samples split equals 6 mean samples leaf equals 2 max features 146 00:12:04,690 --> 00:12:08,180 equals square root and Max depth equals none. 147 00:12:08,230 --> 00:12:16,060 They were the best cross validated results across 10 different models and now when we call predate on 148 00:12:16,630 --> 00:12:21,580 our randomize search classifier by default it's going to use these parameters. 149 00:12:21,600 --> 00:12:27,970 So instead of finding these by hand randomize search CV has found them for us. 150 00:12:28,240 --> 00:12:28,940 So let's do it. 151 00:12:28,960 --> 00:12:34,390 Let's make predictions with the best hybrid parameters 152 00:12:38,800 --> 00:12:47,300 what we might do are as y parades eagles are ACL left up predict we use we could use the validation 153 00:12:47,300 --> 00:12:54,740 said here but in our case we're gonna use the test set and then we'll go evaluate the predictions 154 00:12:58,670 --> 00:13:02,740 IRS metrics Eagles evaluate 155 00:13:05,660 --> 00:13:10,090 why test are X Y parades 156 00:13:16,190 --> 00:13:22,800 well now function is supposed to be evaluate parades there we go wonderful. 157 00:13:22,930 --> 00:13:28,320 So if we come back up here did we see an improvement here. 158 00:13:28,320 --> 00:13:30,840 No we didn't. 159 00:13:30,840 --> 00:13:37,230 And so this is the sort of where the experimentation comes in hyper parameter tuning you won't always 160 00:13:37,230 --> 00:13:40,650 find an improvement after running something like this. 161 00:13:40,650 --> 00:13:45,660 Maybe we could run it for longer and try 50 different combinations. 162 00:13:45,660 --> 00:13:51,360 And after it's tried 50 different combinations it will find some parameters which end up being better 163 00:13:51,360 --> 00:13:54,210 than our manually tuned result. 164 00:13:54,210 --> 00:14:00,030 But what I hope you're saying to see is that using randomized search CV rather than running through 165 00:14:00,030 --> 00:14:08,310 all these different settings by hand it gives us a way to codify or function eyes to the tuning of hyper 166 00:14:08,310 --> 00:14:09,690 parameters. 167 00:14:09,900 --> 00:14:15,390 And so now there's one more way we can use to improve our models hyper parameters and it's with grid 168 00:14:15,390 --> 00:14:16,500 search TV. 169 00:14:16,860 --> 00:14:19,700 So it's kind of similar to randomize search but it's got one. 170 00:14:19,710 --> 00:14:21,140 One key difference. 171 00:14:21,240 --> 00:14:22,560 We'll have a look at that in the next video.