1 00:00:00,480 --> 00:00:01,110 OK. 2 00:00:01,170 --> 00:00:07,930 And the last video we trying to model on a subset of data a.k.a. allowing each estimate. 3 00:00:08,070 --> 00:00:15,060 So one hundred estimate is to see ten thousand different data points rather than all four hundred thousand 4 00:00:15,180 --> 00:00:15,870 or so. 5 00:00:15,870 --> 00:00:19,410 And we saw a fairly big decrease in computing time. 6 00:00:19,530 --> 00:00:27,190 So our model try and faster now what we might want to do is because we can experiment a bit fast and 7 00:00:27,200 --> 00:00:35,940 now is try to find some ideal hyper parameters that are a bit better than what our random reforest regrets 8 00:00:35,940 --> 00:00:37,890 are defaults with. 9 00:00:37,920 --> 00:00:39,140 So let's have a look. 10 00:00:39,330 --> 00:00:45,630 And if you've had a think about it you might have come across or thought of the idea of churning high 11 00:00:45,630 --> 00:00:52,470 parameter tuning with randomized search CV something that we've looked at in a previous project in some 12 00:00:52,470 --> 00:00:53,920 other previous videos. 13 00:00:53,940 --> 00:00:56,100 So that's what we're going to have a look at. 14 00:00:56,160 --> 00:01:01,290 And so wondering what randomize search CV is remember we can always look at the docs but what we'll 15 00:01:01,290 --> 00:01:04,910 do is actually maybe we'll just create a cell. 16 00:01:04,920 --> 00:01:10,860 Let's have a look at the code first from S.K. learn how would we import it model selection 17 00:01:14,040 --> 00:01:20,710 randomized search CV and remember whatever machine learning model you're working with. 18 00:01:20,720 --> 00:01:28,940 You can always go insert the model name so in our case random forest regress are hyper parameter churning 19 00:01:30,570 --> 00:01:36,410 and this will give you some information on how to share in your Random Forests regress our or your random 20 00:01:36,410 --> 00:01:44,980 forest in general so that's one way to do it the other way is from the way that we've seen is randomize 21 00:01:44,980 --> 00:01:48,080 search CV are actually let's be real here. 22 00:01:48,090 --> 00:01:54,660 There are many different ways to do it but if I remember what randomize search CV does if not that's 23 00:01:54,660 --> 00:01:57,510 okay randomize search CV 24 00:02:00,850 --> 00:02:02,480 go the documentation. 25 00:02:02,720 --> 00:02:08,710 So random I search on hype of remnants basically we create a parameter distribution which kind of looks 26 00:02:08,710 --> 00:02:15,410 like a dictionary and randomize search CV is going to take our model or estimate or in socket loan terms 27 00:02:15,830 --> 00:02:21,960 and search across this Perak distributions or dictionary for the best type of parameters for our model. 28 00:02:21,980 --> 00:02:23,030 So that's what we're going to create. 29 00:02:23,660 --> 00:02:35,270 So different random forest regress are hybrid parameters and we'll go here are F grid nice and simple 30 00:02:35,270 --> 00:02:36,170 for the dictionary. 31 00:02:36,170 --> 00:02:41,370 This is the parameter distribution that I was talking about and we might go over an estimate. 32 00:02:42,020 --> 00:02:44,480 Remember if you're wondering where these came from. 33 00:02:44,480 --> 00:02:49,550 Don't forget you can always search up here random forest regress our revenue tuning and you'll find 34 00:02:49,820 --> 00:02:55,460 a fair bit of information on basically almost any machine learning model that you can think of of different 35 00:02:55,460 --> 00:02:57,170 ways of how to tune the hybrid remnants. 36 00:02:57,170 --> 00:02:59,810 Because again this is not a perfect science. 37 00:02:59,810 --> 00:03:03,870 This is a whole bunch of trial and error trying to figure out how things are going. 38 00:03:03,930 --> 00:03:11,450 And so what I'm doing here is I'm setting up a number of different values that randomize search TV is 39 00:03:11,450 --> 00:03:12,590 going to go through. 40 00:03:12,680 --> 00:03:22,340 Try a certain number of calculations or combinations of them and then figure out which ones achieve 41 00:03:22,370 --> 00:03:24,240 the best results because that's what we're after right. 42 00:03:24,260 --> 00:03:27,700 We're after a model that boost us on the cable leaderboard. 43 00:03:27,710 --> 00:03:32,300 That's what we're trying to do with our default model with no churn type and parameters we're sitting 44 00:03:32,540 --> 00:03:35,500 and only 10000 examples mind you that's the bigger one there. 45 00:03:36,320 --> 00:03:37,870 We're sitting at around eight here. 46 00:03:38,000 --> 00:03:39,500 So that's not bad. 47 00:03:39,500 --> 00:03:41,860 So what else are we turning here. 48 00:03:41,870 --> 00:03:45,150 We've got a number of estimate as max depth. 49 00:03:45,170 --> 00:03:51,650 These are all different parameters of the random forest means samples split mean samples LEAF AND THEN 50 00:03:51,650 --> 00:03:53,410 WE'LL GO MAX features. 51 00:03:53,450 --> 00:04:01,370 That's another good one zero point five one square root order. 52 00:04:01,700 --> 00:04:10,100 And then we also might go Max samples so we might also set this to 10000 because we saw that before 53 00:04:10,580 --> 00:04:16,900 Max samples increase how fast our model can find patterns in the data because it's only looking at 10000 54 00:04:16,970 --> 00:04:19,550 random examples rather than 400000. 55 00:04:19,550 --> 00:04:21,220 That's in our training set. 56 00:04:21,260 --> 00:04:28,550 So in the spirit of trying to reduce the time between experiments we'll set that to that instantiate 57 00:04:29,990 --> 00:04:33,650 randomized search CV model. 58 00:04:34,010 --> 00:04:44,010 We'll call this RSS model really creatively randomized search CV and we'll pass it a random forest regress 59 00:04:44,020 --> 00:04:52,970 the with end jobs equal negative one and also random state equal to 42 for consistency remember that's 60 00:04:52,970 --> 00:04:58,970 similar to setting any random seed and then we're going to pass it our param distributions which is 61 00:04:59,060 --> 00:05:07,070 our F grid which is what we've just instantiated up here and and ETA we usually do it for about 20 or 62 00:05:07,070 --> 00:05:07,490 so. 63 00:05:07,550 --> 00:05:12,350 Well I like to do it for about 20 or so for the first time but we might change that we might see why 64 00:05:12,350 --> 00:05:19,890 CV equals five member that's cross validation with those ego's true and the reason why we're going to 65 00:05:19,890 --> 00:05:22,410 reduce this probably only do it. 66 00:05:22,410 --> 00:05:23,280 How much time do we have. 67 00:05:23,280 --> 00:05:24,810 Maybe only two. 68 00:05:24,840 --> 00:05:35,210 Reason being as you might well what has happened here to maybe that read really fast. 69 00:05:35,500 --> 00:05:37,750 I'm not sure why would you like that in a second but that's a right. 70 00:05:37,750 --> 00:05:45,250 Maybe five five is enough because I've already gone through this this process but I've set an it to 71 00:05:45,280 --> 00:05:50,230 100 but it took like two and a half hours on my machine so you could set this to whatever you want whenever 72 00:05:50,230 --> 00:05:56,350 you have time for but start low because if we were to set this to 100 this cell might run for a couple 73 00:05:56,350 --> 00:06:01,360 of hours and say well if we were going home we might set this to run for 100 and run overnight or something 74 00:06:01,360 --> 00:06:01,740 like that. 75 00:06:02,410 --> 00:06:05,280 Or if we had a lunch break we might set it to run over our lunch break. 76 00:06:05,280 --> 00:06:09,730 That's what I used to do when I was working at a technology company in my city I would run models over 77 00:06:09,730 --> 00:06:15,820 my lunch break and while I was eating food and the computer was just calculating away in our case we're 78 00:06:15,820 --> 00:06:20,160 just gonna set this to five and what we'll do. 79 00:06:20,260 --> 00:06:23,250 That's why it ran so long because we didn't even fit it. 80 00:06:23,260 --> 00:06:26,820 That's why this time is so small as model fit. 81 00:06:26,830 --> 00:06:32,830 So what this is going to do actually two's enough then because I thought this confuse me but I realized 82 00:06:32,830 --> 00:06:34,810 we didn't fit anything that's why it went so fast. 83 00:06:36,760 --> 00:06:38,770 So is what we can do. 84 00:06:38,850 --> 00:06:44,850 We go fit the randomized search saving model. 85 00:06:44,850 --> 00:06:46,120 What this is going to do. 86 00:06:46,320 --> 00:06:51,810 It's going to fit two iterations so randomize said she is gonna pick two different combinations of parameters 87 00:06:51,810 --> 00:06:52,540 from here. 88 00:06:52,800 --> 00:06:57,060 Assign them to the Al random forest or aggressor and then fit it to the training data. 89 00:06:57,060 --> 00:07:02,110 And remember it's only on 10000 samples for the maximum and it's gonna tell us while we'll be out to 90 00:07:02,110 --> 00:07:05,070 find out in a second which are the best parameters. 91 00:07:05,070 --> 00:07:10,890 So because we're doing two iterations and we're doing a CV value of five there's still going to be 10 92 00:07:10,890 --> 00:07:11,670 different fits. 93 00:07:12,270 --> 00:07:14,360 So this might take a minute or so. 94 00:07:14,360 --> 00:07:20,070 So we might time travel again and I know we said we can only time travel every so often but we unlocked 95 00:07:20,070 --> 00:07:24,650 a superpower going through this project because we're doing so well with time travel until the cell 96 00:07:24,660 --> 00:07:32,160 is done and we're back so that took my macro program just under two minutes right. 97 00:07:32,160 --> 00:07:38,180 To try out two different iterations with five fold cross validation across a bunch of different hyper 98 00:07:38,190 --> 00:07:43,770 parameters and now even though this took about two minutes it's still far less than our original seven 99 00:07:43,770 --> 00:07:50,040 minute model trial that we did right and a few videos ago because we've set Max samples to equal 10000 100 00:07:50,880 --> 00:08:00,370 and now because we have a trained model with randomized search CV we can find the best model hybrid 101 00:08:00,390 --> 00:08:03,460 parameters that randomize search CV found for us. 102 00:08:03,540 --> 00:08:05,430 And remember we've only sent this to two. 103 00:08:05,460 --> 00:08:10,230 So if you have more time or more compute power you might set this to something like 100 and find the 104 00:08:10,230 --> 00:08:15,200 best parameters from that but we're only doing two for this case when we go. 105 00:08:15,210 --> 00:08:21,150 Here are a model dot based programs it's gonna tell us. 106 00:08:21,170 --> 00:08:25,180 Okay so this is what the best parameters that it's found. 107 00:08:25,300 --> 00:08:31,520 Ninety estimate is mean sample split 16 means samples leave 10000 Max samples because it was only 1 108 00:08:31,520 --> 00:08:37,160 value Max features auto max def 10 and now remember these might not be the best model hybrid Romney's 109 00:08:37,160 --> 00:08:44,170 because we've only searched for 2 different combinations but we're going to work with them for now evaluate 110 00:08:45,240 --> 00:08:52,700 the randomized search model show scores our beautiful show scores function we're gonna pass at the IRS 111 00:08:52,700 --> 00:08:58,520 model now we may or may not see an improvement here in our evaluation metrics that we've used before 112 00:08:59,760 --> 00:09:00,100 yeah. 113 00:09:00,130 --> 00:09:02,300 So we actually see that this has worsened. 114 00:09:02,300 --> 00:09:03,040 That's okay right. 115 00:09:03,080 --> 00:09:10,340 Because as I said we've only tried two different combinations and now in a past notebook in a past project 116 00:09:10,370 --> 00:09:19,070 when I was setting up for this project I have actually as I said tried an ETA equals 100 and I took 117 00:09:19,070 --> 00:09:23,600 a little break and I went in had some food because it took my computer just over two hours and so we'll 118 00:09:23,600 --> 00:09:28,940 have a look at the best type of parameters my computer found in two hours of searching and we'll have 119 00:09:28,940 --> 00:09:31,670 a look at how they go in the next video.