1 00:00:00,560 --> 00:00:01,280 Wonderful. 2 00:00:01,280 --> 00:00:07,430 So now we have hyper parameter grades for our logistic regression model as well as our random forest 3 00:00:07,430 --> 00:00:08,960 classifier. 4 00:00:08,960 --> 00:00:13,250 Let's turn each of them using randomized search CV. 5 00:00:13,280 --> 00:00:15,760 So now we've got just repeat what we just said. 6 00:00:15,910 --> 00:00:23,980 Got a of parameter reads set up for each of our models. 7 00:00:24,050 --> 00:00:33,250 Let's change them using randomized search CV will change that into markdown. 8 00:00:33,260 --> 00:00:33,860 Wonderful. 9 00:00:34,100 --> 00:00:37,610 So first up we're going to change logistic regression. 10 00:00:37,870 --> 00:00:41,810 Now at the moment logistic regression model is scroll back up. 11 00:00:42,160 --> 00:00:44,060 He's getting the best results. 12 00:00:44,270 --> 00:00:50,990 Just on a single train and test split but let's see if those results carry through to a cross validation 13 00:00:50,990 --> 00:00:54,260 split which is what randomize search TV is gonna help us with. 14 00:00:54,580 --> 00:01:01,220 So during logistic regression we'll set up a random seed so that our results are reproducible and then 15 00:01:01,220 --> 00:01:11,770 we're going to go set up random hyper parameter search for logistic regression. 16 00:01:11,770 --> 00:01:17,720 So ask for random search log rig for logistic regression and that's this log rate grid that's short 17 00:01:17,720 --> 00:01:26,510 for logistic regression that's short for random forest here we're gonna get a randomized search save 18 00:01:26,520 --> 00:01:35,440 a logistic regression wonderful and then we're going to go program distribution 19 00:01:38,130 --> 00:01:45,480 which is equal to our log read grid then we're going to use a cross validation of 5 which is similar 20 00:01:45,480 --> 00:01:50,910 to what we've seen here so five fold cross validation you could use ten fold you could use twenty fold 21 00:01:51,180 --> 00:01:54,870 what you should know is that the higher this number the longer it's going to take. 22 00:01:55,200 --> 00:01:59,410 So because it has to build a model for each and every single set. 23 00:01:59,490 --> 00:02:03,510 So 5 is generally a pretty good value to use. 24 00:02:03,510 --> 00:02:09,570 Again you might want to test that experimentally depending on the size of your data number of it as 25 00:02:09,660 --> 00:02:10,770 equals 20. 26 00:02:10,770 --> 00:02:17,430 So that means an Ita is means we're going to try 20 different combinations which I think will be every 27 00:02:17,430 --> 00:02:21,390 single one of them because we've only got 20 different values for C here. 28 00:02:21,390 --> 00:02:27,700 But that's okay so we're going to go then for both equals true which is just means it's going to output 29 00:02:28,480 --> 00:02:30,160 a few things for us. 30 00:02:30,250 --> 00:02:41,890 Now with that we can go for it random hybrid parameter search model for logistic regression. 31 00:02:42,160 --> 00:02:47,440 So this is going to do is call randomized search the way it's gonna cross validate a logistic regression 32 00:02:47,440 --> 00:02:51,240 model five times for 20 iterations. 33 00:02:51,250 --> 00:02:59,760 So we should have about 100 or so examples one hundred or so different hex five times 20 is one hundred. 34 00:03:00,040 --> 00:03:05,380 We're going to fit this on the training data wonderful 35 00:03:08,470 --> 00:03:12,530 Oh no for RAM distribution. 36 00:03:13,000 --> 00:03:15,380 Our shunt. 37 00:03:15,420 --> 00:03:16,010 There we go. 38 00:03:16,020 --> 00:03:21,780 Fitting five models for each of the 20 candidates totalling one hundred fits well and that's done you 39 00:03:21,780 --> 00:03:28,430 see ah well it's zoomed in here so you can't really say it's going to tell us what the parameters are 40 00:03:28,430 --> 00:03:33,410 that it's used and see here we can see parameter attributions see is all these different values here 41 00:03:33,770 --> 00:03:41,630 solver is lived linear let's zoom back in because it's easier to say and now we can check the best parameters 42 00:03:41,900 --> 00:03:49,270 by going RSA log rig dot best programs is solve a label anything that makes sense. 43 00:03:49,270 --> 00:03:52,980 And the C value is zero point two three three wonderful. 44 00:03:52,990 --> 00:04:04,590 And so now let's evaluate log rig dot score on our test data eighty eight point five accuracy so if 45 00:04:04,590 --> 00:04:09,090 we go back up here did we beat our original result eighty eight point five. 46 00:04:09,380 --> 00:04:14,900 No we've got equal to it that logistic regression model out of the box must be pretty good because we've 47 00:04:14,900 --> 00:04:17,460 only turned to hyper parameters. 48 00:04:17,460 --> 00:04:22,110 You could look out more different values that we could change but we're gonna leave that here for the 49 00:04:22,110 --> 00:04:26,240 time being we're going to see what it's like to churn out random forest classifier. 50 00:04:26,240 --> 00:04:28,540 So we're putting a little commentary here. 51 00:04:28,590 --> 00:04:35,330 So now we've changed logistic regression 52 00:04:38,430 --> 00:04:43,660 let's do the same for random forest classify 53 00:04:46,990 --> 00:04:53,580 marked down wonderful and we're gonna do the same thing again set up random seed and we could just copy 54 00:04:54,360 --> 00:05:00,360 the code from here and just change the variables we're in the habit of writing it out you know random 55 00:05:00,360 --> 00:05:11,280 seed we're gonna set that to 42 our favorite number so set up random type of parameter search for random 56 00:05:11,610 --> 00:05:12,950 forest classifier. 57 00:05:13,020 --> 00:05:14,110 Wonderful. 58 00:05:14,110 --> 00:05:20,420 We're going to R S RF Eagles randomized search save a wonderful. 59 00:05:20,580 --> 00:05:26,520 We're going to put in our random forest classifier then we're gonna pass it param we'll just use Tableau 60 00:05:26,520 --> 00:05:28,230 to complete so we're not doing typos here. 61 00:05:28,230 --> 00:05:30,360 Daniel come on like we did before. 62 00:05:30,420 --> 00:05:35,730 So we're gonna pass the RF grid which is the random forest grid so I'll hyper parameter grid up here 63 00:05:36,770 --> 00:05:38,970 so we'll come down here. 64 00:05:39,080 --> 00:05:39,830 What's next. 65 00:05:39,830 --> 00:05:45,350 Say we're going to set that to five so five fold cross validation number of iterations is going to be 66 00:05:45,350 --> 00:05:46,030 20 again. 67 00:05:46,040 --> 00:05:50,570 So we're going to try 20 different combinations of these hyper parameters. 68 00:05:50,690 --> 00:05:53,350 Now if we were to add these all up. 69 00:05:53,540 --> 00:06:02,730 So this is 50 times for two hundred times two and or actually yeah times the space between here. 70 00:06:02,730 --> 00:06:04,250 So this is gonna be a lot more. 71 00:06:04,350 --> 00:06:05,160 Let's just leave it at that. 72 00:06:05,180 --> 00:06:09,600 It's gonna be a lot more different combinations but we're only going to try 20 of them so randomly try 73 00:06:09,600 --> 00:06:11,710 20 of them. 74 00:06:12,030 --> 00:06:15,340 So verbose equals true. 75 00:06:15,520 --> 00:06:16,710 Here we go. 76 00:06:16,720 --> 00:06:25,850 Fit random type of parameter search model for random forest classifier. 77 00:06:25,860 --> 00:06:26,980 Wonderful. 78 00:06:27,270 --> 00:06:35,220 And we'll get our S F which stands for random search random forest fit we'll do it on the training data. 79 00:06:35,560 --> 00:06:36,990 Why train. 80 00:06:36,990 --> 00:06:40,540 Wonderful what's happened here we've got something wrong again 81 00:06:44,580 --> 00:06:53,410 invalid parameter main samples lead a type header the classic main samples lead. 82 00:06:53,490 --> 00:06:55,310 You notice that when I was going through it's all right. 83 00:06:55,320 --> 00:06:56,940 If he didn't because I didn't notice it either. 84 00:06:58,020 --> 00:06:58,880 Now run this. 85 00:06:58,890 --> 00:06:59,640 So this should work. 86 00:06:59,640 --> 00:07:01,210 This might actually take a little while. 87 00:07:02,030 --> 00:07:03,100 I hope not. 88 00:07:03,210 --> 00:07:09,180 If it does I'll pause the video and come back when it's finished well we just finished up so that took 89 00:07:09,180 --> 00:07:14,310 about a minute or so on my computer it will also depend on how fast your computer's processing speed 90 00:07:14,310 --> 00:07:19,260 is is how fast it goes through through this sort of process it does have to build 100 different models. 91 00:07:19,470 --> 00:07:21,510 So let's check it out now that it's finished. 92 00:07:21,510 --> 00:07:27,230 How do we find the best parameters of our random forest random search model. 93 00:07:27,330 --> 00:07:31,200 You might already know this because you just did it with a logistic regression. 94 00:07:31,200 --> 00:07:34,440 It's okay if you don't find the best parameters. 95 00:07:34,440 --> 00:07:39,570 This takes a little bit of practice find the best hot parameters I'll give you a second to think of 96 00:07:39,570 --> 00:07:47,020 it hint it's in a attribute it's an attribute of this one here. 97 00:07:47,100 --> 00:07:49,030 All right let's figure it out. 98 00:07:49,060 --> 00:07:51,880 So we're going to call the best parameter attribute. 99 00:07:51,880 --> 00:07:54,550 And so this is going to show us our best parameters. 100 00:07:54,550 --> 00:07:58,790 So number of estimates is the best results were 210. 101 00:07:58,790 --> 00:08:05,620 This means samples split was for the best means samples leaf was 19 and the best maximum death was three. 102 00:08:05,630 --> 00:08:07,850 But that was only 20 different models. 103 00:08:08,050 --> 00:08:13,320 So 20 candidates we could increase this to to 100 or something that would take a little bit longer. 104 00:08:14,380 --> 00:08:16,570 But in our case 20 is enough. 105 00:08:16,570 --> 00:08:19,150 We're focusing on the principle here. 106 00:08:19,150 --> 00:08:22,860 We could revisit this in a second to try and improve our results. 107 00:08:22,870 --> 00:08:27,080 But what we're trying to do here is just minimize our time between experiments. 108 00:08:27,190 --> 00:08:32,750 So let's figure out how random search random forest model went. 109 00:08:33,070 --> 00:08:37,150 So evaluate the randomized search random forest 110 00:08:41,190 --> 00:08:48,990 classifier model are as are at school X test why test how did it do. 111 00:08:49,170 --> 00:08:51,720 Eighty six point eight eight. 112 00:08:52,410 --> 00:08:58,890 Let's go back up here had it perform vs. our original scores. 113 00:08:58,910 --> 00:09:02,020 All we got a little bit of an improvement. 114 00:09:02,380 --> 00:09:03,340 Yes. 115 00:09:03,430 --> 00:09:08,970 So this is our initial random forest model eighty three point six accuracy. 116 00:09:08,980 --> 00:09:15,360 Remember if you times that by 100 but even though we've gone up a little bit we've gone up a few percentage 117 00:09:15,360 --> 00:09:17,530 points to eighty six point eight. 118 00:09:17,640 --> 00:09:19,180 We can check out the original one here. 119 00:09:19,200 --> 00:09:26,520 Model scores our logistic regression model is still our default one is still better than the random 120 00:09:26,520 --> 00:09:28,890 forest and the tuned random forest. 121 00:09:28,890 --> 00:09:33,480 Now of course we might be out to try a different search different high parameter combinations to try 122 00:09:33,480 --> 00:09:36,950 and push out random forest model further. 123 00:09:37,470 --> 00:09:42,030 But what we might do is because we're trying to minimize our time between experiments we might push 124 00:09:42,030 --> 00:09:47,070 on with our logistic regression model so now you see how we're going through this process of elimination 125 00:09:47,520 --> 00:09:50,630 trying to figure out which model works best for us which doesn't work. 126 00:09:50,640 --> 00:09:54,480 And then just slightly incrementing on the one that performs the best. 127 00:09:54,480 --> 00:09:56,670 That's where that's where we're up to now. 128 00:09:56,670 --> 00:10:04,110 We're doing the well and truly in the experimentation phase of this six step framework we're trying 129 00:10:04,110 --> 00:10:06,810 out different models we're trying at different hyper parameters. 130 00:10:07,050 --> 00:10:13,020 So now we've searched better high parameters for our logistic regression model backup here using randomized 131 00:10:13,020 --> 00:10:20,910 search CB what we might do now is use grid search CV which exhaustively searches through hyper parameter 132 00:10:21,240 --> 00:10:25,980 rather than randomly looking at different combinations of high parameters as we saw in the psychic Loan 133 00:10:25,980 --> 00:10:35,400 Section grid search CV and you can look up the documentation for this as loan grid search CV grid search 134 00:10:35,400 --> 00:10:43,890 CV is an exhaustive search over a specified parameter values for an estimate. 135 00:10:43,920 --> 00:10:47,240 So that's what we might do now with our logistic regression model. 136 00:10:47,280 --> 00:10:55,830 There's a couple of steps for high parameter tuning one you could go by hand two randomized search of 137 00:10:55,830 --> 00:11:00,810 a three grid search to really narrow things down. 138 00:11:00,990 --> 00:11:01,650 That's what we're at. 139 00:11:01,640 --> 00:11:06,380 The process we're following now we eliminated came in with training high parameters by hand. 140 00:11:06,510 --> 00:11:10,470 We eliminated our random forest model again we could always revisit these things. 141 00:11:10,470 --> 00:11:17,710 This is part of the experimentation using randomized search Stevie because our logistic regression model 142 00:11:18,220 --> 00:11:21,730 is performing better than now randomized search random forest model. 143 00:11:21,730 --> 00:11:25,620 So now we're going to use grid search CV with our logistic regression model. 144 00:11:25,720 --> 00:11:28,670 So have a little review of what we've done. 145 00:11:28,720 --> 00:11:34,360 See if you could try a different hybrid parameter grid for each of our logistic regression and random 146 00:11:34,360 --> 00:11:39,730 forest in our randomize search CV see if you can try some different parameters maybe increase a number 147 00:11:39,730 --> 00:11:41,070 of iterations. 148 00:11:41,290 --> 00:11:45,550 So if you can push those results a little bit but for now in the next video we're going to focus on 149 00:11:45,610 --> 00:11:49,420 churning out logistic regression model using grid search CV.