1 00:00:00,190 --> 00:00:05,580 It's in the last video we left off saying that these metrics had precision recall and f1 score are only 2 00:00:05,580 --> 00:00:10,220 calculated using one split train test split. 3 00:00:10,410 --> 00:00:16,450 So only calculated using one but we want to prefer using cross validation where possible. 4 00:00:16,500 --> 00:00:18,150 So that's what we're going to do. 5 00:00:18,180 --> 00:00:33,120 Let's make a little heading we'll go calculate evaluation metrics using cross validation. 6 00:00:33,180 --> 00:00:33,560 All right. 7 00:00:33,570 --> 00:00:47,650 Here we're going to calculate precision recall in f1 score of our model using cross validation and to 8 00:00:47,650 --> 00:00:51,600 do so we'll be using. 9 00:00:52,340 --> 00:00:54,850 Let me put this one in code cross Val school. 10 00:00:54,910 --> 00:00:58,240 So we've seen this one in the in the socket line section. 11 00:00:58,260 --> 00:01:05,410 Cross Val school and if we look it up here as we always want to do so cross Val school what's it say 12 00:01:06,860 --> 00:01:08,040 so it takes an estimate. 13 00:01:08,070 --> 00:01:13,110 It takes an X it takes a y evaluate a school by cross validation the thing we want to pay attention 14 00:01:13,110 --> 00:01:14,830 to here is scoring. 15 00:01:14,910 --> 00:01:21,240 So if we get a scoring a string callable a string or score callable object function with signature which 16 00:01:21,240 --> 00:01:23,110 should return only a single value. 17 00:01:23,110 --> 00:01:23,390 OK. 18 00:01:23,400 --> 00:01:30,300 So this is the parameter we're going to use to evaluate our model using cross validation with different 19 00:01:30,300 --> 00:01:31,020 metrics. 20 00:01:31,020 --> 00:01:36,870 So we're gonna change the scoring string so let's say that we'll set up a new logistic regression model 21 00:01:36,870 --> 00:01:39,020 instance with our best type of parameters. 22 00:01:39,190 --> 00:01:40,690 What were our best hybrid. 23 00:01:40,720 --> 00:01:46,950 So let's go check the hybrid parameters because we want to set up one of the best models that we can 24 00:01:47,010 --> 00:01:50,240 and then use cross validation to evaluate it. 25 00:01:50,380 --> 00:01:55,880 G.S. log rig dot based programs. 26 00:01:56,010 --> 00:01:56,640 Wonderful. 27 00:01:57,150 --> 00:02:06,900 So now we're going to create a classifier go here create a new classifier with base parameters that 28 00:02:06,900 --> 00:02:16,400 we've found so CnF equals logistic regression Sea Eagles we're going to just copy that here. 29 00:02:16,440 --> 00:02:17,570 Wonderful. 30 00:02:18,170 --> 00:02:20,570 Whoops I forgot the 0. 31 00:02:20,580 --> 00:02:24,300 That's a very long number there and then solver equals. 32 00:02:24,330 --> 00:02:28,740 Now again these parameters might be different if you've done your own hype of hammering a tuning and 33 00:02:28,740 --> 00:02:32,090 found some better ones but that will do for us for now. 34 00:02:32,100 --> 00:02:32,550 Wonderful. 35 00:02:32,580 --> 00:02:37,710 So now we've instantiated a model with the best high parameters let's use Crossrail score along with 36 00:02:37,710 --> 00:02:42,120 the scoring parameter to get some cross validated metrics. 37 00:02:42,150 --> 00:02:45,990 So what we do I've actually forgotten here. 38 00:02:46,230 --> 00:02:48,780 We're going to calculate yes there we go. 39 00:02:48,810 --> 00:02:50,710 So we want accuracy here. 40 00:02:51,030 --> 00:02:54,240 Cross validated accuracy. 41 00:02:55,080 --> 00:03:00,420 So we might do four cells and we go here cross validated precision 42 00:03:04,110 --> 00:03:09,380 cross validated recall and then cross validated 43 00:03:12,080 --> 00:03:14,390 EF 1 school. 44 00:03:14,400 --> 00:03:15,460 Wonderful. 45 00:03:15,510 --> 00:03:16,820 So let's do that. 46 00:03:16,830 --> 00:03:25,860 So we got gonna CV ACH which stands for cross validation accuracy equals cross vowel score and we're 47 00:03:25,860 --> 00:03:29,440 going to pass it out classifier which we just instantiated here. 48 00:03:29,700 --> 00:03:35,820 We're gonna pass it all of the X data because we can now because we're using cross vowel school and 49 00:03:35,820 --> 00:03:46,270 all of the y data and we're gonna set CV to five and we're gonna set scoring to be accuracy beautiful 50 00:03:46,670 --> 00:03:52,540 and then we're gonna check out what CV act looks like and we can take the main of this because remember 51 00:03:52,600 --> 00:03:58,810 but what is it done it's evaluated our model over five different splits. 52 00:03:58,810 --> 00:04:04,880 So if we take the mean of it all right that's gonna get the average accuracy across these five different 53 00:04:04,880 --> 00:04:05,900 splints. 54 00:04:05,900 --> 00:04:06,680 So let's do that. 55 00:04:06,680 --> 00:04:12,720 So we go empty You don't mean save a wonderful. 56 00:04:12,790 --> 00:04:14,230 So that's our accuracy there. 57 00:04:14,290 --> 00:04:22,580 So we might say that actually you might just override our value and then we'll go save a beautiful. 58 00:04:22,650 --> 00:04:25,140 And now let's do the sign with these elements. 59 00:04:25,320 --> 00:04:30,530 We'll go here we could just copy the code here again. 60 00:04:30,540 --> 00:04:33,220 You don't want to get into the habit of copying code. 61 00:04:33,300 --> 00:04:37,020 We should really function like this but we're just going to keep rolling with the punches what we're 62 00:04:37,020 --> 00:04:37,640 doing now. 63 00:04:38,370 --> 00:04:43,670 And by functional is this I mean because we're doing relatively similar calculations the whole way through. 64 00:04:43,890 --> 00:04:46,100 We could just make it function. 65 00:04:46,230 --> 00:04:58,250 So that is the precision might override this to be precision equals MP main and then see it again wonderful. 66 00:04:58,540 --> 00:05:04,300 And then we could do the same thing here except for recall what we might do is be a little bit tricky 67 00:05:05,250 --> 00:05:10,650 put in spaces here when you hold command you can put little curses here. 68 00:05:10,680 --> 00:05:19,910 So if we just change this old to recall how happy days look at that I got five cases on the go recall 69 00:05:20,270 --> 00:05:28,210 one for oh that's a nice recall score cross validated as well we might do the same for F one and we 70 00:05:28,290 --> 00:05:34,590 go here we're gonna change all of these bad boys to F one. 71 00:05:34,860 --> 00:05:35,620 There we go. 72 00:05:35,910 --> 00:05:37,290 We might put a little space. 73 00:05:37,320 --> 00:05:40,830 So our code is looking polyphonic wonderful. 74 00:05:40,830 --> 00:05:41,740 All right. 75 00:05:41,800 --> 00:05:47,300 And so now we've got all these cross validation metrics across validated metrics that our boss is requesting. 76 00:05:47,640 --> 00:05:52,320 It's not really good to just happen Pierson and Jupiter and I book like all of our other valuable staffers 77 00:05:52,440 --> 00:05:57,270 in a nice neat little table is our classification matrix that looks really good in a presentation. 78 00:05:57,270 --> 00:06:02,260 You know this is something that someone could look at and go yep I can see that value is pretty high. 79 00:06:02,340 --> 00:06:03,700 I can see what's happening here. 80 00:06:04,410 --> 00:06:05,300 Let's do the same. 81 00:06:05,310 --> 00:06:10,350 Rather than just having them all spread out let's pull into a graph of sorts or a visualization. 82 00:06:10,740 --> 00:06:16,100 So we go visualize our cross validated metrics. 83 00:06:17,140 --> 00:06:22,440 So we're going to create save a metrics as a data frame. 84 00:06:22,440 --> 00:06:27,080 I mean yeah we've probably really should have functional this to begin with but that's right. 85 00:06:27,300 --> 00:06:34,890 Sometimes you have to just go through the old fashioned way and then realize why you've gone wrong and 86 00:06:34,890 --> 00:06:39,090 see how you could improve and maybe that's a little extension you could try yourself and see how you 87 00:06:39,090 --> 00:06:41,530 could function or some of what we're doing here. 88 00:06:41,610 --> 00:06:46,830 Maybe I just cross validate some X and Y using different metrics and puts it all into a nice little 89 00:06:46,830 --> 00:06:57,990 presentation all in one hit that would be a good practice save a recall and then we go if one is going 90 00:06:57,990 --> 00:07:09,490 to be our CV F one wonderful we need to set an index equals zero beautiful and then we can create a 91 00:07:09,490 --> 00:07:11,920 plot save a matrix. 92 00:07:12,010 --> 00:07:13,000 Now I've done this before. 93 00:07:13,090 --> 00:07:13,870 Spoiler alert. 94 00:07:13,990 --> 00:07:16,100 So I know I need to transpose it. 95 00:07:16,160 --> 00:07:17,270 Bah. 96 00:07:17,270 --> 00:07:25,110 Title equals cross validated classification metrics. 97 00:07:25,180 --> 00:07:27,400 And do we want a legend. 98 00:07:27,400 --> 00:07:28,300 No we don't. 99 00:07:28,300 --> 00:07:29,220 False. 100 00:07:29,260 --> 00:07:30,850 Let's see what that looks like. 101 00:07:30,880 --> 00:07:31,270 Beautiful. 102 00:07:31,270 --> 00:07:33,710 We're gonna put a little semicolon here. 103 00:07:33,730 --> 00:07:34,990 There we go. 104 00:07:35,020 --> 00:07:39,910 So maybe we could probably add like the numbers on hand so people know exactly what they are but we 105 00:07:39,910 --> 00:07:42,370 see our models doing pretty well on recall. 106 00:07:42,460 --> 00:07:49,540 And again we could probably add our cross validation results from our random forest model or our Kenya's 107 00:07:49,540 --> 00:07:55,390 neighbors classifier or any other classification model that we may have tried during our experimentation 108 00:07:55,390 --> 00:07:55,960 phase. 109 00:07:56,010 --> 00:07:59,590 Remember this experimentation phase is just it's an iterative process right. 110 00:07:59,590 --> 00:08:04,940 Just going back and back and back and forth through here so we could add that to here and see which 111 00:08:04,940 --> 00:08:05,810 model performs best. 112 00:08:05,810 --> 00:08:12,260 But for now we're sticking with logistic regression and this looks like something that we could share. 113 00:08:12,260 --> 00:08:14,510 So what do we have left now. 114 00:08:14,810 --> 00:08:18,940 Well let's go back out to where we put down what we've what we've ticked off. 115 00:08:18,950 --> 00:08:21,950 Have we ticked off everything the things our boss was asking. 116 00:08:21,950 --> 00:08:23,890 The things we previously didn't know how to do. 117 00:08:23,900 --> 00:08:28,260 But now we've seen them in action. 118 00:08:28,410 --> 00:08:28,830 All right. 119 00:08:29,440 --> 00:08:37,040 So we've got cross validation precision recall Yep classification report tick ROIC curve tick area and 120 00:08:37,060 --> 00:08:39,190 look have fusion matrix. 121 00:08:39,200 --> 00:08:39,700 Yes. 122 00:08:39,860 --> 00:08:43,790 The thing we're missing out on here is feature importance. 123 00:08:43,800 --> 00:08:44,710 Mm hmm. 124 00:08:44,740 --> 00:08:48,890 All right well let's see how we do feature importance in the next video.