1 00:00:00,330 --> 00:00:05,160 Now we've just covered accuracy as a method of evaluating our classification models. 2 00:00:05,160 --> 00:00:07,650 And you might be thinking why can't we just leave it at that. 3 00:00:08,190 --> 00:00:14,250 Well as we start to go through these other metrics here you'll start to understand why it might be important 4 00:00:14,250 --> 00:00:18,810 to get a few more different evaluation metrics on the board rather than just accuracy. 5 00:00:18,900 --> 00:00:23,550 In my case when I was first learning how to build classification models I would always think yeah just 6 00:00:23,640 --> 00:00:25,710 higher accuracy is better right. 7 00:00:25,740 --> 00:00:31,350 And then slowly after looking at different examples I start to realize okay now I see the value of using 8 00:00:31,680 --> 00:00:36,870 other metrics like these but let's not talk about it let's see it in action the next one we're going 9 00:00:36,870 --> 00:00:48,120 to cover is the area under the receiver operating so area under the receiver operating characteristic 10 00:00:48,750 --> 00:00:49,990 curve. 11 00:00:50,220 --> 00:00:57,660 The beautiful thing is it's also known as a U.S. So area under a curve or R O C so receive a operating 12 00:00:57,660 --> 00:00:59,220 characteristic. 13 00:00:59,220 --> 00:01:02,350 So let's do this ROIC curve. 14 00:01:02,400 --> 00:01:09,580 So these are the two things you'll look out for area under curve or rock curve. 15 00:01:09,800 --> 00:01:11,010 We put it here. 16 00:01:11,070 --> 00:01:16,420 You see or rock curve beautiful. 17 00:01:16,420 --> 00:01:20,980 So if you hear someone talking about you say you were ROIC or rock or something like that they're probably 18 00:01:20,980 --> 00:01:22,940 talking about this metric here. 19 00:01:22,990 --> 00:01:25,480 Now what does a rock curve measure. 20 00:01:25,930 --> 00:01:34,140 Well by formal definition a rock curve is a comparison of a model's true positive rate a.k.a. TPR versus 21 00:01:34,140 --> 00:01:35,950 a model's false positive rate. 22 00:01:35,950 --> 00:01:36,910 Let's write that down. 23 00:01:36,910 --> 00:01:42,980 Here are a comparison of the models. 24 00:01:43,270 --> 00:01:53,420 True positive right which is also known as TPR vs. a model's false positive rate which is also known 25 00:01:53,420 --> 00:01:54,640 as FBA. 26 00:01:54,890 --> 00:02:00,830 Now you might be wondering okay what is a true positive and what is a false positive. 27 00:02:00,830 --> 00:02:02,440 Well let's have a look here. 28 00:02:02,560 --> 00:02:10,930 A true positive equals model predicts one when truth is one that makes sense. 29 00:02:10,930 --> 00:02:17,930 So in our case if we have a look up here if we our targets are 1 0 0 a true positive is when our model 30 00:02:17,930 --> 00:02:18,950 predicts a 1. 31 00:02:19,040 --> 00:02:20,990 The real label is a 1. 32 00:02:21,010 --> 00:02:21,540 Okay. 33 00:02:21,700 --> 00:02:23,460 Yeah that makes sense. 34 00:02:23,480 --> 00:02:32,390 So if we go here a false positive is model predicts 1 when truth is 0. 35 00:02:32,390 --> 00:02:34,160 So why is it called false positive. 36 00:02:34,160 --> 00:02:35,940 Well it's because it's predicting 1. 37 00:02:35,960 --> 00:02:36,200 So. 38 00:02:36,200 --> 00:02:36,550 Okay. 39 00:02:36,560 --> 00:02:38,960 The positive class has heart disease. 40 00:02:38,960 --> 00:02:40,820 When the truth is actually 0. 41 00:02:40,820 --> 00:02:44,990 So it's giving us a false positive sense that that a person may have heart disease. 42 00:02:45,110 --> 00:02:48,750 In our particular case for our heart disease classification problem. 43 00:02:48,920 --> 00:02:51,260 And then if we go here and through negative 44 00:02:53,930 --> 00:02:59,650 ego's model predicts zero when truth is zero. 45 00:02:59,690 --> 00:03:00,680 So that makes sense. 46 00:03:00,680 --> 00:03:04,790 So that's predicting the model is getting the right prediction there is predicting someone doesn't have 47 00:03:04,790 --> 00:03:06,680 heart disease when they actually don't. 48 00:03:06,680 --> 00:03:07,880 That's great. 49 00:03:07,880 --> 00:03:12,750 And then a false negative equals model predicts. 50 00:03:13,070 --> 00:03:14,440 What do you think this would be. 51 00:03:14,540 --> 00:03:15,330 False Negative. 52 00:03:15,340 --> 00:03:21,140 We look at false positive a false negative is when model predicts zero. 53 00:03:21,310 --> 00:03:29,150 When truth is one so that's because it's predicting not heart disease when it's actually is heart disease. 54 00:03:29,260 --> 00:03:29,500 Right. 55 00:03:29,500 --> 00:03:35,740 So now we know these we can see that a rock curve is a comparison of a model's true positive rate TPR 56 00:03:36,140 --> 00:03:38,500 vs. typos everywhere. 57 00:03:38,510 --> 00:03:39,440 Daniel come on. 58 00:03:39,640 --> 00:03:42,850 Versus a model's false positive rate or FPL. 59 00:03:42,880 --> 00:03:43,780 So now we know this. 60 00:03:43,780 --> 00:03:51,820 Let's see it in action so we can do this using psychic loans metrics library planning going to import 61 00:03:52,420 --> 00:04:01,680 ROIC curve so see our sign rock of ROIC curve and then we're going to make predictions with probabilities. 62 00:04:01,690 --> 00:04:03,850 And how can we make predictions with probabilities. 63 00:04:03,850 --> 00:04:06,430 We saw this in our making predictions. 64 00:04:06,550 --> 00:04:15,010 We can do that with Y probs equals CnF dot predict Kroeber for probability we're gonna make some predictions 65 00:04:15,070 --> 00:04:17,150 on the X test data. 66 00:04:17,230 --> 00:04:26,410 Now I just want to make sure maybe we do that we create create X test just to make sure it's the right 67 00:04:26,440 --> 00:04:28,660 x test data x test 68 00:04:31,260 --> 00:04:46,280 etc. So we want to go here x train x test y train y test equals train test split x y test size equals 69 00:04:46,370 --> 00:04:49,750 zero point two wonderful. 70 00:04:49,840 --> 00:04:57,340 And so now we can do that there because a rock curve is a comparison of a model's true positive rate 71 00:04:57,340 --> 00:04:59,880 versus a model's false positive rate. 72 00:04:59,920 --> 00:05:02,970 We want to only keep the positive classes. 73 00:05:02,980 --> 00:05:06,620 So actually let's see what y probs looks like. 74 00:05:07,000 --> 00:05:07,680 Why proms. 75 00:05:07,690 --> 00:05:08,880 Why hasn't this worked. 76 00:05:08,890 --> 00:05:09,790 What do we got here. 77 00:05:09,790 --> 00:05:14,520 This random Florence classified is not fitted yet code fit that would make sense. 78 00:05:14,560 --> 00:05:15,310 So we want 79 00:05:18,130 --> 00:05:18,830 the classifier. 80 00:05:18,830 --> 00:05:24,140 Of course you can't make predictions without without fitting them without the model learning any patterns 81 00:05:25,220 --> 00:05:26,160 so we're gonna fit it here. 82 00:05:26,180 --> 00:05:30,110 Then we'll make some predictions and we'll have a look at this maybe only the first 10 so we're not 83 00:05:30,110 --> 00:05:31,580 taking our paper space. 84 00:05:32,370 --> 00:05:32,630 Okay. 85 00:05:32,630 --> 00:05:33,500 Beautiful. 86 00:05:33,530 --> 00:05:41,220 And so because ROIC curve only is a comparison of the models true positive rate versus a false positive 87 00:05:41,220 --> 00:05:46,510 rate we only want probabilities that the model has predicted for the positive class. 88 00:05:46,530 --> 00:05:52,770 So if you imagine here our models trying to predict 0 or 1 This is the probability that the label is 89 00:05:52,770 --> 00:05:58,100 a zero and this is the probability that the label is one and so on for. 90 00:05:58,140 --> 00:06:03,580 For all of these samples until until it finishes up so there's gonna be however many are in the test 91 00:06:03,580 --> 00:06:04,250 set here. 92 00:06:05,660 --> 00:06:07,530 Sixty ones is gonna be sixty one of these. 93 00:06:07,550 --> 00:06:12,170 So essentially what we want is why probs positive 94 00:06:14,900 --> 00:06:20,390 so this is the probabilities that it's the positive class a.k.a. zero is the negative class and one 95 00:06:20,390 --> 00:06:22,020 is the positive class. 96 00:06:22,030 --> 00:06:31,010 Why is we're gonna use some slicing here but only column 1 of every row so that we can do that just 97 00:06:31,010 --> 00:06:34,160 so you can know what's going on. 98 00:06:34,330 --> 00:06:37,880 Positive and we'll only look at the first 10 again. 99 00:06:39,090 --> 00:06:40,260 So does this make sense. 100 00:06:40,260 --> 00:06:42,110 We're getting zero point four three. 101 00:06:42,180 --> 00:06:48,210 Yep zero point seven seven Yep zero point four eight so on so on and so on for the entire list of Y 102 00:06:48,210 --> 00:06:49,020 problems. 103 00:06:49,080 --> 00:07:00,530 And now we can calculate FBR TPR and thresholds gave FBR TPR thresholds. 104 00:07:00,540 --> 00:07:01,280 Why do I know this. 105 00:07:01,290 --> 00:07:07,530 We'll see this in a second why do I know that this is gonna be FBI common TPR common threshold rock 106 00:07:07,530 --> 00:07:08,220 curve. 107 00:07:08,250 --> 00:07:09,090 I'm going to pass it. 108 00:07:09,090 --> 00:07:11,280 Why test and why. 109 00:07:11,280 --> 00:07:14,680 Probs positive. 110 00:07:14,730 --> 00:07:19,950 Now how would you figure out what rock club returns in this case. 111 00:07:19,950 --> 00:07:23,040 Well what I would do is I'd press shift tab and this is how I know. 112 00:07:23,730 --> 00:07:27,640 So we've got rock curve it takes y truth so that's our test labels. 113 00:07:27,690 --> 00:07:30,520 It takes y score so that's out. 114 00:07:30,730 --> 00:07:32,590 Y probs positive. 115 00:07:32,590 --> 00:07:36,690 And if we come down here compute the receiver operating characteristic rock. 116 00:07:36,850 --> 00:07:37,270 Yeah. 117 00:07:37,420 --> 00:07:45,900 Beautiful now parameters y true y score target scores can be either probability estimates of the positive 118 00:07:45,900 --> 00:07:49,770 class confidence values or non threshold measures of the decision. 119 00:07:49,770 --> 00:07:52,510 So that's the probability estimates of the positive class. 120 00:07:52,530 --> 00:07:55,410 That's where we got this slice from right. 121 00:07:55,440 --> 00:08:00,840 And then if we come down here it's gonna tell us what it returns FBR increasing false positive rates 122 00:08:00,840 --> 00:08:06,270 such that element IE is a false positive rate of predictions with score above thresholds TPR. 123 00:08:06,270 --> 00:08:09,410 So that's a true positive rate and thresholds. 124 00:08:09,600 --> 00:08:12,570 So that's how we can figure out what a function returns. 125 00:08:12,570 --> 00:08:16,720 And again that's just viewing the doc string in energy and notebook. 126 00:08:16,740 --> 00:08:21,270 You can always look at the documentation for this if you wanted to do so you might look up something 127 00:08:21,270 --> 00:08:29,140 like how to calculate rock curve for psychic learn so then we're gonna check check the false positive 128 00:08:30,490 --> 00:08:39,550 rates FBR beautiful so that's giving us a big array but looking at these on its own doesn't really make 129 00:08:39,550 --> 00:08:42,970 much sense it's much easier to see it visually. 130 00:08:42,970 --> 00:08:48,340 And since I get loan doesn't really have a built in function to plot a rock curve what we might have 131 00:08:48,340 --> 00:08:53,410 to do is and quite often you come across this right as you'll have to find a function or write your 132 00:08:53,410 --> 00:08:55,110 own that'll do it for you. 133 00:08:55,120 --> 00:08:59,950 So that's what we're gonna have a look at in the next video or we'll plot a rock curve and this function 134 00:08:59,950 --> 00:09:04,240 here rock curve will start to make a bit more sense rather than just be an array of numbers.