1 00:00:00,730 --> 00:00:07,660 In this lesson we're going to look more closely at the predictions our model got right and the predictions 2 00:00:07,720 --> 00:00:10,000 our model got wrong. 3 00:00:10,000 --> 00:00:17,590 Now previously we've looked at accuracy as one metric for evaluating how well our model is doing. 4 00:00:17,590 --> 00:00:20,350 The accuracy modelled how many predictions were correct. 5 00:00:20,350 --> 00:00:22,300 Out of all the predictions. 6 00:00:22,480 --> 00:00:24,700 Now here's a question for you. 7 00:00:24,790 --> 00:00:29,350 Can you think of a serious shortcoming of using this metric. 8 00:00:29,380 --> 00:00:37,040 Why might we not want to select our models purely based on accuracy. 9 00:00:37,040 --> 00:00:43,850 Well let me ask you this imagine you were hired by the British National Health Service to build a model 10 00:00:43,880 --> 00:00:46,790 that detects cancer. 11 00:00:46,790 --> 00:00:53,020 Now suppose you build a model that classifies every single patient as not having cancer. 12 00:00:53,210 --> 00:00:59,790 Each new data point is just labeled as being cancer free how accurate with this model. 13 00:01:01,780 --> 00:01:09,400 Well the entire population of the UK is around 65 and a half million and there are around two and a 14 00:01:09,400 --> 00:01:13,170 half million people living with cancer. 15 00:01:13,270 --> 00:01:22,100 So if we do the math then this model is actually 96 percent accurate and ninety six is a very high number. 16 00:01:22,150 --> 00:01:26,050 So it seems like it's a very accurate model. 17 00:01:26,350 --> 00:01:32,500 But if we're getting 96 percent accuracy for a model that does nothing whatsoever then there's probably 18 00:01:32,500 --> 00:01:39,520 something amiss right now as you can tell from these bond charts the underlying reason is is that in 19 00:01:39,520 --> 00:01:46,030 this example one category greatly outnumbers the other category and you'd be surprised. 20 00:01:46,030 --> 00:01:51,070 This is actually a very very common problem in data science and machine learning. 21 00:01:51,430 --> 00:01:58,630 So clearly we need to move beyond the accuracy metric and look at some other metrics as well. 22 00:02:00,030 --> 00:02:06,150 Enter the concept of false positives and false negatives. 23 00:02:06,200 --> 00:02:11,810 My favorite way to think about false positives and false negatives is actually through one of Aesop's 24 00:02:11,810 --> 00:02:12,740 Fables. 25 00:02:12,740 --> 00:02:17,040 It's the story of The Boy Who Cried Wolf. 26 00:02:17,120 --> 00:02:23,180 Now I was told a story a long long time ago when I was a child and I was still shiny and new. 27 00:02:23,660 --> 00:02:27,520 So I'll have to do my best to paraphrase. 28 00:02:27,650 --> 00:02:35,930 But once upon a time there was a shepherd boy who liked to trick his fellow villagers every once in 29 00:02:35,930 --> 00:02:43,670 a while he would run to the village and cry that a wolf had attacked his sheep alas. 30 00:02:43,700 --> 00:02:45,920 When the villagers came to investigate. 31 00:02:45,920 --> 00:02:51,360 No Wolf was found and this is what a false positive is. 32 00:02:51,380 --> 00:02:56,050 The villagers did not like false positives and they were very very angry with the boy. 33 00:02:56,960 --> 00:03:03,200 And having played this trick a few more times on the unsuspecting villagers one day our shepherd boy 34 00:03:03,290 --> 00:03:05,330 actually sees a wolf. 35 00:03:06,080 --> 00:03:08,620 He runs to the villagers and shouts Oi. 36 00:03:08,720 --> 00:03:10,730 There is a wolf eating all the sheep. 37 00:03:11,420 --> 00:03:17,820 However the villagers think it's another false positive and that there is no Wolf. 38 00:03:17,840 --> 00:03:22,240 Little do they know that this time round there actually is a wolf. 39 00:03:22,370 --> 00:03:24,290 And the boy was telling the truth. 40 00:03:24,440 --> 00:03:29,270 And what they've got in this case is actually a true positive. 41 00:03:29,660 --> 00:03:36,110 But as a consequence of not grabbing their pitchforks though Wolf eight all the sheep and the villagers 42 00:03:36,110 --> 00:03:37,160 go hungry. 43 00:03:37,380 --> 00:03:39,240 No more roast lamb for the villagers. 44 00:03:39,440 --> 00:03:40,470 Only potatoes. 45 00:03:41,610 --> 00:03:43,730 And this is where the story ends. 46 00:03:43,730 --> 00:03:50,810 But we have two more cases to think about actually for the third case let's imagine a version of the 47 00:03:50,810 --> 00:03:57,290 story where the boy rocks up to the village and truthfully announces everyday that there is no wealth 48 00:03:58,190 --> 00:03:58,920 no wealth today. 49 00:03:58,930 --> 00:04:01,600 Dear villagers all as well. 50 00:04:01,700 --> 00:04:05,680 Now since there is no Wolf and the boy said that there was no Wolf. 51 00:04:05,690 --> 00:04:14,620 This is called a true negative and that leaves us with one more case in this case. 52 00:04:14,670 --> 00:04:18,660 The boy goes to the village and announces that there is no Wolf. 53 00:04:18,780 --> 00:04:25,720 But this is in fact incorrect the case where the boy says there is no Wolf but there actually is a wolf. 54 00:04:25,920 --> 00:04:34,340 It's called the false negative so what are the false positives and false negatives in our spam classification 55 00:04:34,340 --> 00:04:36,080 context. 56 00:04:36,080 --> 00:04:43,280 Well for the false positive are spam classifier would predict that an email is spam but it's actually 57 00:04:43,280 --> 00:04:44,840 a legitimate email. 58 00:04:44,840 --> 00:04:51,350 False positives are the reason why you have to go into your spam folder and occasionally fish out the 59 00:04:51,590 --> 00:04:52,730 non spam emails. 60 00:04:54,270 --> 00:04:56,030 And the false negatives. 61 00:04:56,180 --> 00:05:05,530 Well a false negative is when the spammer manages to get around the spam filter and land in your inbox. 62 00:05:05,570 --> 00:05:14,590 In other words the email is actually spam but it was incorrectly classified by the spam filter now with 63 00:05:14,590 --> 00:05:15,720 these tools in hand. 64 00:05:15,880 --> 00:05:21,850 We can look at some other metrics for our spam classifier to determine whether it's any good. 65 00:05:21,850 --> 00:05:23,900 So let's head on back into the Jupiter note. 66 00:05:24,280 --> 00:05:26,560 So let me add a little markdown cell here. 67 00:05:26,560 --> 00:05:33,410 That reads false positives and false negatives. 68 00:05:34,630 --> 00:05:39,790 In the next couple of cells we're going to be calculating our true positives are false positives and 69 00:05:39,850 --> 00:05:41,650 our false negatives. 70 00:05:42,460 --> 00:05:49,090 So let's check how often we actually predicted non spam and how many times we've predicted spam gunplay 71 00:05:49,120 --> 00:05:59,320 actually has a very handy method called Unique and this takes two inputs one will be our prediction 72 00:05:59,320 --> 00:06:07,470 vector and the other one well just the return counts equals true. 73 00:06:07,600 --> 00:06:14,650 And here we can see that we've predicted non spam one thousand one hundred sixty three times and we've 74 00:06:14,650 --> 00:06:20,090 predicted spam five hundred and sixty times. 75 00:06:20,290 --> 00:06:25,880 The next thing I'll do is I'll create a number higher rate of the true positives. 76 00:06:26,020 --> 00:06:33,220 So I want to make a comparison between each element in the y underscore test number higher rate and 77 00:06:33,370 --> 00:06:36,720 each element in the prediction no higher rate. 78 00:06:36,940 --> 00:06:41,000 And I want to store this result in a variable called true on a score. 79 00:06:41,110 --> 00:06:50,420 P O S now to check whether an email is spam and r y underscore test. 80 00:06:50,640 --> 00:06:54,000 We can say why does good test double equals 1. 81 00:06:54,540 --> 00:06:59,730 So this will check for each element in y underscore test. 82 00:06:59,760 --> 00:07:07,560 If it's equal to 1 and to check whether our predictions are equal to 1 we can do it like so right prediction 83 00:07:07,560 --> 00:07:15,080 W equals 1 to make a comparison between these two conditions right. 84 00:07:15,110 --> 00:07:19,030 Why underscore a test double equals 1 and prediction tablet equals 1. 85 00:07:19,140 --> 00:07:24,220 What we've previously done is we've used the double Ampersand right. 86 00:07:24,300 --> 00:07:33,540 We've previously made comparisons with this logical and this double ampersand is a boolean operator 87 00:07:33,900 --> 00:07:36,420 but in this case we can't use it. 88 00:07:36,420 --> 00:07:38,460 We won't get the results that we want. 89 00:07:38,520 --> 00:07:46,140 If we use it like so if we want to make an element by element comparison we use a single ampersand. 90 00:07:46,170 --> 00:07:54,920 So in this case we've got a bit wise and operator not the boolean and operator. 91 00:07:55,290 --> 00:08:04,070 And this will allow us to make an element by element comparison so if I execute the cell and then come 92 00:08:04,070 --> 00:08:15,720 down here and sum up my results then I can see that I've got five hundred and forty eight true positives. 93 00:08:15,980 --> 00:08:24,440 In case you're wondering true underscore pause looks like this is just an umpire rate of true and false 94 00:08:24,680 --> 00:08:26,300 values. 95 00:08:26,300 --> 00:08:33,050 So as a challenge can you create a number higher rate that measures the false positives for each data 96 00:08:33,050 --> 00:08:42,750 point call this variable false underscore pulse and then work out how many false positives that were. 97 00:08:43,470 --> 00:08:50,370 And after you've done that do the same for the false negatives store those in a variable called false 98 00:08:50,500 --> 00:08:52,220 underscore neck. 99 00:08:52,470 --> 00:08:56,010 I'll give you a few seconds to pause the video before I show you the solution 100 00:08:59,990 --> 00:09:00,430 all right. 101 00:09:00,430 --> 00:09:02,440 Ready. 102 00:09:02,450 --> 00:09:06,290 I want to store the false positives in a number higher rate as well. 103 00:09:06,290 --> 00:09:11,780 I'm going to call it false underscore Paul's and that's going to be equal to it's gonna be equal to 104 00:09:12,110 --> 00:09:20,300 well where our prediction was spam but actually we had a non spam message. 105 00:09:20,300 --> 00:09:33,350 So why on a school test is equal to non spam zero but we're comparing that with where our prediction 106 00:09:33,530 --> 00:09:38,540 is equal to one where our prediction was spam. 107 00:09:38,570 --> 00:09:45,770 Those are false positives and in total we have about 12 of them. 108 00:09:46,100 --> 00:09:53,840 So in 12 cases are naive based model thought that an email was spam when it really was just a normal 109 00:09:53,960 --> 00:09:56,340 email. 110 00:09:56,360 --> 00:09:59,750 What about the false negatives though false on the score. 111 00:09:59,930 --> 00:10:09,630 Neg is equal to Y and a school test double equals 1. 112 00:10:09,640 --> 00:10:20,020 So in this case an email was actually spam and we're going to compare that with prediction W equals 113 00:10:20,470 --> 00:10:21,070 zero. 114 00:10:21,580 --> 00:10:27,880 So in this case our prediction was that it is a non spam email here. 115 00:10:27,970 --> 00:10:37,510 Our spam message inbox and we missed it with our spam filter so false on a school neg Dot. 116 00:10:37,510 --> 00:10:47,020 Some will show us how many emails actually made it into our inbox and out of all of our test emails 117 00:10:47,350 --> 00:10:51,030 40 spam messages made it into the inbox. 118 00:10:52,640 --> 00:10:53,500 And you know what. 119 00:10:53,960 --> 00:10:59,300 We can actually see these values very very clearly on our chart with the decision boundary. 120 00:10:59,870 --> 00:11:00,380 Let me show you. 121 00:11:00,380 --> 00:11:09,110 If we scroll up and we look here every time we've got one of the Red Crosses below the decision boundary 122 00:11:09,620 --> 00:11:21,320 in this area here where all the non spam messages are we've misclassified this email in the next lesson 123 00:11:21,620 --> 00:11:28,610 we're going to be looking at three other metrics that complement our use of the accuracy metric to help 124 00:11:28,610 --> 00:11:34,060 us evaluate how good our model is looking forward to seeing you in the next lesson. 125 00:11:34,070 --> 00:11:34,690 Take care.