1
00:00:00,730 --> 00:00:07,660
In this lesson we're going to look more closely at the predictions our model got right and the predictions

2
00:00:07,720 --> 00:00:10,000
our model got wrong.

3
00:00:10,000 --> 00:00:17,590
Now previously we've looked at accuracy as one metric for evaluating how well our model is doing.

4
00:00:17,590 --> 00:00:20,350
The accuracy modelled how many predictions were correct.

5
00:00:20,350 --> 00:00:22,300
Out of all the predictions.

6
00:00:22,480 --> 00:00:24,700
Now here's a question for you.

7
00:00:24,790 --> 00:00:29,350
Can you think of a serious shortcoming of using this metric.

8
00:00:29,380 --> 00:00:37,040
Why might we not want to select our models purely based on accuracy.

9
00:00:37,040 --> 00:00:43,850
Well let me ask you this imagine you were hired by the British National Health Service to build a model

10
00:00:43,880 --> 00:00:46,790
that detects cancer.

11
00:00:46,790 --> 00:00:53,020
Now suppose you build a model that classifies every single patient as not having cancer.

12
00:00:53,210 --> 00:00:59,790
Each new data point is just labeled as being cancer free how accurate with this model.

13
00:01:01,780 --> 00:01:09,400
Well the entire population of the UK is around 65 and a half million and there are around two and a

14
00:01:09,400 --> 00:01:13,170
half million people living with cancer.

15
00:01:13,270 --> 00:01:22,100
So if we do the math then this model is actually 96 percent accurate and ninety six is a very high number.

16
00:01:22,150 --> 00:01:26,050
So it seems like it's a very accurate model.

17
00:01:26,350 --> 00:01:32,500
But if we're getting 96 percent accuracy for a model that does nothing whatsoever then there's probably

18
00:01:32,500 --> 00:01:39,520
something amiss right now as you can tell from these bond charts the underlying reason is is that in

19
00:01:39,520 --> 00:01:46,030
this example one category greatly outnumbers the other category and you'd be surprised.

20
00:01:46,030 --> 00:01:51,070
This is actually a very very common problem in data science and machine learning.

21
00:01:51,430 --> 00:01:58,630
So clearly we need to move beyond the accuracy metric and look at some other metrics as well.

22
00:02:00,030 --> 00:02:06,150
Enter the concept of false positives and false negatives.

23
00:02:06,200 --> 00:02:11,810
My favorite way to think about false positives and false negatives is actually through one of Aesop's

24
00:02:11,810 --> 00:02:12,740
Fables.

25
00:02:12,740 --> 00:02:17,040
It's the story of The Boy Who Cried Wolf.

26
00:02:17,120 --> 00:02:23,180
Now I was told a story a long long time ago when I was a child and I was still shiny and new.

27
00:02:23,660 --> 00:02:27,520
So I'll have to do my best to paraphrase.

28
00:02:27,650 --> 00:02:35,930
But once upon a time there was a shepherd boy who liked to trick his fellow villagers every once in

29
00:02:35,930 --> 00:02:43,670
a while he would run to the village and cry that a wolf had attacked his sheep alas.

30
00:02:43,700 --> 00:02:45,920
When the villagers came to investigate.

31
00:02:45,920 --> 00:02:51,360
No Wolf was found and this is what a false positive is.

32
00:02:51,380 --> 00:02:56,050
The villagers did not like false positives and they were very very angry with the boy.

33
00:02:56,960 --> 00:03:03,200
And having played this trick a few more times on the unsuspecting villagers one day our shepherd boy

34
00:03:03,290 --> 00:03:05,330
actually sees a wolf.

35
00:03:06,080 --> 00:03:08,620
He runs to the villagers and shouts Oi.

36
00:03:08,720 --> 00:03:10,730
There is a wolf eating all the sheep.

37
00:03:11,420 --> 00:03:17,820
However the villagers think it's another false positive and that there is no Wolf.

38
00:03:17,840 --> 00:03:22,240
Little do they know that this time round there actually is a wolf.

39
00:03:22,370 --> 00:03:24,290
And the boy was telling the truth.

40
00:03:24,440 --> 00:03:29,270
And what they've got in this case is actually a true positive.

41
00:03:29,660 --> 00:03:36,110
But as a consequence of not grabbing their pitchforks though Wolf eight all the sheep and the villagers

42
00:03:36,110 --> 00:03:37,160
go hungry.

43
00:03:37,380 --> 00:03:39,240
No more roast lamb for the villagers.

44
00:03:39,440 --> 00:03:40,470
Only potatoes.

45
00:03:41,610 --> 00:03:43,730
And this is where the story ends.

46
00:03:43,730 --> 00:03:50,810
But we have two more cases to think about actually for the third case let's imagine a version of the

47
00:03:50,810 --> 00:03:57,290
story where the boy rocks up to the village and truthfully announces everyday that there is no wealth

48
00:03:58,190 --> 00:03:58,920
no wealth today.

49
00:03:58,930 --> 00:04:01,600
Dear villagers all as well.

50
00:04:01,700 --> 00:04:05,680
Now since there is no Wolf and the boy said that there was no Wolf.

51
00:04:05,690 --> 00:04:14,620
This is called a true negative and that leaves us with one more case in this case.

52
00:04:14,670 --> 00:04:18,660
The boy goes to the village and announces that there is no Wolf.

53
00:04:18,780 --> 00:04:25,720
But this is in fact incorrect the case where the boy says there is no Wolf but there actually is a wolf.

54
00:04:25,920 --> 00:04:34,340
It's called the false negative so what are the false positives and false negatives in our spam classification

55
00:04:34,340 --> 00:04:36,080
context.

56
00:04:36,080 --> 00:04:43,280
Well for the false positive are spam classifier would predict that an email is spam but it's actually

57
00:04:43,280 --> 00:04:44,840
a legitimate email.

58
00:04:44,840 --> 00:04:51,350
False positives are the reason why you have to go into your spam folder and occasionally fish out the

59
00:04:51,590 --> 00:04:52,730
non spam emails.

60
00:04:54,270 --> 00:04:56,030
And the false negatives.

61
00:04:56,180 --> 00:05:05,530
Well a false negative is when the spammer manages to get around the spam filter and land in your inbox.

62
00:05:05,570 --> 00:05:14,590
In other words the email is actually spam but it was incorrectly classified by the spam filter now with

63
00:05:14,590 --> 00:05:15,720
these tools in hand.

64
00:05:15,880 --> 00:05:21,850
We can look at some other metrics for our spam classifier to determine whether it's any good.

65
00:05:21,850 --> 00:05:23,900
So let's head on back into the Jupiter note.

66
00:05:24,280 --> 00:05:26,560
So let me add a little markdown cell here.

67
00:05:26,560 --> 00:05:33,410
That reads false positives and false negatives.

68
00:05:34,630 --> 00:05:39,790
In the next couple of cells we're going to be calculating our true positives are false positives and

69
00:05:39,850 --> 00:05:41,650
our false negatives.

70
00:05:42,460 --> 00:05:49,090
So let's check how often we actually predicted non spam and how many times we've predicted spam gunplay

71
00:05:49,120 --> 00:05:59,320
actually has a very handy method called Unique and this takes two inputs one will be our prediction

72
00:05:59,320 --> 00:06:07,470
vector and the other one well just the return counts equals true.

73
00:06:07,600 --> 00:06:14,650
And here we can see that we've predicted non spam one thousand one hundred sixty three times and we've

74
00:06:14,650 --> 00:06:20,090
predicted spam five hundred and sixty times.

75
00:06:20,290 --> 00:06:25,880
The next thing I'll do is I'll create a number higher rate of the true positives.

76
00:06:26,020 --> 00:06:33,220
So I want to make a comparison between each element in the y underscore test number higher rate and

77
00:06:33,370 --> 00:06:36,720
each element in the prediction no higher rate.

78
00:06:36,940 --> 00:06:41,000
And I want to store this result in a variable called true on a score.

79
00:06:41,110 --> 00:06:50,420
P O S now to check whether an email is spam and r y underscore test.

80
00:06:50,640 --> 00:06:54,000
We can say why does good test double equals 1.

81
00:06:54,540 --> 00:06:59,730
So this will check for each element in y underscore test.

82
00:06:59,760 --> 00:07:07,560
If it's equal to 1 and to check whether our predictions are equal to 1 we can do it like so right prediction

83
00:07:07,560 --> 00:07:15,080
W equals 1 to make a comparison between these two conditions right.

84
00:07:15,110 --> 00:07:19,030
Why underscore a test double equals 1 and prediction tablet equals 1.

85
00:07:19,140 --> 00:07:24,220
What we've previously done is we've used the double Ampersand right.

86
00:07:24,300 --> 00:07:33,540
We've previously made comparisons with this logical and this double ampersand is a boolean operator

87
00:07:33,900 --> 00:07:36,420
but in this case we can't use it.

88
00:07:36,420 --> 00:07:38,460
We won't get the results that we want.

89
00:07:38,520 --> 00:07:46,140
If we use it like so if we want to make an element by element comparison we use a single ampersand.

90
00:07:46,170 --> 00:07:54,920
So in this case we've got a bit wise and operator not the boolean and operator.

91
00:07:55,290 --> 00:08:04,070
And this will allow us to make an element by element comparison so if I execute the cell and then come

92
00:08:04,070 --> 00:08:15,720
down here and sum up my results then I can see that I've got five hundred and forty eight true positives.

93
00:08:15,980 --> 00:08:24,440
In case you're wondering true underscore pause looks like this is just an umpire rate of true and false

94
00:08:24,680 --> 00:08:26,300
values.

95
00:08:26,300 --> 00:08:33,050
So as a challenge can you create a number higher rate that measures the false positives for each data

96
00:08:33,050 --> 00:08:42,750
point call this variable false underscore pulse and then work out how many false positives that were.

97
00:08:43,470 --> 00:08:50,370
And after you've done that do the same for the false negatives store those in a variable called false

98
00:08:50,500 --> 00:08:52,220
underscore neck.

99
00:08:52,470 --> 00:08:56,010
I'll give you a few seconds to pause the video before I show you the solution

100
00:08:59,990 --> 00:09:00,430
all right.

101
00:09:00,430 --> 00:09:02,440
Ready.

102
00:09:02,450 --> 00:09:06,290
I want to store the false positives in a number higher rate as well.

103
00:09:06,290 --> 00:09:11,780
I'm going to call it false underscore Paul's and that's going to be equal to it's gonna be equal to

104
00:09:12,110 --> 00:09:20,300
well where our prediction was spam but actually we had a non spam message.

105
00:09:20,300 --> 00:09:33,350
So why on a school test is equal to non spam zero but we're comparing that with where our prediction

106
00:09:33,530 --> 00:09:38,540
is equal to one where our prediction was spam.

107
00:09:38,570 --> 00:09:45,770
Those are false positives and in total we have about 12 of them.

108
00:09:46,100 --> 00:09:53,840
So in 12 cases are naive based model thought that an email was spam when it really was just a normal

109
00:09:53,960 --> 00:09:56,340
email.

110
00:09:56,360 --> 00:09:59,750
What about the false negatives though false on the score.

111
00:09:59,930 --> 00:10:09,630
Neg is equal to Y and a school test double equals 1.

112
00:10:09,640 --> 00:10:20,020
So in this case an email was actually spam and we're going to compare that with prediction W equals

113
00:10:20,470 --> 00:10:21,070
zero.

114
00:10:21,580 --> 00:10:27,880
So in this case our prediction was that it is a non spam email here.

115
00:10:27,970 --> 00:10:37,510
Our spam message inbox and we missed it with our spam filter so false on a school neg Dot.

116
00:10:37,510 --> 00:10:47,020
Some will show us how many emails actually made it into our inbox and out of all of our test emails

117
00:10:47,350 --> 00:10:51,030
40 spam messages made it into the inbox.

118
00:10:52,640 --> 00:10:53,500
And you know what.

119
00:10:53,960 --> 00:10:59,300
We can actually see these values very very clearly on our chart with the decision boundary.

120
00:10:59,870 --> 00:11:00,380
Let me show you.

121
00:11:00,380 --> 00:11:09,110
If we scroll up and we look here every time we've got one of the Red Crosses below the decision boundary

122
00:11:09,620 --> 00:11:21,320
in this area here where all the non spam messages are we've misclassified this email in the next lesson

123
00:11:21,620 --> 00:11:28,610
we're going to be looking at three other metrics that complement our use of the accuracy metric to help

124
00:11:28,610 --> 00:11:34,060
us evaluate how good our model is looking forward to seeing you in the next lesson.

125
00:11:34,070 --> 00:11:34,690
Take care.