1
00:00:00,330 --> 00:00:05,160
Now we've just covered accuracy as a method of evaluating our classification models.

2
00:00:05,160 --> 00:00:07,650
And you might be thinking why can't we just leave it at that.

3
00:00:08,190 --> 00:00:14,250
Well as we start to go through these other metrics here you'll start to understand why it might be important

4
00:00:14,250 --> 00:00:18,810
to get a few more different evaluation metrics on the board rather than just accuracy.

5
00:00:18,900 --> 00:00:23,550
In my case when I was first learning how to build classification models I would always think yeah just

6
00:00:23,640 --> 00:00:25,710
higher accuracy is better right.

7
00:00:25,740 --> 00:00:31,350
And then slowly after looking at different examples I start to realize okay now I see the value of using

8
00:00:31,680 --> 00:00:36,870
other metrics like these but let's not talk about it let's see it in action the next one we're going

9
00:00:36,870 --> 00:00:48,120
to cover is the area under the receiver operating so area under the receiver operating characteristic

10
00:00:48,750 --> 00:00:49,990
curve.

11
00:00:50,220 --> 00:00:57,660
The beautiful thing is it's also known as a U.S. So area under a curve or R O C so receive a operating

12
00:00:57,660 --> 00:00:59,220
characteristic.

13
00:00:59,220 --> 00:01:02,350
So let's do this ROIC curve.

14
00:01:02,400 --> 00:01:09,580
So these are the two things you'll look out for area under curve or rock curve.

15
00:01:09,800 --> 00:01:11,010
We put it here.

16
00:01:11,070 --> 00:01:16,420
You see or rock curve beautiful.

17
00:01:16,420 --> 00:01:20,980
So if you hear someone talking about you say you were ROIC or rock or something like that they're probably

18
00:01:20,980 --> 00:01:22,940
talking about this metric here.

19
00:01:22,990 --> 00:01:25,480
Now what does a rock curve measure.

20
00:01:25,930 --> 00:01:34,140
Well by formal definition a rock curve is a comparison of a model's true positive rate a.k.a. TPR versus

21
00:01:34,140 --> 00:01:35,950
a model's false positive rate.

22
00:01:35,950 --> 00:01:36,910
Let's write that down.

23
00:01:36,910 --> 00:01:42,980
Here are a comparison of the models.

24
00:01:43,270 --> 00:01:53,420
True positive right which is also known as TPR vs. a model's false positive rate which is also known

25
00:01:53,420 --> 00:01:54,640
as FBA.

26
00:01:54,890 --> 00:02:00,830
Now you might be wondering okay what is a true positive and what is a false positive.

27
00:02:00,830 --> 00:02:02,440
Well let's have a look here.

28
00:02:02,560 --> 00:02:10,930
A true positive equals model predicts one when truth is one that makes sense.

29
00:02:10,930 --> 00:02:17,930
So in our case if we have a look up here if we our targets are 1 0 0 a true positive is when our model

30
00:02:17,930 --> 00:02:18,950
predicts a 1.

31
00:02:19,040 --> 00:02:20,990
The real label is a 1.

32
00:02:21,010 --> 00:02:21,540
Okay.

33
00:02:21,700 --> 00:02:23,460
Yeah that makes sense.

34
00:02:23,480 --> 00:02:32,390
So if we go here a false positive is model predicts 1 when truth is 0.

35
00:02:32,390 --> 00:02:34,160
So why is it called false positive.

36
00:02:34,160 --> 00:02:35,940
Well it's because it's predicting 1.

37
00:02:35,960 --> 00:02:36,200
So.

38
00:02:36,200 --> 00:02:36,550
Okay.

39
00:02:36,560 --> 00:02:38,960
The positive class has heart disease.

40
00:02:38,960 --> 00:02:40,820
When the truth is actually 0.

41
00:02:40,820 --> 00:02:44,990
So it's giving us a false positive sense that that a person may have heart disease.

42
00:02:45,110 --> 00:02:48,750
In our particular case for our heart disease classification problem.

43
00:02:48,920 --> 00:02:51,260
And then if we go here and through negative

44
00:02:53,930 --> 00:02:59,650
ego's model predicts zero when truth is zero.

45
00:02:59,690 --> 00:03:00,680
So that makes sense.

46
00:03:00,680 --> 00:03:04,790
So that's predicting the model is getting the right prediction there is predicting someone doesn't have

47
00:03:04,790 --> 00:03:06,680
heart disease when they actually don't.

48
00:03:06,680 --> 00:03:07,880
That's great.

49
00:03:07,880 --> 00:03:12,750
And then a false negative equals model predicts.

50
00:03:13,070 --> 00:03:14,440
What do you think this would be.

51
00:03:14,540 --> 00:03:15,330
False Negative.

52
00:03:15,340 --> 00:03:21,140
We look at false positive a false negative is when model predicts zero.

53
00:03:21,310 --> 00:03:29,150
When truth is one so that's because it's predicting not heart disease when it's actually is heart disease.

54
00:03:29,260 --> 00:03:29,500
Right.

55
00:03:29,500 --> 00:03:35,740
So now we know these we can see that a rock curve is a comparison of a model's true positive rate TPR

56
00:03:36,140 --> 00:03:38,500
vs. typos everywhere.

57
00:03:38,510 --> 00:03:39,440
Daniel come on.

58
00:03:39,640 --> 00:03:42,850
Versus a model's false positive rate or FPL.

59
00:03:42,880 --> 00:03:43,780
So now we know this.

60
00:03:43,780 --> 00:03:51,820
Let's see it in action so we can do this using psychic loans metrics library planning going to import

61
00:03:52,420 --> 00:04:01,680
ROIC curve so see our sign rock of ROIC curve and then we're going to make predictions with probabilities.

62
00:04:01,690 --> 00:04:03,850
And how can we make predictions with probabilities.

63
00:04:03,850 --> 00:04:06,430
We saw this in our making predictions.

64
00:04:06,550 --> 00:04:15,010
We can do that with Y probs equals CnF dot predict Kroeber for probability we're gonna make some predictions

65
00:04:15,070 --> 00:04:17,150
on the X test data.

66
00:04:17,230 --> 00:04:26,410
Now I just want to make sure maybe we do that we create create X test just to make sure it's the right

67
00:04:26,440 --> 00:04:28,660
x test data x test

68
00:04:31,260 --> 00:04:46,280
etc. So we want to go here x train x test y train y test equals train test split x y test size equals

69
00:04:46,370 --> 00:04:49,750
zero point two wonderful.

70
00:04:49,840 --> 00:04:57,340
And so now we can do that there because a rock curve is a comparison of a model's true positive rate

71
00:04:57,340 --> 00:04:59,880
versus a model's false positive rate.

72
00:04:59,920 --> 00:05:02,970
We want to only keep the positive classes.

73
00:05:02,980 --> 00:05:06,620
So actually let's see what y probs looks like.

74
00:05:07,000 --> 00:05:07,680
Why proms.

75
00:05:07,690 --> 00:05:08,880
Why hasn't this worked.

76
00:05:08,890 --> 00:05:09,790
What do we got here.

77
00:05:09,790 --> 00:05:14,520
This random Florence classified is not fitted yet code fit that would make sense.

78
00:05:14,560 --> 00:05:15,310
So we want

79
00:05:18,130 --> 00:05:18,830
the classifier.

80
00:05:18,830 --> 00:05:24,140
Of course you can't make predictions without without fitting them without the model learning any patterns

81
00:05:25,220 --> 00:05:26,160
so we're gonna fit it here.

82
00:05:26,180 --> 00:05:30,110
Then we'll make some predictions and we'll have a look at this maybe only the first 10 so we're not

83
00:05:30,110 --> 00:05:31,580
taking our paper space.

84
00:05:32,370 --> 00:05:32,630
Okay.

85
00:05:32,630 --> 00:05:33,500
Beautiful.

86
00:05:33,530 --> 00:05:41,220
And so because ROIC curve only is a comparison of the models true positive rate versus a false positive

87
00:05:41,220 --> 00:05:46,510
rate we only want probabilities that the model has predicted for the positive class.

88
00:05:46,530 --> 00:05:52,770
So if you imagine here our models trying to predict 0 or 1 This is the probability that the label is

89
00:05:52,770 --> 00:05:58,100
a zero and this is the probability that the label is one and so on for.

90
00:05:58,140 --> 00:06:03,580
For all of these samples until until it finishes up so there's gonna be however many are in the test

91
00:06:03,580 --> 00:06:04,250
set here.

92
00:06:05,660 --> 00:06:07,530
Sixty ones is gonna be sixty one of these.

93
00:06:07,550 --> 00:06:12,170
So essentially what we want is why probs positive

94
00:06:14,900 --> 00:06:20,390
so this is the probabilities that it's the positive class a.k.a. zero is the negative class and one

95
00:06:20,390 --> 00:06:22,020
is the positive class.

96
00:06:22,030 --> 00:06:31,010
Why is we're gonna use some slicing here but only column 1 of every row so that we can do that just

97
00:06:31,010 --> 00:06:34,160
so you can know what's going on.

98
00:06:34,330 --> 00:06:37,880
Positive and we'll only look at the first 10 again.

99
00:06:39,090 --> 00:06:40,260
So does this make sense.

100
00:06:40,260 --> 00:06:42,110
We're getting zero point four three.

101
00:06:42,180 --> 00:06:48,210
Yep zero point seven seven Yep zero point four eight so on so on and so on for the entire list of Y

102
00:06:48,210 --> 00:06:49,020
problems.

103
00:06:49,080 --> 00:07:00,530
And now we can calculate FBR TPR and thresholds gave FBR TPR thresholds.

104
00:07:00,540 --> 00:07:01,280
Why do I know this.

105
00:07:01,290 --> 00:07:07,530
We'll see this in a second why do I know that this is gonna be FBI common TPR common threshold rock

106
00:07:07,530 --> 00:07:08,220
curve.

107
00:07:08,250 --> 00:07:09,090
I'm going to pass it.

108
00:07:09,090 --> 00:07:11,280
Why test and why.

109
00:07:11,280 --> 00:07:14,680
Probs positive.

110
00:07:14,730 --> 00:07:19,950
Now how would you figure out what rock club returns in this case.

111
00:07:19,950 --> 00:07:23,040
Well what I would do is I'd press shift tab and this is how I know.

112
00:07:23,730 --> 00:07:27,640
So we've got rock curve it takes y truth so that's our test labels.

113
00:07:27,690 --> 00:07:30,520
It takes y score so that's out.

114
00:07:30,730 --> 00:07:32,590
Y probs positive.

115
00:07:32,590 --> 00:07:36,690
And if we come down here compute the receiver operating characteristic rock.

116
00:07:36,850 --> 00:07:37,270
Yeah.

117
00:07:37,420 --> 00:07:45,900
Beautiful now parameters y true y score target scores can be either probability estimates of the positive

118
00:07:45,900 --> 00:07:49,770
class confidence values or non threshold measures of the decision.

119
00:07:49,770 --> 00:07:52,510
So that's the probability estimates of the positive class.

120
00:07:52,530 --> 00:07:55,410
That's where we got this slice from right.

121
00:07:55,440 --> 00:08:00,840
And then if we come down here it's gonna tell us what it returns FBR increasing false positive rates

122
00:08:00,840 --> 00:08:06,270
such that element IE is a false positive rate of predictions with score above thresholds TPR.

123
00:08:06,270 --> 00:08:09,410
So that's a true positive rate and thresholds.

124
00:08:09,600 --> 00:08:12,570
So that's how we can figure out what a function returns.

125
00:08:12,570 --> 00:08:16,720
And again that's just viewing the doc string in energy and notebook.

126
00:08:16,740 --> 00:08:21,270
You can always look at the documentation for this if you wanted to do so you might look up something

127
00:08:21,270 --> 00:08:29,140
like how to calculate rock curve for psychic learn so then we're gonna check check the false positive

128
00:08:30,490 --> 00:08:39,550
rates FBR beautiful so that's giving us a big array but looking at these on its own doesn't really make

129
00:08:39,550 --> 00:08:42,970
much sense it's much easier to see it visually.

130
00:08:42,970 --> 00:08:48,340
And since I get loan doesn't really have a built in function to plot a rock curve what we might have

131
00:08:48,340 --> 00:08:53,410
to do is and quite often you come across this right as you'll have to find a function or write your

132
00:08:53,410 --> 00:08:55,110
own that'll do it for you.

133
00:08:55,120 --> 00:08:59,950
So that's what we're gonna have a look at in the next video or we'll plot a rock curve and this function

134
00:08:59,950 --> 00:09:04,240
here rock curve will start to make a bit more sense rather than just be an array of numbers.