1
00:00:00,190 --> 00:00:05,580
It's in the last video we left off saying that these metrics had precision recall and f1 score are only

2
00:00:05,580 --> 00:00:10,220
calculated using one split train test split.

3
00:00:10,410 --> 00:00:16,450
So only calculated using one but we want to prefer using cross validation where possible.

4
00:00:16,500 --> 00:00:18,150
So that's what we're going to do.

5
00:00:18,180 --> 00:00:33,120
Let's make a little heading we'll go calculate evaluation metrics using cross validation.

6
00:00:33,180 --> 00:00:33,560
All right.

7
00:00:33,570 --> 00:00:47,650
Here we're going to calculate precision recall in f1 score of our model using cross validation and to

8
00:00:47,650 --> 00:00:51,600
do so we'll be using.

9
00:00:52,340 --> 00:00:54,850
Let me put this one in code cross Val school.

10
00:00:54,910 --> 00:00:58,240
So we've seen this one in the in the socket line section.

11
00:00:58,260 --> 00:01:05,410
Cross Val school and if we look it up here as we always want to do so cross Val school what's it say

12
00:01:06,860 --> 00:01:08,040
so it takes an estimate.

13
00:01:08,070 --> 00:01:13,110
It takes an X it takes a y evaluate a school by cross validation the thing we want to pay attention

14
00:01:13,110 --> 00:01:14,830
to here is scoring.

15
00:01:14,910 --> 00:01:21,240
So if we get a scoring a string callable a string or score callable object function with signature which

16
00:01:21,240 --> 00:01:23,110
should return only a single value.

17
00:01:23,110 --> 00:01:23,390
OK.

18
00:01:23,400 --> 00:01:30,300
So this is the parameter we're going to use to evaluate our model using cross validation with different

19
00:01:30,300 --> 00:01:31,020
metrics.

20
00:01:31,020 --> 00:01:36,870
So we're gonna change the scoring string so let's say that we'll set up a new logistic regression model

21
00:01:36,870 --> 00:01:39,020
instance with our best type of parameters.

22
00:01:39,190 --> 00:01:40,690
What were our best hybrid.

23
00:01:40,720 --> 00:01:46,950
So let's go check the hybrid parameters because we want to set up one of the best models that we can

24
00:01:47,010 --> 00:01:50,240
and then use cross validation to evaluate it.

25
00:01:50,380 --> 00:01:55,880
G.S. log rig dot based programs.

26
00:01:56,010 --> 00:01:56,640
Wonderful.

27
00:01:57,150 --> 00:02:06,900
So now we're going to create a classifier go here create a new classifier with base parameters that

28
00:02:06,900 --> 00:02:16,400
we've found so CnF equals logistic regression Sea Eagles we're going to just copy that here.

29
00:02:16,440 --> 00:02:17,570
Wonderful.

30
00:02:18,170 --> 00:02:20,570
Whoops I forgot the 0.

31
00:02:20,580 --> 00:02:24,300
That's a very long number there and then solver equals.

32
00:02:24,330 --> 00:02:28,740
Now again these parameters might be different if you've done your own hype of hammering a tuning and

33
00:02:28,740 --> 00:02:32,090
found some better ones but that will do for us for now.

34
00:02:32,100 --> 00:02:32,550
Wonderful.

35
00:02:32,580 --> 00:02:37,710
So now we've instantiated a model with the best high parameters let's use Crossrail score along with

36
00:02:37,710 --> 00:02:42,120
the scoring parameter to get some cross validated metrics.

37
00:02:42,150 --> 00:02:45,990
So what we do I've actually forgotten here.

38
00:02:46,230 --> 00:02:48,780
We're going to calculate yes there we go.

39
00:02:48,810 --> 00:02:50,710
So we want accuracy here.

40
00:02:51,030 --> 00:02:54,240
Cross validated accuracy.

41
00:02:55,080 --> 00:03:00,420
So we might do four cells and we go here cross validated precision

42
00:03:04,110 --> 00:03:09,380
cross validated recall and then cross validated

43
00:03:12,080 --> 00:03:14,390
EF 1 school.

44
00:03:14,400 --> 00:03:15,460
Wonderful.

45
00:03:15,510 --> 00:03:16,820
So let's do that.

46
00:03:16,830 --> 00:03:25,860
So we got gonna CV ACH which stands for cross validation accuracy equals cross vowel score and we're

47
00:03:25,860 --> 00:03:29,440
going to pass it out classifier which we just instantiated here.

48
00:03:29,700 --> 00:03:35,820
We're gonna pass it all of the X data because we can now because we're using cross vowel school and

49
00:03:35,820 --> 00:03:46,270
all of the y data and we're gonna set CV to five and we're gonna set scoring to be accuracy beautiful

50
00:03:46,670 --> 00:03:52,540
and then we're gonna check out what CV act looks like and we can take the main of this because remember

51
00:03:52,600 --> 00:03:58,810
but what is it done it's evaluated our model over five different splits.

52
00:03:58,810 --> 00:04:04,880
So if we take the mean of it all right that's gonna get the average accuracy across these five different

53
00:04:04,880 --> 00:04:05,900
splints.

54
00:04:05,900 --> 00:04:06,680
So let's do that.

55
00:04:06,680 --> 00:04:12,720
So we go empty You don't mean save a wonderful.

56
00:04:12,790 --> 00:04:14,230
So that's our accuracy there.

57
00:04:14,290 --> 00:04:22,580
So we might say that actually you might just override our value and then we'll go save a beautiful.

58
00:04:22,650 --> 00:04:25,140
And now let's do the sign with these elements.

59
00:04:25,320 --> 00:04:30,530
We'll go here we could just copy the code here again.

60
00:04:30,540 --> 00:04:33,220
You don't want to get into the habit of copying code.

61
00:04:33,300 --> 00:04:37,020
We should really function like this but we're just going to keep rolling with the punches what we're

62
00:04:37,020 --> 00:04:37,640
doing now.

63
00:04:38,370 --> 00:04:43,670
And by functional is this I mean because we're doing relatively similar calculations the whole way through.

64
00:04:43,890 --> 00:04:46,100
We could just make it function.

65
00:04:46,230 --> 00:04:58,250
So that is the precision might override this to be precision equals MP main and then see it again wonderful.

66
00:04:58,540 --> 00:05:04,300
And then we could do the same thing here except for recall what we might do is be a little bit tricky

67
00:05:05,250 --> 00:05:10,650
put in spaces here when you hold command you can put little curses here.

68
00:05:10,680 --> 00:05:19,910
So if we just change this old to recall how happy days look at that I got five cases on the go recall

69
00:05:20,270 --> 00:05:28,210
one for oh that's a nice recall score cross validated as well we might do the same for F one and we

70
00:05:28,290 --> 00:05:34,590
go here we're gonna change all of these bad boys to F one.

71
00:05:34,860 --> 00:05:35,620
There we go.

72
00:05:35,910 --> 00:05:37,290
We might put a little space.

73
00:05:37,320 --> 00:05:40,830
So our code is looking polyphonic wonderful.

74
00:05:40,830 --> 00:05:41,740
All right.

75
00:05:41,800 --> 00:05:47,300
And so now we've got all these cross validation metrics across validated metrics that our boss is requesting.

76
00:05:47,640 --> 00:05:52,320
It's not really good to just happen Pierson and Jupiter and I book like all of our other valuable staffers

77
00:05:52,440 --> 00:05:57,270
in a nice neat little table is our classification matrix that looks really good in a presentation.

78
00:05:57,270 --> 00:06:02,260
You know this is something that someone could look at and go yep I can see that value is pretty high.

79
00:06:02,340 --> 00:06:03,700
I can see what's happening here.

80
00:06:04,410 --> 00:06:05,300
Let's do the same.

81
00:06:05,310 --> 00:06:10,350
Rather than just having them all spread out let's pull into a graph of sorts or a visualization.

82
00:06:10,740 --> 00:06:16,100
So we go visualize our cross validated metrics.

83
00:06:17,140 --> 00:06:22,440
So we're going to create save a metrics as a data frame.

84
00:06:22,440 --> 00:06:27,080
I mean yeah we've probably really should have functional this to begin with but that's right.

85
00:06:27,300 --> 00:06:34,890
Sometimes you have to just go through the old fashioned way and then realize why you've gone wrong and

86
00:06:34,890 --> 00:06:39,090
see how you could improve and maybe that's a little extension you could try yourself and see how you

87
00:06:39,090 --> 00:06:41,530
could function or some of what we're doing here.

88
00:06:41,610 --> 00:06:46,830
Maybe I just cross validate some X and Y using different metrics and puts it all into a nice little

89
00:06:46,830 --> 00:06:57,990
presentation all in one hit that would be a good practice save a recall and then we go if one is going

90
00:06:57,990 --> 00:07:09,490
to be our CV F one wonderful we need to set an index equals zero beautiful and then we can create a

91
00:07:09,490 --> 00:07:11,920
plot save a matrix.

92
00:07:12,010 --> 00:07:13,000
Now I've done this before.

93
00:07:13,090 --> 00:07:13,870
Spoiler alert.

94
00:07:13,990 --> 00:07:16,100
So I know I need to transpose it.

95
00:07:16,160 --> 00:07:17,270
Bah.

96
00:07:17,270 --> 00:07:25,110
Title equals cross validated classification metrics.

97
00:07:25,180 --> 00:07:27,400
And do we want a legend.

98
00:07:27,400 --> 00:07:28,300
No we don't.

99
00:07:28,300 --> 00:07:29,220
False.

100
00:07:29,260 --> 00:07:30,850
Let's see what that looks like.

101
00:07:30,880 --> 00:07:31,270
Beautiful.

102
00:07:31,270 --> 00:07:33,710
We're gonna put a little semicolon here.

103
00:07:33,730 --> 00:07:34,990
There we go.

104
00:07:35,020 --> 00:07:39,910
So maybe we could probably add like the numbers on hand so people know exactly what they are but we

105
00:07:39,910 --> 00:07:42,370
see our models doing pretty well on recall.

106
00:07:42,460 --> 00:07:49,540
And again we could probably add our cross validation results from our random forest model or our Kenya's

107
00:07:49,540 --> 00:07:55,390
neighbors classifier or any other classification model that we may have tried during our experimentation

108
00:07:55,390 --> 00:07:55,960
phase.

109
00:07:56,010 --> 00:07:59,590
Remember this experimentation phase is just it's an iterative process right.

110
00:07:59,590 --> 00:08:04,940
Just going back and back and back and forth through here so we could add that to here and see which

111
00:08:04,940 --> 00:08:05,810
model performs best.

112
00:08:05,810 --> 00:08:12,260
But for now we're sticking with logistic regression and this looks like something that we could share.

113
00:08:12,260 --> 00:08:14,510
So what do we have left now.

114
00:08:14,810 --> 00:08:18,940
Well let's go back out to where we put down what we've what we've ticked off.

115
00:08:18,950 --> 00:08:21,950
Have we ticked off everything the things our boss was asking.

116
00:08:21,950 --> 00:08:23,890
The things we previously didn't know how to do.

117
00:08:23,900 --> 00:08:28,260
But now we've seen them in action.

118
00:08:28,410 --> 00:08:28,830
All right.

119
00:08:29,440 --> 00:08:37,040
So we've got cross validation precision recall Yep classification report tick ROIC curve tick area and

120
00:08:37,060 --> 00:08:39,190
look have fusion matrix.

121
00:08:39,200 --> 00:08:39,700
Yes.

122
00:08:39,860 --> 00:08:43,790
The thing we're missing out on here is feature importance.

123
00:08:43,800 --> 00:08:44,710
Mm hmm.

124
00:08:44,740 --> 00:08:48,890
All right well let's see how we do feature importance in the next video.