1
00:00:00,300 --> 00:00:00,766
All right.

2
00:00:00,766 --> 00:00:03,866
Welcome back to talking about k fold
cross-validation, a

3
00:00:03,866 --> 00:00:07,433
very important tool
in your toolkit for assessing,

4
00:00:07,733 --> 00:00:11,833
how well your model is working
with the data that you have.

5
00:00:12,300 --> 00:00:12,600
Okay.

6
00:00:12,600 --> 00:00:14,466
So here is what we normally do.

7
00:00:14,466 --> 00:00:18,233
We have a data set,
and we usually split it into a training

8
00:00:18,233 --> 00:00:20,200
set and a test set.

9
00:00:20,200 --> 00:00:23,366
And from here we're going to talk about
k fold cross-validation.

10
00:00:23,366 --> 00:00:26,066
But first
I wanted to make a quick note that,

11
00:00:26,066 --> 00:00:29,266
so there is
there are two schools of thought.

12
00:00:29,300 --> 00:00:33,233
Well, basically one school of thought
is that when you're doing KS

13
00:00:33,233 --> 00:00:35,666
fold cross-validation,
you don't need the test set.

14
00:00:35,666 --> 00:00:40,200
It is enough to do k fold cross-validation
in the second, school of thought,

15
00:00:40,200 --> 00:00:44,600
you still do the test set and you do k
fold cross-validation on the training set.

16
00:00:45,366 --> 00:00:47,433
and then you still use the test
set later on.

17
00:00:47,433 --> 00:00:49,366
So those are two different approaches.

18
00:00:49,366 --> 00:00:52,566
We are going to talk more about that
at the end of this tutorial.

19
00:00:53,000 --> 00:00:56,133
In throughout this tutorial
we're going to use a second school

20
00:00:56,133 --> 00:00:59,733
of thought
because it's more, it's more general.

21
00:00:59,733 --> 00:01:01,166
And then we'll be able

22
00:01:01,166 --> 00:01:06,066
to, like simplify it
to make it, appropriate for the first goal

23
00:01:06,100 --> 00:01:08,033
thought at the end of this tutorial
when we discuss it.

24
00:01:08,033 --> 00:01:11,033
So for now, let's, stick to breaking

25
00:01:11,033 --> 00:01:14,033
the data set into a training set
and a test set.

26
00:01:14,033 --> 00:01:17,566
Now, once you've broken down what you do
next for k fold cross-validation.

27
00:01:17,566 --> 00:01:18,633
So normally what you would do is

28
00:01:18,633 --> 00:01:22,266
you would just train your model here
and then test your model here.

29
00:01:23,100 --> 00:01:23,433
Right.

30
00:01:23,433 --> 00:01:27,500
And from that
you would get like a result. Yes.

31
00:01:27,500 --> 00:01:28,666
The model hasn't seen this data.

32
00:01:28,666 --> 00:01:31,666
So you would be able to tell how
well it performs on this test set.

33
00:01:31,800 --> 00:01:34,333
But what if you just get lucky
on this test set?

34
00:01:34,333 --> 00:01:37,333
What if it just so happens that it does
well in test set,

35
00:01:37,466 --> 00:01:40,366
but then on future
data, it's not going to do well at all.

36
00:01:40,366 --> 00:01:42,600
So that's what k fold
cross-validation is for.

37
00:01:42,600 --> 00:01:46,200
It's here to combat that scenario
where you just got lucky on the test set

38
00:01:46,466 --> 00:01:49,466
to ensure with more certainty
that your model is doing well.

39
00:01:49,633 --> 00:01:51,433
So what are we going to do
is we're going to take the training set.

40
00:01:51,433 --> 00:01:54,433
We're going to split it into, ten folds.

41
00:01:54,433 --> 00:01:55,833
It's actually k folds.

42
00:01:55,833 --> 00:01:59,433
But for our tutorial
for simplicity's sake,

43
00:01:59,433 --> 00:02:01,166
which is going to assume k equals to ten.

44
00:02:01,166 --> 00:02:02,933
So it's playing into ten folds.

45
00:02:02,933 --> 00:02:06,566
A fold is just a fancy word
for saying we're going to split ten parts.

46
00:02:06,566 --> 00:02:08,200
Each part is about this.

47
00:02:08,200 --> 00:02:10,700
They're all about the same in
size and they don't overlap.

48
00:02:12,033 --> 00:02:15,633
then what we're going
to do is we're going to train the data

49
00:02:15,633 --> 00:02:19,500
on, nine of these folds
and keep one fold

50
00:02:19,600 --> 00:02:22,933
as an unseen, fold for validation.

51
00:02:22,933 --> 00:02:25,033
So that's,
that's going to be our training data

52
00:02:25,033 --> 00:02:26,500
and that's going to be our validation
data.

53
00:02:26,500 --> 00:02:30,133
Think of it as, 
basically training data, testing data.

54
00:02:30,133 --> 00:02:31,500
But we're going to use validation.

55
00:02:31,500 --> 00:02:34,500
So we don't confuse with this taste
testing testing data.

56
00:02:34,666 --> 00:02:37,733
So we're going to train it on this,
this data of these nine folds.

57
00:02:37,800 --> 00:02:40,900
And then validate
or find our metrics and calculate whatever

58
00:02:40,900 --> 00:02:45,200
we need to calculate of how well our model
is performing on this validation

59
00:02:45,200 --> 00:02:48,233
set of validation fold,
because it has not seen it before.

60
00:02:48,566 --> 00:02:50,866
Great. Then we're going to do that again.

61
00:02:50,866 --> 00:02:53,400
But now
we're going to shift the validation fold.

62
00:02:53,400 --> 00:02:55,066
The validation fold becomes this fold.

63
00:02:55,066 --> 00:02:56,166
So now we're going to train the data

64
00:02:56,166 --> 00:03:00,000
on the on this data
set right on these nine folds.

65
00:03:00,000 --> 00:03:01,366
And as you can see it's slightly.

66
00:03:01,366 --> 00:03:05,266
So the training data has slightly changed
and the fold is completely new.

67
00:03:05,266 --> 00:03:06,666
All the validation folds completely new.

68
00:03:06,666 --> 00:03:08,900
And again it's not going to be seen
during this training.

69
00:03:08,900 --> 00:03:11,766
So we're going to get a new model
as a result of this training.

70
00:03:11,766 --> 00:03:13,433
a new trained model.

71
00:03:13,433 --> 00:03:15,000
And we're going to validate on this fold.

72
00:03:15,000 --> 00:03:19,400
And note every time we do this
for every fold or every like

73
00:03:19,433 --> 00:03:20,633
combination of folds.

74
00:03:20,633 --> 00:03:24,766
So here and here
we have to use the same hyperparameters.

75
00:03:24,766 --> 00:03:25,500
Very important.

76
00:03:25,500 --> 00:03:27,900
So we've decided on our hyperparameters.

77
00:03:27,900 --> 00:03:32,900
And now we're just training the model
again and again on slightly different

78
00:03:32,900 --> 00:03:36,933
training data and validating it
on the validating fold, which is changing,

79
00:03:37,100 --> 00:03:38,700
which is shifting, as you can see.

80
00:03:38,700 --> 00:03:40,266
So here's our six training.

81
00:03:40,266 --> 00:03:42,033
So we train it on all of this data

82
00:03:42,033 --> 00:03:45,633
and then validate it on this fold
which is not seen during training.

83
00:03:46,066 --> 00:03:49,033
So we keep doing that
and we keep shifting shifting.

84
00:03:49,033 --> 00:03:51,766
So if we have ten folds
we're going to have to do train

85
00:03:51,766 --> 00:03:54,366
ten train train ten models.

86
00:03:54,366 --> 00:03:56,066
And each time we're going to use
the same hyperparameter.

87
00:03:56,066 --> 00:03:59,166
So the model hyperparameter model
and the hyperparameters

88
00:03:59,166 --> 00:04:00,533
are the same during training.

89
00:04:00,533 --> 00:04:05,100
Of course, it will result in it'll be
a different slightly different result.

90
00:04:05,100 --> 00:04:07,800
And then we'll validate it
on the validation fold.

91
00:04:07,800 --> 00:04:11,566
And as a result
we will have ten sets of metrics.

92
00:04:11,566 --> 00:04:15,166
Remember, if we just did the training set
and the test set just the normal two,

93
00:04:15,433 --> 00:04:18,000
then we would have one set of metrics.
Then we could have gotten lucky.

94
00:04:18,000 --> 00:04:21,600
Whereas here
we're going to have ten sets of metrics.

95
00:04:21,600 --> 00:04:25,800
It's much less likely that we got lucky
ten times.

96
00:04:26,100 --> 00:04:26,333
Right.

97
00:04:26,333 --> 00:04:29,200
So it's much more reliable
now that we're going to have,

98
00:04:29,200 --> 00:04:32,200
ten sets of metrics
and we can look at them in aggregate.

99
00:04:32,366 --> 00:04:33,766
And that's exactly what we're going to do.

100
00:04:33,766 --> 00:04:35,333
So let's make some space.

101
00:04:35,333 --> 00:04:39,000
And here we're going to assess these
metrics and look at them in aggregate.

102
00:04:39,266 --> 00:04:42,266
And if these metrics
look good in aggregate

103
00:04:42,300 --> 00:04:44,466
then the modeling approach is valid.

104
00:04:44,466 --> 00:04:45,733
So the model you've selected

105
00:04:45,733 --> 00:04:49,800
and the hyperparameters
you selected are good for this data.

106
00:04:50,166 --> 00:04:53,333
And then what we're going to do
is we're going to train the model again

107
00:04:53,333 --> 00:04:56,366
to go and train the model
one more time, one last time.

108
00:04:56,533 --> 00:04:59,133
This time we're going to train
an all of the training data,

109
00:04:59,133 --> 00:05:03,333
and then we're going to test it on 
the test set as usual.

110
00:05:03,333 --> 00:05:04,866
That's our final step.

111
00:05:04,866 --> 00:05:08,266
On the other hand,
if the aggregate metrics don't look good

112
00:05:08,566 --> 00:05:10,033
then something's wrong then.

113
00:05:10,033 --> 00:05:12,900
Otherwise we need to.
If so, they don't look.

114
00:05:12,900 --> 00:05:17,866
If if they don't look good, we need to
adjust hyperparameters of the model.

115
00:05:17,866 --> 00:05:20,866
Or we have to change the model entirely

116
00:05:21,300 --> 00:05:24,300
and repeat this whole process
of k fold cross-validation.

117
00:05:24,866 --> 00:05:27,300
So that's what k fold cross-validation is.

118
00:05:27,300 --> 00:05:28,333
And that's how it works.

119
00:05:28,333 --> 00:05:31,500
As we discussed at the beginning,
there's a few schools of thought.

120
00:05:31,500 --> 00:05:34,500
This was a second school
or they don't really have numbers.

121
00:05:34,633 --> 00:05:37,933
This is one of the schools of thought
that says we should have the training set.

122
00:05:37,933 --> 00:05:40,666
Apply k fold cross-validation
to the training set,

123
00:05:40,666 --> 00:05:43,300
says the metrics, and then retrain
the model on the training set.

124
00:05:43,300 --> 00:05:47,366
Once we're happy and then still test it on
test set, the other school of thought

125
00:05:47,366 --> 00:05:50,966
says, let's get rid of this testing
step, right?

126
00:05:50,966 --> 00:05:53,900
So we've already, 
run the model here many times,

127
00:05:54,900 --> 00:05:57,733
and we've
tested it on these validation metrics.

128
00:05:57,733 --> 00:06:00,733
So we're just going to train the model
on the training set.

129
00:06:00,733 --> 00:06:03,100
And we will no need to test it anymore.

130
00:06:03,100 --> 00:06:05,133
We've tested it. We know it works.

131
00:06:05,133 --> 00:06:07,866
then there's a modification
of that school of thought as well

132
00:06:07,866 --> 00:06:12,000
where you don't even train it
on this training set anymore.

133
00:06:12,100 --> 00:06:18,066
So you just say, okay, well,
we've, train the model ten times here.

134
00:06:18,100 --> 00:06:20,366
We've got ten different models.
As a result.

135
00:06:20,366 --> 00:06:22,533
We've tested all of them,
so we don't need to do this test.

136
00:06:22,533 --> 00:06:23,666
And we're not going to do the training.

137
00:06:23,666 --> 00:06:25,800
We're just going
to pick one of these models.

138
00:06:25,800 --> 00:06:28,900
that is kind of like a little bit
challenging in my view.

139
00:06:28,900 --> 00:06:31,700
Like how do you pick which one to go with?

140
00:06:31,700 --> 00:06:32,566
You're going to just take

141
00:06:32,566 --> 00:06:36,500
the one of the the best metrics or them
the closest metrics to the average.

142
00:06:36,500 --> 00:06:39,733
So it's it creates a little bit of extra
work to pick the model out of these,

143
00:06:39,733 --> 00:06:42,133
because they're all going
to be slightly different models,

144
00:06:42,133 --> 00:06:45,133
because the underlying training
data was slightly different.

145
00:06:45,900 --> 00:06:48,900
So that's also an option.

146
00:06:49,000 --> 00:06:51,933
And then there's yet another

147
00:06:51,933 --> 00:06:55,233
modification of how to think about k fold
cross-validation.

148
00:06:55,633 --> 00:06:58,566
what you could do is you could take,

149
00:06:58,566 --> 00:07:02,600
this part and do it first,
you know, do the classic part first.

150
00:07:02,766 --> 00:07:05,766
Split your data into a training set
and a test set.

151
00:07:05,900 --> 00:07:08,700
Train your model on the training set,
test it on the test set.

152
00:07:08,700 --> 00:07:12,166
And then if you're happy
with the results of this classic approach,

153
00:07:12,400 --> 00:07:17,033
you could go an extra step
and apply k fold cross-validation.

154
00:07:17,033 --> 00:07:18,500
So do all of this.

155
00:07:18,500 --> 00:07:20,633
But after you've done
the training and testing.

156
00:07:20,633 --> 00:07:24,800
And so then once you've done all of this,
if your aggregate metrics

157
00:07:24,800 --> 00:07:27,900
still look good, then
you can confirm and say, well, I'm happy,

158
00:07:28,533 --> 00:07:32,100
that even like, like I,
I didn't get lucky on the test set.

159
00:07:32,100 --> 00:07:36,300
Basically it was indeed
the fact that it works on the test set,

160
00:07:36,600 --> 00:07:42,366
is is not just a chance in reality,
which I tested it later.

161
00:07:42,366 --> 00:07:44,700
I tested it with k fold cross-validation
still works.

162
00:07:44,700 --> 00:07:46,400
So I'm just going to keep
my original model

163
00:07:46,400 --> 00:07:48,100
that I trained in this first approach.

164
00:07:48,100 --> 00:07:49,600
So in this case

165
00:07:49,600 --> 00:07:54,566
k fold cross-validation is acting
as like an add on to your classic method.

166
00:07:54,566 --> 00:07:57,166
So it's kind of the same thing
as we discussed.

167
00:07:57,166 --> 00:08:00,300
as that this the what's it called?

168
00:08:00,900 --> 00:08:03,200
The general,
the most general k fold cross-validation.

169
00:08:03,200 --> 00:08:05,600
When you do all this first
and then you train and then you test,

170
00:08:05,600 --> 00:08:07,800
it's kind of the same thing,
but it's just doing it backwards.

171
00:08:07,800 --> 00:08:13,266
So you can do that to, what whatever
you're happy with, whatever works for you.

172
00:08:13,266 --> 00:08:16,333
As long as, you know why you're doing it

173
00:08:16,333 --> 00:08:20,200
and, you know, like,
what results you're aiming for.

174
00:08:20,200 --> 00:08:23,700
You know, how to assess, the,
indications

175
00:08:23,700 --> 00:08:27,133
that k fold cross-validation is giving to,
the rest of the details.

176
00:08:27,133 --> 00:08:30,900
There's no, like, one hard and fast way
that you have to do it, as long as you get

177
00:08:31,200 --> 00:08:33,366
the outcome. We get the benefits of k

178
00:08:33,366 --> 00:08:35,500
k fold cross-validation
that you're aiming to get.

179
00:08:35,500 --> 00:08:39,833
And of course, you don't let, the,
the validation

180
00:08:39,833 --> 00:08:41,300
data leak into the training data

181
00:08:41,300 --> 00:08:44,066
so you don't let the model see
the validation data during training

182
00:08:44,066 --> 00:08:46,866
or the test set during training set
if you're doing those okay.

183
00:08:46,866 --> 00:08:48,500
So that's k fold cross

184
00:08:48,500 --> 00:08:53,466
congratulate on adding a new powerful tool
for model assessment to your toolkit.

185
00:08:53,733 --> 00:08:55,233
And I look forward to seeing you back here
next time.

186
00:08:55,233 --> 00:08:57,266
Until then, enjoy machine learning.