1
00:00:00,266 --> 00:00:02,533
Hello and welcome to this auditorium.

2
00:00:02,533 --> 00:00:03,066
And mostly

3
00:00:03,066 --> 00:00:07,566
welcome to the last part of this course,
14 model selection and Boosting.

4
00:00:08,166 --> 00:00:10,033
So in this part we will do two things.

5
00:00:10,033 --> 00:00:12,900
First evaluating our model performance.

6
00:00:12,900 --> 00:00:15,900
And second,
improving our model performance.

7
00:00:15,900 --> 00:00:19,000
And then there will be this bonus section
about one of the most powerful

8
00:00:19,000 --> 00:00:22,333
algorithm in machine learning,
which has become more and more popular.

9
00:00:22,500 --> 00:00:24,666
And that is called XGBoost.

10
00:00:24,666 --> 00:00:27,866
But first, we want to be able
to improve the model performance

11
00:00:27,866 --> 00:00:30,866
of all the machine learning models
we've built in this course,

12
00:00:31,066 --> 00:00:31,966
and improving

13
00:00:31,966 --> 00:00:35,400
the model performance can be done
with a technique called model selection

14
00:00:35,633 --> 00:00:39,300
that consists of choosing
the best parameters of your machine

15
00:00:39,300 --> 00:00:40,233
learning models.

16
00:00:40,233 --> 00:00:43,233
Because you know, remember,
every time we build a machine

17
00:00:43,233 --> 00:00:46,166
learning model, well,
we had two types of parameters.

18
00:00:46,166 --> 00:00:49,900
The first type where the parameters that
the model learned that is, the parameters

19
00:00:49,900 --> 00:00:53,866
that were changed and found optimal values
by running the model.

20
00:00:54,100 --> 00:00:58,333
And then the second type of parameters are
the parameters that we chose ourselves.

21
00:00:58,600 --> 00:01:02,033
For example the kernel parameter
in the kernel SVM model.

22
00:01:02,533 --> 00:01:05,533
And these parameters are called
the hyperparameters.

23
00:01:05,600 --> 00:01:07,566
So there is still room
to improve the model

24
00:01:07,566 --> 00:01:11,500
because we can still choose
some optimal values for these parameters.

25
00:01:11,833 --> 00:01:15,400
But since these parameters are not
the parameters learned by the model,

26
00:01:15,700 --> 00:01:18,733
then we need to figure out another way
to choose

27
00:01:18,733 --> 00:01:21,900
these optimal values for these parameters
for the hyperparameters.

28
00:01:22,333 --> 00:01:25,333
And that's one of the powerful thing
we'll do in this part ten.

29
00:01:25,500 --> 00:01:28,733
And that will be through a very efficient
technique called grid search.

30
00:01:29,266 --> 00:01:30,966
But before we start grid search,

31
00:01:30,966 --> 00:01:35,066
we need to optimize our way
to evaluate our models, because so far,

32
00:01:35,166 --> 00:01:39,133
what we did is split our data
set between a training set and a test set.

33
00:01:39,500 --> 00:01:42,066
And you know,
we trained our model on the training set

34
00:01:42,066 --> 00:01:44,833
and we tested its performance
on the test set.

35
00:01:44,833 --> 00:01:47,933
That's the correct way
of evaluating the model performance.

36
00:01:48,133 --> 00:01:51,700
But that's not the best one because
we actually have the variance problem.

37
00:01:51,966 --> 00:01:56,066
The variance problem can be explained
by the fact that when we get the accuracy

38
00:01:56,066 --> 00:01:59,866
on the test set, well,
if we run the model again and test again

39
00:01:59,866 --> 00:02:04,200
its performance on another test set, well,
we can get a very different accuracy.

40
00:02:04,633 --> 00:02:08,033
So judging our model performance
only on one

41
00:02:08,033 --> 00:02:11,533
accuracy on one test
set is actually not super relevant.

42
00:02:11,533 --> 00:02:15,133
That's not the most relevant way
to evaluate the model performance.

43
00:02:15,600 --> 00:02:18,833
And so there is this technique
called k fold cross-validation

44
00:02:19,133 --> 00:02:23,366
that improves this a lot because that will
fix this variance problem.

45
00:02:23,833 --> 00:02:25,200
And how will it fix it.

46
00:02:25,200 --> 00:02:29,666
It will fix it by splitting the training
set into ten folds when k equals ten.

47
00:02:29,666 --> 00:02:31,800
And most of the time k equals ten.

48
00:02:31,800 --> 00:02:36,433
And we train our model on nine folds
and we test it on the last remaining fold.

49
00:02:36,733 --> 00:02:40,300
And since with ten folds
we can make ten different combinations

50
00:02:40,300 --> 00:02:43,333
of nine folds to train the model
and one fold to test it.

51
00:02:43,633 --> 00:02:46,566
That means that we can train the model
and Tesla model

52
00:02:46,566 --> 00:02:49,566
on ten combinations
of training and test sets,

53
00:02:49,633 --> 00:02:52,833
and that will already give us a much
better idea of the model performance,

54
00:02:52,833 --> 00:02:56,100
because what we can do
afterwards is take an average

55
00:02:56,366 --> 00:02:59,100
of the different accuracies
of the ten evaluations,

56
00:02:59,100 --> 00:03:02,433
and also compute the standard deviation
to have a look at the variance.

57
00:03:02,733 --> 00:03:06,000
So eventually our analysis
will be much more relevant.

58
00:03:06,300 --> 00:03:09,400
And besides we'll know
in which of these four categories will be.

59
00:03:09,700 --> 00:03:12,700
Because if we get a good accuracy
and a small variance

60
00:03:12,733 --> 00:03:16,100
will be on the lower left one,
if we get a large accuracy

61
00:03:16,133 --> 00:03:19,133
and a high variance,
we will be on the lower right one.

62
00:03:19,166 --> 00:03:21,600
If we get a small accuracy
and a low variance,

63
00:03:21,600 --> 00:03:23,333
we will be on the upper left one.

64
00:03:23,333 --> 00:03:25,100
And eventually, if we get a low accuracy

65
00:03:25,100 --> 00:03:27,966
and a high variance,
we will be on the upper right one.

66
00:03:27,966 --> 00:03:31,133
So this k fold cross
validation is very useful.

67
00:03:31,166 --> 00:03:34,566
And besides our performance
analysis is much more relevant.

68
00:03:35,066 --> 00:03:38,400
So let's start with this k fold
cross-validation.

69
00:03:38,400 --> 00:03:41,100
Our first technique of model selection.

70
00:03:41,100 --> 00:03:44,933
So since we already built a lot of models
we're not going to build another one.

71
00:03:45,133 --> 00:03:46,500
We are going to use one of the model

72
00:03:46,500 --> 00:03:49,900
we built and apply k
fold cross-validation on it.

73
00:03:50,366 --> 00:03:53,666
And so the model we're going to use
is this kernel SVM

74
00:03:53,766 --> 00:03:56,066
we made in part three classification.

75
00:03:56,066 --> 00:03:59,633
And that remember we used to predict
if the customers are going to click

76
00:03:59,633 --> 00:04:03,466
on the ads on the social network to buy
yes or no the SUV.

77
00:04:03,833 --> 00:04:06,833
So the model is already built
and we already have everything.

78
00:04:06,900 --> 00:04:09,900
So what we're going to do
is take the whole model

79
00:04:10,033 --> 00:04:13,866
and we are going to add a new section

80
00:04:13,866 --> 00:04:16,866
code inside of it
that is going to be, of course,

81
00:04:17,100 --> 00:04:20,333
the section code that will implement k
fold cross-validation.

82
00:04:21,000 --> 00:04:26,000
So before we start doing it let's pick
the right folder as working directory.

83
00:04:26,133 --> 00:04:28,233
So we go to machine learning a to Z.

84
00:04:28,233 --> 00:04:30,066
We are now
in the last part of this course.

85
00:04:30,066 --> 00:04:31,800
Congratulations for reaching it.

86
00:04:31,800 --> 00:04:36,566
Part ten model Selection and boosting
and section 48 model selection.

87
00:04:36,866 --> 00:04:37,300
All right.

88
00:04:37,300 --> 00:04:40,100
Make sure that you have
the social network at CSV file.

89
00:04:40,100 --> 00:04:42,466
And if that's the case you're ready to go.

90
00:04:42,466 --> 00:04:46,166
All right so now where do we apply the k
fold cross validation code section.

91
00:04:46,500 --> 00:04:50,433
Well since that consists
of evaluating the model performance.

92
00:04:50,700 --> 00:04:52,566
Well the most relevant location to put

93
00:04:52,566 --> 00:04:56,133
it is right after we build our kernel
SVM model.

94
00:04:56,366 --> 00:04:58,633
That is right after we built the model.

95
00:04:58,633 --> 00:05:01,700
And actually in this code here
we have the predictions

96
00:05:01,700 --> 00:05:04,700
of the test results
and the confusion matrix.

97
00:05:04,900 --> 00:05:07,900
That is actually a first way
of evaluating the model.

98
00:05:08,100 --> 00:05:10,666
But as I said
at the beginning of this tutorial,

99
00:05:10,666 --> 00:05:14,333
this is a correct way of evaluating
the model, but not the best one.

100
00:05:14,466 --> 00:05:15,700
And in today's tutorial,

101
00:05:15,700 --> 00:05:18,900
we are introducing a much better way
to evaluate our model.

102
00:05:19,400 --> 00:05:22,400
And so let's put it
right after this section.

103
00:05:22,566 --> 00:05:25,666
As in a more advanced performance
evaluation method.

104
00:05:26,033 --> 00:05:29,033
And so we're going to call this section
applying

105
00:05:29,433 --> 00:05:33,400
k fold cross validation.

106
00:05:34,466 --> 00:05:35,333
All right.

107
00:05:35,333 --> 00:05:38,733
So now the first thing that we have to do
is to install the carrot package.

108
00:05:38,733 --> 00:05:41,833
Because this contains
a very practical tool

109
00:05:42,000 --> 00:05:45,000
to create the ten folds
of our training set.

110
00:05:45,000 --> 00:05:49,033
So let's start
with this install dot packages

111
00:05:49,200 --> 00:05:52,333
and in parentheses and in quotes carrot.

112
00:05:53,333 --> 00:05:55,700
All right. So mine is already installed.

113
00:05:55,700 --> 00:05:56,933
We can check it out.

114
00:05:56,933 --> 00:05:59,533
Check this same on your list of packages.

115
00:05:59,533 --> 00:06:01,200
Here it is. Carrot.

116
00:06:01,200 --> 00:06:04,200
So I will just put that in comment.

117
00:06:04,233 --> 00:06:05,800
But don't forget to install it.

118
00:06:05,800 --> 00:06:08,800
And then let's not forget
the library command

119
00:06:09,633 --> 00:06:13,466
to import
automatically the carrot package.

120
00:06:14,233 --> 00:06:14,700
All right.

121
00:06:14,700 --> 00:06:17,700
And now let's start
coding k fold cross-validation.

122
00:06:18,033 --> 00:06:22,700
So first we're going to create the ten
folds that will divide our training set.

123
00:06:23,100 --> 00:06:25,033
And to do this it's very simple.

124
00:06:25,033 --> 00:06:27,966
We're going to use the create folds
function by the carrot

125
00:06:27,966 --> 00:06:30,966
package to create these ten folds
very efficiently.

126
00:06:31,333 --> 00:06:32,133
So let's do it.

127
00:06:32,133 --> 00:06:33,666
We're going to call these folds.

128
00:06:33,666 --> 00:06:37,300
Folds will actually be a list
of ten different test folds

129
00:06:37,300 --> 00:06:39,000
composing our training set.

130
00:06:39,000 --> 00:06:43,100
So let's use this create capital F folds
function.

131
00:06:43,100 --> 00:06:44,100
Here it is.

132
00:06:44,100 --> 00:06:48,133
And inside the parenthesis
we just need to specify the training set.

133
00:06:48,500 --> 00:06:50,833
So here I'm adding the training set.

134
00:06:50,833 --> 00:06:55,200
And then we take our dependent variable
column by which we want to make the split.

135
00:06:55,200 --> 00:06:57,433
You know it's exactly
like when we split the data

136
00:06:57,433 --> 00:06:59,500
set between the training sets
and the test set.

137
00:06:59,500 --> 00:07:03,533
We need to specify the dependent variable
to make it split so that the training set

138
00:07:03,533 --> 00:07:07,266
and the test set are well distributed
according to the dependent variable.

139
00:07:07,466 --> 00:07:08,700
Well here that's the same.

140
00:07:08,700 --> 00:07:11,233
We creating ten folds of the training set.

141
00:07:11,233 --> 00:07:14,166
And we are specifying
the dependent variable to make sure

142
00:07:14,166 --> 00:07:17,166
are well distributed
according to the dependent variable.

143
00:07:17,500 --> 00:07:22,066
So that's why here we need to specify
our dependent variable which is purchased.

144
00:07:24,066 --> 00:07:24,400
All right.

145
00:07:24,400 --> 00:07:27,400
So that's the first argument
of the create false function.

146
00:07:27,600 --> 00:07:31,966
And of course as you might have guessed
the second argument is the number of folds

147
00:07:32,200 --> 00:07:35,066
you want to divide your training set into.

148
00:07:35,066 --> 00:07:38,366
And really a good choice
of the number of false is ten

149
00:07:38,600 --> 00:07:42,566
because by creating ten fold,
we will eventually get ten accuracies.

150
00:07:42,866 --> 00:07:45,133
And ten accuracies is a relevant way

151
00:07:45,133 --> 00:07:49,066
to measure the accuracy
through the mean of these ten accuracies.

152
00:07:49,433 --> 00:07:52,800
So we will take ten folds,
and I recommend to do that in practice.

153
00:07:53,000 --> 00:07:56,700
So here we just add k equals ten.

154
00:07:57,400 --> 00:07:58,200
All right.

155
00:07:58,200 --> 00:08:01,133
Now we're going to implement k
fold cross-validation.

156
00:08:01,133 --> 00:08:04,300
Because what we just did
here is just to create the folds.

157
00:08:04,533 --> 00:08:07,366
But now we need to implement
the algorithm itself.

158
00:08:07,366 --> 00:08:11,066
And to do this
well there are several ways of doing it.

159
00:08:11,133 --> 00:08:14,700
But we're going to use
a very practical function in R

160
00:08:14,933 --> 00:08:17,100
which is called the l apply function.

161
00:08:17,100 --> 00:08:22,000
And that consists of applying a function
to the different elements of a list.

162
00:08:22,600 --> 00:08:28,100
So this list is going to be our folds list
that contains the ten test folds.

163
00:08:28,333 --> 00:08:31,333
And the function is the function
that is going to compute

164
00:08:31,500 --> 00:08:34,500
the accuracy
for each of these ten test faults.

165
00:08:34,766 --> 00:08:38,433
So let's start by creating a new variable
that we're going to call CV.

166
00:08:38,966 --> 00:08:42,266
And then let's use here
this L apply function.

167
00:08:42,833 --> 00:08:45,033
All right. And you're going to understand
what's going to happen.

168
00:08:45,033 --> 00:08:48,466
So in this L apply function
we need to input two arguments.

169
00:08:48,500 --> 00:08:52,466
The first argument
is the list of the elements to which

170
00:08:52,466 --> 00:08:55,466
we are going to apply the next function,
which is the next argument.

171
00:08:55,800 --> 00:08:58,833
And so as I just said this list is false.

172
00:08:59,300 --> 00:09:02,033
The list of our ten test false.

173
00:09:02,033 --> 00:09:05,033
And then the next argument
is the function.

174
00:09:05,100 --> 00:09:08,233
So a function in R can be written
this way.

175
00:09:08,700 --> 00:09:10,033
Function.

176
00:09:10,033 --> 00:09:15,666
Then in parentheses we need to input
the argument which we will call x.

177
00:09:15,666 --> 00:09:18,700
This is a local argument so far, but x

178
00:09:18,700 --> 00:09:21,700
is actually going to be each
one of the ten test folds.

179
00:09:21,733 --> 00:09:25,600
So x here and then a pair of brackets.

180
00:09:25,800 --> 00:09:26,700
Here we go.

181
00:09:26,700 --> 00:09:30,500
And inside these brackets
we are going to implement this function

182
00:09:30,500 --> 00:09:34,700
that will compute the accuracy of
the model on each of these ten test folds.

183
00:09:35,000 --> 00:09:38,933
So basically in this function we are going
to implement k fold cross-validation.

184
00:09:39,533 --> 00:09:42,566
So what do we need to implement
k fold cross-validation.

185
00:09:42,600 --> 00:09:45,400
Well first we need the training fold.

186
00:09:45,400 --> 00:09:49,666
The training fold is the whole training
set to which we withdraw the test fold.

187
00:09:50,033 --> 00:09:52,700
So basically training fold here

188
00:09:53,666 --> 00:09:55,166
I'm creating a new local

189
00:09:55,166 --> 00:09:58,800
variable actually that I'm call
that I'm calling training fold.

190
00:09:59,100 --> 00:10:02,100
And so as I just said
this is the whole training set.

191
00:10:02,333 --> 00:10:03,400
Here we go.

192
00:10:03,400 --> 00:10:08,766
But to which we will draw the test fold
that is minus x.

193
00:10:08,966 --> 00:10:13,233
Because, you know, x is actually
each element of this folds this here.

194
00:10:13,500 --> 00:10:16,833
So by putting minus x here
we are taking the whole training set

195
00:10:17,100 --> 00:10:18,500
but without the test fold.

196
00:10:18,500 --> 00:10:20,766
And therefore
that's actually the training fold.

197
00:10:20,766 --> 00:10:23,766
And then come up to take all the columns.

198
00:10:24,300 --> 00:10:24,833
All right.

199
00:10:24,833 --> 00:10:26,700
So we got our training fold.

200
00:10:26,700 --> 00:10:28,866
Now let's get our test fold.

201
00:10:28,866 --> 00:10:31,866
So our test fold try to guess what
it's going to be.

202
00:10:32,200 --> 00:10:35,200
Test fold equals training set.

203
00:10:35,700 --> 00:10:38,033
And inside the square brackets.

204
00:10:38,033 --> 00:10:39,666
Well where do we need to put here.

205
00:10:39,666 --> 00:10:43,400
Well that's actually x
because you know x represents

206
00:10:43,533 --> 00:10:46,566
all the observations
for each one of the ten test folds.

207
00:10:47,000 --> 00:10:50,000
So we got our test fold.

208
00:10:50,100 --> 00:10:51,633
And then what do we need to do now.

209
00:10:51,633 --> 00:10:54,766
Now what we need to do is train

210
00:10:54,966 --> 00:10:57,966
our kernel SVM model on the training fold.

211
00:10:58,166 --> 00:11:01,866
And then we will test its performance
on the test fold.

212
00:11:02,100 --> 00:11:04,133
So basically what do we need to do now.

213
00:11:04,133 --> 00:11:08,900
We need to add our model
which is our kernel SVM classifier.

214
00:11:09,466 --> 00:11:13,866
So what we can do now
is just take this code section here.

215
00:11:13,866 --> 00:11:16,300
Because that's where we build the model.

216
00:11:16,300 --> 00:11:18,666
And we need to include this model
in the function.

217
00:11:18,666 --> 00:11:20,166
That's why we're taking it.

218
00:11:20,166 --> 00:11:24,766
So copy and let's add it here paste.

219
00:11:25,300 --> 00:11:26,033
And here we go.

220
00:11:26,033 --> 00:11:29,366
We have our model but we're not training

221
00:11:29,633 --> 00:11:32,633
this kernel
SVM classifier on the training set.

222
00:11:33,000 --> 00:11:36,000
We're training it on the training fault

223
00:11:36,400 --> 00:11:39,833
because that's the principle
of k fold cross-validation.

224
00:11:40,033 --> 00:11:43,933
We are training a classifier
on each one of the ten training folds.

225
00:11:44,333 --> 00:11:47,333
So that's
why here we're taking the training fold.

226
00:11:47,500 --> 00:11:52,600
And that we create here inside
this function that we're making right now.

227
00:11:53,100 --> 00:11:55,400
And then we keep the same argument.

228
00:11:55,400 --> 00:11:56,166
All right.

229
00:11:56,166 --> 00:11:57,533
Then what do we need to do.

230
00:11:57,533 --> 00:12:00,500
Well that's executive
same as what we're doing

231
00:12:00,500 --> 00:12:03,600
when we make a model
that is predicting the test results.

232
00:12:03,600 --> 00:12:07,300
That's the next step
because that's from this test

233
00:12:07,300 --> 00:12:10,700
that results that we will
then compute the confusion matrix

234
00:12:10,700 --> 00:12:14,233
and therefore the accuracy
which is exactly what we need

235
00:12:14,700 --> 00:12:17,666
that is which is exactly
which will be returned by the function

236
00:12:17,666 --> 00:12:20,800
we are making right now to implement k
fold cross-validation.

237
00:12:21,200 --> 00:12:25,633
So same let's
copy this line to predict the test result.

238
00:12:26,100 --> 00:12:29,000
And let's copy it here.

239
00:12:29,000 --> 00:12:30,466
And is that all?

240
00:12:30,466 --> 00:12:34,700
Of course no because we are not testing
or classifier on the test set.

241
00:12:35,066 --> 00:12:38,266
But we are testing it on the test fold.

242
00:12:38,566 --> 00:12:38,866
Right.

243
00:12:38,866 --> 00:12:42,000
Because you know, we are training a model
on the training fold

244
00:12:42,000 --> 00:12:44,700
and testing its performance
on the test fold.

245
00:12:44,700 --> 00:12:46,200
So now that's good.

246
00:12:46,200 --> 00:12:48,600
And now let's move on to the next step

247
00:12:48,600 --> 00:12:52,133
which is to compute the confusion matrix.

248
00:12:52,300 --> 00:12:55,266
So still let's take this line here

249
00:12:56,233 --> 00:12:59,900
and let's paste it below right here paste.

250
00:13:00,300 --> 00:13:03,300
And now of course we need to change
test set

251
00:13:03,700 --> 00:13:06,600
and replace it by test fold.

252
00:13:06,600 --> 00:13:07,000
All right.

253
00:13:07,000 --> 00:13:10,633
And this will give us the confusion
matrix of this classifier

254
00:13:10,633 --> 00:13:14,066
SVM model of this kernel SVM classifier.

255
00:13:14,433 --> 00:13:19,466
And that is trained on the training folds
and test it on the test fold.

256
00:13:19,833 --> 00:13:21,300
And therefore this line of code

257
00:13:21,300 --> 00:13:24,900
will give you the confusion matrix
for the observations of the test fold.

258
00:13:25,733 --> 00:13:26,433
All right.

259
00:13:26,433 --> 00:13:30,733
And now last step
we need to compute the accuracy because we

260
00:13:30,733 --> 00:13:35,633
are doing all this to get the accuracies
for all the ten test folds here.

261
00:13:36,066 --> 00:13:37,733
So let's compute the accuracy.

262
00:13:37,733 --> 00:13:40,233
The accuracy is

263
00:13:40,233 --> 00:13:42,833
we've calculated this accuracy many times.

264
00:13:42,833 --> 00:13:45,600
We take the number of correct predictions

265
00:13:45,600 --> 00:13:51,600
which is CM one come at one
because this corresponds to the number

266
00:13:51,600 --> 00:13:57,900
of correct predictions of the first class
plus cm two comma two.

267
00:13:58,266 --> 00:14:01,200
Because this corresponds
to the number of correct predictions

268
00:14:01,200 --> 00:14:04,200
of the second class,
and since we have two classes,

269
00:14:04,333 --> 00:14:08,533
this sum corresponds
to the total number of correct predictions

270
00:14:09,033 --> 00:14:12,200
and then we divided by the

271
00:14:12,200 --> 00:14:16,166
total number of observations
in the test set, and therefore

272
00:14:16,533 --> 00:14:20,333
that's the number of correct predictions
which is this sum.

273
00:14:20,333 --> 00:14:20,666
Here,

274
00:14:22,200 --> 00:14:23,033
to which

275
00:14:23,033 --> 00:14:26,400
we also need to add the number
of incorrect predictions.

276
00:14:26,800 --> 00:14:29,800
And therefore, you know, we can copy this

277
00:14:29,866 --> 00:14:33,200
and take the first number
of incorrect predictions

278
00:14:33,200 --> 00:14:36,566
that corresponds to the first class
and this second

279
00:14:36,566 --> 00:14:39,800
number of incorrect predictions
that corresponds to the second class.

280
00:14:40,033 --> 00:14:44,400
And so here we are actually taking
all the elements of this confusion matrix.

281
00:14:44,766 --> 00:14:48,733
That is the number of correct predictions
plus the number of incorrect predictions.

282
00:14:49,266 --> 00:14:54,433
And so now with this line of code
we get the accuracy for one fold.

283
00:14:54,766 --> 00:14:59,200
But since we're using this
L apply function, this will do all this.

284
00:14:59,200 --> 00:15:02,200
Compute the accuracy
for each of the ten test folds.

285
00:15:02,400 --> 00:15:04,566
And therefore we will get ten accuracies.

286
00:15:04,566 --> 00:15:08,066
And we will compute its mean,
which will give us a much more relevant

287
00:15:08,066 --> 00:15:10,266
accuracy than just a single one.

288
00:15:10,266 --> 00:15:13,066
We obtained earlier with our previous
method

289
00:15:13,066 --> 00:15:15,566
of evaluating the model performance.

290
00:15:15,566 --> 00:15:16,100
All right.

291
00:15:16,100 --> 00:15:20,533
So now we have everything,
but we just need to specify that

292
00:15:20,533 --> 00:15:24,466
we want to have this accuracy returned
because this is a function.

293
00:15:24,466 --> 00:15:27,466
So we need to specify what we want
this function to return.

294
00:15:27,600 --> 00:15:32,833
And to do this we just add return
parentheses cis and accuracy.

295
00:15:33,000 --> 00:15:37,733
And now everything is ready K
fold cross-validation is well implemented.

296
00:15:38,400 --> 00:15:38,766
All right.

297
00:15:38,766 --> 00:15:42,566
So now we are ready
to get the ten accuracies that will result

298
00:15:42,566 --> 00:15:45,066
from this ten fold
cross-validation technique.

299
00:15:45,066 --> 00:15:49,700
So we are going to select everything
from here up to the top

300
00:15:49,833 --> 00:15:52,833
because we haven't imported
the dataset yet.

301
00:15:52,966 --> 00:15:56,900
So let's press Command Control
plus enter to execute the whole thing.

302
00:15:57,233 --> 00:15:58,033
Here we go.

303
00:15:58,033 --> 00:16:02,533
Everything was correctly executed in
less than one second, so that's perfect.

304
00:16:02,866 --> 00:16:05,066
Let's have a look at the results.

305
00:16:05,066 --> 00:16:07,733
So first let's put that down.

306
00:16:07,733 --> 00:16:08,700
All right.

307
00:16:08,700 --> 00:16:10,200
So here we get all the results.

308
00:16:10,200 --> 00:16:12,166
First the data set was well imported.

309
00:16:12,166 --> 00:16:14,033
We split it into the training set.

310
00:16:14,033 --> 00:16:17,033
And the test set at this section here.

311
00:16:17,300 --> 00:16:21,233
And then we build our classifier
which is our kernel SVM classifier.

312
00:16:21,400 --> 00:16:24,600
And of course we get our CV list

313
00:16:24,733 --> 00:16:27,766
that we built through K-Fold
cross validation.

314
00:16:28,200 --> 00:16:30,833
That is this CV list, which is the list

315
00:16:30,833 --> 00:16:34,800
of the ten accuracies
that result from k fold cross validation.

316
00:16:35,266 --> 00:16:36,600
And so let's check it out.

317
00:16:36,600 --> 00:16:39,400
Let's have a look
at what these ten accuracies are.

318
00:16:39,400 --> 00:16:42,166
So we're going to look at it
from the console.

319
00:16:42,166 --> 00:16:45,166
So I'm pressing CV here
and pressing enter.

320
00:16:45,333 --> 00:16:46,833
And here we go.

321
00:16:46,833 --> 00:16:48,333
That's the results.

322
00:16:48,333 --> 00:16:50,100
That's the ten accuracies.

323
00:16:50,100 --> 00:16:53,400
So for the one we get 93% accuracy.

324
00:16:53,400 --> 00:16:54,366
That's very good.

325
00:16:54,366 --> 00:16:58,100
Full to 87% for three 100%.

326
00:16:58,100 --> 00:17:00,000
So no incorrect prediction.

327
00:17:00,000 --> 00:17:01,700
Full for 86%.

328
00:17:01,700 --> 00:17:05,900
Full 596% 90% on full 690%

329
00:17:05,900 --> 00:17:09,866
and full 793% on full, 890% full nine

330
00:17:10,133 --> 00:17:13,133
and eventually 83% on full ten.

331
00:17:13,333 --> 00:17:17,000
So that clearly illustrates
what I told you about this variance

332
00:17:17,000 --> 00:17:21,233
problem that can occur when we rerun
the model several times, because indeed

333
00:17:21,433 --> 00:17:24,300
we get different accuracies and sometimes

334
00:17:24,300 --> 00:17:28,200
a large difference in accuracy
from one fold to another.

335
00:17:28,500 --> 00:17:30,000
So from here to here that's fine.

336
00:17:30,000 --> 00:17:31,200
But here, for example,

337
00:17:31,200 --> 00:17:36,233
from fold two to fold three,
we get a 13% difference of accuracies.

338
00:17:36,666 --> 00:17:39,500
So that's why it's not that relevant

339
00:17:39,500 --> 00:17:42,500
to compute the accuracy
on one single split.

340
00:17:42,533 --> 00:17:45,800
And it's much more relevant
to compute the accuracies on ten splits,

341
00:17:45,966 --> 00:17:47,600
because then we can take the mean.

342
00:17:47,600 --> 00:17:49,900
And that's exactly what we are going to do
right now.

343
00:17:49,900 --> 00:17:53,800
We are going to compute the mean of
the ten accuracies that we obtained here.

344
00:17:54,166 --> 00:17:58,200
So to get this mean it's
actually very simple.

345
00:17:58,500 --> 00:18:01,600
We're just going to use the mean function.

346
00:18:02,033 --> 00:18:06,100
So parentheses here
and inside this mean we of course input CV

347
00:18:06,266 --> 00:18:10,033
because CV is a list of our ten accuracies
that we obtain here.

348
00:18:10,400 --> 00:18:13,933
However just to make sure
we get the values of the accuracies

349
00:18:13,933 --> 00:18:18,966
of each of the ten fold,
we need to specify here as dot numeric

350
00:18:19,500 --> 00:18:22,500
and in parenthesis
we include CV to make sure we take

351
00:18:22,500 --> 00:18:25,666
the mean of these values here
that are the accuracies.

352
00:18:26,233 --> 00:18:29,666
And let's put this average
of the accuracies

353
00:18:29,666 --> 00:18:33,166
into one variable
that will appear in the values here.

354
00:18:33,533 --> 00:18:36,533
And let's call
this variable simply accuracy.

355
00:18:36,966 --> 00:18:41,733
Because the mean of these accuracies
is just the ultimate relevance.

356
00:18:41,733 --> 00:18:42,700
Accuracy.

357
00:18:42,700 --> 00:18:47,666
So accuracy equals
mean of the accuracies in this CV list.

358
00:18:48,133 --> 00:18:48,733
All right.

359
00:18:48,733 --> 00:18:51,733
So let's compute it and we'll get

360
00:18:52,033 --> 00:18:55,300
let's see an accuracy of 91%.

361
00:18:55,566 --> 00:18:57,833
And that's the relevant accuracy.

362
00:18:57,833 --> 00:18:59,400
We are looking for.

363
00:18:59,400 --> 00:19:02,400
So overall
we can say with more credibility

364
00:19:02,466 --> 00:19:06,833
that our model our kernel SVM
classifier is pretty performance.

365
00:19:08,066 --> 00:19:09,000
So that's pretty good.

366
00:19:09,000 --> 00:19:10,433
And now congratulations

367
00:19:10,433 --> 00:19:14,100
you have a much more advanced way
of evaluating your model performance

368
00:19:14,233 --> 00:19:17,900
in your data science toolkit,
which you'll see in the next tutorial.

369
00:19:17,900 --> 00:19:21,300
We'll see a very powerful technique
that will help us choose the optimal

370
00:19:21,300 --> 00:19:24,566
hyperparameters
of any machine learning model we built.

371
00:19:25,000 --> 00:19:27,466
So I look forward to doing that
in the next tutorial.

372
00:19:27,466 --> 00:19:30,466
And until then, enjoy my machine learning.