1 00:00:00,266 --> 00:00:02,533 Hello and welcome to this auditorium. 2 00:00:02,533 --> 00:00:03,066 And mostly 3 00:00:03,066 --> 00:00:07,566 welcome to the last part of this course, 14 model selection and Boosting. 4 00:00:08,166 --> 00:00:10,033 So in this part we will do two things. 5 00:00:10,033 --> 00:00:12,900 First evaluating our model performance. 6 00:00:12,900 --> 00:00:15,900 And second, improving our model performance. 7 00:00:15,900 --> 00:00:19,000 And then there will be this bonus section about one of the most powerful 8 00:00:19,000 --> 00:00:22,333 algorithm in machine learning, which has become more and more popular. 9 00:00:22,500 --> 00:00:24,666 And that is called XGBoost. 10 00:00:24,666 --> 00:00:27,866 But first, we want to be able to improve the model performance 11 00:00:27,866 --> 00:00:30,866 of all the machine learning models we've built in this course, 12 00:00:31,066 --> 00:00:31,966 and improving 13 00:00:31,966 --> 00:00:35,400 the model performance can be done with a technique called model selection 14 00:00:35,633 --> 00:00:39,300 that consists of choosing the best parameters of your machine 15 00:00:39,300 --> 00:00:40,233 learning models. 16 00:00:40,233 --> 00:00:43,233 Because you know, remember, every time we build a machine 17 00:00:43,233 --> 00:00:46,166 learning model, well, we had two types of parameters. 18 00:00:46,166 --> 00:00:49,900 The first type where the parameters that the model learned that is, the parameters 19 00:00:49,900 --> 00:00:53,866 that were changed and found optimal values by running the model. 20 00:00:54,100 --> 00:00:58,333 And then the second type of parameters are the parameters that we chose ourselves. 21 00:00:58,600 --> 00:01:02,033 For example the kernel parameter in the kernel SVM model. 22 00:01:02,533 --> 00:01:05,533 And these parameters are called the hyperparameters. 23 00:01:05,600 --> 00:01:07,566 So there is still room to improve the model 24 00:01:07,566 --> 00:01:11,500 because we can still choose some optimal values for these parameters. 25 00:01:11,833 --> 00:01:15,400 But since these parameters are not the parameters learned by the model, 26 00:01:15,700 --> 00:01:18,733 then we need to figure out another way to choose 27 00:01:18,733 --> 00:01:21,900 these optimal values for these parameters for the hyperparameters. 28 00:01:22,333 --> 00:01:25,333 And that's one of the powerful thing we'll do in this part ten. 29 00:01:25,500 --> 00:01:28,733 And that will be through a very efficient technique called grid search. 30 00:01:29,266 --> 00:01:30,966 But before we start grid search, 31 00:01:30,966 --> 00:01:35,066 we need to optimize our way to evaluate our models, because so far, 32 00:01:35,166 --> 00:01:39,133 what we did is split our data set between a training set and a test set. 33 00:01:39,500 --> 00:01:42,066 And you know, we trained our model on the training set 34 00:01:42,066 --> 00:01:44,833 and we tested its performance on the test set. 35 00:01:44,833 --> 00:01:47,933 That's the correct way of evaluating the model performance. 36 00:01:48,133 --> 00:01:51,700 But that's not the best one because we actually have the variance problem. 37 00:01:51,966 --> 00:01:56,066 The variance problem can be explained by the fact that when we get the accuracy 38 00:01:56,066 --> 00:01:59,866 on the test set, well, if we run the model again and test again 39 00:01:59,866 --> 00:02:04,200 its performance on another test set, well, we can get a very different accuracy. 40 00:02:04,633 --> 00:02:08,033 So judging our model performance only on one 41 00:02:08,033 --> 00:02:11,533 accuracy on one test set is actually not super relevant. 42 00:02:11,533 --> 00:02:15,133 That's not the most relevant way to evaluate the model performance. 43 00:02:15,600 --> 00:02:18,833 And so there is this technique called k fold cross-validation 44 00:02:19,133 --> 00:02:23,366 that improves this a lot because that will fix this variance problem. 45 00:02:23,833 --> 00:02:25,200 And how will it fix it. 46 00:02:25,200 --> 00:02:29,666 It will fix it by splitting the training set into ten folds when k equals ten. 47 00:02:29,666 --> 00:02:31,800 And most of the time k equals ten. 48 00:02:31,800 --> 00:02:36,433 And we train our model on nine folds and we test it on the last remaining fold. 49 00:02:36,733 --> 00:02:40,300 And since with ten folds we can make ten different combinations 50 00:02:40,300 --> 00:02:43,333 of nine folds to train the model and one fold to test it. 51 00:02:43,633 --> 00:02:46,566 That means that we can train the model and Tesla model 52 00:02:46,566 --> 00:02:49,566 on ten combinations of training and test sets, 53 00:02:49,633 --> 00:02:52,833 and that will already give us a much better idea of the model performance, 54 00:02:52,833 --> 00:02:56,100 because what we can do afterwards is take an average 55 00:02:56,366 --> 00:02:59,100 of the different accuracies of the ten evaluations, 56 00:02:59,100 --> 00:03:02,433 and also compute the standard deviation to have a look at the variance. 57 00:03:02,733 --> 00:03:06,000 So eventually our analysis will be much more relevant. 58 00:03:06,300 --> 00:03:09,400 And besides we'll know in which of these four categories will be. 59 00:03:09,700 --> 00:03:12,700 Because if we get a good accuracy and a small variance 60 00:03:12,733 --> 00:03:16,100 will be on the lower left one, if we get a large accuracy 61 00:03:16,133 --> 00:03:19,133 and a high variance, we will be on the lower right one. 62 00:03:19,166 --> 00:03:21,600 If we get a small accuracy and a low variance, 63 00:03:21,600 --> 00:03:23,333 we will be on the upper left one. 64 00:03:23,333 --> 00:03:25,100 And eventually, if we get a low accuracy 65 00:03:25,100 --> 00:03:27,966 and a high variance, we will be on the upper right one. 66 00:03:27,966 --> 00:03:31,133 So this k fold cross validation is very useful. 67 00:03:31,166 --> 00:03:34,566 And besides our performance analysis is much more relevant. 68 00:03:35,066 --> 00:03:38,400 So let's start with this k fold cross-validation. 69 00:03:38,400 --> 00:03:41,100 Our first technique of model selection. 70 00:03:41,100 --> 00:03:44,933 So since we already built a lot of models we're not going to build another one. 71 00:03:45,133 --> 00:03:46,500 We are going to use one of the model 72 00:03:46,500 --> 00:03:49,900 we built and apply k fold cross-validation on it. 73 00:03:50,366 --> 00:03:53,666 And so the model we're going to use is this kernel SVM 74 00:03:53,766 --> 00:03:56,066 we made in part three classification. 75 00:03:56,066 --> 00:03:59,633 And that remember we used to predict if the customers are going to click 76 00:03:59,633 --> 00:04:03,466 on the ads on the social network to buy yes or no the SUV. 77 00:04:03,833 --> 00:04:06,833 So the model is already built and we already have everything. 78 00:04:06,900 --> 00:04:09,900 So what we're going to do is take the whole model 79 00:04:10,033 --> 00:04:13,866 and we are going to add a new section 80 00:04:13,866 --> 00:04:16,866 code inside of it that is going to be, of course, 81 00:04:17,100 --> 00:04:20,333 the section code that will implement k fold cross-validation. 82 00:04:21,000 --> 00:04:26,000 So before we start doing it let's pick the right folder as working directory. 83 00:04:26,133 --> 00:04:28,233 So we go to machine learning a to Z. 84 00:04:28,233 --> 00:04:30,066 We are now in the last part of this course. 85 00:04:30,066 --> 00:04:31,800 Congratulations for reaching it. 86 00:04:31,800 --> 00:04:36,566 Part ten model Selection and boosting and section 48 model selection. 87 00:04:36,866 --> 00:04:37,300 All right. 88 00:04:37,300 --> 00:04:40,100 Make sure that you have the social network at CSV file. 89 00:04:40,100 --> 00:04:42,466 And if that's the case you're ready to go. 90 00:04:42,466 --> 00:04:46,166 All right so now where do we apply the k fold cross validation code section. 91 00:04:46,500 --> 00:04:50,433 Well since that consists of evaluating the model performance. 92 00:04:50,700 --> 00:04:52,566 Well the most relevant location to put 93 00:04:52,566 --> 00:04:56,133 it is right after we build our kernel SVM model. 94 00:04:56,366 --> 00:04:58,633 That is right after we built the model. 95 00:04:58,633 --> 00:05:01,700 And actually in this code here we have the predictions 96 00:05:01,700 --> 00:05:04,700 of the test results and the confusion matrix. 97 00:05:04,900 --> 00:05:07,900 That is actually a first way of evaluating the model. 98 00:05:08,100 --> 00:05:10,666 But as I said at the beginning of this tutorial, 99 00:05:10,666 --> 00:05:14,333 this is a correct way of evaluating the model, but not the best one. 100 00:05:14,466 --> 00:05:15,700 And in today's tutorial, 101 00:05:15,700 --> 00:05:18,900 we are introducing a much better way to evaluate our model. 102 00:05:19,400 --> 00:05:22,400 And so let's put it right after this section. 103 00:05:22,566 --> 00:05:25,666 As in a more advanced performance evaluation method. 104 00:05:26,033 --> 00:05:29,033 And so we're going to call this section applying 105 00:05:29,433 --> 00:05:33,400 k fold cross validation. 106 00:05:34,466 --> 00:05:35,333 All right. 107 00:05:35,333 --> 00:05:38,733 So now the first thing that we have to do is to install the carrot package. 108 00:05:38,733 --> 00:05:41,833 Because this contains a very practical tool 109 00:05:42,000 --> 00:05:45,000 to create the ten folds of our training set. 110 00:05:45,000 --> 00:05:49,033 So let's start with this install dot packages 111 00:05:49,200 --> 00:05:52,333 and in parentheses and in quotes carrot. 112 00:05:53,333 --> 00:05:55,700 All right. So mine is already installed. 113 00:05:55,700 --> 00:05:56,933 We can check it out. 114 00:05:56,933 --> 00:05:59,533 Check this same on your list of packages. 115 00:05:59,533 --> 00:06:01,200 Here it is. Carrot. 116 00:06:01,200 --> 00:06:04,200 So I will just put that in comment. 117 00:06:04,233 --> 00:06:05,800 But don't forget to install it. 118 00:06:05,800 --> 00:06:08,800 And then let's not forget the library command 119 00:06:09,633 --> 00:06:13,466 to import automatically the carrot package. 120 00:06:14,233 --> 00:06:14,700 All right. 121 00:06:14,700 --> 00:06:17,700 And now let's start coding k fold cross-validation. 122 00:06:18,033 --> 00:06:22,700 So first we're going to create the ten folds that will divide our training set. 123 00:06:23,100 --> 00:06:25,033 And to do this it's very simple. 124 00:06:25,033 --> 00:06:27,966 We're going to use the create folds function by the carrot 125 00:06:27,966 --> 00:06:30,966 package to create these ten folds very efficiently. 126 00:06:31,333 --> 00:06:32,133 So let's do it. 127 00:06:32,133 --> 00:06:33,666 We're going to call these folds. 128 00:06:33,666 --> 00:06:37,300 Folds will actually be a list of ten different test folds 129 00:06:37,300 --> 00:06:39,000 composing our training set. 130 00:06:39,000 --> 00:06:43,100 So let's use this create capital F folds function. 131 00:06:43,100 --> 00:06:44,100 Here it is. 132 00:06:44,100 --> 00:06:48,133 And inside the parenthesis we just need to specify the training set. 133 00:06:48,500 --> 00:06:50,833 So here I'm adding the training set. 134 00:06:50,833 --> 00:06:55,200 And then we take our dependent variable column by which we want to make the split. 135 00:06:55,200 --> 00:06:57,433 You know it's exactly like when we split the data 136 00:06:57,433 --> 00:06:59,500 set between the training sets and the test set. 137 00:06:59,500 --> 00:07:03,533 We need to specify the dependent variable to make it split so that the training set 138 00:07:03,533 --> 00:07:07,266 and the test set are well distributed according to the dependent variable. 139 00:07:07,466 --> 00:07:08,700 Well here that's the same. 140 00:07:08,700 --> 00:07:11,233 We creating ten folds of the training set. 141 00:07:11,233 --> 00:07:14,166 And we are specifying the dependent variable to make sure 142 00:07:14,166 --> 00:07:17,166 are well distributed according to the dependent variable. 143 00:07:17,500 --> 00:07:22,066 So that's why here we need to specify our dependent variable which is purchased. 144 00:07:24,066 --> 00:07:24,400 All right. 145 00:07:24,400 --> 00:07:27,400 So that's the first argument of the create false function. 146 00:07:27,600 --> 00:07:31,966 And of course as you might have guessed the second argument is the number of folds 147 00:07:32,200 --> 00:07:35,066 you want to divide your training set into. 148 00:07:35,066 --> 00:07:38,366 And really a good choice of the number of false is ten 149 00:07:38,600 --> 00:07:42,566 because by creating ten fold, we will eventually get ten accuracies. 150 00:07:42,866 --> 00:07:45,133 And ten accuracies is a relevant way 151 00:07:45,133 --> 00:07:49,066 to measure the accuracy through the mean of these ten accuracies. 152 00:07:49,433 --> 00:07:52,800 So we will take ten folds, and I recommend to do that in practice. 153 00:07:53,000 --> 00:07:56,700 So here we just add k equals ten. 154 00:07:57,400 --> 00:07:58,200 All right. 155 00:07:58,200 --> 00:08:01,133 Now we're going to implement k fold cross-validation. 156 00:08:01,133 --> 00:08:04,300 Because what we just did here is just to create the folds. 157 00:08:04,533 --> 00:08:07,366 But now we need to implement the algorithm itself. 158 00:08:07,366 --> 00:08:11,066 And to do this well there are several ways of doing it. 159 00:08:11,133 --> 00:08:14,700 But we're going to use a very practical function in R 160 00:08:14,933 --> 00:08:17,100 which is called the l apply function. 161 00:08:17,100 --> 00:08:22,000 And that consists of applying a function to the different elements of a list. 162 00:08:22,600 --> 00:08:28,100 So this list is going to be our folds list that contains the ten test folds. 163 00:08:28,333 --> 00:08:31,333 And the function is the function that is going to compute 164 00:08:31,500 --> 00:08:34,500 the accuracy for each of these ten test faults. 165 00:08:34,766 --> 00:08:38,433 So let's start by creating a new variable that we're going to call CV. 166 00:08:38,966 --> 00:08:42,266 And then let's use here this L apply function. 167 00:08:42,833 --> 00:08:45,033 All right. And you're going to understand what's going to happen. 168 00:08:45,033 --> 00:08:48,466 So in this L apply function we need to input two arguments. 169 00:08:48,500 --> 00:08:52,466 The first argument is the list of the elements to which 170 00:08:52,466 --> 00:08:55,466 we are going to apply the next function, which is the next argument. 171 00:08:55,800 --> 00:08:58,833 And so as I just said this list is false. 172 00:08:59,300 --> 00:09:02,033 The list of our ten test false. 173 00:09:02,033 --> 00:09:05,033 And then the next argument is the function. 174 00:09:05,100 --> 00:09:08,233 So a function in R can be written this way. 175 00:09:08,700 --> 00:09:10,033 Function. 176 00:09:10,033 --> 00:09:15,666 Then in parentheses we need to input the argument which we will call x. 177 00:09:15,666 --> 00:09:18,700 This is a local argument so far, but x 178 00:09:18,700 --> 00:09:21,700 is actually going to be each one of the ten test folds. 179 00:09:21,733 --> 00:09:25,600 So x here and then a pair of brackets. 180 00:09:25,800 --> 00:09:26,700 Here we go. 181 00:09:26,700 --> 00:09:30,500 And inside these brackets we are going to implement this function 182 00:09:30,500 --> 00:09:34,700 that will compute the accuracy of the model on each of these ten test folds. 183 00:09:35,000 --> 00:09:38,933 So basically in this function we are going to implement k fold cross-validation. 184 00:09:39,533 --> 00:09:42,566 So what do we need to implement k fold cross-validation. 185 00:09:42,600 --> 00:09:45,400 Well first we need the training fold. 186 00:09:45,400 --> 00:09:49,666 The training fold is the whole training set to which we withdraw the test fold. 187 00:09:50,033 --> 00:09:52,700 So basically training fold here 188 00:09:53,666 --> 00:09:55,166 I'm creating a new local 189 00:09:55,166 --> 00:09:58,800 variable actually that I'm call that I'm calling training fold. 190 00:09:59,100 --> 00:10:02,100 And so as I just said this is the whole training set. 191 00:10:02,333 --> 00:10:03,400 Here we go. 192 00:10:03,400 --> 00:10:08,766 But to which we will draw the test fold that is minus x. 193 00:10:08,966 --> 00:10:13,233 Because, you know, x is actually each element of this folds this here. 194 00:10:13,500 --> 00:10:16,833 So by putting minus x here we are taking the whole training set 195 00:10:17,100 --> 00:10:18,500 but without the test fold. 196 00:10:18,500 --> 00:10:20,766 And therefore that's actually the training fold. 197 00:10:20,766 --> 00:10:23,766 And then come up to take all the columns. 198 00:10:24,300 --> 00:10:24,833 All right. 199 00:10:24,833 --> 00:10:26,700 So we got our training fold. 200 00:10:26,700 --> 00:10:28,866 Now let's get our test fold. 201 00:10:28,866 --> 00:10:31,866 So our test fold try to guess what it's going to be. 202 00:10:32,200 --> 00:10:35,200 Test fold equals training set. 203 00:10:35,700 --> 00:10:38,033 And inside the square brackets. 204 00:10:38,033 --> 00:10:39,666 Well where do we need to put here. 205 00:10:39,666 --> 00:10:43,400 Well that's actually x because you know x represents 206 00:10:43,533 --> 00:10:46,566 all the observations for each one of the ten test folds. 207 00:10:47,000 --> 00:10:50,000 So we got our test fold. 208 00:10:50,100 --> 00:10:51,633 And then what do we need to do now. 209 00:10:51,633 --> 00:10:54,766 Now what we need to do is train 210 00:10:54,966 --> 00:10:57,966 our kernel SVM model on the training fold. 211 00:10:58,166 --> 00:11:01,866 And then we will test its performance on the test fold. 212 00:11:02,100 --> 00:11:04,133 So basically what do we need to do now. 213 00:11:04,133 --> 00:11:08,900 We need to add our model which is our kernel SVM classifier. 214 00:11:09,466 --> 00:11:13,866 So what we can do now is just take this code section here. 215 00:11:13,866 --> 00:11:16,300 Because that's where we build the model. 216 00:11:16,300 --> 00:11:18,666 And we need to include this model in the function. 217 00:11:18,666 --> 00:11:20,166 That's why we're taking it. 218 00:11:20,166 --> 00:11:24,766 So copy and let's add it here paste. 219 00:11:25,300 --> 00:11:26,033 And here we go. 220 00:11:26,033 --> 00:11:29,366 We have our model but we're not training 221 00:11:29,633 --> 00:11:32,633 this kernel SVM classifier on the training set. 222 00:11:33,000 --> 00:11:36,000 We're training it on the training fault 223 00:11:36,400 --> 00:11:39,833 because that's the principle of k fold cross-validation. 224 00:11:40,033 --> 00:11:43,933 We are training a classifier on each one of the ten training folds. 225 00:11:44,333 --> 00:11:47,333 So that's why here we're taking the training fold. 226 00:11:47,500 --> 00:11:52,600 And that we create here inside this function that we're making right now. 227 00:11:53,100 --> 00:11:55,400 And then we keep the same argument. 228 00:11:55,400 --> 00:11:56,166 All right. 229 00:11:56,166 --> 00:11:57,533 Then what do we need to do. 230 00:11:57,533 --> 00:12:00,500 Well that's executive same as what we're doing 231 00:12:00,500 --> 00:12:03,600 when we make a model that is predicting the test results. 232 00:12:03,600 --> 00:12:07,300 That's the next step because that's from this test 233 00:12:07,300 --> 00:12:10,700 that results that we will then compute the confusion matrix 234 00:12:10,700 --> 00:12:14,233 and therefore the accuracy which is exactly what we need 235 00:12:14,700 --> 00:12:17,666 that is which is exactly which will be returned by the function 236 00:12:17,666 --> 00:12:20,800 we are making right now to implement k fold cross-validation. 237 00:12:21,200 --> 00:12:25,633 So same let's copy this line to predict the test result. 238 00:12:26,100 --> 00:12:29,000 And let's copy it here. 239 00:12:29,000 --> 00:12:30,466 And is that all? 240 00:12:30,466 --> 00:12:34,700 Of course no because we are not testing or classifier on the test set. 241 00:12:35,066 --> 00:12:38,266 But we are testing it on the test fold. 242 00:12:38,566 --> 00:12:38,866 Right. 243 00:12:38,866 --> 00:12:42,000 Because you know, we are training a model on the training fold 244 00:12:42,000 --> 00:12:44,700 and testing its performance on the test fold. 245 00:12:44,700 --> 00:12:46,200 So now that's good. 246 00:12:46,200 --> 00:12:48,600 And now let's move on to the next step 247 00:12:48,600 --> 00:12:52,133 which is to compute the confusion matrix. 248 00:12:52,300 --> 00:12:55,266 So still let's take this line here 249 00:12:56,233 --> 00:12:59,900 and let's paste it below right here paste. 250 00:13:00,300 --> 00:13:03,300 And now of course we need to change test set 251 00:13:03,700 --> 00:13:06,600 and replace it by test fold. 252 00:13:06,600 --> 00:13:07,000 All right. 253 00:13:07,000 --> 00:13:10,633 And this will give us the confusion matrix of this classifier 254 00:13:10,633 --> 00:13:14,066 SVM model of this kernel SVM classifier. 255 00:13:14,433 --> 00:13:19,466 And that is trained on the training folds and test it on the test fold. 256 00:13:19,833 --> 00:13:21,300 And therefore this line of code 257 00:13:21,300 --> 00:13:24,900 will give you the confusion matrix for the observations of the test fold. 258 00:13:25,733 --> 00:13:26,433 All right. 259 00:13:26,433 --> 00:13:30,733 And now last step we need to compute the accuracy because we 260 00:13:30,733 --> 00:13:35,633 are doing all this to get the accuracies for all the ten test folds here. 261 00:13:36,066 --> 00:13:37,733 So let's compute the accuracy. 262 00:13:37,733 --> 00:13:40,233 The accuracy is 263 00:13:40,233 --> 00:13:42,833 we've calculated this accuracy many times. 264 00:13:42,833 --> 00:13:45,600 We take the number of correct predictions 265 00:13:45,600 --> 00:13:51,600 which is CM one come at one because this corresponds to the number 266 00:13:51,600 --> 00:13:57,900 of correct predictions of the first class plus cm two comma two. 267 00:13:58,266 --> 00:14:01,200 Because this corresponds to the number of correct predictions 268 00:14:01,200 --> 00:14:04,200 of the second class, and since we have two classes, 269 00:14:04,333 --> 00:14:08,533 this sum corresponds to the total number of correct predictions 270 00:14:09,033 --> 00:14:12,200 and then we divided by the 271 00:14:12,200 --> 00:14:16,166 total number of observations in the test set, and therefore 272 00:14:16,533 --> 00:14:20,333 that's the number of correct predictions which is this sum. 273 00:14:20,333 --> 00:14:20,666 Here, 274 00:14:22,200 --> 00:14:23,033 to which 275 00:14:23,033 --> 00:14:26,400 we also need to add the number of incorrect predictions. 276 00:14:26,800 --> 00:14:29,800 And therefore, you know, we can copy this 277 00:14:29,866 --> 00:14:33,200 and take the first number of incorrect predictions 278 00:14:33,200 --> 00:14:36,566 that corresponds to the first class and this second 279 00:14:36,566 --> 00:14:39,800 number of incorrect predictions that corresponds to the second class. 280 00:14:40,033 --> 00:14:44,400 And so here we are actually taking all the elements of this confusion matrix. 281 00:14:44,766 --> 00:14:48,733 That is the number of correct predictions plus the number of incorrect predictions. 282 00:14:49,266 --> 00:14:54,433 And so now with this line of code we get the accuracy for one fold. 283 00:14:54,766 --> 00:14:59,200 But since we're using this L apply function, this will do all this. 284 00:14:59,200 --> 00:15:02,200 Compute the accuracy for each of the ten test folds. 285 00:15:02,400 --> 00:15:04,566 And therefore we will get ten accuracies. 286 00:15:04,566 --> 00:15:08,066 And we will compute its mean, which will give us a much more relevant 287 00:15:08,066 --> 00:15:10,266 accuracy than just a single one. 288 00:15:10,266 --> 00:15:13,066 We obtained earlier with our previous method 289 00:15:13,066 --> 00:15:15,566 of evaluating the model performance. 290 00:15:15,566 --> 00:15:16,100 All right. 291 00:15:16,100 --> 00:15:20,533 So now we have everything, but we just need to specify that 292 00:15:20,533 --> 00:15:24,466 we want to have this accuracy returned because this is a function. 293 00:15:24,466 --> 00:15:27,466 So we need to specify what we want this function to return. 294 00:15:27,600 --> 00:15:32,833 And to do this we just add return parentheses cis and accuracy. 295 00:15:33,000 --> 00:15:37,733 And now everything is ready K fold cross-validation is well implemented. 296 00:15:38,400 --> 00:15:38,766 All right. 297 00:15:38,766 --> 00:15:42,566 So now we are ready to get the ten accuracies that will result 298 00:15:42,566 --> 00:15:45,066 from this ten fold cross-validation technique. 299 00:15:45,066 --> 00:15:49,700 So we are going to select everything from here up to the top 300 00:15:49,833 --> 00:15:52,833 because we haven't imported the dataset yet. 301 00:15:52,966 --> 00:15:56,900 So let's press Command Control plus enter to execute the whole thing. 302 00:15:57,233 --> 00:15:58,033 Here we go. 303 00:15:58,033 --> 00:16:02,533 Everything was correctly executed in less than one second, so that's perfect. 304 00:16:02,866 --> 00:16:05,066 Let's have a look at the results. 305 00:16:05,066 --> 00:16:07,733 So first let's put that down. 306 00:16:07,733 --> 00:16:08,700 All right. 307 00:16:08,700 --> 00:16:10,200 So here we get all the results. 308 00:16:10,200 --> 00:16:12,166 First the data set was well imported. 309 00:16:12,166 --> 00:16:14,033 We split it into the training set. 310 00:16:14,033 --> 00:16:17,033 And the test set at this section here. 311 00:16:17,300 --> 00:16:21,233 And then we build our classifier which is our kernel SVM classifier. 312 00:16:21,400 --> 00:16:24,600 And of course we get our CV list 313 00:16:24,733 --> 00:16:27,766 that we built through K-Fold cross validation. 314 00:16:28,200 --> 00:16:30,833 That is this CV list, which is the list 315 00:16:30,833 --> 00:16:34,800 of the ten accuracies that result from k fold cross validation. 316 00:16:35,266 --> 00:16:36,600 And so let's check it out. 317 00:16:36,600 --> 00:16:39,400 Let's have a look at what these ten accuracies are. 318 00:16:39,400 --> 00:16:42,166 So we're going to look at it from the console. 319 00:16:42,166 --> 00:16:45,166 So I'm pressing CV here and pressing enter. 320 00:16:45,333 --> 00:16:46,833 And here we go. 321 00:16:46,833 --> 00:16:48,333 That's the results. 322 00:16:48,333 --> 00:16:50,100 That's the ten accuracies. 323 00:16:50,100 --> 00:16:53,400 So for the one we get 93% accuracy. 324 00:16:53,400 --> 00:16:54,366 That's very good. 325 00:16:54,366 --> 00:16:58,100 Full to 87% for three 100%. 326 00:16:58,100 --> 00:17:00,000 So no incorrect prediction. 327 00:17:00,000 --> 00:17:01,700 Full for 86%. 328 00:17:01,700 --> 00:17:05,900 Full 596% 90% on full 690% 329 00:17:05,900 --> 00:17:09,866 and full 793% on full, 890% full nine 330 00:17:10,133 --> 00:17:13,133 and eventually 83% on full ten. 331 00:17:13,333 --> 00:17:17,000 So that clearly illustrates what I told you about this variance 332 00:17:17,000 --> 00:17:21,233 problem that can occur when we rerun the model several times, because indeed 333 00:17:21,433 --> 00:17:24,300 we get different accuracies and sometimes 334 00:17:24,300 --> 00:17:28,200 a large difference in accuracy from one fold to another. 335 00:17:28,500 --> 00:17:30,000 So from here to here that's fine. 336 00:17:30,000 --> 00:17:31,200 But here, for example, 337 00:17:31,200 --> 00:17:36,233 from fold two to fold three, we get a 13% difference of accuracies. 338 00:17:36,666 --> 00:17:39,500 So that's why it's not that relevant 339 00:17:39,500 --> 00:17:42,500 to compute the accuracy on one single split. 340 00:17:42,533 --> 00:17:45,800 And it's much more relevant to compute the accuracies on ten splits, 341 00:17:45,966 --> 00:17:47,600 because then we can take the mean. 342 00:17:47,600 --> 00:17:49,900 And that's exactly what we are going to do right now. 343 00:17:49,900 --> 00:17:53,800 We are going to compute the mean of the ten accuracies that we obtained here. 344 00:17:54,166 --> 00:17:58,200 So to get this mean it's actually very simple. 345 00:17:58,500 --> 00:18:01,600 We're just going to use the mean function. 346 00:18:02,033 --> 00:18:06,100 So parentheses here and inside this mean we of course input CV 347 00:18:06,266 --> 00:18:10,033 because CV is a list of our ten accuracies that we obtain here. 348 00:18:10,400 --> 00:18:13,933 However just to make sure we get the values of the accuracies 349 00:18:13,933 --> 00:18:18,966 of each of the ten fold, we need to specify here as dot numeric 350 00:18:19,500 --> 00:18:22,500 and in parenthesis we include CV to make sure we take 351 00:18:22,500 --> 00:18:25,666 the mean of these values here that are the accuracies. 352 00:18:26,233 --> 00:18:29,666 And let's put this average of the accuracies 353 00:18:29,666 --> 00:18:33,166 into one variable that will appear in the values here. 354 00:18:33,533 --> 00:18:36,533 And let's call this variable simply accuracy. 355 00:18:36,966 --> 00:18:41,733 Because the mean of these accuracies is just the ultimate relevance. 356 00:18:41,733 --> 00:18:42,700 Accuracy. 357 00:18:42,700 --> 00:18:47,666 So accuracy equals mean of the accuracies in this CV list. 358 00:18:48,133 --> 00:18:48,733 All right. 359 00:18:48,733 --> 00:18:51,733 So let's compute it and we'll get 360 00:18:52,033 --> 00:18:55,300 let's see an accuracy of 91%. 361 00:18:55,566 --> 00:18:57,833 And that's the relevant accuracy. 362 00:18:57,833 --> 00:18:59,400 We are looking for. 363 00:18:59,400 --> 00:19:02,400 So overall we can say with more credibility 364 00:19:02,466 --> 00:19:06,833 that our model our kernel SVM classifier is pretty performance. 365 00:19:08,066 --> 00:19:09,000 So that's pretty good. 366 00:19:09,000 --> 00:19:10,433 And now congratulations 367 00:19:10,433 --> 00:19:14,100 you have a much more advanced way of evaluating your model performance 368 00:19:14,233 --> 00:19:17,900 in your data science toolkit, which you'll see in the next tutorial. 369 00:19:17,900 --> 00:19:21,300 We'll see a very powerful technique that will help us choose the optimal 370 00:19:21,300 --> 00:19:24,566 hyperparameters of any machine learning model we built. 371 00:19:25,000 --> 00:19:27,466 So I look forward to doing that in the next tutorial. 372 00:19:27,466 --> 00:19:30,466 And until then, enjoy my machine learning.