1 00:00:00,366 --> 00:00:02,533 Hello and welcome to this art tutorial. 2 00:00:02,533 --> 00:00:05,266 So in the previous tutorial we learned a very efficient way 3 00:00:05,266 --> 00:00:07,400 to evaluate our model performance. 4 00:00:07,400 --> 00:00:10,400 So that was about evaluating our model performance. 5 00:00:10,600 --> 00:00:13,300 And in today's tutorial we are going to learn a technique 6 00:00:13,300 --> 00:00:16,566 that is going to be about improving model's performance. 7 00:00:16,966 --> 00:00:21,300 So we already built very powerful models but we can still improve them. 8 00:00:21,300 --> 00:00:22,666 And how can we do that? 9 00:00:22,666 --> 00:00:26,133 Well it's by finding the optimal values of the hyperparameters. 10 00:00:26,333 --> 00:00:30,300 Because indeed any machine learning model is composed of two types of parameters. 11 00:00:30,566 --> 00:00:33,566 The first type of parameters are the parameters that are learned 12 00:00:33,633 --> 00:00:37,233 through the machine learning algorithm, and the second type of parameters 13 00:00:37,233 --> 00:00:39,000 are the parameters that we choose. 14 00:00:39,000 --> 00:00:43,066 Like for example, the kernel in the SVM model, or the penalty parameter, 15 00:00:43,200 --> 00:00:47,233 or even some regularization parameter in ridge regression or lasso, 16 00:00:47,500 --> 00:00:49,533 and basically in any machine learning model, 17 00:00:49,533 --> 00:00:52,266 we have a lot of parameters that are not learned. 18 00:00:52,266 --> 00:00:56,433 And so far, what we've done in this course was just to choose one single value 19 00:00:56,433 --> 00:00:57,366 for these parameters. 20 00:00:57,366 --> 00:01:00,366 We never actually experienced with these hyperparameters. 21 00:01:00,366 --> 00:01:02,700 We never tried several values of them. 22 00:01:02,700 --> 00:01:06,333 And so that gives us still a lot of room for improvement, because maybe there were 23 00:01:06,333 --> 00:01:09,333 better choices for the values of these hyperparameters, 24 00:01:09,366 --> 00:01:12,366 you know, a better choice than the value we chose. 25 00:01:12,633 --> 00:01:16,466 So this technique called grid Search, will answer to one of your questions 26 00:01:16,466 --> 00:01:19,466 that we're asked many times in this course, which is 27 00:01:19,566 --> 00:01:23,266 how do I know which parameter to select when I make a machine learning model? 28 00:01:23,266 --> 00:01:26,166 What is the optimal value of these hyperparameters? 29 00:01:26,166 --> 00:01:28,200 And grid search will give an answer to that 30 00:01:28,200 --> 00:01:31,600 because it will find the optimal values for these parameters. 31 00:01:31,800 --> 00:01:33,300 So let's see how it will do that. 32 00:01:33,300 --> 00:01:35,566 And let's implement grid search. 33 00:01:35,566 --> 00:01:39,200 So we're going to work on the same problem as in the previous tutorial. 34 00:01:39,300 --> 00:01:41,200 That is you know, this classification problem 35 00:01:41,200 --> 00:01:44,800 where we need to classify the users in the social network and predict 36 00:01:44,966 --> 00:01:47,966 if they're going to click yes or no on the ad to buy the SUV. 37 00:01:48,066 --> 00:01:51,266 And therefore we are working with this data set, social network ads. 38 00:01:51,566 --> 00:01:54,966 And besides, that's a good example because the kernel SVM model 39 00:01:55,200 --> 00:01:56,666 has many parameters. 40 00:01:56,666 --> 00:02:00,000 You know, it has this penalty parameter, this gamma parameter. 41 00:02:00,300 --> 00:02:03,933 And some of you asked me which values we should choose for these parameters. 42 00:02:04,233 --> 00:02:06,966 And so grid search will tell us exactly that. 43 00:02:06,966 --> 00:02:08,533 So let's do it. 44 00:02:08,533 --> 00:02:11,733 We are working in the same folder as in the previous tutorial. 45 00:02:11,733 --> 00:02:13,566 That is the model selection folder. 46 00:02:13,566 --> 00:02:16,366 And make sure that you have the social network ad csv file. 47 00:02:16,366 --> 00:02:18,266 And if that's the case you ready to go. 48 00:02:18,266 --> 00:02:21,600 So we are going to apply grid search anywhere 49 00:02:21,600 --> 00:02:25,900 after the data preprocessing phase, because we are not actually going to use 50 00:02:26,033 --> 00:02:27,700 this kernel SVM model 51 00:02:27,700 --> 00:02:31,333 and then doing some parameter tuning on it to find the optimal values. 52 00:02:31,633 --> 00:02:33,900 We are actually going to use a new package 53 00:02:33,900 --> 00:02:37,666 that you will see is a very practical package to build 54 00:02:37,666 --> 00:02:40,666 the same kernel SVM model as this one here, 55 00:02:40,666 --> 00:02:43,800 but by applying grid search on it at the same time. 56 00:02:44,100 --> 00:02:46,500 And this package, you already know it. 57 00:02:46,500 --> 00:02:48,266 It's the carrot package. 58 00:02:48,266 --> 00:02:51,066 And really, when we talk about machine learning in R, 59 00:02:51,066 --> 00:02:54,233 the carrot package is one of the most practical packages 60 00:02:54,766 --> 00:02:58,800 because basically with this package you can build any machine learning model. 61 00:02:58,866 --> 00:03:01,400 And not only you can build any machine learning model, 62 00:03:01,400 --> 00:03:04,500 but when you build a machine learning model with the carrot package, 63 00:03:04,700 --> 00:03:08,466 well, it will give you the model you want with all the optimal parameters. 64 00:03:08,733 --> 00:03:10,433 So that's pretty powerful. 65 00:03:10,433 --> 00:03:13,633 And therefore that's one of the most powerful packages in R. 66 00:03:14,100 --> 00:03:16,366 So as I told you, we can build 67 00:03:16,366 --> 00:03:20,433 this new tuned model anywhere after the data preprocessing phase. 68 00:03:20,433 --> 00:03:24,266 So let's actually make it right after the k fold cross validation 69 00:03:24,266 --> 00:03:28,433 section to, you know, keep the other interesting code sections here 70 00:03:28,600 --> 00:03:32,600 so that for example, you can compare the two models, you know, the one we made here 71 00:03:32,833 --> 00:03:35,833 and the new model we were about to make with the carrot package. 72 00:03:36,100 --> 00:03:38,733 Feel free to compare their performance results. 73 00:03:38,733 --> 00:03:43,800 So we're going to build this new kernel SVM model right here. 74 00:03:44,033 --> 00:03:45,100 So applying 75 00:03:45,100 --> 00:03:48,600 grid search to find the best parameters I prepared this code section here. 76 00:03:48,933 --> 00:03:53,100 But actually it's not only applying grid search it's also building a new model. 77 00:03:53,566 --> 00:03:58,066 So first the first thing that we have to do is to import the carrot package. 78 00:03:58,366 --> 00:04:02,066 We already installed it in some previous tutorials, but let's 79 00:04:02,066 --> 00:04:06,166 make sure you install it if you didn't follow those previous tutorials. 80 00:04:06,166 --> 00:04:11,000 So I'm just putting back this command and inside of course carrot in quotes. 81 00:04:11,433 --> 00:04:13,966 All right. And then I'll put that in command. 82 00:04:13,966 --> 00:04:15,000 Here we go. 83 00:04:15,000 --> 00:04:18,433 And now let's import the package with the library command. 84 00:04:18,966 --> 00:04:19,700 Here we go. 85 00:04:19,700 --> 00:04:22,866 Library and carrot inside. Perfect. 86 00:04:23,000 --> 00:04:28,100 So now let's build the same kernel SVM model but with the carrot package. 87 00:04:28,466 --> 00:04:31,133 And you're going to see that it's going to be something you. 88 00:04:31,133 --> 00:04:31,466 Okay. 89 00:04:31,466 --> 00:04:35,233 So since we're building this new classifier the kernel SVM classifier. 90 00:04:35,600 --> 00:04:39,000 Well let's go our model with the usual name that is classifier. 91 00:04:40,233 --> 00:04:40,633 All right. 92 00:04:40,633 --> 00:04:41,866 So that's the variable. 93 00:04:41,866 --> 00:04:42,800 And then equals. 94 00:04:42,800 --> 00:04:45,566 And then that's where we use the carrot package. 95 00:04:45,566 --> 00:04:47,433 And what is the function we're going to use. Now. 96 00:04:47,433 --> 00:04:50,266 It's one of the most used function of the carrot package. 97 00:04:50,266 --> 00:04:53,333 It is the train function. So parentheses. 98 00:04:53,666 --> 00:04:57,566 And now what's really interesting to do is to take a browser. 99 00:04:58,066 --> 00:04:58,933 Here it is. 100 00:04:58,933 --> 00:05:01,500 And let's type carrot 101 00:05:01,500 --> 00:05:05,800 enter and let's take the GitHub link. 102 00:05:06,133 --> 00:05:07,800 So let's click on it. 103 00:05:07,800 --> 00:05:12,066 And inside of it you need to click on this link right here. 104 00:05:12,700 --> 00:05:16,133 And then let's click on six available models. 105 00:05:16,500 --> 00:05:17,966 Because in this section 106 00:05:17,966 --> 00:05:21,666 you'll get all the models you can build with the carrot package. 107 00:05:21,933 --> 00:05:23,400 And there are a lot of models there 108 00:05:23,400 --> 00:05:25,800 actually all the models we built in this course. 109 00:05:25,800 --> 00:05:30,066 So right now some of you might be thinking then why didn't we build all these model 110 00:05:30,066 --> 00:05:32,900 with the carrot package, since indeed the carrot package 111 00:05:32,900 --> 00:05:35,900 will give us these models with the best optimal parameters? 112 00:05:36,200 --> 00:05:40,066 Well, that's because the packages we used to build all the models in the course 113 00:05:40,266 --> 00:05:44,166 have great options, some of which you cannot use with the carrot package. 114 00:05:44,466 --> 00:05:48,133 So it's good to know how to use both the packages we used before in the course 115 00:05:48,400 --> 00:05:51,800 and the carrot package, but definitely for parameter tuning 116 00:05:52,033 --> 00:05:54,300 you should use the carrot package. 117 00:05:54,300 --> 00:05:54,600 All right. 118 00:05:54,600 --> 00:05:56,100 So this list you see 119 00:05:56,100 --> 00:05:59,666 here is the list of all the models you can build with the carrot package. 120 00:06:00,000 --> 00:06:03,000 And since right now we are building the kernel 121 00:06:03,000 --> 00:06:06,466 SVM model, well you will see that it is available. 122 00:06:06,466 --> 00:06:09,466 I mean it is part of this list and 123 00:06:09,500 --> 00:06:13,466 we can find it right here at the bottom of this list. 124 00:06:13,466 --> 00:06:17,566 Because indeed we can see that we have many support vector machines model. 125 00:06:17,900 --> 00:06:21,200 And the one we are interested in is the support 126 00:06:21,200 --> 00:06:24,333 vector machines with radial basis function kernel. 127 00:06:24,633 --> 00:06:28,200 Remember the radial basis function kernel is the Gaussian kernel, 128 00:06:28,400 --> 00:06:32,000 the most commonly used kernel to build the kernel SVM model. 129 00:06:32,366 --> 00:06:36,066 And so this is the model we are going to build right now with the carrot package. 130 00:06:36,466 --> 00:06:40,366 And the information that we need to take right now that is going to be one 131 00:06:40,366 --> 00:06:43,233 input of the trained function we were about to use. 132 00:06:43,233 --> 00:06:45,866 Is this information here? 133 00:06:45,866 --> 00:06:49,433 That's the input of one of the parameters of the train function. 134 00:06:49,433 --> 00:06:50,933 The method parameter. 135 00:06:50,933 --> 00:06:53,933 And it is with this parameter that the train function will know 136 00:06:53,933 --> 00:06:56,933 which model to build and which model to tune. 137 00:06:56,933 --> 00:07:00,766 So right now what we have to do is to take this name, 138 00:07:01,300 --> 00:07:05,133 copy it, and we will paste it as input of the method parameter 139 00:07:05,133 --> 00:07:06,900 inside the train function. 140 00:07:06,900 --> 00:07:08,766 So let's go back to RStudio. 141 00:07:08,766 --> 00:07:11,733 Here we go. And now let's build the model. 142 00:07:11,733 --> 00:07:14,400 So we can actually press 143 00:07:14,400 --> 00:07:17,400 F1 here to get the informations of the arguments. 144 00:07:17,833 --> 00:07:19,900 So let's see what we have to input. 145 00:07:19,900 --> 00:07:24,300 The first compulsory argument we need to input is this form argument. 146 00:07:24,666 --> 00:07:26,800 And of course that's the formula. 147 00:07:26,800 --> 00:07:28,066 So let's do it. 148 00:07:28,066 --> 00:07:31,600 Let's input this first argument form equals. 149 00:07:31,600 --> 00:07:33,266 And then we need to put the formula 150 00:07:33,266 --> 00:07:36,300 exactly as we used to do when building the previous models. 151 00:07:36,533 --> 00:07:41,066 So that's the dependent variable which is as a reminder purchased. 152 00:07:41,466 --> 00:07:44,466 You know it's the social network and business problem. 153 00:07:44,533 --> 00:07:47,100 So here we need to input purchased. 154 00:07:48,500 --> 00:07:49,066 Here we go. 155 00:07:49,066 --> 00:07:53,466 Then tilde and then all the independent variables. 156 00:07:53,833 --> 00:07:56,700 And remember we don't have to put the names of all the independent variables. 157 00:07:56,700 --> 00:07:59,166 We can use this shortcut here. 158 00:07:59,166 --> 00:08:00,933 That is this dot. 159 00:08:00,933 --> 00:08:03,933 All right then comma and then next argument. 160 00:08:04,133 --> 00:08:05,800 So the next argument is data. 161 00:08:05,800 --> 00:08:08,433 And that's of course your training set. 162 00:08:08,433 --> 00:08:11,000 You're building your classifier on a training set. 163 00:08:11,000 --> 00:08:14,000 And so you need to input here data 164 00:08:14,433 --> 00:08:17,133 equals training set. 165 00:08:17,133 --> 00:08:17,900 Here we go. 166 00:08:17,900 --> 00:08:21,633 So now is the form the formula and the data the training set. 167 00:08:21,933 --> 00:08:24,933 You have all the information you need to train your model. 168 00:08:25,466 --> 00:08:28,633 But then of course we need to specify which model we have to build. 169 00:08:28,933 --> 00:08:32,533 That is we need to specify that we want to make the kernel SVM model. 170 00:08:32,833 --> 00:08:35,633 And that's what happens with the third argument, 171 00:08:35,633 --> 00:08:38,400 which is not one of these arguments here. 172 00:08:38,400 --> 00:08:42,033 These are not compulsory but it's the method argument. 173 00:08:42,333 --> 00:08:46,400 Actually you have the link here, the link we just browsed on Google. 174 00:08:46,400 --> 00:08:50,300 This link will give you the list of all the models available and correct. 175 00:08:50,766 --> 00:08:53,766 And so that's for this method parameter that we need to input. 176 00:08:53,800 --> 00:08:57,433 Once we copy it in this link, that is SVM radio. 177 00:08:57,833 --> 00:08:59,866 So let's input that right now coma. 178 00:08:59,866 --> 00:09:04,500 And then method equals then quotes. 179 00:09:04,866 --> 00:09:08,266 And inside of these quotes we need to paste SVM radio. 180 00:09:08,800 --> 00:09:12,066 And that's actually all with these three arguments 181 00:09:12,333 --> 00:09:17,200 the formula the data and the method you will build a kernel SVM model. 182 00:09:17,466 --> 00:09:20,466 And then for the parameter tuning you're going to see what happens. 183 00:09:20,500 --> 00:09:25,566 So first what we're going to do is execute the data preprocessing phase. 184 00:09:25,566 --> 00:09:29,733 Because we need to import the data set and applying the data preprocessing phase. 185 00:09:30,066 --> 00:09:33,133 So let's do it right now we're not going to execute these sections 186 00:09:33,133 --> 00:09:36,000 because this is to build the kernel SVM model the other way. 187 00:09:36,000 --> 00:09:37,833 So we just need to take everything from here 188 00:09:39,000 --> 00:09:40,433 up to the top. 189 00:09:40,433 --> 00:09:42,133 Here we go. Let's do it. 190 00:09:42,133 --> 00:09:44,166 All right. Data set. Well import it. 191 00:09:44,166 --> 00:09:47,700 We have the data set the training set and the test set and the data 192 00:09:47,700 --> 00:09:50,700 preprocessing phase was applied all correctly. 193 00:09:50,800 --> 00:09:51,133 All right. 194 00:09:51,133 --> 00:09:56,433 So now let's build the kernel SVM model with the carrot package. 195 00:09:56,766 --> 00:09:59,766 So let's make sure we import the carrot package. 196 00:10:00,133 --> 00:10:00,966 Here we go. 197 00:10:00,966 --> 00:10:05,800 And now let's select this line and execute it. 198 00:10:06,366 --> 00:10:06,833 All right. 199 00:10:06,833 --> 00:10:08,333 It's taking a little time. 200 00:10:08,333 --> 00:10:10,933 Well around one second that's fine. 201 00:10:10,933 --> 00:10:12,900 And here we get our classifier. 202 00:10:12,900 --> 00:10:15,533 So that's the kernel SVM model classifier. 203 00:10:15,533 --> 00:10:18,533 It is not yet tuned but you're going to see what's going to happen. 204 00:10:18,566 --> 00:10:22,366 Because now what we're going to do is pressing enter here. 205 00:10:22,766 --> 00:10:27,300 And we're simply going to type classifier and selecting this. 206 00:10:27,300 --> 00:10:28,866 And pressing enter. 207 00:10:28,866 --> 00:10:32,466 And there you get a lot of very interesting informations 208 00:10:32,733 --> 00:10:36,966 about your kernel SVM classifier you just built with the carrier package. 209 00:10:37,466 --> 00:10:42,333 Because indeed what you see here is the conclusion of some parameter tuning 210 00:10:42,333 --> 00:10:46,133 that was done on this kernel SVM classifier you just built. 211 00:10:46,433 --> 00:10:47,966 And what is this conclusion? 212 00:10:47,966 --> 00:10:52,700 It is that accuracy was used to select the optimal model using the largest value. 213 00:10:52,700 --> 00:10:54,166 So that means that the accuracy 214 00:10:54,166 --> 00:10:57,366 was the performance metric to evaluate your model performance. 215 00:10:57,966 --> 00:11:01,500 And just below you see that the final values used for 216 00:11:01,500 --> 00:11:04,966 the model were Sigma equals 1.12, 217 00:11:05,133 --> 00:11:08,400 1.13, and c equals 0.5. 218 00:11:08,966 --> 00:11:11,900 And that's the results of your parameter tuning. 219 00:11:11,900 --> 00:11:15,600 That's the optimal values of the hyperparameters of the kernel 220 00:11:15,600 --> 00:11:19,200 SVM model, the ones that will make your kernel SVM model 221 00:11:19,466 --> 00:11:22,800 even more performance than the one you built with the previous method, 222 00:11:23,100 --> 00:11:27,066 because of course, maybe the default value for the C parameter was 0.5. 223 00:11:27,066 --> 00:11:31,100 This looks like the default value, but definitely the default value 224 00:11:31,100 --> 00:11:34,600 for the sigma parameter, which is another hyper parameter of kernel. 225 00:11:34,600 --> 00:11:40,133 SVM was definitely not 1.13 and was this 1.13 value. 226 00:11:40,133 --> 00:11:43,800 For the sigma parameter, you get the optimal accuracy. 227 00:11:44,433 --> 00:11:46,933 An accuracy that is higher than the accuracy 228 00:11:46,933 --> 00:11:50,000 you obtained with the kernel SVM model built previously, 229 00:11:50,500 --> 00:11:52,966 and you can even see what this accuracy is. 230 00:11:52,966 --> 00:11:54,333 It is shown right here. 231 00:11:54,333 --> 00:11:56,966 Accuracy is 92%. 232 00:11:56,966 --> 00:11:58,766 And besides this is a relevant accuracy. 233 00:11:58,766 --> 00:12:01,766 This is not the accuracy measured on one traintestsplit. 234 00:12:02,033 --> 00:12:04,600 This was measured on several traintestsplit 235 00:12:04,600 --> 00:12:07,433 which you can exactly see here with this information. 236 00:12:07,433 --> 00:12:10,433 Resampling bootstrap was 25 repetitions, 237 00:12:10,666 --> 00:12:13,900 which is exactly the same principle as k fold cross-validation. 238 00:12:14,166 --> 00:12:17,333 That means that you take different samples of training sets and test sets, 239 00:12:17,600 --> 00:12:19,366 and you build your model on the training set 240 00:12:19,366 --> 00:12:23,800 and test its performance on the test set for each of these samples, and eventually 241 00:12:23,800 --> 00:12:28,000 you take the mean of the accuracies and you get a 92% accuracy. 242 00:12:28,400 --> 00:12:31,100 So that's definitely a good accuracy. 243 00:12:31,100 --> 00:12:34,066 And besides, that's the best you can obtain with 244 00:12:34,066 --> 00:12:36,900 these optimal values of the hyperparameters. 245 00:12:36,900 --> 00:12:40,500 Sigma equals 1.13 and c equals 0.5. 246 00:12:40,933 --> 00:12:44,100 And now I'll give you an even faster way to get these optimal values 247 00:12:44,100 --> 00:12:45,633 of the hyperparameters. 248 00:12:45,633 --> 00:12:49,333 You just need to well let's keep this classifier here. 249 00:12:49,333 --> 00:12:50,933 But let's copy that. 250 00:12:50,933 --> 00:12:54,900 You just need to take your classifier and then add a dollar sign. 251 00:12:55,433 --> 00:12:57,500 And then you can add here best tune. 252 00:12:58,533 --> 00:12:59,566 Here you go. 253 00:12:59,566 --> 00:13:02,166 You select it, you execute it. 254 00:13:02,166 --> 00:13:05,933 And you get directly the optimal values of the hyperparameters 255 00:13:06,300 --> 00:13:10,000 sigma equals 1.13 and c equals 0.5. 256 00:13:10,500 --> 00:13:14,700 And now congratulations you got your best kernel SVM model 257 00:13:14,966 --> 00:13:18,300 with the best training coefficients and the best hyperparameters. 258 00:13:18,600 --> 00:13:20,500 And now the choice is yours. 259 00:13:20,500 --> 00:13:24,466 You can either take this kernel SVM classifier you built here 260 00:13:24,466 --> 00:13:25,766 with the carrier package, 261 00:13:25,766 --> 00:13:29,633 or you can take the classifier you built here with the usual way. 262 00:13:29,900 --> 00:13:32,900 But inside of this SVM function you can input 263 00:13:33,000 --> 00:13:36,133 the optimal values you found here for the hyperparameters. 264 00:13:36,133 --> 00:13:40,066 That is, sigma equals 1.13 and c equals 0.5. 265 00:13:40,466 --> 00:13:41,900 I will let you that for practice 266 00:13:41,900 --> 00:13:44,900 and judge yourself which method you want to choose. 267 00:13:45,233 --> 00:13:47,966 And I'll see you in the next and last section of this course 268 00:13:47,966 --> 00:13:49,433 in which we are going to implement 269 00:13:49,433 --> 00:13:53,100 one of the most powerful models in machine learning, XGBoost. 270 00:13:53,400 --> 00:13:55,833 So this is going to be a very exciting section. 271 00:13:55,833 --> 00:13:57,300 I can't wait to see you there. 272 00:13:57,300 --> 00:13:59,133 And until then, enjoy machine learning.