1 00:00:00,300 --> 00:00:03,566 Hello my friends, and welcome to the final. 2 00:00:03,566 --> 00:00:06,366 Practical activity of this course. 3 00:00:06,366 --> 00:00:07,333 Yes, I must. 4 00:00:07,333 --> 00:00:08,533 Start by saying that. 5 00:00:08,533 --> 00:00:10,633 I'm at the same time excited. 6 00:00:10,633 --> 00:00:13,133 But sad that this. Is the end of the journey. 7 00:00:13,133 --> 00:00:15,200 But no worries, we're going to end on a. 8 00:00:15,200 --> 00:00:16,500 Very, very. 9 00:00:16,500 --> 00:00:17,733 Good note. 10 00:00:17,733 --> 00:00:18,900 And that good note is. 11 00:00:18,900 --> 00:00:20,700 About, of course, XGBoost. 12 00:00:20,700 --> 00:00:22,633 It is a super powerful. 13 00:00:22,633 --> 00:00:25,166 Machine learning. Model, which I absolutely. 14 00:00:25,166 --> 00:00:26,700 Want. You to have. In the. 15 00:00:26,700 --> 00:00:30,633 Toolkit because you will see that it brings exit and results. 16 00:00:30,633 --> 00:00:32,966 In most machine learning problems. 17 00:00:32,966 --> 00:00:37,666 And actually, the thing so cool about this is that it can be both used for. 18 00:00:37,666 --> 00:00:40,033 Regression and classification. 19 00:00:40,033 --> 00:00:41,033 So there we go. 20 00:00:41,033 --> 00:00:41,900 Let's cross. 21 00:00:41,900 --> 00:00:44,333 The finish line together in this final. 22 00:00:44,333 --> 00:00:45,000 Tutorial. 23 00:00:45,000 --> 00:00:48,066 By implementing. The XGBoost. Model. 24 00:00:48,533 --> 00:00:51,333 So that model is given to you in person. 25 00:00:51,333 --> 00:00:52,266 So just before we. 26 00:00:52,266 --> 00:00:55,400 Enter this part, make sure to be on that same page. 27 00:00:55,400 --> 00:00:58,200 I give you the link to this folder right before this tutorial. 28 00:00:58,200 --> 00:01:00,900 In the article. So make sure to connect to it. 29 00:01:00,900 --> 00:01:02,066 And now here we go. 30 00:01:02,066 --> 00:01:03,866 Let's finish this journey together. 31 00:01:03,866 --> 00:01:06,733 By entering part ten. And then the final section. 32 00:01:06,733 --> 00:01:07,900 Of this course, section. 33 00:01:07,900 --> 00:01:10,866 49. On XGBoost. 34 00:01:10,866 --> 00:01:11,333 All right. 35 00:01:11,333 --> 00:01:15,333 And as usual we're going to start with Python which contains two files. 36 00:01:15,333 --> 00:01:16,600 First the data. 37 00:01:16,600 --> 00:01:19,400 And second, the implementation. 38 00:01:19,400 --> 00:01:22,366 And now. You probably also noticed that. 39 00:01:22,366 --> 00:01:25,200 There are many files open now on my machine. 40 00:01:25,200 --> 00:01:26,766 Write all these files. 41 00:01:26,766 --> 00:01:27,733 And these are, you know, the. 42 00:01:27,733 --> 00:01:29,933 Files that we. Implemented. 43 00:01:29,933 --> 00:01:30,966 So quickly and. 44 00:01:30,966 --> 00:01:34,933 Efficiently when doing that model selection demo. 45 00:01:35,066 --> 00:01:38,066 At the end of part three classification. 46 00:01:38,066 --> 00:01:40,500 Right. I gave you. This model selection folder. 47 00:01:40,500 --> 00:01:42,900 With all these classification. Models. 48 00:01:42,900 --> 00:01:45,300 Which were. Experimented on the same. 49 00:01:45,300 --> 00:01:49,033 Data set, which is that data set data dot CSV. 50 00:01:49,400 --> 00:01:51,533 And which I remind consists. 51 00:01:51,533 --> 00:01:53,633 Of predicting if a breast. 52 00:01:53,633 --> 00:01:56,500 Cancer tumor is benign or malignant. 53 00:01:56,500 --> 00:02:00,133 Meaning that each row of this data set corresponds. To. 54 00:02:00,133 --> 00:02:02,433 A patient. You know is a certain patient. 55 00:02:02,433 --> 00:02:03,900 And for each of these patients, we. 56 00:02:03,900 --> 00:02:05,900 Have several features. 57 00:02:05,900 --> 00:02:08,966 From the clump thickness, the uniformity of. 58 00:02:08,966 --> 00:02:09,833 Cell size. The. 59 00:02:09,833 --> 00:02:14,900 Uniformity of cell shape, you know, all these features that are characteristics. 60 00:02:14,900 --> 00:02:16,166 Of a tumor. 61 00:02:16,166 --> 00:02:19,166 And with all these features, we were trying to predict if the. 62 00:02:19,166 --> 00:02:22,400 Tumor is benign or malignant, it is benign. 63 00:02:22,400 --> 00:02:23,966 If we get to class two. 64 00:02:23,966 --> 00:02:25,033 And malignant if. 65 00:02:25,033 --> 00:02:27,233 We get the class four. All right. 66 00:02:27,233 --> 00:02:29,033 And we build and trained. 67 00:02:29,033 --> 00:02:29,766 All of our. 68 00:02:29,766 --> 00:02:30,933 Classification model. 69 00:02:30,933 --> 00:02:33,433 Which are all the ones right here to. 70 00:02:33,433 --> 00:02:35,000 Learn the correlations between. 71 00:02:35,000 --> 00:02:37,000 All these features. 72 00:02:37,000 --> 00:02:41,700 And that dependent variable telling if the tumor is benign or malignant. 73 00:02:42,166 --> 00:02:43,300 And remember. 74 00:02:43,300 --> 00:02:44,833 That we had. Different. 75 00:02:44,833 --> 00:02:48,300 Accuracies for each of these models with the logistic. 76 00:02:48,300 --> 00:02:52,466 Regression model, we had an accuracy of 94.7%. 77 00:02:52,733 --> 00:02:54,033 With the K-nearest neighbors. 78 00:02:54,033 --> 00:02:56,800 We had. An accuracy of 94.7%. 79 00:02:56,800 --> 00:02:58,200 Again, with the. 80 00:02:58,200 --> 00:03:00,166 SVM, we had an accuracy. 81 00:03:00,166 --> 00:03:02,866 Of. 94.1%. 82 00:03:02,866 --> 00:03:06,166 With the kernel SVM, we had a better accuracy actually. 83 00:03:06,300 --> 00:03:09,200 With 95.3. Percent. 84 00:03:09,200 --> 00:03:10,400 With Naive Bayes. 85 00:03:10,400 --> 00:03:12,433 We got a lower accuracy of. 86 00:03:12,433 --> 00:03:14,266 94.1% again. 87 00:03:14,266 --> 00:03:17,066 And with decision tree classification, well. 88 00:03:17,066 --> 00:03:18,133 That. Was the winner. 89 00:03:18,133 --> 00:03:20,533 We got an amazing. Accuracy of. 90 00:03:20,533 --> 00:03:23,333 95.9%. That was. 91 00:03:23,333 --> 00:03:24,766 The number one on the. 92 00:03:24,766 --> 00:03:28,566 Podium, followed by the kernel SVM with that accuracy. 93 00:03:28,566 --> 00:03:32,133 And then unfortunately, Random Forest did not do any. 94 00:03:32,133 --> 00:03:33,600 Good or, you know. 95 00:03:33,600 --> 00:03:35,166 Not better than the others. 96 00:03:35,166 --> 00:03:38,100 Because we got. A 93.5%. 97 00:03:38,100 --> 00:03:39,633 Accuracy with it. 98 00:03:39,633 --> 00:03:42,033 And so. What I want to. Do now. 99 00:03:42,033 --> 00:03:46,266 As you probably have guessed, is to build the XGBoost. 100 00:03:46,266 --> 00:03:49,033 Model and train it on the same data. 101 00:03:49,033 --> 00:03:50,433 Set to see. 102 00:03:50,433 --> 00:03:51,633 If it can take. 103 00:03:51,633 --> 00:03:53,333 The throne holds by the. 104 00:03:53,333 --> 00:03:56,133 Decision tree classification model. 105 00:03:56,133 --> 00:03:57,033 In other words. 106 00:03:57,033 --> 00:03:58,800 To see if it. Can beat. 107 00:03:58,800 --> 00:04:00,066 That accuracy. 108 00:04:00,066 --> 00:04:01,800 Obtained with the. Decision tree. 109 00:04:01,800 --> 00:04:03,266 Classification model. 110 00:04:03,266 --> 00:04:07,033 And well, maybe maybe that's the good note on which. 111 00:04:07,033 --> 00:04:07,433 We will. 112 00:04:07,433 --> 00:04:09,400 End the journey. Of this course. 113 00:04:09,400 --> 00:04:10,233 Are you ready? 114 00:04:10,233 --> 00:04:11,333 So let's do. This. 115 00:04:11,333 --> 00:04:16,400 We're now going to build and train our XGBoost model on the exact same. 116 00:04:16,400 --> 00:04:17,366 Data set. 117 00:04:17,366 --> 00:04:18,866 And see if we can beat. 118 00:04:18,866 --> 00:04:19,633 Basically. 119 00:04:19,633 --> 00:04:22,633 An. Accuracy. Of 95.9%. 120 00:04:22,866 --> 00:04:23,566 And not only. 121 00:04:23,566 --> 00:04:24,100 We will test. 122 00:04:24,100 --> 00:04:25,200 That on a single. 123 00:04:25,200 --> 00:04:27,366 Test set, but also of. Course, now that we. 124 00:04:27,366 --> 00:04:28,300 Learned k fold. 125 00:04:28,300 --> 00:04:30,600 Cross-Validation in the previous section, we. 126 00:04:30,600 --> 00:04:33,600 Will test this. On ten test. Fools so that. 127 00:04:33,633 --> 00:04:36,166 We can get a relevant measure of. The accuracy. 128 00:04:36,166 --> 00:04:37,300 And make sure. 129 00:04:37,300 --> 00:04:38,700 That perhaps. 130 00:04:38,700 --> 00:04:42,033 XGBoost will now become the number one on the podium. 131 00:04:42,033 --> 00:04:44,866 With the. Ultimate machine learning. Throne. 132 00:04:44,866 --> 00:04:46,800 So let's check this out right now. 133 00:04:46,800 --> 00:04:48,966 Let's open this implementation. 134 00:04:48,966 --> 00:04:51,200 With either Google Colaboratory or. 135 00:04:51,200 --> 00:04:52,566 Jupyter Notebook. 136 00:04:52,566 --> 00:04:54,100 I'm going to put. It. Last. 137 00:04:54,100 --> 00:04:55,233 You know, just next. 138 00:04:55,233 --> 00:04:58,200 To all our other classification. Models. 139 00:04:58,200 --> 00:05:00,900 And now the notebook. Just. Opened. 140 00:05:00,900 --> 00:05:03,600 But it is. Still in read only mode. So we're going to create. 141 00:05:03,600 --> 00:05:06,533 A copy right away by clicking Save. 142 00:05:06,533 --> 00:05:07,333 A Copy and Drive. 143 00:05:07,333 --> 00:05:08,200 You can notice that. 144 00:05:08,200 --> 00:05:10,466 All these are copies of the. 145 00:05:10,466 --> 00:05:11,966 Original implementations. 146 00:05:11,966 --> 00:05:12,833 Which are right here. 147 00:05:12,833 --> 00:05:14,066 You know that's the. 148 00:05:14,066 --> 00:05:15,966 Model selection folder and then. 149 00:05:15,966 --> 00:05:19,433 Classification subfolder, which I gave you at the end of part three. 150 00:05:19,600 --> 00:05:22,000 So you can run these codes again if you want. 151 00:05:22,000 --> 00:05:23,633 But we already did that. 152 00:05:23,633 --> 00:05:27,200 Just remember that the number one was indeed decision tree classification 153 00:05:27,200 --> 00:05:30,200 with an accuracy of 95.9%. 154 00:05:30,333 --> 00:05:31,066 And now we're going to. 155 00:05:31,066 --> 00:05:34,233 See if we can beat this with XGBoost. 156 00:05:35,166 --> 00:05:35,900 All right. 157 00:05:35,900 --> 00:05:38,366 So of course no worries. 158 00:05:38,366 --> 00:05:40,233 We won't re-implement all this. 159 00:05:40,233 --> 00:05:42,766 We will quickly get to the core. 160 00:05:42,766 --> 00:05:44,800 Of the implementation. And mostly. 161 00:05:44,800 --> 00:05:46,600 The exciting part, which are the. 162 00:05:46,600 --> 00:05:49,233 Results in this same. Tutorial. 163 00:05:49,233 --> 00:05:51,200 Because indeed. All the cells of. 164 00:05:51,200 --> 00:05:52,733 This implementation are. 165 00:05:52,733 --> 00:05:55,066 Just cells. Taken from our diverse. Toolkit. 166 00:05:55,066 --> 00:05:55,533 Right? 167 00:05:55,533 --> 00:05:58,566 These three first cells are, as you recognize 168 00:05:58,566 --> 00:06:01,566 perfectly, the cells of our data preprocessing template. 169 00:06:01,566 --> 00:06:01,966 Right. 170 00:06:01,966 --> 00:06:04,733 We first import the libraries and we import the data set with. 171 00:06:04,733 --> 00:06:06,000 The exact same code. 172 00:06:06,000 --> 00:06:08,133 I just put the name of the data. Set. Here. 173 00:06:08,133 --> 00:06:10,633 And then we split. The data. Sets into the training set and. 174 00:06:10,633 --> 00:06:11,700 Test set. 175 00:06:11,700 --> 00:06:13,833 So this is all the data preprocessing phase. 176 00:06:13,833 --> 00:06:16,500 Then we train XGBoost on the training set. 177 00:06:16,500 --> 00:06:17,733 And of course I'm going to. 178 00:06:17,733 --> 00:06:20,666 Delete this cell right away because that's the cell we. 179 00:06:20,666 --> 00:06:22,700 Will re-implement together. 180 00:06:22,700 --> 00:06:23,233 And then we. 181 00:06:23,233 --> 00:06:27,033 Have the other tools of our other toolkits like the classification toolkit. 182 00:06:27,300 --> 00:06:30,400 Because indeed this cell makes the confusion matrix 183 00:06:30,400 --> 00:06:33,400 and prints at the same time the accuracy. 184 00:06:33,466 --> 00:06:35,400 I actually already deleted the. 185 00:06:35,400 --> 00:06:36,666 Output to make sure. 186 00:06:36,666 --> 00:06:38,233 We get the full surprise. 187 00:06:38,233 --> 00:06:39,700 By the end of this. Tutorial. 188 00:06:39,700 --> 00:06:43,333 And then of course, as I told you, we are going to apply K-Fold 189 00:06:43,333 --> 00:06:44,833 cross validation right at. 190 00:06:44,833 --> 00:06:45,866 The end to make. 191 00:06:45,866 --> 00:06:46,366 Sure. 192 00:06:46,366 --> 00:06:48,900 That indeed we didn't get lucky on the. 193 00:06:48,900 --> 00:06:49,466 Test set. 194 00:06:49,466 --> 00:06:51,500 You know, if we indeed can beat all. 195 00:06:51,500 --> 00:06:54,366 The other algorithms. So we will not. Only get a. 196 00:06:54,366 --> 00:06:57,200 First measure of the performance thanks to a single. 197 00:06:57,200 --> 00:07:00,000 Test set with this. Cell, and then we'll. Get the. 198 00:07:00,000 --> 00:07:01,400 Ultimate measure of. 199 00:07:01,400 --> 00:07:04,200 The accuracy with that cell. 200 00:07:04,200 --> 00:07:05,066 Are you ready? 201 00:07:05,066 --> 00:07:06,300 Let's start by. 202 00:07:06,300 --> 00:07:07,533 Building and training. 203 00:07:07,533 --> 00:07:09,233 XGBoost. On the training. 204 00:07:09,233 --> 00:07:12,100 Set, which resulted. From the split. Of the data set. 205 00:07:12,100 --> 00:07:14,133 Between the training set and test set. 206 00:07:14,133 --> 00:07:16,766 And first, in order. To get the. Assistance of Google. 207 00:07:16,766 --> 00:07:20,866 Collab, well, let's apply this reflex of uploading the. 208 00:07:20,866 --> 00:07:22,533 Data into the notebook. 209 00:07:22,533 --> 00:07:23,666 So, you know, I just. 210 00:07:23,666 --> 00:07:24,566 Clicked this. 211 00:07:24,566 --> 00:07:27,033 Folder button here, and then a second. Will see the. 212 00:07:27,033 --> 00:07:29,600 Upload button to upload indeed. Our data set. 213 00:07:29,600 --> 00:07:30,833 So let's click it. 214 00:07:30,833 --> 00:07:33,233 And now let's. Go to. Our machine learning. 215 00:07:33,233 --> 00:07:35,066 Is it Codes and Data sets folder. 216 00:07:35,066 --> 00:07:35,566 Because you will. 217 00:07:35,566 --> 00:07:36,700 Still find the data. 218 00:07:36,700 --> 00:07:39,400 Dot CSV. File in this folder in. Parts. 219 00:07:39,400 --> 00:07:41,866 And of course. So let's go into this folder. 220 00:07:41,866 --> 00:07:44,900 Then let's go into part ten then section 49. 221 00:07:44,900 --> 00:07:45,933 XGBoost. 222 00:07:45,933 --> 00:07:46,666 Python. 223 00:07:46,666 --> 00:07:47,166 And here. 224 00:07:47,166 --> 00:07:49,100 Is the data set data dot CSV. 225 00:07:49,100 --> 00:07:50,833 Of many patients with. 226 00:07:50,833 --> 00:07:52,600 Tumors for which we have to predict if. 227 00:07:52,600 --> 00:07:54,933 The tumor is benign or malignant. 228 00:07:54,933 --> 00:07:56,400 So open. 229 00:07:56,400 --> 00:07:57,533 Okay. 230 00:07:57,533 --> 00:08:00,666 And now the data set is indeed uploaded into the notebook. 231 00:08:00,666 --> 00:08:01,600 So there we go. 232 00:08:01,600 --> 00:08:05,100 We can implement that cell and then run the whole code. 233 00:08:05,366 --> 00:08:07,466 All right. So let's create a new code cell. 234 00:08:07,466 --> 00:08:08,366 And there we. Go. 235 00:08:08,366 --> 00:08:09,600 Let's build and train. 236 00:08:09,600 --> 00:08:11,966 XGBoost. On the training set. 237 00:08:11,966 --> 00:08:12,966 So you're going to see that it's. 238 00:08:12,966 --> 00:08:14,733 Going to be super easy. 239 00:08:14,733 --> 00:08:16,566 And in fact we won't even do it. 240 00:08:16,566 --> 00:08:20,233 With scikit learn, but with a library called XGBoost. 241 00:08:20,233 --> 00:08:21,000 In which. We. 242 00:08:21,000 --> 00:08:23,033 Don't even have to install thanks. 243 00:08:23,033 --> 00:08:24,933 To Google Colab, because it is one of. 244 00:08:24,933 --> 00:08:27,200 The many. Packages already. 245 00:08:27,200 --> 00:08:30,200 Installed on Google Colab already pre-installed. 246 00:08:30,300 --> 00:08:32,200 So we have nothing to worry about 247 00:08:32,200 --> 00:08:35,866 and we can just start building and training this model. 248 00:08:36,233 --> 00:08:38,500 But first we're going to import. The class. 249 00:08:38,500 --> 00:08:41,066 With which we're going to build this and this class belongs of. 250 00:08:41,066 --> 00:08:43,233 Course, to this XGBoost library. 251 00:08:43,233 --> 00:08:45,633 So there we go. We're going to start from. 252 00:08:45,633 --> 00:08:48,000 This XGBoost library. 253 00:08:48,000 --> 00:08:49,966 Right. It's built this way just. 254 00:08:49,966 --> 00:08:52,633 Like the name of the model XGBoost. Indeed. 255 00:08:52,633 --> 00:08:54,000 And from this. Library. 256 00:08:54,000 --> 00:08:55,300 We're going to import. 257 00:08:55,300 --> 00:08:57,733 Well, the class. That can build an. 258 00:08:57,733 --> 00:08:59,366 XGBoost classification. 259 00:08:59,366 --> 00:09:02,366 Model and which is. Called ex-KGB. 260 00:09:02,400 --> 00:09:04,200 There we go. Google collab found it. 261 00:09:04,200 --> 00:09:06,400 XGBoost classifier. 262 00:09:06,400 --> 00:09:06,933 All right. 263 00:09:06,933 --> 00:09:08,133 And now the next natural. 264 00:09:08,133 --> 00:09:09,833 Step as usual, is to. 265 00:09:09,833 --> 00:09:11,366 Create an instance of this. 266 00:09:11,366 --> 00:09:13,266 Class which will be exactly the. 267 00:09:13,266 --> 00:09:16,400 Object containing the. XGBoost. Model. 268 00:09:16,533 --> 00:09:19,366 So once again. We're going to call it classifier. 269 00:09:20,700 --> 00:09:21,300 All right. 270 00:09:21,300 --> 00:09:23,400 And we'll. Create this classifier as. 271 00:09:23,400 --> 00:09:24,700 An instance indeed. 272 00:09:24,700 --> 00:09:28,200 Of the XGBoost classifier class. 273 00:09:28,200 --> 00:09:29,200 Perfect. 274 00:09:29,200 --> 00:09:32,000 And now the good news is that we won't have too much. 275 00:09:32,000 --> 00:09:34,166 To worry about with this class, because. 276 00:09:34,166 --> 00:09:36,633 There is not much parameter to tune. Right? 277 00:09:36,633 --> 00:09:37,666 Basically, the default. 278 00:09:37,666 --> 00:09:38,466 Version of the. 279 00:09:38,466 --> 00:09:39,233 XGBoost. 280 00:09:39,233 --> 00:09:42,166 Model will most of the time perform super. Well. 281 00:09:42,166 --> 00:09:44,133 So all good here and now. 282 00:09:44,133 --> 00:09:44,600 Of course. 283 00:09:44,600 --> 00:09:46,933 We finished this. By. Connecting. 284 00:09:46,933 --> 00:09:48,533 This extra boost classifier. 285 00:09:48,533 --> 00:09:49,933 To our training set. 286 00:09:49,933 --> 00:09:51,833 And the way to. Do this is by, of course. 287 00:09:51,833 --> 00:09:52,666 Calling the. 288 00:09:52,666 --> 00:09:54,166 Fit. Method from our. 289 00:09:54,166 --> 00:09:55,766 Classifier object, which. 290 00:09:55,766 --> 00:09:58,266 Will do nothing else than train this. 291 00:09:58,266 --> 00:09:59,766 XGBoost classifier. 292 00:09:59,766 --> 00:10:01,900 On the training set. Right? 293 00:10:01,900 --> 00:10:04,500 So something we did many times. 294 00:10:04,500 --> 00:10:05,233 So there we go. 295 00:10:05,233 --> 00:10:09,000 Let's do it one less time in this whole machine learning journey. 296 00:10:09,000 --> 00:10:10,300 But then I'm sure you. Will do it. 297 00:10:10,300 --> 00:10:14,300 Many times once again in the future, in your future machine learning career. 298 00:10:14,466 --> 00:10:17,466 So let's do this. We call our classifier. 299 00:10:17,800 --> 00:10:19,666 From which we're going to call. 300 00:10:19,666 --> 00:10:21,000 This fit. 301 00:10:21,000 --> 00:10:22,066 Method. Which will. 302 00:10:22,066 --> 00:10:22,933 Train the. 303 00:10:22,933 --> 00:10:23,966 Classifier. 304 00:10:23,966 --> 00:10:26,300 On the training set, which is composed. 305 00:10:26,300 --> 00:10:28,066 Of first the features of the. 306 00:10:28,066 --> 00:10:31,066 Training set represented by X train, and then. 307 00:10:31,333 --> 00:10:32,533 The dependent variable of the. 308 00:10:32,533 --> 00:10:36,133 Training set represented. By Y train. 309 00:10:36,133 --> 00:10:38,033 And these are exactly, of course. 310 00:10:38,033 --> 00:10:40,066 The inputs of the fitness it. 311 00:10:41,033 --> 00:10:41,700 All right. 312 00:10:41,700 --> 00:10:42,700 So in the. 313 00:10:42,700 --> 00:10:44,900 Flashiest of the flashes we built. 314 00:10:44,900 --> 00:10:45,733 And trained. 315 00:10:45,733 --> 00:10:47,333 This XGBoost model. 316 00:10:47,333 --> 00:10:48,466 On the training set. 317 00:10:48,466 --> 00:10:50,866 We only had. To implement these three lines of code. 318 00:10:50,866 --> 00:10:53,500 And then all. The rest. Is something we've already done. 319 00:10:53,500 --> 00:10:55,200 You know this is the confusion matrix. 320 00:10:55,200 --> 00:10:56,333 You have this code. 321 00:10:56,333 --> 00:10:58,766 In all. Of your classification. Templates. 322 00:10:58,766 --> 00:11:00,766 And finally. Of course, we apply the exact. 323 00:11:00,766 --> 00:11:02,766 Same cell that we implemented just. 324 00:11:02,766 --> 00:11:06,600 Before in the previous section to apply k for cross-validation. 325 00:11:06,900 --> 00:11:09,900 So we are ready to run this code, but just before we do it, 326 00:11:10,033 --> 00:11:11,633 I just want to show you the way to. 327 00:11:11,633 --> 00:11:14,266 Build an. ex-KGB regressor. Model. 328 00:11:14,266 --> 00:11:16,900 You know, an extra boost model for regression. 329 00:11:16,900 --> 00:11:18,600 It's actually super simple. 330 00:11:18,600 --> 00:11:20,833 The only thing you need. To change here. 331 00:11:20,833 --> 00:11:22,266 Is just, of. Course, the name of. 332 00:11:22,266 --> 00:11:27,166 The class, which wouldn't be ex-KGB classifier, but ex-KGB. 333 00:11:27,500 --> 00:11:30,500 And as you will see, ex-KGB regressor. 334 00:11:30,833 --> 00:11:31,966 And then, you know, you would just. 335 00:11:31,966 --> 00:11:34,333 Replace classify here by regressor. 336 00:11:34,333 --> 00:11:35,333 And then that's. It. 337 00:11:35,333 --> 00:11:36,666 This way you will build a. 338 00:11:36,666 --> 00:11:39,366 Regression model based on X boost. 339 00:11:39,366 --> 00:11:39,700 All right. 340 00:11:39,700 --> 00:11:43,366 But let's go back to our XGBoost classifier class. 341 00:11:43,366 --> 00:11:44,500 And there we go. 342 00:11:44,500 --> 00:11:48,066 Now we can just save this implementation and do a run all. 343 00:11:48,066 --> 00:11:49,100 To. Find out. 344 00:11:49,100 --> 00:11:52,033 If X boost. Is going to steal the throne of. 345 00:11:52,033 --> 00:11:54,233 The decision tree classification. Model. 346 00:11:54,233 --> 00:11:56,266 For this particular data set. 347 00:11:56,266 --> 00:11:57,333 So quick reminder. 348 00:11:57,333 --> 00:12:00,966 With the decision tree classification model we got the best accuracy. 349 00:12:00,966 --> 00:12:04,433 You know the highest one of 95.9%. 350 00:12:04,700 --> 00:12:06,800 And now let's find out if we can beat. 351 00:12:06,800 --> 00:12:09,166 This with the XGBoost model. 352 00:12:09,166 --> 00:12:10,133 Trained on the. 353 00:12:10,133 --> 00:12:12,666 Exact. Same. Data set. 354 00:12:12,666 --> 00:12:13,700 All right. So basically. 355 00:12:13,700 --> 00:12:16,200 We're ready. Now we're just going to do a run. 356 00:12:16,200 --> 00:12:16,733 All by. 357 00:12:16,733 --> 00:12:18,966 Clicking runtime here. And now. 358 00:12:18,966 --> 00:12:21,033 Are you ready I bet you are. 359 00:12:21,033 --> 00:12:23,666 So let's do this three two. 360 00:12:23,666 --> 00:12:25,300 One. Go. 361 00:12:25,300 --> 00:12:27,000 All right. So all the cells are running now. 362 00:12:27,000 --> 00:12:29,700 And we get. An impressive. 363 00:12:29,700 --> 00:12:30,866 Accuracy. 364 00:12:30,866 --> 00:12:33,433 Of 97.8. 365 00:12:33,433 --> 00:12:34,466 Percent. 366 00:12:34,466 --> 00:12:37,833 When I was telling you that we're going to end this journey on a good note. 367 00:12:37,833 --> 00:12:39,966 Well, I was. Choosing my words. 368 00:12:39,966 --> 00:12:42,466 Very, very carefully indeed. 369 00:12:42,466 --> 00:12:44,333 That's just an amazing accuracy. 370 00:12:44,333 --> 00:12:44,733 You know, there are. 371 00:12:44,733 --> 00:12:46,166 Only three incorrect. 372 00:12:46,166 --> 00:12:48,633 Predictions on such a sensitive problem. 373 00:12:48,633 --> 00:12:50,100 You know, cancer prediction. 374 00:12:50,100 --> 00:12:51,900 Well, this result. Is just amazing. 375 00:12:51,900 --> 00:12:55,133 Here. Indeed, we almost. Get 98% accuracy. 376 00:12:55,133 --> 00:12:57,600 With. These only three incorrect predictions. 377 00:12:57,600 --> 00:12:59,100 That's just amazing. 378 00:12:59,100 --> 00:13:00,266 But now we have. To check. 379 00:13:00,266 --> 00:13:02,200 One last thing, because, you know, maybe we. 380 00:13:02,200 --> 00:13:05,100 Got lucky. On this single. Test. Set. 381 00:13:05,100 --> 00:13:05,600 Maybe that. 382 00:13:05,600 --> 00:13:08,733 Single test, it was more favorable to actually boost. 383 00:13:08,733 --> 00:13:09,666 On the. Other. 384 00:13:09,666 --> 00:13:10,733 Classification models. 385 00:13:10,733 --> 00:13:13,666 Which could. Explain why Extra Boost was number one. 386 00:13:13,666 --> 00:13:16,233 And the only. Way to check this is by actually. 387 00:13:16,233 --> 00:13:17,400 Computing other. 388 00:13:17,400 --> 00:13:19,366 Accuracies. On other test. Sets. 389 00:13:19,366 --> 00:13:21,033 And this. Is exactly what k fold. 390 00:13:21,033 --> 00:13:22,566 Cross-Validation is about. 391 00:13:22,566 --> 00:13:23,166 And that is. 392 00:13:23,166 --> 00:13:25,800 Why this is the last cell of this. Implementation. 393 00:13:25,800 --> 00:13:30,933 And we also have the result for this, which is, as we can see, still. 394 00:13:30,933 --> 00:13:31,833 An amazing. 395 00:13:31,833 --> 00:13:35,200 Accuracy. Of 96.50. 3%. 396 00:13:35,566 --> 00:13:36,633 This is of course. 397 00:13:36,633 --> 00:13:40,133 An. Average accuracy obtained as a result of the average. 398 00:13:40,133 --> 00:13:43,133 Of ten different accuracies measured on ten different test. 399 00:13:43,133 --> 00:13:45,100 Sets. And besides, we have a. 400 00:13:45,100 --> 00:13:47,733 Rather small standard deviation of only. 401 00:13:47,733 --> 00:13:49,400 2%, which is good. 402 00:13:49,400 --> 00:13:53,066 Once again for this sensitive problem of cancer prediction. 403 00:13:53,466 --> 00:13:55,066 So yes, XGBoost. 404 00:13:55,066 --> 00:13:56,900 Is definitely number one here. 405 00:13:56,900 --> 00:13:59,133 And that's why my friends, I'm just. Super happy. 406 00:13:59,133 --> 00:14:02,266 That we end on this good note with this final, powerful. 407 00:14:02,266 --> 00:14:03,100 Tool that you get. 408 00:14:03,100 --> 00:14:04,566 In your machine learning toolkit, 409 00:14:04,566 --> 00:14:08,133 because now you can start your post machine learning journey, you know, for. 410 00:14:08,133 --> 00:14:10,533 Your career. In full confidence. 411 00:14:10,533 --> 00:14:12,000 And about that, that will. 412 00:14:12,000 --> 00:14:14,533 Be my final. Words to you in this course. 413 00:14:14,533 --> 00:14:15,066 I wish. 414 00:14:15,066 --> 00:14:16,233 You tons of great. 415 00:14:16,233 --> 00:14:19,233 Success in your future machine learning projects. 416 00:14:19,233 --> 00:14:20,066 I wish that you. 417 00:14:20,066 --> 00:14:22,100 Are the talented data scientist 418 00:14:22,100 --> 00:14:25,933 who brings the strongest insights and the highest value analysis. 419 00:14:25,933 --> 00:14:28,333 To your. Team and to your clients. 420 00:14:28,333 --> 00:14:30,300 Now you're totally able to do this. 421 00:14:30,300 --> 00:14:33,766 Thanks to your complete and powerful machine learning toolkit. 422 00:14:33,900 --> 00:14:37,166 With these, you're totally able to smash your future. 423 00:14:37,166 --> 00:14:38,666 Machine learning problems. 424 00:14:38,666 --> 00:14:41,166 So once again. I wish you the best. And I look. 425 00:14:41,166 --> 00:14:42,966 Forward to seeing. You in another. 426 00:14:42,966 --> 00:14:45,600 Course for. A new data science journey. 427 00:14:45,600 --> 00:14:48,533 And until then, of course, enjoy machine learning.