1
00:00:00,300 --> 00:00:03,566
Hello
my friends, and welcome to the final.

2
00:00:03,566 --> 00:00:06,366
Practical activity of this course.

3
00:00:06,366 --> 00:00:07,333
Yes, I must.

4
00:00:07,333 --> 00:00:08,533
Start by saying that.

5
00:00:08,533 --> 00:00:10,633
I'm at the same time excited.

6
00:00:10,633 --> 00:00:13,133
But sad that
this. Is the end of the journey.

7
00:00:13,133 --> 00:00:15,200
But no worries, we're going to end on a.

8
00:00:15,200 --> 00:00:16,500
Very, very.

9
00:00:16,500 --> 00:00:17,733
Good note.

10
00:00:17,733 --> 00:00:18,900
And that good note is.

11
00:00:18,900 --> 00:00:20,700
About, of course, XGBoost.

12
00:00:20,700 --> 00:00:22,633
It is a super powerful.

13
00:00:22,633 --> 00:00:25,166
Machine learning. Model,
which I absolutely.

14
00:00:25,166 --> 00:00:26,700
Want. You to have. In the.

15
00:00:26,700 --> 00:00:30,633
Toolkit because you will see
that it brings exit and results.

16
00:00:30,633 --> 00:00:32,966
In most machine learning problems.

17
00:00:32,966 --> 00:00:37,666
And actually, the thing so cool about this
is that it can be both used for.

18
00:00:37,666 --> 00:00:40,033
Regression and classification.

19
00:00:40,033 --> 00:00:41,033
So there we go.

20
00:00:41,033 --> 00:00:41,900
Let's cross.

21
00:00:41,900 --> 00:00:44,333
The finish line together in this final.

22
00:00:44,333 --> 00:00:45,000
Tutorial.

23
00:00:45,000 --> 00:00:48,066
By implementing. The XGBoost. Model.

24
00:00:48,533 --> 00:00:51,333
So that model is given to you in person.

25
00:00:51,333 --> 00:00:52,266
So just before we.

26
00:00:52,266 --> 00:00:55,400
Enter this part,
make sure to be on that same page.

27
00:00:55,400 --> 00:00:58,200
I give you the link to this folder
right before this tutorial.

28
00:00:58,200 --> 00:01:00,900
In the article.
So make sure to connect to it.

29
00:01:00,900 --> 00:01:02,066
And now here we go.

30
00:01:02,066 --> 00:01:03,866
Let's finish this journey together.

31
00:01:03,866 --> 00:01:06,733
By entering part ten.
And then the final section.

32
00:01:06,733 --> 00:01:07,900
Of this course, section.

33
00:01:07,900 --> 00:01:10,866
49. On XGBoost.

34
00:01:10,866 --> 00:01:11,333
All right.

35
00:01:11,333 --> 00:01:15,333
And as usual we're going to start
with Python which contains two files.

36
00:01:15,333 --> 00:01:16,600
First the data.

37
00:01:16,600 --> 00:01:19,400
And second, the implementation.

38
00:01:19,400 --> 00:01:22,366
And now. You probably also noticed that.

39
00:01:22,366 --> 00:01:25,200
There are many files open
now on my machine.

40
00:01:25,200 --> 00:01:26,766
Write all these files.

41
00:01:26,766 --> 00:01:27,733
And these are, you know, the.

42
00:01:27,733 --> 00:01:29,933
Files that we. Implemented.

43
00:01:29,933 --> 00:01:30,966
So quickly and.

44
00:01:30,966 --> 00:01:34,933
Efficiently
when doing that model selection demo.

45
00:01:35,066 --> 00:01:38,066
At the end of part three classification.

46
00:01:38,066 --> 00:01:40,500
Right.
I gave you. This model selection folder.

47
00:01:40,500 --> 00:01:42,900
With all these classification. Models.

48
00:01:42,900 --> 00:01:45,300
Which were. Experimented on the same.

49
00:01:45,300 --> 00:01:49,033
Data set,
which is that data set data dot CSV.

50
00:01:49,400 --> 00:01:51,533
And which I remind consists.

51
00:01:51,533 --> 00:01:53,633
Of predicting if a breast.

52
00:01:53,633 --> 00:01:56,500
Cancer tumor is benign or malignant.

53
00:01:56,500 --> 00:02:00,133
Meaning that each row of this data
set corresponds. To.

54
00:02:00,133 --> 00:02:02,433
A patient. You know is a certain patient.

55
00:02:02,433 --> 00:02:03,900
And for each of these patients, we.

56
00:02:03,900 --> 00:02:05,900
Have several features.

57
00:02:05,900 --> 00:02:08,966
From the clump
thickness, the uniformity of.

58
00:02:08,966 --> 00:02:09,833
Cell size. The.

59
00:02:09,833 --> 00:02:14,900
Uniformity of cell shape, you know, all
these features that are characteristics.

60
00:02:14,900 --> 00:02:16,166
Of a tumor.

61
00:02:16,166 --> 00:02:19,166
And with all these features,
we were trying to predict if the.

62
00:02:19,166 --> 00:02:22,400
Tumor is benign or malignant,
it is benign.

63
00:02:22,400 --> 00:02:23,966
If we get to class two.

64
00:02:23,966 --> 00:02:25,033
And malignant if.

65
00:02:25,033 --> 00:02:27,233
We get the class four. All right.

66
00:02:27,233 --> 00:02:29,033
And we build and trained.

67
00:02:29,033 --> 00:02:29,766
All of our.

68
00:02:29,766 --> 00:02:30,933
Classification model.

69
00:02:30,933 --> 00:02:33,433
Which are all the ones right here to.

70
00:02:33,433 --> 00:02:35,000
Learn the correlations between.

71
00:02:35,000 --> 00:02:37,000
All these features.

72
00:02:37,000 --> 00:02:41,700
And that dependent variable telling
if the tumor is benign or malignant.

73
00:02:42,166 --> 00:02:43,300
And remember.

74
00:02:43,300 --> 00:02:44,833
That we had. Different.

75
00:02:44,833 --> 00:02:48,300
Accuracies for each of these models
with the logistic.

76
00:02:48,300 --> 00:02:52,466
Regression model,
we had an accuracy of 94.7%.

77
00:02:52,733 --> 00:02:54,033
With the K-nearest neighbors.

78
00:02:54,033 --> 00:02:56,800
We had. An accuracy of 94.7%.

79
00:02:56,800 --> 00:02:58,200
Again, with the.

80
00:02:58,200 --> 00:03:00,166
SVM, we had an accuracy.

81
00:03:00,166 --> 00:03:02,866
Of. 94.1%.

82
00:03:02,866 --> 00:03:06,166
With the kernel SVM,
we had a better accuracy actually.

83
00:03:06,300 --> 00:03:09,200
With 95.3. Percent.

84
00:03:09,200 --> 00:03:10,400
With Naive Bayes.

85
00:03:10,400 --> 00:03:12,433
We got a lower accuracy of.

86
00:03:12,433 --> 00:03:14,266
94.1% again.

87
00:03:14,266 --> 00:03:17,066
And with decision tree
classification, well.

88
00:03:17,066 --> 00:03:18,133
That. Was the winner.

89
00:03:18,133 --> 00:03:20,533
We got an amazing. Accuracy of.

90
00:03:20,533 --> 00:03:23,333
95.9%. That was.

91
00:03:23,333 --> 00:03:24,766
The number one on the.

92
00:03:24,766 --> 00:03:28,566
Podium, followed by the kernel SVM
with that accuracy.

93
00:03:28,566 --> 00:03:32,133
And then unfortunately, Random Forest
did not do any.

94
00:03:32,133 --> 00:03:33,600
Good or, you know.

95
00:03:33,600 --> 00:03:35,166
Not better than the others.

96
00:03:35,166 --> 00:03:38,100
Because we got. A 93.5%.

97
00:03:38,100 --> 00:03:39,633
Accuracy with it.

98
00:03:39,633 --> 00:03:42,033
And so. What I want to. Do now.

99
00:03:42,033 --> 00:03:46,266
As you probably have
guessed, is to build the XGBoost.

100
00:03:46,266 --> 00:03:49,033
Model and train it on the same data.

101
00:03:49,033 --> 00:03:50,433
Set to see.

102
00:03:50,433 --> 00:03:51,633
If it can take.

103
00:03:51,633 --> 00:03:53,333
The throne holds by the.

104
00:03:53,333 --> 00:03:56,133
Decision tree classification model.

105
00:03:56,133 --> 00:03:57,033
In other words.

106
00:03:57,033 --> 00:03:58,800
To see if it. Can beat.

107
00:03:58,800 --> 00:04:00,066
That accuracy.

108
00:04:00,066 --> 00:04:01,800
Obtained with the. Decision tree.

109
00:04:01,800 --> 00:04:03,266
Classification model.

110
00:04:03,266 --> 00:04:07,033
And well,
maybe maybe that's the good note on which.

111
00:04:07,033 --> 00:04:07,433
We will.

112
00:04:07,433 --> 00:04:09,400
End the journey. Of this course.

113
00:04:09,400 --> 00:04:10,233
Are you ready?

114
00:04:10,233 --> 00:04:11,333
So let's do. This.

115
00:04:11,333 --> 00:04:16,400
We're now going to build and train
our XGBoost model on the exact same.

116
00:04:16,400 --> 00:04:17,366
Data set.

117
00:04:17,366 --> 00:04:18,866
And see if we can beat.

118
00:04:18,866 --> 00:04:19,633
Basically.

119
00:04:19,633 --> 00:04:22,633
An. Accuracy. Of 95.9%.

120
00:04:22,866 --> 00:04:23,566
And not only.

121
00:04:23,566 --> 00:04:24,100
We will test.

122
00:04:24,100 --> 00:04:25,200
That on a single.

123
00:04:25,200 --> 00:04:27,366
Test set,
but also of. Course, now that we.

124
00:04:27,366 --> 00:04:28,300
Learned k fold.

125
00:04:28,300 --> 00:04:30,600
Cross-Validation
in the previous section, we.

126
00:04:30,600 --> 00:04:33,600
Will test this. On ten test. Fools
so that.

127
00:04:33,633 --> 00:04:36,166
We can get a relevant measure of.
The accuracy.

128
00:04:36,166 --> 00:04:37,300
And make sure.

129
00:04:37,300 --> 00:04:38,700
That perhaps.

130
00:04:38,700 --> 00:04:42,033
XGBoost will now become the number
one on the podium.

131
00:04:42,033 --> 00:04:44,866
With the.
Ultimate machine learning. Throne.

132
00:04:44,866 --> 00:04:46,800
So let's check this out right now.

133
00:04:46,800 --> 00:04:48,966
Let's open this implementation.

134
00:04:48,966 --> 00:04:51,200
With either Google Colaboratory or.

135
00:04:51,200 --> 00:04:52,566
Jupyter Notebook.

136
00:04:52,566 --> 00:04:54,100
I'm going to put. It. Last.

137
00:04:54,100 --> 00:04:55,233
You know, just next.

138
00:04:55,233 --> 00:04:58,200
To all our other classification. Models.

139
00:04:58,200 --> 00:05:00,900
And now the notebook. Just. Opened.

140
00:05:00,900 --> 00:05:03,600
But it is. Still in read only mode.
So we're going to create.

141
00:05:03,600 --> 00:05:06,533
A copy right away by clicking Save.

142
00:05:06,533 --> 00:05:07,333
A Copy and Drive.

143
00:05:07,333 --> 00:05:08,200
You can notice that.

144
00:05:08,200 --> 00:05:10,466
All these are copies of the.

145
00:05:10,466 --> 00:05:11,966
Original implementations.

146
00:05:11,966 --> 00:05:12,833
Which are right here.

147
00:05:12,833 --> 00:05:14,066
You know that's the.

148
00:05:14,066 --> 00:05:15,966
Model selection folder and then.

149
00:05:15,966 --> 00:05:19,433
Classification subfolder,
which I gave you at the end of part three.

150
00:05:19,600 --> 00:05:22,000
So you can run these codes again
if you want.

151
00:05:22,000 --> 00:05:23,633
But we already did that.

152
00:05:23,633 --> 00:05:27,200
Just remember that the number one
was indeed decision tree classification

153
00:05:27,200 --> 00:05:30,200
with an accuracy of 95.9%.

154
00:05:30,333 --> 00:05:31,066
And now we're going to.

155
00:05:31,066 --> 00:05:34,233
See if we can beat this with XGBoost.

156
00:05:35,166 --> 00:05:35,900
All right.

157
00:05:35,900 --> 00:05:38,366
So of course no worries.

158
00:05:38,366 --> 00:05:40,233
We won't re-implement all this.

159
00:05:40,233 --> 00:05:42,766
We will quickly get to the core.

160
00:05:42,766 --> 00:05:44,800
Of the implementation. And mostly.

161
00:05:44,800 --> 00:05:46,600
The exciting part, which are the.

162
00:05:46,600 --> 00:05:49,233
Results in this same. Tutorial.

163
00:05:49,233 --> 00:05:51,200
Because indeed. All the cells of.

164
00:05:51,200 --> 00:05:52,733
This implementation are.

165
00:05:52,733 --> 00:05:55,066
Just cells.
Taken from our diverse. Toolkit.

166
00:05:55,066 --> 00:05:55,533
Right?

167
00:05:55,533 --> 00:05:58,566
These three first cells are,
as you recognize

168
00:05:58,566 --> 00:06:01,566
perfectly, the cells of our data
preprocessing template.

169
00:06:01,566 --> 00:06:01,966
Right.

170
00:06:01,966 --> 00:06:04,733
We first import the libraries
and we import the data set with.

171
00:06:04,733 --> 00:06:06,000
The exact same code.

172
00:06:06,000 --> 00:06:08,133
I just put the name of the data.
Set. Here.

173
00:06:08,133 --> 00:06:10,633
And then we split. The data. Sets
into the training set and.

174
00:06:10,633 --> 00:06:11,700
Test set.

175
00:06:11,700 --> 00:06:13,833
So this is all the data
preprocessing phase.

176
00:06:13,833 --> 00:06:16,500
Then we train XGBoost on the training set.

177
00:06:16,500 --> 00:06:17,733
And of course I'm going to.

178
00:06:17,733 --> 00:06:20,666
Delete this cell right away
because that's the cell we.

179
00:06:20,666 --> 00:06:22,700
Will re-implement together.

180
00:06:22,700 --> 00:06:23,233
And then we.

181
00:06:23,233 --> 00:06:27,033
Have the other tools of our other toolkits
like the classification toolkit.

182
00:06:27,300 --> 00:06:30,400
Because indeed this cell
makes the confusion matrix

183
00:06:30,400 --> 00:06:33,400
and prints at the same time the accuracy.

184
00:06:33,466 --> 00:06:35,400
I actually already deleted the.

185
00:06:35,400 --> 00:06:36,666
Output to make sure.

186
00:06:36,666 --> 00:06:38,233
We get the full surprise.

187
00:06:38,233 --> 00:06:39,700
By the end of this. Tutorial.

188
00:06:39,700 --> 00:06:43,333
And then of course, as I told you,
we are going to apply K-Fold

189
00:06:43,333 --> 00:06:44,833
cross validation right at.

190
00:06:44,833 --> 00:06:45,866
The end to make.

191
00:06:45,866 --> 00:06:46,366
Sure.

192
00:06:46,366 --> 00:06:48,900
That indeed we didn't get lucky on the.

193
00:06:48,900 --> 00:06:49,466
Test set.

194
00:06:49,466 --> 00:06:51,500
You know, if we indeed can beat all.

195
00:06:51,500 --> 00:06:54,366
The other algorithms.
So we will not. Only get a.

196
00:06:54,366 --> 00:06:57,200
First measure of the performance
thanks to a single.

197
00:06:57,200 --> 00:07:00,000
Test set with this. Cell,
and then we'll. Get the.

198
00:07:00,000 --> 00:07:01,400
Ultimate measure of.

199
00:07:01,400 --> 00:07:04,200
The accuracy with that cell.

200
00:07:04,200 --> 00:07:05,066
Are you ready?

201
00:07:05,066 --> 00:07:06,300
Let's start by.

202
00:07:06,300 --> 00:07:07,533
Building and training.

203
00:07:07,533 --> 00:07:09,233
XGBoost. On the training.

204
00:07:09,233 --> 00:07:12,100
Set, which resulted.
From the split. Of the data set.

205
00:07:12,100 --> 00:07:14,133
Between the training set and test set.

206
00:07:14,133 --> 00:07:16,766
And first, in order.
To get the. Assistance of Google.

207
00:07:16,766 --> 00:07:20,866
Collab, well, let's apply
this reflex of uploading the.

208
00:07:20,866 --> 00:07:22,533
Data into the notebook.

209
00:07:22,533 --> 00:07:23,666
So, you know, I just.

210
00:07:23,666 --> 00:07:24,566
Clicked this.

211
00:07:24,566 --> 00:07:27,033
Folder button here,
and then a second. Will see the.

212
00:07:27,033 --> 00:07:29,600
Upload button to upload
indeed. Our data set.

213
00:07:29,600 --> 00:07:30,833
So let's click it.

214
00:07:30,833 --> 00:07:33,233
And now let's. Go to.
Our machine learning.

215
00:07:33,233 --> 00:07:35,066
Is it Codes and Data sets folder.

216
00:07:35,066 --> 00:07:35,566
Because you will.

217
00:07:35,566 --> 00:07:36,700
Still find the data.

218
00:07:36,700 --> 00:07:39,400
Dot CSV. File in this folder in. Parts.

219
00:07:39,400 --> 00:07:41,866
And of course.
So let's go into this folder.

220
00:07:41,866 --> 00:07:44,900
Then let's go into part ten
then section 49.

221
00:07:44,900 --> 00:07:45,933
XGBoost.

222
00:07:45,933 --> 00:07:46,666
Python.

223
00:07:46,666 --> 00:07:47,166
And here.

224
00:07:47,166 --> 00:07:49,100
Is the data set data dot CSV.

225
00:07:49,100 --> 00:07:50,833
Of many patients with.

226
00:07:50,833 --> 00:07:52,600
Tumors for which we have to predict if.

227
00:07:52,600 --> 00:07:54,933
The tumor is benign or malignant.

228
00:07:54,933 --> 00:07:56,400
So open.

229
00:07:56,400 --> 00:07:57,533
Okay.

230
00:07:57,533 --> 00:08:00,666
And now the data set is indeed
uploaded into the notebook.

231
00:08:00,666 --> 00:08:01,600
So there we go.

232
00:08:01,600 --> 00:08:05,100
We can implement that cell
and then run the whole code.

233
00:08:05,366 --> 00:08:07,466
All right.
So let's create a new code cell.

234
00:08:07,466 --> 00:08:08,366
And there we. Go.

235
00:08:08,366 --> 00:08:09,600
Let's build and train.

236
00:08:09,600 --> 00:08:11,966
XGBoost. On the training set.

237
00:08:11,966 --> 00:08:12,966
So you're going to see that it's.

238
00:08:12,966 --> 00:08:14,733
Going to be super easy.

239
00:08:14,733 --> 00:08:16,566
And in fact we won't even do it.

240
00:08:16,566 --> 00:08:20,233
With scikit learn,
but with a library called XGBoost.

241
00:08:20,233 --> 00:08:21,000
In which. We.

242
00:08:21,000 --> 00:08:23,033
Don't even have to install thanks.

243
00:08:23,033 --> 00:08:24,933
To Google Colab, because it is one of.

244
00:08:24,933 --> 00:08:27,200
The many. Packages already.

245
00:08:27,200 --> 00:08:30,200
Installed on Google Colab already
pre-installed.

246
00:08:30,300 --> 00:08:32,200
So we have nothing to worry about

247
00:08:32,200 --> 00:08:35,866
and we can just start building
and training this model.

248
00:08:36,233 --> 00:08:38,500
But first we're going
to import. The class.

249
00:08:38,500 --> 00:08:41,066
With which we're going to build this
and this class belongs of.

250
00:08:41,066 --> 00:08:43,233
Course, to this XGBoost library.

251
00:08:43,233 --> 00:08:45,633
So there we go. We're going to start from.

252
00:08:45,633 --> 00:08:48,000
This XGBoost library.

253
00:08:48,000 --> 00:08:49,966
Right. It's built this way just.

254
00:08:49,966 --> 00:08:52,633
Like the name of the model
XGBoost. Indeed.

255
00:08:52,633 --> 00:08:54,000
And from this. Library.

256
00:08:54,000 --> 00:08:55,300
We're going to import.

257
00:08:55,300 --> 00:08:57,733
Well, the class. That can build an.

258
00:08:57,733 --> 00:08:59,366
XGBoost classification.

259
00:08:59,366 --> 00:09:02,366
Model and which is. Called ex-KGB.

260
00:09:02,400 --> 00:09:04,200
There we go. Google collab found it.

261
00:09:04,200 --> 00:09:06,400
XGBoost classifier.

262
00:09:06,400 --> 00:09:06,933
All right.

263
00:09:06,933 --> 00:09:08,133
And now the next natural.

264
00:09:08,133 --> 00:09:09,833
Step as usual, is to.

265
00:09:09,833 --> 00:09:11,366
Create an instance of this.

266
00:09:11,366 --> 00:09:13,266
Class which will be exactly the.

267
00:09:13,266 --> 00:09:16,400
Object containing the. XGBoost. Model.

268
00:09:16,533 --> 00:09:19,366
So once again.
We're going to call it classifier.

269
00:09:20,700 --> 00:09:21,300
All right.

270
00:09:21,300 --> 00:09:23,400
And we'll. Create this classifier as.

271
00:09:23,400 --> 00:09:24,700
An instance indeed.

272
00:09:24,700 --> 00:09:28,200
Of the XGBoost classifier class.

273
00:09:28,200 --> 00:09:29,200
Perfect.

274
00:09:29,200 --> 00:09:32,000
And now the good news
is that we won't have too much.

275
00:09:32,000 --> 00:09:34,166
To worry about with this class, because.

276
00:09:34,166 --> 00:09:36,633
There is not much parameter to tune.
Right?

277
00:09:36,633 --> 00:09:37,666
Basically, the default.

278
00:09:37,666 --> 00:09:38,466
Version of the.

279
00:09:38,466 --> 00:09:39,233
XGBoost.

280
00:09:39,233 --> 00:09:42,166
Model will most of the time
perform super. Well.

281
00:09:42,166 --> 00:09:44,133
So all good here and now.

282
00:09:44,133 --> 00:09:44,600
Of course.

283
00:09:44,600 --> 00:09:46,933
We finished this. By. Connecting.

284
00:09:46,933 --> 00:09:48,533
This extra boost classifier.

285
00:09:48,533 --> 00:09:49,933
To our training set.

286
00:09:49,933 --> 00:09:51,833
And the way to. Do this is by, of course.

287
00:09:51,833 --> 00:09:52,666
Calling the.

288
00:09:52,666 --> 00:09:54,166
Fit. Method from our.

289
00:09:54,166 --> 00:09:55,766
Classifier object, which.

290
00:09:55,766 --> 00:09:58,266
Will do nothing else than train this.

291
00:09:58,266 --> 00:09:59,766
XGBoost classifier.

292
00:09:59,766 --> 00:10:01,900
On the training set. Right?

293
00:10:01,900 --> 00:10:04,500
So something we did many times.

294
00:10:04,500 --> 00:10:05,233
So there we go.

295
00:10:05,233 --> 00:10:09,000
Let's do it one less time
in this whole machine learning journey.

296
00:10:09,000 --> 00:10:10,300
But then I'm sure you. Will do it.

297
00:10:10,300 --> 00:10:14,300
Many times once again in the future,
in your future machine learning career.

298
00:10:14,466 --> 00:10:17,466
So let's do this. We call our classifier.

299
00:10:17,800 --> 00:10:19,666
From which we're going to call.

300
00:10:19,666 --> 00:10:21,000
This fit.

301
00:10:21,000 --> 00:10:22,066
Method. Which will.

302
00:10:22,066 --> 00:10:22,933
Train the.

303
00:10:22,933 --> 00:10:23,966
Classifier.

304
00:10:23,966 --> 00:10:26,300
On the training set, which is composed.

305
00:10:26,300 --> 00:10:28,066
Of first the features of the.

306
00:10:28,066 --> 00:10:31,066
Training set represented by X train,
and then.

307
00:10:31,333 --> 00:10:32,533
The dependent variable of the.

308
00:10:32,533 --> 00:10:36,133
Training set represented. By Y train.

309
00:10:36,133 --> 00:10:38,033
And these are exactly, of course.

310
00:10:38,033 --> 00:10:40,066
The inputs of the fitness it.

311
00:10:41,033 --> 00:10:41,700
All right.

312
00:10:41,700 --> 00:10:42,700
So in the.

313
00:10:42,700 --> 00:10:44,900
Flashiest of the flashes we built.

314
00:10:44,900 --> 00:10:45,733
And trained.

315
00:10:45,733 --> 00:10:47,333
This XGBoost model.

316
00:10:47,333 --> 00:10:48,466
On the training set.

317
00:10:48,466 --> 00:10:50,866
We only had. To implement
these three lines of code.

318
00:10:50,866 --> 00:10:53,500
And then all. The rest. Is
something we've already done.

319
00:10:53,500 --> 00:10:55,200
You know this is the confusion matrix.

320
00:10:55,200 --> 00:10:56,333
You have this code.

321
00:10:56,333 --> 00:10:58,766
In all. Of your classification. Templates.

322
00:10:58,766 --> 00:11:00,766
And finally. Of course,
we apply the exact.

323
00:11:00,766 --> 00:11:02,766
Same cell that we implemented just.

324
00:11:02,766 --> 00:11:06,600
Before in the previous section
to apply k for cross-validation.

325
00:11:06,900 --> 00:11:09,900
So we are ready to run this code,
but just before we do it,

326
00:11:10,033 --> 00:11:11,633
I just want to show you the way to.

327
00:11:11,633 --> 00:11:14,266
Build an. ex-KGB regressor. Model.

328
00:11:14,266 --> 00:11:16,900
You know, an extra boost model
for regression.

329
00:11:16,900 --> 00:11:18,600
It's actually super simple.

330
00:11:18,600 --> 00:11:20,833
The only thing you need. To change here.

331
00:11:20,833 --> 00:11:22,266
Is just, of. Course, the name of.

332
00:11:22,266 --> 00:11:27,166
The class, which wouldn't be ex-KGB
classifier, but ex-KGB.

333
00:11:27,500 --> 00:11:30,500
And as you will see, ex-KGB regressor.

334
00:11:30,833 --> 00:11:31,966
And then, you know, you would just.

335
00:11:31,966 --> 00:11:34,333
Replace classify here by regressor.

336
00:11:34,333 --> 00:11:35,333
And then that's. It.

337
00:11:35,333 --> 00:11:36,666
This way you will build a.

338
00:11:36,666 --> 00:11:39,366
Regression model based on X boost.

339
00:11:39,366 --> 00:11:39,700
All right.

340
00:11:39,700 --> 00:11:43,366
But let's go back to our XGBoost
classifier class.

341
00:11:43,366 --> 00:11:44,500
And there we go.

342
00:11:44,500 --> 00:11:48,066
Now we can just save this implementation
and do a run all.

343
00:11:48,066 --> 00:11:49,100
To. Find out.

344
00:11:49,100 --> 00:11:52,033
If X boost. Is going to steal
the throne of.

345
00:11:52,033 --> 00:11:54,233
The decision tree classification. Model.

346
00:11:54,233 --> 00:11:56,266
For this particular data set.

347
00:11:56,266 --> 00:11:57,333
So quick reminder.

348
00:11:57,333 --> 00:12:00,966
With the decision tree classification
model we got the best accuracy.

349
00:12:00,966 --> 00:12:04,433
You know the highest one of 95.9%.

350
00:12:04,700 --> 00:12:06,800
And now let's find out if we can beat.

351
00:12:06,800 --> 00:12:09,166
This with the XGBoost model.

352
00:12:09,166 --> 00:12:10,133
Trained on the.

353
00:12:10,133 --> 00:12:12,666
Exact. Same. Data set.

354
00:12:12,666 --> 00:12:13,700
All right. So basically.

355
00:12:13,700 --> 00:12:16,200
We're ready.
Now we're just going to do a run.

356
00:12:16,200 --> 00:12:16,733
All by.

357
00:12:16,733 --> 00:12:18,966
Clicking runtime here. And now.

358
00:12:18,966 --> 00:12:21,033
Are you ready I bet you are.

359
00:12:21,033 --> 00:12:23,666
So let's do this three two.

360
00:12:23,666 --> 00:12:25,300
One. Go.

361
00:12:25,300 --> 00:12:27,000
All right.
So all the cells are running now.

362
00:12:27,000 --> 00:12:29,700
And we get. An impressive.

363
00:12:29,700 --> 00:12:30,866
Accuracy.

364
00:12:30,866 --> 00:12:33,433
Of 97.8.

365
00:12:33,433 --> 00:12:34,466
Percent.

366
00:12:34,466 --> 00:12:37,833
When I was telling you that we're going
to end this journey on a good note.

367
00:12:37,833 --> 00:12:39,966
Well, I was. Choosing my words.

368
00:12:39,966 --> 00:12:42,466
Very, very carefully indeed.

369
00:12:42,466 --> 00:12:44,333
That's just an amazing accuracy.

370
00:12:44,333 --> 00:12:44,733
You know, there are.

371
00:12:44,733 --> 00:12:46,166
Only three incorrect.

372
00:12:46,166 --> 00:12:48,633
Predictions on such a sensitive problem.

373
00:12:48,633 --> 00:12:50,100
You know, cancer prediction.

374
00:12:50,100 --> 00:12:51,900
Well, this result. Is just amazing.

375
00:12:51,900 --> 00:12:55,133
Here. Indeed, we almost. Get 98% accuracy.

376
00:12:55,133 --> 00:12:57,600
With. These only three incorrect
predictions.

377
00:12:57,600 --> 00:12:59,100
That's just amazing.

378
00:12:59,100 --> 00:13:00,266
But now we have. To check.

379
00:13:00,266 --> 00:13:02,200
One last thing,
because, you know, maybe we.

380
00:13:02,200 --> 00:13:05,100
Got lucky. On this single. Test. Set.

381
00:13:05,100 --> 00:13:05,600
Maybe that.

382
00:13:05,600 --> 00:13:08,733
Single test,
it was more favorable to actually boost.

383
00:13:08,733 --> 00:13:09,666
On the. Other.

384
00:13:09,666 --> 00:13:10,733
Classification models.

385
00:13:10,733 --> 00:13:13,666
Which could. Explain
why Extra Boost was number one.

386
00:13:13,666 --> 00:13:16,233
And the only. Way to check
this is by actually.

387
00:13:16,233 --> 00:13:17,400
Computing other.

388
00:13:17,400 --> 00:13:19,366
Accuracies. On other test. Sets.

389
00:13:19,366 --> 00:13:21,033
And this. Is exactly what k fold.

390
00:13:21,033 --> 00:13:22,566
Cross-Validation is about.

391
00:13:22,566 --> 00:13:23,166
And that is.

392
00:13:23,166 --> 00:13:25,800
Why this is the last cell of this.
Implementation.

393
00:13:25,800 --> 00:13:30,933
And we also have the result for this,
which is, as we can see, still.

394
00:13:30,933 --> 00:13:31,833
An amazing.

395
00:13:31,833 --> 00:13:35,200
Accuracy. Of 96.50. 3%.

396
00:13:35,566 --> 00:13:36,633
This is of course.

397
00:13:36,633 --> 00:13:40,133
An. Average accuracy
obtained as a result of the average.

398
00:13:40,133 --> 00:13:43,133
Of ten different accuracies
measured on ten different test.

399
00:13:43,133 --> 00:13:45,100
Sets. And besides, we have a.

400
00:13:45,100 --> 00:13:47,733
Rather small standard deviation of only.

401
00:13:47,733 --> 00:13:49,400
2%, which is good.

402
00:13:49,400 --> 00:13:53,066
Once again for this
sensitive problem of cancer prediction.

403
00:13:53,466 --> 00:13:55,066
So yes, XGBoost.

404
00:13:55,066 --> 00:13:56,900
Is definitely number one here.

405
00:13:56,900 --> 00:13:59,133
And that's why my friends,
I'm just. Super happy.

406
00:13:59,133 --> 00:14:02,266
That we end on this good note
with this final, powerful.

407
00:14:02,266 --> 00:14:03,100
Tool that you get.

408
00:14:03,100 --> 00:14:04,566
In your machine learning toolkit,

409
00:14:04,566 --> 00:14:08,133
because now you can start your post
machine learning journey, you know, for.

410
00:14:08,133 --> 00:14:10,533
Your career. In full confidence.

411
00:14:10,533 --> 00:14:12,000
And about that, that will.

412
00:14:12,000 --> 00:14:14,533
Be my final. Words to you in this course.

413
00:14:14,533 --> 00:14:15,066
I wish.

414
00:14:15,066 --> 00:14:16,233
You tons of great.

415
00:14:16,233 --> 00:14:19,233
Success in your future machine
learning projects.

416
00:14:19,233 --> 00:14:20,066
I wish that you.

417
00:14:20,066 --> 00:14:22,100
Are the talented data scientist

418
00:14:22,100 --> 00:14:25,933
who brings the strongest insights
and the highest value analysis.

419
00:14:25,933 --> 00:14:28,333
To your. Team and to your clients.

420
00:14:28,333 --> 00:14:30,300
Now you're totally able to do this.

421
00:14:30,300 --> 00:14:33,766
Thanks to your complete
and powerful machine learning toolkit.

422
00:14:33,900 --> 00:14:37,166
With these,
you're totally able to smash your future.

423
00:14:37,166 --> 00:14:38,666
Machine learning problems.

424
00:14:38,666 --> 00:14:41,166
So once again.
I wish you the best. And I look.

425
00:14:41,166 --> 00:14:42,966
Forward to seeing. You in another.

426
00:14:42,966 --> 00:14:45,600
Course for. A new data science journey.

427
00:14:45,600 --> 00:14:48,533
And until then,
of course, enjoy machine learning.