1
00:00:00,166 --> 00:00:02,100
Hello my friends, and welcome to the.

2
00:00:02,100 --> 00:00:04,266
Final practical activity of.

3
00:00:04,266 --> 00:00:07,266
This part nine dimensionality. Reduction.

4
00:00:07,333 --> 00:00:08,966
We already built two.

5
00:00:08,966 --> 00:00:10,166
Dimensionality reduction.

6
00:00:10,166 --> 00:00:12,433
Models first principal.

7
00:00:12,433 --> 00:00:16,033
Component analysis
and then second linear discriminant.

8
00:00:16,033 --> 00:00:17,133
Analysis.

9
00:00:17,133 --> 00:00:18,333
We got amazing.

10
00:00:18,333 --> 00:00:22,333
Results with both, but slightly better
results and actually perfect results.

11
00:00:22,333 --> 00:00:24,733
With linear discriminant analysis.

12
00:00:24,733 --> 00:00:27,300
So now we're.
Hoping that with our third tool of.

13
00:00:27,300 --> 00:00:29,066
The dimensionality reduction.

14
00:00:29,066 --> 00:00:32,133
Toolkit
that we get at least the same thing as PCA

15
00:00:32,133 --> 00:00:35,133
or that same perfect results as with LDA.

16
00:00:35,400 --> 00:00:37,000
And you might guess that since.

17
00:00:37,000 --> 00:00:40,900
Now we're about to add a kernel,
and as we saw with SVM

18
00:00:40,900 --> 00:00:44,000
and kernel SVM, adding a kernel
always improves the result.

19
00:00:44,166 --> 00:00:45,900
Well,
you might guess that we are about to.

20
00:00:45,900 --> 00:00:48,000
Get amazing results. As well.

21
00:00:48,000 --> 00:00:48,566
All right.

22
00:00:48,566 --> 00:00:50,866
So let's start.
Let's build that final model.

23
00:00:50,866 --> 00:00:54,033
But before this let's make sure
everyone here is on the same page.

24
00:00:54,033 --> 00:00:57,166
I give you the link to this whole folder
right before this tutorial.

25
00:00:57,166 --> 00:00:58,666
So make sure to connect to it.

26
00:00:58,666 --> 00:00:59,766
And now here we go.

27
00:00:59,766 --> 00:01:01,100
Let's enter port.

28
00:01:01,100 --> 00:01:01,800
Nine and then.

29
00:01:01,800 --> 00:01:04,800
Section 45 kernel PCA.

30
00:01:05,100 --> 00:01:07,266
And as usual
we're going to start with Python.

31
00:01:07,266 --> 00:01:10,033
And this Python folder contains two files.

32
00:01:10,033 --> 00:01:11,166
First the kernel.

33
00:01:11,166 --> 00:01:14,166
PCA implementation and the. Ipynb format.

34
00:01:14,166 --> 00:01:16,233
And of course the same data set.

35
00:01:16,233 --> 00:01:19,700
Wine. CSV which is a data set of.

36
00:01:19,700 --> 00:01:21,233
Many wines and many different.

37
00:01:21,233 --> 00:01:24,200
Wines. Each row corresponds to a wine.

38
00:01:24,200 --> 00:01:25,333
And for each of these wines,

39
00:01:25,333 --> 00:01:29,433
we have these features
from the alcohol level to the proline.

40
00:01:29,700 --> 00:01:30,733
And then for each.

41
00:01:30,733 --> 00:01:33,633
Of these wines,
we also have the customer. Segment.

42
00:01:33,633 --> 00:01:36,733
Which is the segment of customers to which

43
00:01:37,000 --> 00:01:40,800
the wine belongs to in the sense
that all the customers of each.

44
00:01:40,800 --> 00:01:43,200
Segment,
and we have three segments in total.

45
00:01:43,200 --> 00:01:46,133
Have the same preference for such wines.

46
00:01:46,133 --> 00:01:46,833
Okay.

47
00:01:46,833 --> 00:01:48,400
And so now the challenge is.

48
00:01:48,400 --> 00:01:50,100
To build a logistic regression.

49
00:01:50,100 --> 00:01:53,166
Model, combine to some dimensionality
reduction techniques.

50
00:01:53,166 --> 00:01:55,000
Apply to this data set so.

51
00:01:55,000 --> 00:01:56,133
That we can end up with.

52
00:01:56,133 --> 00:01:58,300
Less complex data set, which.

53
00:01:58,300 --> 00:02:01,333
At the same time
will provide an excellent way.

54
00:02:01,333 --> 00:02:04,000
For the logistic. Regression model
to learn.

55
00:02:04,000 --> 00:02:07,233
The correlations
between all these features. And.

56
00:02:07,566 --> 00:02:09,366
The dependent variable.

57
00:02:09,366 --> 00:02:09,833
All right.

58
00:02:09,833 --> 00:02:11,100
And then for each new. Wine, we.

59
00:02:11,100 --> 00:02:12,133
Will deploy this.

60
00:02:12,133 --> 00:02:13,166
Predictive model.

61
00:02:13,166 --> 00:02:16,200
To predict the. Customer segment
to. Which.

62
00:02:16,200 --> 00:02:17,666
This wine belongs.

63
00:02:17,666 --> 00:02:18,666
So that the owner of.

64
00:02:18,666 --> 00:02:20,700
The wine shop can recommend.

65
00:02:20,700 --> 00:02:23,466
Each new wine to the right customer.

66
00:02:23,466 --> 00:02:23,833
All right.

67
00:02:23,833 --> 00:02:26,200
So that's the exact same case as before.

68
00:02:26,200 --> 00:02:28,933
And now let's open this implementation.

69
00:02:28,933 --> 00:02:31,400
With either Google Collaboratory or.

70
00:02:31,400 --> 00:02:32,766
Jupyter. Notebook.

71
00:02:32,766 --> 00:02:35,500
As you can see, I kept this PCA.

72
00:02:35,500 --> 00:02:39,266
Implementation and this LDA implementation
so that we can compare.

73
00:02:39,633 --> 00:02:40,400
And now.

74
00:02:40,400 --> 00:02:41,400
Well as usual this.

75
00:02:41,400 --> 00:02:43,433
Implementation is in read only mode.

76
00:02:43,433 --> 00:02:45,000
Because you all have access to it.

77
00:02:45,000 --> 00:02:46,866
So let's create a copy by.

78
00:02:46,866 --> 00:02:48,000
Clicking. File here.

79
00:02:48,000 --> 00:02:50,800
Then save a copy in drive.
Because indeed in this.

80
00:02:50,800 --> 00:02:51,566
Copy we will.

81
00:02:51,566 --> 00:02:55,633
Re-Implement
the cell that implements kernel. PCA.

82
00:02:56,133 --> 00:02:58,933
Let's get rid of this
so that we can have clearly the three.

83
00:02:58,933 --> 00:03:00,933
Dimensionality reduction techniques.

84
00:03:00,933 --> 00:03:03,166
And now there we go. Let's implement.

85
00:03:03,166 --> 00:03:04,166
Kernel PCA.

86
00:03:04,166 --> 00:03:07,866
But first this time let's upload
the data set first so that.

87
00:03:07,866 --> 00:03:08,733
We can, you know.

88
00:03:08,733 --> 00:03:11,233
Get the assistance of Google Colab.

89
00:03:11,233 --> 00:03:13,400
Right now
our notebook is connecting to runtime.

90
00:03:13,400 --> 00:03:15,766
There we go. Then let's click upload.

91
00:03:15,766 --> 00:03:16,266
We'll end.

92
00:03:16,266 --> 00:03:19,166
Up in the Linear
Discriminant analysis. Folder.

93
00:03:19,166 --> 00:03:21,800
So let's just do the whole path again.

94
00:03:21,800 --> 00:03:24,666
So this is the. Whole machine. Learning
is it folder.

95
00:03:24,666 --> 00:03:28,900
Let's go inside and let's go to part
nine dimensionality reduction and section.

96
00:03:28,900 --> 00:03:30,700
45 kernel PCA.

97
00:03:30,700 --> 00:03:32,500
Python and wine dot.

98
00:03:32,500 --> 00:03:34,266
CSV. Open.

99
00:03:34,266 --> 00:03:36,233
Okay and there we go.

100
00:03:36,233 --> 00:03:39,466
Now our notebook is connected okay.

101
00:03:39,466 --> 00:03:40,766
So now we're going to do two things.

102
00:03:40,766 --> 00:03:42,966
First we're. Going to remove. That.

103
00:03:42,966 --> 00:03:43,633
Cell.

104
00:03:43,633 --> 00:03:46,733
You know put it in the bin
so that we can re-implement it.

105
00:03:46,733 --> 00:03:47,900
But also lets.

106
00:03:47,900 --> 00:03:49,833
You know remove all. The outputs.

107
00:03:49,833 --> 00:03:51,133
By train not to look at them.

108
00:03:51,133 --> 00:03:51,833
You know.

109
00:03:51,833 --> 00:03:54,166
I hope
I hope you didn't look at the results.

110
00:03:54,166 --> 00:03:56,600
But anyway, I'm. Sure you expect.

111
00:03:56,600 --> 00:03:58,266
An amazing result. As well.

112
00:03:58,266 --> 00:04:00,900
So let's just remove the output here.

113
00:04:00,900 --> 00:04:03,466
That's the visualization
of the training set results.

114
00:04:03,466 --> 00:04:06,466
And this one visualization
of the test set results.

115
00:04:06,866 --> 00:04:09,400
All right then let's take the table of.

116
00:04:09,400 --> 00:04:11,833
Contents applying kernel PCA.

117
00:04:11,833 --> 00:04:12,833
And there we go.

118
00:04:12,833 --> 00:04:15,066
We are ready to implement this.

119
00:04:15,066 --> 00:04:16,900
So let's create a new code cell.

120
00:04:16,900 --> 00:04:22,400
And now do we want to re-implement this
you know from the very scratch.

121
00:04:22,500 --> 00:04:24,633
Or do we want to be efficient.

122
00:04:24,633 --> 00:04:25,400
And well.

123
00:04:25,400 --> 00:04:27,766
Of course
that's really my spirit as a coder.

124
00:04:27,766 --> 00:04:30,600
As a machine learning programmer,
I always want to be.

125
00:04:30,600 --> 00:04:31,400
Efficient.

126
00:04:31,400 --> 00:04:33,300
And by. That I mean that, you know.

127
00:04:33,300 --> 00:04:34,566
The kernel PCA.

128
00:04:34,566 --> 00:04:35,833
Implementation is.

129
00:04:35,833 --> 00:04:39,233
Super close to the PCA. Implementation.

130
00:04:39,466 --> 00:04:42,066
Because basically it will be.
Almost the same, except.

131
00:04:42,066 --> 00:04:43,300
That we will have to add.

132
00:04:43,300 --> 00:04:45,700
A kernel in one of the inputs.

133
00:04:45,700 --> 00:04:47,800
So what we're going to do,
you know, in that spirit.

134
00:04:47,800 --> 00:04:51,333
Of efficiency, is we will go to our PCA.

135
00:04:51,333 --> 00:04:54,233
Implementation. We will say that cell.

136
00:04:54,233 --> 00:04:58,033
Because you're going to see that it's
going to be almost the same.

137
00:04:58,366 --> 00:05:00,833
So let's paste it here.

138
00:05:00,833 --> 00:05:03,633
And now the only thing
that we have to change.

139
00:05:03,633 --> 00:05:05,900
Is first the. Name of the. Class.

140
00:05:05,900 --> 00:05:08,566
But not the. Module,
because the class we're about.

141
00:05:08,566 --> 00:05:10,833
To import to implement. Kernel PCA.

142
00:05:10,833 --> 00:05:13,400
Still belongs to this decomposition
module.

143
00:05:13,400 --> 00:05:14,966
By the cyclic library.

144
00:05:14,966 --> 00:05:20,400
And that class is of course kernel PCA
just like that.

145
00:05:20,666 --> 00:05:22,833
So that's the class then.

146
00:05:22,833 --> 00:05:25,100
Well let's give a. Different
name to the object.

147
00:05:25,100 --> 00:05:27,766
We're not going to call. It PCA
but we can call. It you know.

148
00:05:27,766 --> 00:05:30,500
K PCA as. You want. You know.

149
00:05:30,500 --> 00:05:32,400
Then of course here
when we call the class to.

150
00:05:32,400 --> 00:05:34,133
Create an instance of.

151
00:05:34,133 --> 00:05:37,200
This object,
which will be this CPK variable, well.

152
00:05:37,233 --> 00:05:37,700
Of course we.

153
00:05:37,700 --> 00:05:41,100
Need to call the right class
which is kernel PCA.

154
00:05:41,733 --> 00:05:44,133
And now inside this. Class well same.

155
00:05:44,133 --> 00:05:45,666
We have to choose. A number of.

156
00:05:45,666 --> 00:05:47,333
Extracted features which is.

157
00:05:47,333 --> 00:05:50,066
Still given
by this argument and components.

158
00:05:50,066 --> 00:05:54,300
But since now we're working with a kernel,
you know, we're doing kernel PCA.

159
00:05:54,533 --> 00:05:55,800
Well, exactly the.

160
00:05:55,800 --> 00:05:57,633
Same as when we transitioned from.

161
00:05:57,633 --> 00:05:59,466
SVM to kernel. SVM.

162
00:05:59,466 --> 00:06:01,233
Well, we simply need to. Add a.

163
00:06:01,233 --> 00:06:03,833
Kernel argument here, and we'll actually.

164
00:06:03,833 --> 00:06:04,366
Choose.

165
00:06:04,366 --> 00:06:07,300
The same kernel as with kernel SVM,
meaning the.

166
00:06:07,300 --> 00:06:09,866
RBF kernel which is the radial basis.

167
00:06:09,866 --> 00:06:11,100
Function kernel.

168
00:06:11,100 --> 00:06:12,433
So there we go. That's our.

169
00:06:12,433 --> 00:06:14,633
Second argument. Here. Kernel

170
00:06:16,000 --> 00:06:17,100
in. Quotes.

171
00:06:17,100 --> 00:06:20,700
Well r d f radial basis function.

172
00:06:21,366 --> 00:06:24,466
And now let's see
let's see what there is left to change.

173
00:06:24,600 --> 00:06:25,833
So this line is good.

174
00:06:25,833 --> 00:06:27,000
The next line of code.

175
00:06:27,000 --> 00:06:29,033
Well same in order to perform.

176
00:06:29,033 --> 00:06:30,300
The kernel PCA.

177
00:06:30,300 --> 00:06:32,166
Dimensionality reduction technique.

178
00:06:32,166 --> 00:06:35,266
Well we only. Need the features of Xtrain.

179
00:06:35,300 --> 00:06:37,500
And not the dependent. Variable y train.

180
00:06:37,500 --> 00:06:38,200
So all good.

181
00:06:38,200 --> 00:06:42,666
You know that's the same as PCA
but not the same as LDA which required

182
00:06:42,833 --> 00:06:44,500
the dependent variable y train.

183
00:06:44,500 --> 00:06:47,000
So all good here. However, be careful we.

184
00:06:47,000 --> 00:06:48,733
Renamed our object not.

185
00:06:48,733 --> 00:06:51,100
PCA but k PCA.

186
00:06:51,100 --> 00:06:53,566
So same here. Cbc.ca.

187
00:06:53,566 --> 00:06:56,800
And now my friends,
this implementation is over.

188
00:06:57,100 --> 00:06:58,033
That's what. Happens.

189
00:06:58,033 --> 00:06:59,633
You know when we're being efficient,

190
00:06:59,633 --> 00:07:03,066
the implementation is sometimes
completed faster than expected.

191
00:07:03,400 --> 00:07:03,766
And that's.

192
00:07:03,766 --> 00:07:06,366
Because as you can see,
kernel PCA is very.

193
00:07:06,366 --> 00:07:07,933
Similar to. PCA.

194
00:07:07,933 --> 00:07:10,466
You know, in terms of its implementation.

195
00:07:10,466 --> 00:07:12,033
Okay. So now we're.

196
00:07:12,033 --> 00:07:14,233
Just ready to. Run all again.

197
00:07:14,233 --> 00:07:15,866
We have our data set.

198
00:07:15,866 --> 00:07:17,966
Our implementation is all good.

199
00:07:17,966 --> 00:07:19,100
So let's do this.

200
00:07:19,100 --> 00:07:21,400
Let's click. Runtime here.

201
00:07:21,400 --> 00:07:22,233
And then.

202
00:07:22,233 --> 00:07:25,233
Three to. One run. Oh go.

203
00:07:25,566 --> 00:07:26,200
All right. So now.

204
00:07:26,200 --> 00:07:27,400
All the cells. Are running.

205
00:07:27,400 --> 00:07:29,366
Our logistic regression model is built.

206
00:07:29,366 --> 00:07:30,666
and.

207
00:07:30,666 --> 00:07:34,166
As expected
well we get of course an accuracy of.

208
00:07:34,166 --> 00:07:37,566
100%. I've really. Seen some. Cases.

209
00:07:37,566 --> 00:07:40,600
Where you know,
the non kernel version of the model.

210
00:07:40,633 --> 00:07:43,333
Beats the kernel. Version of the model.

211
00:07:43,333 --> 00:07:45,400
It can happen. But it's very rare.

212
00:07:45,400 --> 00:07:45,933
There you go.

213
00:07:45,933 --> 00:07:51,200
Here of course
kernel PCA manages to beat the PCA model.

214
00:07:51,300 --> 00:07:52,333
Thinks that kernel.

215
00:07:52,333 --> 00:07:55,833
Well we fixed that incorrect prediction
which we had.

216
00:07:56,000 --> 00:07:58,966
Remember in. PCA right here.

217
00:07:58,966 --> 00:08:01,166
So all good here we get a one.

218
00:08:01,166 --> 00:08:02,700
Hundred percent accuracy.

219
00:08:02,700 --> 00:08:04,500
And now. Let's have a look at the results.
To know.

220
00:08:04,500 --> 00:08:08,433
How kernel PCA was able
to separate our classes.

221
00:08:08,433 --> 00:08:09,066
In the test.

222
00:08:09,066 --> 00:08:10,266
Set right, which.

223
00:08:10,266 --> 00:08:13,266
Are new observations
on which the model wasn't trained.

224
00:08:13,500 --> 00:08:15,033
Well there you go. That's our two.

225
00:08:15,033 --> 00:08:18,000
Principal components PC1 and. PC2.

226
00:08:18,000 --> 00:08:20,966
And now in a new dimension once again

227
00:08:20,966 --> 00:08:23,600
you know, because the wine's observation
points.

228
00:08:23,600 --> 00:08:24,533
Here are.

229
00:08:24,533 --> 00:08:28,233
Arranged in a different way
than with. PCA.

230
00:08:28,266 --> 00:08:31,100
Right? We have very different arrangement
of the points here.

231
00:08:31,100 --> 00:08:32,333
We can see them more.

232
00:08:32,333 --> 00:08:35,333
Dispersed. Than. Here right with our PCA.

233
00:08:35,500 --> 00:08:38,033
Well that's because.
We are in a new dimension.

234
00:08:38,033 --> 00:08:40,933
We are with different. Dimensions
pc1. PC2.

235
00:08:40,933 --> 00:08:43,200
Meaning. Different extracted features.

236
00:08:43,200 --> 00:08:44,933
So that's totally normal that.

237
00:08:44,933 --> 00:08:47,766
Our observation. Points,
you know the wines here. Are.

238
00:08:47,766 --> 00:08:49,600
Arranged in a very different way.

239
00:08:49,600 --> 00:08:51,766
That's because
we are in a different dimension in.

240
00:08:51,766 --> 00:08:54,000
Which, well, the logistic regression.

241
00:08:54,000 --> 00:08:57,600
Model. Was perfectly able to classify our.

242
00:08:57,600 --> 00:08:58,800
Observation points with.

243
00:08:58,800 --> 00:09:01,800
These three prediction regions.

244
00:09:01,800 --> 00:09:03,566
And similar to LDA, the.

245
00:09:03,566 --> 00:09:08,000
Observation points are arranged
differently because once again, these.

246
00:09:08,000 --> 00:09:10,233
Are. Some different dimensions.

247
00:09:10,233 --> 00:09:12,633
We are in another dimension here.

248
00:09:12,633 --> 00:09:14,833
Thanks to these extracted features.

249
00:09:14,833 --> 00:09:17,300
So you see. This dimensionality
reduction technique is.

250
00:09:17,300 --> 00:09:17,566
Pretty.

251
00:09:17,566 --> 00:09:18,833
Fascinating because it.

252
00:09:18,833 --> 00:09:21,766
Basically allows us
to create a new space of.

253
00:09:21,766 --> 00:09:24,300
Dimensions and in some new dimension.

254
00:09:24,300 --> 00:09:25,633
Well, indeed, we can.

255
00:09:25,633 --> 00:09:27,166
Perfectly classify.

256
00:09:27,166 --> 00:09:29,566
Some observations.
Like it is the case for.

257
00:09:29,566 --> 00:09:32,700
Linear
discriminant analysis and kernel. PCA.

258
00:09:33,366 --> 00:09:35,100
Now what I recommend for you to.

259
00:09:35,100 --> 00:09:38,100
Do is to practice. This on other.
Data set.

260
00:09:38,100 --> 00:09:39,666
So I recommend, for example.

261
00:09:39,666 --> 00:09:41,333
To check out the UCI.

262
00:09:41,333 --> 00:09:43,966
ML repository platform and go to.

263
00:09:43,966 --> 00:09:45,000
The classification.

264
00:09:45,000 --> 00:09:47,966
Section and try the kernel. PCA. On other.

265
00:09:47,966 --> 00:09:49,033
Data sets. And you'll.

266
00:09:49,033 --> 00:09:51,100
See you will end up with similar results.

267
00:09:51,100 --> 00:09:53,866
With. Some prediction boundaries.
Like that.

268
00:09:53,866 --> 00:09:55,000
Separating well.

269
00:09:55,000 --> 00:09:56,833
The classes. Please share your.

270
00:09:56,833 --> 00:09:59,266
Results in the Q&A,
especially the ones where we.

271
00:09:59,266 --> 00:09:59,933
Clearly see.

272
00:09:59,933 --> 00:10:00,900
An improvement.

273
00:10:00,900 --> 00:10:03,900
With kernel PCA. With respect to PCA.

274
00:10:03,966 --> 00:10:05,433
You know, maybe you'll find some data.

275
00:10:05,433 --> 00:10:07,900
Sets where PCA performs poorly, but.

276
00:10:07,900 --> 00:10:08,433
Then by.

277
00:10:08,433 --> 00:10:11,933
Adding a kernel with kernel PCA,
you will get much better results.

278
00:10:12,066 --> 00:10:13,200
So please share. This.

279
00:10:13,200 --> 00:10:16,200
I'm actually very interested
to see what you get.

280
00:10:16,500 --> 00:10:18,066
All right, thanks in advance.

281
00:10:18,066 --> 00:10:19,733
And now congratulations.

282
00:10:19,733 --> 00:10:22,933
This new chapter on dimensionality
reduction is done.

283
00:10:23,266 --> 00:10:24,933
And now we're going to move on
to the final.

284
00:10:24,933 --> 00:10:27,000
Chapter of the course, Poisson.

285
00:10:27,000 --> 00:10:29,066
Model Selection and Boosting.

286
00:10:29,066 --> 00:10:31,033
Where you will learn your three.

287
00:10:31,033 --> 00:10:32,600
Last and very important.

288
00:10:32,600 --> 00:10:34,733
Tools which are first k fold.

289
00:10:34,733 --> 00:10:36,000
Cross-Validation to.

290
00:10:36,000 --> 00:10:38,833
Evaluate your machine learning models.
The best way.

291
00:10:38,833 --> 00:10:41,500
Then parameter tuning to find the.

292
00:10:41,500 --> 00:10:44,166
Best values of your. Hyperparameters.

293
00:10:44,166 --> 00:10:46,066
And finally. The cherry. On the.

294
00:10:46,066 --> 00:10:50,833
Cake of this course I will teach you
and give you the x g boost.

295
00:10:50,833 --> 00:10:54,466
Model, which is one of the best
and most powerful machine learning.

296
00:10:54,466 --> 00:10:57,000
Models for regression or classification.

297
00:10:57,000 --> 00:10:59,000
That will complete your master machine.

298
00:10:59,000 --> 00:11:00,800
Learning toolkit the best way.

299
00:11:00,800 --> 00:11:02,833
And until then, enjoy machine learning.