1
00:00:00,200 --> 00:00:02,500
Hello and welcome to this art tutorial.

2
00:00:02,500 --> 00:00:05,833
So now we know how to implement
two feature extraction techniques.

3
00:00:06,033 --> 00:00:08,100
These are PCA and LDA.

4
00:00:08,100 --> 00:00:11,466
But these feature extraction techniques
work on linear problems.

5
00:00:11,466 --> 00:00:14,400
That is when the data
is linearly separable.

6
00:00:14,400 --> 00:00:17,666
And in this section we are going to see
one new feature extraction technique.

7
00:00:17,666 --> 00:00:19,000
But this time adapt it

8
00:00:19,000 --> 00:00:22,433
for nonlinear problems
where the data is non-linearly separable.

9
00:00:23,000 --> 00:00:26,000
So this technique is called kernel PCA.

10
00:00:26,033 --> 00:00:27,233
Kernel PCA is.

11
00:00:27,233 --> 00:00:29,066
A kernel sized version. Of PCA.

12
00:00:29,066 --> 00:00:31,300
Where we map the data
to a higher dimension

13
00:00:31,300 --> 00:00:32,433
using the kernel trick,

14
00:00:32,433 --> 00:00:35,433
and then from there
we extract some new principal components,

15
00:00:35,600 --> 00:00:38,966
and we are going to see how it manages
to deal with non-linear problems.

16
00:00:39,466 --> 00:00:41,800
So we're not going to work
on the same problem

17
00:00:41,800 --> 00:00:44,866
as we did in the previous sections
with the one data set,

18
00:00:45,100 --> 00:00:48,400
but we're going to work on the same data
set as the one used in part

19
00:00:48,400 --> 00:00:50,833
three classification.
Because now we need visuals.

20
00:00:50,833 --> 00:00:52,800
We need to clearly see what happens.

21
00:00:52,800 --> 00:00:53,066
We need.

22
00:00:53,066 --> 00:00:55,100
To see how kernel PCA.

23
00:00:55,100 --> 00:00:58,566
Manages
to extract some new independent variables.

24
00:00:58,566 --> 00:01:00,133
The principal components.

25
00:01:00,133 --> 00:01:02,400
Even when the problem is non-linear,
that is,

26
00:01:02,400 --> 00:01:04,566
when the data is not linearly separable.

27
00:01:04,566 --> 00:01:08,066
And this data set that we used in part
three, the social network

28
00:01:08,066 --> 00:01:11,700
Ads data sets will remember
it was clearly a nonlinear problem

29
00:01:11,700 --> 00:01:15,133
because nonlinear classifiers
showed much better performance.

30
00:01:15,433 --> 00:01:17,700
So let's take this data set
and let's apply.

31
00:01:17,700 --> 00:01:19,166
Konopka to see how it.

32
00:01:19,166 --> 00:01:21,133
Will handle the non-linearity.

33
00:01:21,133 --> 00:01:24,400
So let's find this data
set into our working directory folder.

34
00:01:24,633 --> 00:01:27,200
So we'll go to our machine
learning A to z folder.

35
00:01:27,200 --> 00:01:29,466
Then part nine dimensionality reduction.

36
00:01:29,466 --> 00:01:33,433
And here we are at the last section
of this part nine kernel PCA.

37
00:01:33,800 --> 00:01:36,200
So that's the folder
you want to set as a working directory.

38
00:01:36,200 --> 00:01:39,000
Make sure that you have
the social network and CSV file.

39
00:01:39,000 --> 00:01:40,066
And if that's the case you

40
00:01:40,066 --> 00:01:43,833
ready to click on this more button here
to set the folder as working directory.

41
00:01:44,400 --> 00:01:48,600
And now what we're going to do
is take this logistic regression model.

42
00:01:48,633 --> 00:01:52,600
Because you know this logistic regression
model is a linear classifier.

43
00:01:53,100 --> 00:01:56,233
Therefore it will not be appropriate
for our problem because

44
00:01:56,400 --> 00:01:58,466
our data is not linearly separable.

45
00:01:58,466 --> 00:02:02,500
So what we're going to do
is take this linear classifier here.

46
00:02:02,800 --> 00:02:07,366
But we are going to apply kernel PCA
inside of it to see how kernel.

47
00:02:07,366 --> 00:02:09,866
PCA will. Save the situation.

48
00:02:09,866 --> 00:02:13,900
And so you will see that even if we apply
a linear model well thanks to kernel.

49
00:02:13,900 --> 00:02:14,533
PCA that.

50
00:02:14,533 --> 00:02:15,900
Will manage to extract

51
00:02:15,900 --> 00:02:20,166
new principal components adapted for this
non-linearly separable data.

52
00:02:20,333 --> 00:02:22,866
Well, you will see that
we will get amazing results.

53
00:02:22,866 --> 00:02:27,766
So right now let's copy the whole model
here from the top down to the bottom.

54
00:02:28,366 --> 00:02:31,100
Copy and let's paste it in our common.

55
00:02:31,100 --> 00:02:34,100
PCA. File. All right.

56
00:02:34,133 --> 00:02:38,133
And now basically the only thing
that we have to do is to apply.

57
00:02:38,133 --> 00:02:40,500
Kernel PCA. At the right place.

58
00:02:40,500 --> 00:02:43,533
But before we do that
I would just like us to visualize again

59
00:02:43,533 --> 00:02:47,266
why this linear model is not appropriate
for this data set.

60
00:02:47,600 --> 00:02:50,700
So what we're going to do
is take everything from here.

61
00:02:50,700 --> 00:02:51,500
Because, you know, this

62
00:02:51,500 --> 00:02:55,466
will visualize the training set results
by plotting the prediction regions

63
00:02:55,466 --> 00:02:56,766
and the prediction boundary.

64
00:02:56,766 --> 00:02:59,766
So we're going to take everything
from here up to the top

65
00:02:59,766 --> 00:03:03,400
to you know, import the data set, apply
the preprocessing phase,

66
00:03:03,600 --> 00:03:06,000
fit the logistic regression
to the training set

67
00:03:06,000 --> 00:03:08,333
and eventually
plot the training set results.

68
00:03:08,333 --> 00:03:09,333
So let's do it.

69
00:03:09,333 --> 00:03:11,866
Let's visualize this again very quickly.

70
00:03:11,866 --> 00:03:14,733
And that will give us motivation
to apply kernel PCA.

71
00:03:16,366 --> 00:03:17,266
And here we go.

72
00:03:17,266 --> 00:03:19,133
All executed properly.

73
00:03:19,133 --> 00:03:22,100
So as a reminder
the points are the real observations.

74
00:03:22,100 --> 00:03:24,933
That is our real customers
in the social network

75
00:03:24,933 --> 00:03:27,933
represented by their age
and their estimated salary.

76
00:03:28,233 --> 00:03:30,400
So that's our real observation points.

77
00:03:30,400 --> 00:03:33,766
And our predictions are represented
by these regions.

78
00:03:33,766 --> 00:03:36,133
The red region here
and the green region here.

79
00:03:36,133 --> 00:03:39,833
And basically this red region
is where our model predicts

80
00:03:39,966 --> 00:03:42,833
that the customer
will not click on the ad.

81
00:03:42,833 --> 00:03:43,766
And this green region

82
00:03:43,766 --> 00:03:47,866
here is the region where a model predicts
that the customers will click on the ad.

83
00:03:47,866 --> 00:03:49,766
And by the. SVM.

84
00:03:49,766 --> 00:03:54,266
And so remember, the problem
was that this straight line here is

85
00:03:54,266 --> 00:03:57,933
actually the prediction boundary generated
by the logistic regression model.

86
00:03:58,200 --> 00:04:01,700
But since the logistic regression
model is a linear classifier,

87
00:04:01,700 --> 00:04:04,700
then it has to be a straight line here
separating the data.

88
00:04:04,700 --> 00:04:07,733
And therefore remember the problem is
that it cannot make some kind

89
00:04:07,733 --> 00:04:11,166
of a curve here
to catch these green users.

90
00:04:11,166 --> 00:04:14,166
That should be in the green region
right now they're in the red region.

91
00:04:14,300 --> 00:04:18,133
And so this clearly represents the fact
that our data is not linearly separable,

92
00:04:18,300 --> 00:04:20,133
because we can clearly see that

93
00:04:20,133 --> 00:04:23,700
this prediction boundary here
that plays the role of separator.

94
00:04:23,700 --> 00:04:26,400
And that is supposed to separate
the two classes.

95
00:04:26,400 --> 00:04:29,166
Well, it cannot separate the two classes
properly because.

96
00:04:29,166 --> 00:04:30,033
As you can see, these.

97
00:04:30,033 --> 00:04:32,633
Users are not in the right region.

98
00:04:32,633 --> 00:04:36,400
And so now what we're going to do
is not make a non-linear classifier

99
00:04:36,400 --> 00:04:37,700
like we did in part three.

100
00:04:37,700 --> 00:04:41,466
You know, when we made kernel, SVM, Naive
Bayes, decision trees or random forest.

101
00:04:41,766 --> 00:04:44,700
Well, what we're going to do now
instead is applied.

102
00:04:44,700 --> 00:04:45,700
Kernel PCA.

103
00:04:45,700 --> 00:04:49,966
So that we keep a straight line
as the separator, as the prediction

104
00:04:49,966 --> 00:04:51,700
boundary of a linear classifier.

105
00:04:51,700 --> 00:04:55,233
That is still going to be the prediction
boundary of a logistic regression model.

106
00:04:55,666 --> 00:04:58,666
But since we're going to apply.
Kernel PCA.

107
00:04:58,800 --> 00:05:02,700
Well, this will manage to apply some trick
where the trick is actually

108
00:05:02,700 --> 00:05:06,666
the kernel trick to map the data
into a higher dimension and then apply.

109
00:05:06,666 --> 00:05:07,600
PCA to.

110
00:05:07,600 --> 00:05:12,666
Extract new components that will be new
dimensions that explain the most variance.

111
00:05:12,900 --> 00:05:15,466
But thanks to this kernel trick, well.

112
00:05:15,466 --> 00:05:16,633
You'll see that we'll manage.

113
00:05:16,633 --> 00:05:21,133
To get some new dimensions in which
the data will be linearly separable

114
00:05:21,333 --> 00:05:24,333
even by a linear classifier
like logistic regression.

115
00:05:24,433 --> 00:05:25,500
So let's see that right now.

116
00:05:25,500 --> 00:05:28,433
I can't wait to show you this.
I'm going to close this.

117
00:05:28,433 --> 00:05:31,433
And now let's apply kernel. PCA.

118
00:05:31,433 --> 00:05:32,800
At the right location.

119
00:05:32,800 --> 00:05:34,666
So you already know what this location is.

120
00:05:34,666 --> 00:05:37,466
It's actually not different than before.

121
00:05:37,466 --> 00:05:39,266
We need to apply. Kernel PCA.

122
00:05:39,266 --> 00:05:41,266
Right after the data preprocessing phase.

123
00:05:41,266 --> 00:05:44,733
And just before fitting our classifier
like logistic regression

124
00:05:44,733 --> 00:05:46,000
to our training set.

125
00:05:46,000 --> 00:05:46,900
So basically.

126
00:05:46,900 --> 00:05:49,866
We need to apply. Kernel PCA. Right here.

127
00:05:49,866 --> 00:05:52,200
So use section here. Applying

128
00:05:53,300 --> 00:05:55,666
kernel PCA.

129
00:05:55,666 --> 00:05:57,766
And here we go. Let's do it.

130
00:05:57,766 --> 00:05:58,100
All right.

131
00:05:58,100 --> 00:06:01,633
So first we need to install a new package
that is called kernel lab

132
00:06:01,800 --> 00:06:03,566
which I don't think
we've installed before.

133
00:06:03,566 --> 00:06:04,900
So let's do it right now.

134
00:06:04,900 --> 00:06:08,266
So we use the command
install dot packages.

135
00:06:08,733 --> 00:06:09,500
Here we go.

136
00:06:09,500 --> 00:06:12,366
And in quotes kernel lab.

137
00:06:12,366 --> 00:06:13,166
All right.

138
00:06:13,166 --> 00:06:15,533
So I think I already have it installed.

139
00:06:15,533 --> 00:06:16,966
Let's check it out.

140
00:06:16,966 --> 00:06:20,033
So here it is kernel lab
kernel based machine learning lab.

141
00:06:20,033 --> 00:06:21,900
So I will not install it.

142
00:06:21,900 --> 00:06:24,900
But if you want to do it
you just select this line and execute.

143
00:06:25,100 --> 00:06:27,666
So I will just put this line as command.

144
00:06:27,666 --> 00:06:28,500
Here we go.

145
00:06:28,500 --> 00:06:32,366
But then since it is not imported
I will import it using the library

146
00:06:33,066 --> 00:06:35,500
command kernel lab.

147
00:06:35,500 --> 00:06:36,200
All right.

148
00:06:36,200 --> 00:06:38,900
And that will import it.

149
00:06:38,900 --> 00:06:41,100
All right. Kernel lab will import it.

150
00:06:41,100 --> 00:06:44,100
And now let's start applying kernel PCA.

151
00:06:44,766 --> 00:06:46,366
So as. For PCA.

152
00:06:46,366 --> 00:06:49,966
And LDA we're going to start
by creating an object which will.

153
00:06:49,966 --> 00:06:50,933
Be the kernel PCA.

154
00:06:50,933 --> 00:06:54,133
Object that we will use
to transform our original data

155
00:06:54,133 --> 00:06:57,366
set into this new data set
after using the kernel trick.

156
00:06:57,733 --> 00:07:00,733
So we'll call this object k. PCA.

157
00:07:00,900 --> 00:07:01,666
And then equals.

158
00:07:01,666 --> 00:07:05,100
And then that's where we use the function
that will create this kernel.

159
00:07:05,100 --> 00:07:06,366
PCA. Object.

160
00:07:06,366 --> 00:07:09,366
So this function is also k PCA.

161
00:07:09,366 --> 00:07:10,466
Then parentheses.

162
00:07:10,466 --> 00:07:13,066
And then let's input
the different arguments.

163
00:07:13,066 --> 00:07:14,100
So let's check it out.

164
00:07:14,100 --> 00:07:17,833
Let's press F1
here to have a look at the arguments.

165
00:07:18,633 --> 00:07:20,500
So the first argument is x.

166
00:07:20,500 --> 00:07:24,533
And this is actually the data matrix
of the formula describing the model.

167
00:07:24,766 --> 00:07:27,366
And here I'll give you a little trick
to describe the model.

168
00:07:27,366 --> 00:07:29,566
Very simply and very efficiently.

169
00:07:29,566 --> 00:07:33,233
We can simply input here a tilde and dot.

170
00:07:33,566 --> 00:07:34,733
And that will be enough for the.

171
00:07:34,733 --> 00:07:37,733
KPK. Function
to understand what the formula is,

172
00:07:38,033 --> 00:07:42,166
because then we will add
the second argument which is data,

173
00:07:42,300 --> 00:07:46,366
and that is actually the training set
but without the dependent variable.

174
00:07:46,400 --> 00:07:47,800
Because remember kernel.

175
00:07:47,800 --> 00:07:49,800
PCA is just a PCA technique.

176
00:07:49,800 --> 00:07:53,866
Where we use the kernel trick
to map the data into higher dimension

177
00:07:53,866 --> 00:07:55,133
and then apply. PCA.

178
00:07:55,133 --> 00:07:59,000
Because indeed, in this higher dimension,
the data is linearly separable.

179
00:07:59,266 --> 00:08:01,300
And therefore, since we. Apply PCA.

180
00:08:01,300 --> 00:08:03,600
In this higher dimension. And PCA is an.

181
00:08:03,600 --> 00:08:06,733
Unsupervised technique, well,
here for the data argument,

182
00:08:06,733 --> 00:08:10,800
we just need to input the training set
but without the dependent variable.

183
00:08:11,033 --> 00:08:11,933
And therefore.

184
00:08:11,933 --> 00:08:13,066
As for PCA, we.

185
00:08:13,066 --> 00:08:18,366
Input here data equal training set
then brackets

186
00:08:18,366 --> 00:08:21,666
to remove the dependent variable
which is indexed by three.

187
00:08:22,033 --> 00:08:25,033
Because we only have two independent
variables.

188
00:08:25,066 --> 00:08:25,533
All right.

189
00:08:25,533 --> 00:08:27,666
And then the next argument is kernel.

190
00:08:27,666 --> 00:08:31,800
So kernel is the kernel
you want to use to apply the kernel trick.

191
00:08:32,000 --> 00:08:36,000
Remember when we studied kernel SVM
we saw that there were several kernels

192
00:08:36,133 --> 00:08:37,700
to use the kernel trick.

193
00:08:37,700 --> 00:08:41,000
And here we're going to use the most
common one which is the Gaussian kernel.

194
00:08:41,200 --> 00:08:43,566
And that is called here RBF dot.

195
00:08:43,566 --> 00:08:45,900
So that's our third argument.

196
00:08:45,900 --> 00:08:50,966
And so here we
input kernel equals RBF dot.

197
00:08:52,066 --> 00:08:52,700
All right.

198
00:08:52,700 --> 00:08:54,600
And then what is the next argument.

199
00:08:54,600 --> 00:08:56,700
The next argument is k bar.

200
00:08:56,700 --> 00:08:58,566
We will actually not use this one.

201
00:08:58,566 --> 00:09:02,466
But then we have a very important argument
that is at the heart

202
00:09:02,666 --> 00:09:04,466
of dimensionality reduction.

203
00:09:04,466 --> 00:09:06,866
That is features
which is the number of features,

204
00:09:06,866 --> 00:09:09,900
the number of principal components
you want to end up with.

205
00:09:10,433 --> 00:09:13,166
So here
of course, we would like to visualize

206
00:09:13,166 --> 00:09:16,600
the training set results
and the test results in two dimensions.

207
00:09:16,733 --> 00:09:20,500
And to have this in two dimensions,
we need to keep a number of two

208
00:09:20,700 --> 00:09:22,866
new extracted independent variables.

209
00:09:22,866 --> 00:09:24,600
So here the number of.

210
00:09:24,600 --> 00:09:27,666
Features will be. Two. As for. PCA.

211
00:09:28,100 --> 00:09:32,400
So we will input here features equals to.

212
00:09:33,066 --> 00:09:33,466
All right.

213
00:09:33,466 --> 00:09:36,166
And that's it for our. K PCA. Object.

214
00:09:36,166 --> 00:09:37,600
It is ready to be created

215
00:09:37,600 --> 00:09:41,600
and to be used to transform
our original data set into this new data

216
00:09:41,600 --> 00:09:45,233
set with the new extracted features
derived from kernel PCA.

217
00:09:45,566 --> 00:09:48,900
So let's select this line
and create the object.

218
00:09:49,200 --> 00:09:51,333
Here it is. CPK k well. Created.

219
00:09:51,333 --> 00:09:55,766
And now let's move on to the next step,
which is to transform our original data

220
00:09:55,766 --> 00:09:58,933
set into this new extracted data set.

221
00:09:59,533 --> 00:10:02,866
So now things are going to look like
what we did with PCA.

222
00:10:03,066 --> 00:10:04,500
But some things are going to change.

223
00:10:04,500 --> 00:10:08,800
So we will do it step by step and we will
see where we need to make some changes.

224
00:10:09,400 --> 00:10:10,933
All right. So first as for.

225
00:10:10,933 --> 00:10:11,500
PCA we're.

226
00:10:11,500 --> 00:10:15,266
Going to use the predict function
to transform our original training set

227
00:10:15,466 --> 00:10:17,700
into this new extracted training set.

228
00:10:17,700 --> 00:10:20,966
So this new training set with the new
extracted features derived from.

229
00:10:20,966 --> 00:10:21,900
Kernel PCA.

230
00:10:21,900 --> 00:10:26,500
We call it training set underscore. PCA.

231
00:10:27,133 --> 00:10:28,566
All right. And then equals.

232
00:10:28,566 --> 00:10:32,200
And then we use the predict function
to do the transformation.

233
00:10:32,533 --> 00:10:35,866
And inside this predict function
we first input our.

234
00:10:35,866 --> 00:10:38,866
CBC.ca. Object as we. Did for PCA.

235
00:10:38,866 --> 00:10:42,200
And then the training set
the original training set.

236
00:10:42,566 --> 00:10:43,200
So let's do it.

237
00:10:43,200 --> 00:10:46,033
Training set the second one.

238
00:10:46,033 --> 00:10:47,066
All right.

239
00:10:47,066 --> 00:10:48,866
And as opposed. To PCA.

240
00:10:48,866 --> 00:10:50,700
And as with. LDA.

241
00:10:50,700 --> 00:10:53,033
This will return a matrix.

242
00:10:53,033 --> 00:10:54,933
And we need it as a dataframe.

243
00:10:54,933 --> 00:10:58,333
So as for LDA we will use the as dot

244
00:10:58,466 --> 00:11:01,700
data dot frame function.

245
00:11:02,166 --> 00:11:05,166
So parentheses here
and we close the parentheses here

246
00:11:05,633 --> 00:11:09,400
to set this transform training
set the training.

247
00:11:09,400 --> 00:11:11,533
Set PCA. As dataframe.

248
00:11:11,533 --> 00:11:14,366
And as a reminder we're doing this
to give what

249
00:11:14,366 --> 00:11:17,533
the next function will use
in the next sections expect.

250
00:11:18,066 --> 00:11:19,866
All right. So far so good.

251
00:11:19,866 --> 00:11:23,300
And so now let's select this line
and execute this.

252
00:11:23,700 --> 00:11:25,200
And you're going to see
what's going to happen.

253
00:11:25,200 --> 00:11:29,000
And you're going to understand why
we called this new training set training.

254
00:11:29,000 --> 00:11:29,833
Set PCA.

255
00:11:29,833 --> 00:11:32,566
With a different name
than the original training set.

256
00:11:32,566 --> 00:11:33,733
Training set.

257
00:11:33,733 --> 00:11:33,966
All right.

258
00:11:33,966 --> 00:11:34,900
So let's execute.

259
00:11:34,900 --> 00:11:37,500
Here we go. Execute it properly.

260
00:11:37,500 --> 00:11:40,400
And now let's have a look
at our training set.

261
00:11:40,400 --> 00:11:42,333
PCA that. We just created.

262
00:11:42,333 --> 00:11:46,866
So I'm going to enlarge this so that
we can see which one is trained the PCA.

263
00:11:46,866 --> 00:11:47,866
That's the one.

264
00:11:47,866 --> 00:11:50,866
So let's have a look at this
I'm going to click on it.

265
00:11:50,966 --> 00:11:53,566
And here is our training set PCA. So as.

266
00:11:53,566 --> 00:11:54,566
We can see it is.

267
00:11:54,566 --> 00:11:57,633
Composed of only two columns V1 and V2.

268
00:11:58,200 --> 00:12:00,433
So try to guess what these two guns are.

269
00:12:00,433 --> 00:12:01,933
I'm going to tell you right now,

270
00:12:01,933 --> 00:12:05,966
these two columns are
the principal components that we obtained

271
00:12:06,000 --> 00:12:07,500
through kernel PCA.

272
00:12:07,500 --> 00:12:10,866
That is
these are our two new extracted features.

273
00:12:10,866 --> 00:12:11,400
After all

274
00:12:11,400 --> 00:12:15,233
this mapping into this high dimension
using the kernel trick and then applying.

275
00:12:15,233 --> 00:12:18,366
PCA. To the data
set mapped into this higher dimension.

276
00:12:19,100 --> 00:12:21,400
But now the problem is that

277
00:12:21,400 --> 00:12:25,333
in this training set PCA,
we don't have the dependent variable

278
00:12:25,500 --> 00:12:29,033
and we need it for the next sections,
because in our code template

279
00:12:29,266 --> 00:12:33,000
we need to have the independent variables
and the dependent variable.

280
00:12:33,366 --> 00:12:34,366
So what is the next step?

281
00:12:34,366 --> 00:12:40,400
Now the next step is to add the dependent
variable into this training set PCA.

282
00:12:40,700 --> 00:12:44,733
And so the thing to understand here
is that we lost the dependent variable.

283
00:12:45,000 --> 00:12:48,000
But we kept the observations. That is.

284
00:12:48,000 --> 00:12:51,133
This one here
corresponds to the first observation

285
00:12:51,366 --> 00:12:53,800
we had in the original training set.

286
00:12:53,800 --> 00:12:54,500
This one.

287
00:12:54,500 --> 00:12:58,233
So this first observation
here has the zero label

288
00:12:58,566 --> 00:13:01,566
that is that this first customer
didn't buy the SUV.

289
00:13:01,566 --> 00:13:04,033
This was the original independent
variables.

290
00:13:04,033 --> 00:13:08,166
And then if we go to our training set PCA,
well this first customer

291
00:13:08,166 --> 00:13:11,100
is the same first customer
as this training set.

292
00:13:11,100 --> 00:13:14,400
So it will have the zero label
in the purchase column.

293
00:13:14,700 --> 00:13:17,333
But then these are new extracted features.

294
00:13:17,333 --> 00:13:21,166
So of course we don't get the same values
as for the independent variables

295
00:13:21,166 --> 00:13:22,800
of our original training set.

296
00:13:22,800 --> 00:13:27,300
So what we can do now is simply take
the dependent variable column

297
00:13:27,600 --> 00:13:32,666
purchased of this original training set
and add it to our training set, PCA.

298
00:13:32,866 --> 00:13:34,800
Because these observations here

299
00:13:34,800 --> 00:13:37,933
are the same observations
of our original training set.

300
00:13:38,400 --> 00:13:41,200
And so what we need to do
now is very simple.

301
00:13:41,200 --> 00:13:45,166
We just need to take our training set PCA.

302
00:13:45,566 --> 00:13:49,066
Then we're going to add a new column
that we'll call purchased.

303
00:13:49,533 --> 00:13:52,800
So by doing this
you know I'm just creating a new column

304
00:13:53,066 --> 00:13:55,800
that I also called purchased
because this new column is going.

305
00:13:55,800 --> 00:13:56,300
To be.

306
00:13:56,300 --> 00:13:59,666
The purchase dependent variable
and then equals.

307
00:14:00,000 --> 00:14:03,400
And then what I have to do now
is to take the real purchase

308
00:14:03,400 --> 00:14:06,500
dependent variable column
from the original training set.

309
00:14:06,833 --> 00:14:08,500
And we can do that
because the training set.

310
00:14:08,500 --> 00:14:09,400
PCA. Contains

311
00:14:09,400 --> 00:14:13,133
the same observations as the observations
of the original training set.

312
00:14:13,500 --> 00:14:16,500
So here to take the purchase column
of the original training set,

313
00:14:16,800 --> 00:14:21,133
we just need to take our original training
set, which is called training set

314
00:14:21,533 --> 00:14:22,666
and then dollars.

315
00:14:22,666 --> 00:14:25,666
And then that's
where we take the purchased column.

316
00:14:26,000 --> 00:14:29,400
So by doing this
I will add this new column purchased.

317
00:14:29,766 --> 00:14:31,533
And then this new column purchased.

318
00:14:31,533 --> 00:14:35,800
I will include the values of the purchase
column of the original training set.

319
00:14:36,200 --> 00:14:40,100
So let's check it out
I'm going to select this line and execute.

320
00:14:40,500 --> 00:14:42,100
And now as you can. See.

321
00:14:42,100 --> 00:14:44,400
If I go back to training. Set PCA.

322
00:14:44,400 --> 00:14:48,033
This contains the purchase
column of the original training set.

323
00:14:48,500 --> 00:14:49,100
So that's good.

324
00:14:49,100 --> 00:14:50,333
That's the next step done.

325
00:14:50,333 --> 00:14:53,000
And now
we need to take care of the test set.

326
00:14:53,000 --> 00:14:54,733
And so to take care of the test set

327
00:14:54,733 --> 00:14:57,566
we're going to do exactly
the same as we did for this.

328
00:14:57,566 --> 00:14:58,933
Training set PCA.

329
00:14:58,933 --> 00:15:01,633
So let's copy this. Copy.

330
00:15:01,633 --> 00:15:03,500
And let's paste it here.

331
00:15:03,500 --> 00:15:06,800
And of course what we're going to do now
is replace this training.

332
00:15:06,800 --> 00:15:08,133
Set PCA.

333
00:15:08,133 --> 00:15:10,900
By test set. PCA.

334
00:15:10,900 --> 00:15:12,000
And same here.

335
00:15:12,000 --> 00:15:15,833
We take the original test
set to make the transformation.

336
00:15:16,033 --> 00:15:19,333
And then we're going to add the purchase
column of the original test

337
00:15:19,333 --> 00:15:22,333
set to this new test set.

338
00:15:22,500 --> 00:15:25,500
That is the test set
extracted from current PCA.

339
00:15:25,500 --> 00:15:28,333
So test it here. And that should be okay.

340
00:15:28,333 --> 00:15:30,000
So I'm going to select.

341
00:15:30,000 --> 00:15:31,200
These two lines here.

342
00:15:31,200 --> 00:15:34,066
And execute perfect.

343
00:15:34,066 --> 00:15:36,400
Our new test. Set PCA is. Created.

344
00:15:36,400 --> 00:15:37,666
Let's have a quick check.

345
00:15:37,666 --> 00:15:39,300
So that's the test set.

346
00:15:39,300 --> 00:15:41,533
And that's our test set. PCA.

347
00:15:41,533 --> 00:15:45,300
With the two new extracted features
and the purchase column.

348
00:15:45,300 --> 00:15:48,500
And now that means
that we correctly applied kernel PCA.

349
00:15:49,033 --> 00:15:49,733
So great.

350
00:15:49,733 --> 00:15:52,000
We are ready to move on
to the next section.

351
00:15:52,000 --> 00:15:55,000
So let's go back to our kernel. PCA. File.

352
00:15:55,133 --> 00:15:59,033
And let's now fit the logistic regression
to the training set.

353
00:15:59,400 --> 00:16:02,100
Now do we need to change anything
in this code section.

354
00:16:02,100 --> 00:16:05,300
Well yes of course we do
because be careful.

355
00:16:05,300 --> 00:16:09,400
We called our new extractor
training set training set PCA.

356
00:16:09,600 --> 00:16:14,400
So here for the data argument
we need to specify training set PCA.

357
00:16:14,700 --> 00:16:16,400
So that's
the only thing we need to change here.

358
00:16:16,400 --> 00:16:19,933
So we are ready to select this section
and execute.

359
00:16:20,700 --> 00:16:22,233
All right classifier ready.

360
00:16:22,233 --> 00:16:25,900
And now let's move on to the next section
predicting the test set results.

361
00:16:26,133 --> 00:16:27,766
And of course here that's the same.

362
00:16:27,766 --> 00:16:31,600
We need to replace test
set by test set PCA.

363
00:16:32,033 --> 00:16:35,033
You need to enlarge this
a little bit right.

364
00:16:35,766 --> 00:16:36,500
And that's it.

365
00:16:36,500 --> 00:16:40,466
We are ready to execute this section
to predict the test set results.

366
00:16:40,900 --> 00:16:41,700
And here we go.

367
00:16:41,700 --> 00:16:43,800
We get our vector of predictions.

368
00:16:43,800 --> 00:16:46,800
Why pred for this new test set PCA.

369
00:16:46,833 --> 00:16:49,133
All right.
Now let's make the confusion matrix.

370
00:16:49,133 --> 00:16:53,066
We of course need to change test
set by test set PCA.

371
00:16:53,833 --> 00:16:55,566
Here we go. And now it's ready.

372
00:16:55,566 --> 00:17:00,766
Now we can execute this line of code
to get the confusion matrix.

373
00:17:01,033 --> 00:17:01,866
And here it is.

374
00:17:01,866 --> 00:17:05,733
We can have a quick look
and the council by pressing cmd enter.

375
00:17:06,066 --> 00:17:09,833
And we get 57 plus 26 equals 83.

376
00:17:10,033 --> 00:17:13,033
And since we have 100 observations
in the test set,

377
00:17:13,033 --> 00:17:15,933
that gives us an 83% accuracy.

378
00:17:15,933 --> 00:17:17,166
So that's pretty good.

379
00:17:17,166 --> 00:17:20,866
And now let's get to the exciting part
visualizing the training set results.

380
00:17:21,200 --> 00:17:23,300
So very quickly what do we need to change.

381
00:17:23,300 --> 00:17:27,300
Remember we need to change the names
of the independent variables and columns

382
00:17:27,300 --> 00:17:28,800
here. That's compulsory.

383
00:17:28,800 --> 00:17:32,333
So as a reminder the names are v1 and v2.

384
00:17:32,366 --> 00:17:34,500
That's the name
of the independent variables.

385
00:17:34,500 --> 00:17:38,033
So here we need to replace age by v1

386
00:17:38,433 --> 00:17:41,533
and estimated salary by v2.

387
00:17:42,100 --> 00:17:44,366
And that's not compulsory.

388
00:17:44,366 --> 00:17:46,466
And anyway we already have two good names.

389
00:17:46,466 --> 00:17:49,466
PC1 and PC2. So don't forget about that.

390
00:17:49,466 --> 00:17:52,433
And of course we need to change
the name of the training set

391
00:17:52,433 --> 00:17:54,166
because we called our training set.

392
00:17:54,166 --> 00:17:55,800
Training set. PCA.

393
00:17:55,800 --> 00:17:58,533
So here I'm. Adding training. Set PCA.

394
00:17:58,533 --> 00:17:59,533
And that's perfect.

395
00:17:59,533 --> 00:18:03,100
That's ready to be executed
to visualize the training set results.

396
00:18:03,566 --> 00:18:05,900
So we will only visualize
the training set results.

397
00:18:05,900 --> 00:18:08,666
But let's make the same changes
for the test set

398
00:18:08,666 --> 00:18:11,133
so that you can have a look at it
yourself.

399
00:18:11,133 --> 00:18:14,366
So same we're replacing H by v1

400
00:18:15,166 --> 00:18:18,000
estimated salary by v2.

401
00:18:18,000 --> 00:18:22,566
And here we replace test set by test
set underscore PCA.

402
00:18:23,266 --> 00:18:23,700
All right.

403
00:18:23,700 --> 00:18:26,966
And now let's have a look I look forward
to showing you what's going to happen.

404
00:18:26,966 --> 00:18:31,133
So I'm going to select everything
from here up to the top here.

405
00:18:31,133 --> 00:18:34,000
That is the whole section
to visualize the training set results.

406
00:18:34,000 --> 00:18:37,066
And let's press Command and Control
plus enter to execute.

407
00:18:37,700 --> 00:18:40,533
Here we go.
The computations are being run.

408
00:18:42,800 --> 00:18:43,200
All right.

409
00:18:43,200 --> 00:18:45,933
So these are the results of kernel PCA.

410
00:18:45,933 --> 00:18:49,966
Combine to a logistic regression model
that we applied on the nonlinear

411
00:18:49,966 --> 00:18:51,866
separable data set.

412
00:18:51,866 --> 00:18:55,133
And so we can appreciate the contrast
between the simplicity

413
00:18:55,133 --> 00:18:58,833
of the obtained results and the complexity
of what happened behind the scenes,

414
00:18:59,100 --> 00:19:01,666
because indeed
we have this very simple results here

415
00:19:01,666 --> 00:19:04,666
with these two classes
separated by the straight line.

416
00:19:04,766 --> 00:19:08,700
But what happened behind the scenes
is that our original data set

417
00:19:08,700 --> 00:19:12,766
in our original feature
space, was mapped to a higher dimension

418
00:19:12,766 --> 00:19:16,833
using the kernel trick to avoid too
highly compute intensive computations,

419
00:19:17,133 --> 00:19:18,100
and then by mapping

420
00:19:18,100 --> 00:19:21,866
our data set in the original feature space
to this higher dimension.

421
00:19:22,133 --> 00:19:24,433
Well, first,
that created some new dimensions,

422
00:19:24,433 --> 00:19:27,500
and mostly
that created a new feature space

423
00:19:27,633 --> 00:19:30,633
where our De that was then
linearly separable.

424
00:19:30,666 --> 00:19:34,666
But by doing that, we had more dimensions
than the original number of dimensions.

425
00:19:34,800 --> 00:19:37,033
So we still needed to apply. The PCA.

426
00:19:37,033 --> 00:19:41,166
Dimensionality reduction technique to
end up with a lower number of dimensions.

427
00:19:41,466 --> 00:19:42,600
So then PCA was.

428
00:19:42,600 --> 00:19:46,066
Applied to this new feature space
where the data was linearly separable.

429
00:19:46,433 --> 00:19:47,600
And through PCA some.

430
00:19:47,600 --> 00:19:50,466
New extracted
independent variables were created

431
00:19:50,466 --> 00:19:52,866
that are nothing else
and the principal components.

432
00:19:52,866 --> 00:19:53,800
Of PCA.

433
00:19:53,800 --> 00:19:56,800
And eventually
we obtained this new feature space

434
00:19:56,800 --> 00:20:01,866
formed by these two new extracted
principal components resulting from PCA,

435
00:20:02,166 --> 00:20:04,966
in which
now our data is linearly separable

436
00:20:04,966 --> 00:20:07,966
and much better
separated by a linear classifier.

437
00:20:08,400 --> 00:20:10,566
All right. So that's it for kernel PCA.

438
00:20:10,566 --> 00:20:14,066
And that's also the end of this part
dimensionality reduction.

439
00:20:14,366 --> 00:20:15,933
And I'll see you in the next part.

440
00:20:15,933 --> 00:20:18,933
Part ten model selection and boosting.

441
00:20:18,966 --> 00:20:22,566
The last part of this course
we will cover a very exciting algorithm

442
00:20:22,566 --> 00:20:25,566
in machine learning
that is called XGBoost.

443
00:20:25,733 --> 00:20:28,066
So I look forward to seeing you
in this next part.

444
00:20:28,066 --> 00:20:29,900
And until then, enjoy machine learning.