1
00:00:00,200 --> 00:00:02,333
Hello my friends, and welcome to this new.

2
00:00:02,333 --> 00:00:04,333
Practical activity of part.

3
00:00:04,333 --> 00:00:06,500
Nine dimensionality reduction.

4
00:00:06,500 --> 00:00:08,766
So in the previous. Section
we experimented.

5
00:00:08,766 --> 00:00:10,066
With PCA.

6
00:00:10,066 --> 00:00:12,100
Principal component analysis.

7
00:00:12,100 --> 00:00:12,700
And we indeed.

8
00:00:12,700 --> 00:00:14,966
Got great results with our.

9
00:00:14,966 --> 00:00:15,800
One data.

10
00:00:15,800 --> 00:00:16,800
Set, which will be.

11
00:00:16,800 --> 00:00:19,000
The same. Data set for this new section.

12
00:00:19,000 --> 00:00:20,400
Because, you know, we want to compare.

13
00:00:20,400 --> 00:00:22,933
Several.
Dimensionality. Reduction techniques.

14
00:00:22,933 --> 00:00:25,166
So there. We go. We're going to see
if we can even.

15
00:00:25,166 --> 00:00:27,800
Beat PCA, which only had.

16
00:00:27,800 --> 00:00:29,900
One incorrect. Prediction.

17
00:00:29,900 --> 00:00:31,933
So we're going to work
with the same data set.

18
00:00:31,933 --> 00:00:34,433
And therefore
the implementation will be. Exactly.

19
00:00:34,433 --> 00:00:36,466
The same except one cell.

20
00:00:36,466 --> 00:00:38,233
Which will be the cell of course where we.

21
00:00:38,233 --> 00:00:41,700
Implement LDA instead of. PCA. All right.

22
00:00:41,800 --> 00:00:42,633
Are you ready?

23
00:00:42,633 --> 00:00:43,833
Let's do this.

24
00:00:43,833 --> 00:00:44,866
Before we get.

25
00:00:44,866 --> 00:00:48,166
Into this for the part nine, let's make
sure everyone here is on the same page.

26
00:00:48,166 --> 00:00:50,733
I gave you the link to this folder
containing all the codes.

27
00:00:50,733 --> 00:00:53,166
And data sets right before this tutorial.

28
00:00:53,166 --> 00:00:55,933
So make sure to connect to it.
And now here we go.

29
00:00:55,933 --> 00:00:59,333
Let's end to part nine
dimensionality reduction.

30
00:00:59,766 --> 00:01:00,833
And now we're going to go into.

31
00:01:00,833 --> 00:01:02,000
Section 44.

32
00:01:02,000 --> 00:01:04,100
Linear Discriminant analysis.

33
00:01:04,100 --> 00:01:06,533
LDA which will be a new technique.

34
00:01:06,533 --> 00:01:08,300
Of dimensionality reduction.

35
00:01:08,300 --> 00:01:10,200
Very powerful as we will see.

36
00:01:10,200 --> 00:01:10,866
So let's start with.

37
00:01:10,866 --> 00:01:11,466
Python as.

38
00:01:11,466 --> 00:01:12,900
Usual. And there.

39
00:01:12,900 --> 00:01:14,800
We go. This folder just as a.

40
00:01:14,800 --> 00:01:16,833
Previous one has. Two files.

41
00:01:16,833 --> 00:01:18,233
This is the implementation.

42
00:01:18,233 --> 00:01:21,633
And this is the same one data. Set which.

43
00:01:21,633 --> 00:01:21,966
You know.

44
00:01:21,966 --> 00:01:25,200
Belongs to one sub business owner
who first.

45
00:01:25,200 --> 00:01:27,800
Asked you,
you know the most sound to data scientist.

46
00:01:27,800 --> 00:01:28,500
To do.

47
00:01:28,500 --> 00:01:32,233
Some cluster to identify
different customer segments.

48
00:01:32,233 --> 00:01:33,466
For each of.

49
00:01:33,466 --> 00:01:35,633
You know, the wines of. This data set.

50
00:01:35,633 --> 00:01:37,433
You know, each row of this data set.

51
00:01:37,433 --> 00:01:39,400
Corresponds to a certain wine.

52
00:01:39,400 --> 00:01:42,000
And for each one
we have several wine features.

53
00:01:42,000 --> 00:01:43,466
Or, you know, characteristics.

54
00:01:43,466 --> 00:01:45,066
All these up to here.

55
00:01:45,066 --> 00:01:46,100
And you used.

56
00:01:46,100 --> 00:01:48,400
All these features to. Identify.

57
00:01:48,400 --> 00:01:50,733
Those three customer. Segments or,
you know.

58
00:01:50,733 --> 00:01:52,333
Customer clusters.

59
00:01:52,333 --> 00:01:54,066
And after which, you know, since.

60
00:01:54,066 --> 00:01:56,300
This wine shop. Owner was so. Happy and.

61
00:01:56,300 --> 00:01:59,633
Impressed by your job, well,
then of course the owner asked you to do.

62
00:01:59,633 --> 00:02:01,500
Another mission, which is the.

63
00:02:01,500 --> 00:02:04,533
One we're about to do with LDA,
which consists of.

64
00:02:04,533 --> 00:02:08,866
Building a predictive model, combine
two dimensionality reduction.

65
00:02:08,866 --> 00:02:12,366
Apply to this data set
so that for each new.

66
00:02:12,366 --> 00:02:15,366
Wine. That this owner. Has in its wine
shop.

67
00:02:15,500 --> 00:02:16,233
Well, by.

68
00:02:16,233 --> 00:02:17,000
Deploying this.

69
00:02:17,000 --> 00:02:21,300
New predictive model, this owner
will be able to predict which customer.

70
00:02:21,333 --> 00:02:26,033
Segment this new one belongs to
so that it can recommend this new wine to.

71
00:02:26,033 --> 00:02:29,000
The right. Customers
and therefore optimize.

72
00:02:29,000 --> 00:02:31,033
Eventually the sales.

73
00:02:31,033 --> 00:02:31,366
All right.

74
00:02:31,366 --> 00:02:32,200
So that's exactly.

75
00:02:32,200 --> 00:02:34,566
The same data set.
And now let's move on to

76
00:02:34,566 --> 00:02:38,300
our implementation
linear discriminant analysis.

77
00:02:38,400 --> 00:02:39,633
Which we. Can either.

78
00:02:39,633 --> 00:02:41,700
Open with Google Collaboratory as.

79
00:02:41,700 --> 00:02:44,700
I'm doing it now. Or Jupyter. Notebook.

80
00:02:45,000 --> 00:02:47,800
And as you notice. I kept this.

81
00:02:47,800 --> 00:02:49,466
Previous implementation we did on.

82
00:02:49,466 --> 00:02:51,733
PCA so that. We can, you know, compare.

83
00:02:51,733 --> 00:02:53,366
The. Results. Indian. Right.

84
00:02:53,366 --> 00:02:55,766
This is PCA, this is LDA.

85
00:02:55,766 --> 00:02:58,100
But you know, since this is in read
only mode.

86
00:02:58,100 --> 00:02:59,233
We're going to. Create now.

87
00:02:59,233 --> 00:03:00,400
A copy so that we can.

88
00:03:00,400 --> 00:03:01,300
Re-Implement that.

89
00:03:01,300 --> 00:03:04,133
Cell that belong to. LDA model.

90
00:03:04,133 --> 00:03:04,800
So there we go.

91
00:03:04,800 --> 00:03:06,600
Save a copy in drive.

92
00:03:06,600 --> 00:03:08,700
This will create a copy. Inside.

93
00:03:08,700 --> 00:03:10,833
Which will be able to. Re-Implement.

94
00:03:10,833 --> 00:03:12,533
The LDA model.

95
00:03:12,533 --> 00:03:13,100
All right.

96
00:03:13,100 --> 00:03:15,000
And now we can, you know. Close this so.

97
00:03:15,000 --> 00:03:17,966
That we can have the two implementations
next to each other.

98
00:03:17,966 --> 00:03:19,366
You know, the two copies.

99
00:03:19,366 --> 00:03:20,400
And now let's do this.

100
00:03:20,400 --> 00:03:21,333
Let's quickly.

101
00:03:21,333 --> 00:03:22,933
Remove, you know, the.

102
00:03:22,933 --> 00:03:26,133
Cell that. Implements to. LDA. This one.

103
00:03:26,400 --> 00:03:27,733
And let's.

104
00:03:27,733 --> 00:03:29,433
Re-Implement this because you know.

105
00:03:29,433 --> 00:03:30,933
All the rest is the same.

106
00:03:30,933 --> 00:03:31,966
I will actually.

107
00:03:31,966 --> 00:03:34,366
Remove. All these outputs here
so that you. Don't.

108
00:03:34,366 --> 00:03:35,800
See the final. Results.

109
00:03:35,800 --> 00:03:37,833
And we can keep them as a surprise.

110
00:03:37,833 --> 00:03:40,833
So let me just remove the outputs too.

111
00:03:41,166 --> 00:03:42,366
Don't look. Too close.

112
00:03:42,366 --> 00:03:43,133
And there. We go.

113
00:03:43,133 --> 00:03:43,466
All right.

114
00:03:43,466 --> 00:03:46,200
So basically all the cells of.

115
00:03:46,200 --> 00:03:47,666
This implementation are.

116
00:03:47,666 --> 00:03:48,733
Exactly the. Same.

117
00:03:48,733 --> 00:03:50,700
As the previous. One. PCA.

118
00:03:50,700 --> 00:03:52,766
Except of course this cell.

119
00:03:52,766 --> 00:03:55,333
That implements. LDA right here.

120
00:03:55,333 --> 00:03:57,066
So no need to explain all this.

121
00:03:57,066 --> 00:04:00,666
Plus all these other cells result
from our diverse toolkits.

122
00:04:00,666 --> 00:04:03,600
So you're. Definitely 100% familiar.

123
00:04:03,600 --> 00:04:04,800
With them.

124
00:04:04,800 --> 00:04:05,200
All right.

125
00:04:05,200 --> 00:04:06,066
So let's do this.

126
00:04:06,066 --> 00:04:09,066
Let's, you know apply LDA.

127
00:04:09,133 --> 00:04:11,600
So we're going to create a new code
cell. And there.

128
00:04:11,600 --> 00:04:12,333
We go.

129
00:04:12,333 --> 00:04:15,500
Let's implement
linear discriminant analysis.

130
00:04:15,633 --> 00:04:18,266
So now you have two options. The first.

131
00:04:18,266 --> 00:04:20,466
And the. Best option is to press.

132
00:04:20,466 --> 00:04:23,033
Bus on the video
and try to implement this.

133
00:04:23,033 --> 00:04:25,400
Yourself by of course browsing.

134
00:04:25,400 --> 00:04:27,133
The. Scikit learn API.

135
00:04:27,133 --> 00:04:28,466
And find that.

136
00:04:28,466 --> 00:04:31,200
LDA class that. Can implement that. LDA.

137
00:04:31,200 --> 00:04:32,933
Dimensionality reduction technique.

138
00:04:32,933 --> 00:04:35,200
And you will definitely end up. With the.

139
00:04:35,200 --> 00:04:38,200
Same solution
I will implement in a few seconds.

140
00:04:38,233 --> 00:04:39,566
And the second option.

141
00:04:39,566 --> 00:04:41,566
Is of course to, well, not.

142
00:04:41,566 --> 00:04:45,600
Press pause on the video and implement
with meet the solution in.

143
00:04:45,600 --> 00:04:51,500
Let's say three seconds, three, two,
one and go, all right, let's do this.

144
00:04:51,500 --> 00:04:54,166
Let's implement together. LDA.

145
00:04:54,166 --> 00:04:55,700
So as I've just said.

146
00:04:55,700 --> 00:04:57,333
We're going to implement LDA.

147
00:04:57,333 --> 00:04:58,966
Thanks to the secured library.

148
00:04:58,966 --> 00:05:02,666
Therefore
we're going to start from sklearn

149
00:05:02,966 --> 00:05:06,566
from which
we're going to get access to this time.

150
00:05:06,566 --> 00:05:08,400
Not you know.

151
00:05:08,400 --> 00:05:11,000
The well, let me go to PCA here.

152
00:05:11,000 --> 00:05:15,733
Not the decomposition module of Cyclegan,
but a new one, which.

153
00:05:15,766 --> 00:05:17,033
Is very.

154
00:05:17,033 --> 00:05:19,200
Easy to remember
because. This is actually.

155
00:05:19,200 --> 00:05:20,366
Discriminant

156
00:05:22,133 --> 00:05:24,400
underscore analysis.

157
00:05:24,400 --> 00:05:26,033
Okay. That's another module.

158
00:05:26,033 --> 00:05:28,000
Of. Scikit-Learn that contains.

159
00:05:28,000 --> 00:05:29,166
Of course, the class.

160
00:05:29,166 --> 00:05:32,033
That can implement. LDA and that class.

161
00:05:32,033 --> 00:05:34,966
Well you know. After this import here
we have to.

162
00:05:34,966 --> 00:05:36,400
Add the name of this class.

163
00:05:36,400 --> 00:05:37,000
And the name of.

164
00:05:37,000 --> 00:05:39,166
This class is capital L.

165
00:05:39,166 --> 00:05:42,100
And then very simply. Linear.

166
00:05:42,100 --> 00:05:45,100
Discriminant analysis.

167
00:05:45,966 --> 00:05:48,300
All right. Very good.
The reason why Google.

168
00:05:48,300 --> 00:05:48,566
Collab.

169
00:05:48,566 --> 00:05:48,900
By the way,

170
00:05:48,900 --> 00:05:52,700
is not helping me with the suggestions
is because the notebook is not running.

171
00:05:52,700 --> 00:05:56,200
And remember to run the notebook or,
you know, to connect it.

172
00:05:56,400 --> 00:05:58,600
Well, you need to either. Run any of.

173
00:05:58,600 --> 00:06:01,200
The first cells or upload the data set.

174
00:06:01,200 --> 00:06:02,866
So let's do. It right now
so that, you know.

175
00:06:02,866 --> 00:06:04,733
Google Colab. Can assist me.

176
00:06:04,733 --> 00:06:06,933
I really love it when it does it.

177
00:06:06,933 --> 00:06:09,600
So right now
I just clicked on this folder button.

178
00:06:09,600 --> 00:06:12,333
And then let's click the upload button.

179
00:06:12,333 --> 00:06:14,500
And we will end up in the
you know. Previous folder.

180
00:06:14,500 --> 00:06:16,033
For Principal components analysis.

181
00:06:16,033 --> 00:06:17,800
But let me show you the path again.

182
00:06:17,800 --> 00:06:19,266
I put my machine learning.

183
00:06:19,266 --> 00:06:21,066
It is a folder in my desktop.

184
00:06:21,066 --> 00:06:25,066
So inside we're going to go now
to part nine and then section

185
00:06:25,066 --> 00:06:29,900
44 Linear Discriminant Analysis and Python
and then one.

186
00:06:29,900 --> 00:06:30,233
All right.

187
00:06:30,233 --> 00:06:32,466
So this is exactly
the same dataset as before.

188
00:06:32,466 --> 00:06:34,900
But I just wanted to. Show you the path.

189
00:06:34,900 --> 00:06:37,900
All right.
And there we go. We have to one.

190
00:06:38,200 --> 00:06:39,133
And so now I'm going to show.

191
00:06:39,133 --> 00:06:40,700
You if I retype this.

192
00:06:40,700 --> 00:06:43,066
Linear discriminant.

193
00:06:43,066 --> 00:06:44,433
See now it is helping me.

194
00:06:44,433 --> 00:06:46,266
So that's maybe better to have, you know.

195
00:06:46,266 --> 00:06:49,333
This reflex to upload a data
set right at the.

196
00:06:49,333 --> 00:06:52,200
Beginning. Okay. So linear
discriminant analysis.

197
00:06:52,200 --> 00:06:52,866
But since.

198
00:06:52,866 --> 00:06:55,600
This class. Name is actually pretty long
and pretty.

199
00:06:55,600 --> 00:07:01,500
Not practical, well, let's just,
you know, add a simple shortcut like LDA.

200
00:07:01,500 --> 00:07:04,500
We can do this. That's fine. And now.

201
00:07:04,533 --> 00:07:05,600
Let's press.

202
00:07:05,600 --> 00:07:07,800
Enter to move on to the next step.

203
00:07:07,800 --> 00:07:09,800
Which is of. Course naturally.

204
00:07:09,800 --> 00:07:10,733
To create.

205
00:07:10,733 --> 00:07:12,266
An. Object of this.

206
00:07:12,266 --> 00:07:14,900
Linear discriminant analysis class.

207
00:07:14,900 --> 00:07:15,233
All right.

208
00:07:15,233 --> 00:07:17,700
So of course we're going to call it LDA.

209
00:07:17,700 --> 00:07:20,100
Right.
And now we're going to call this class.

210
00:07:20,100 --> 00:07:22,333
And since we gave it the shortcut LDA.

211
00:07:22,333 --> 00:07:23,966
Well we can simply call.

212
00:07:23,966 --> 00:07:26,366
LDA this way.

213
00:07:26,366 --> 00:07:29,300
And now well exactly the same as before.

214
00:07:29,300 --> 00:07:31,500
This LDA class needs to take.

215
00:07:31,500 --> 00:07:34,366
As input only one argument,
which is. Exactly.

216
00:07:34,366 --> 00:07:36,433
The same as before. And also.

217
00:07:36,433 --> 00:07:38,066
It has the exact. Same name.

218
00:07:38,066 --> 00:07:41,066
It is n components.

219
00:07:41,166 --> 00:07:42,833
Which corresponds, of course, to the.

220
00:07:42,833 --> 00:07:45,733
Final number of. Extracted features.
You want to end up.

221
00:07:45,733 --> 00:07:47,533
With after applying this.

222
00:07:47,533 --> 00:07:49,566
Dimensionality reduction technique.

223
00:07:49,566 --> 00:07:50,566
And of course.

224
00:07:50,566 --> 00:07:51,100
As I.

225
00:07:51,100 --> 00:07:53,300
Recommended in the previous section,
we're going to start.

226
00:07:53,300 --> 00:07:54,466
With two.

227
00:07:54,466 --> 00:07:56,400
So that we can see if even with only.

228
00:07:56,400 --> 00:07:57,900
Two extracted features.

229
00:07:57,900 --> 00:07:59,800
Well, we can get great. Results.

230
00:07:59,800 --> 00:08:01,500
And if that's the case, we'll not only.

231
00:08:01,500 --> 00:08:05,833
Will get great results, but also cherry on
the cake will be able to visualize the.

232
00:08:05,833 --> 00:08:07,000
Results on a nice.

233
00:08:07,000 --> 00:08:07,766
2D plot.

234
00:08:07,766 --> 00:08:08,533
Indian. You know.

235
00:08:08,533 --> 00:08:10,933
Thanks to these two code section.

236
00:08:10,933 --> 00:08:11,700
All right.

237
00:08:11,700 --> 00:08:13,666
But right now we need. To finish this.

238
00:08:13,666 --> 00:08:14,700
So there we go.

239
00:08:14,700 --> 00:08:17,100
We're going to extract only two features.

240
00:08:17,100 --> 00:08:19,866
In the end. And to.
Do this we. Need now of.

241
00:08:19,866 --> 00:08:21,500
Course. To connect our.

242
00:08:21,500 --> 00:08:23,900
LDA object to our data set.

243
00:08:23,900 --> 00:08:27,100
But once again separately the training set
and the test set.

244
00:08:27,500 --> 00:08:28,800
And to connect. It well.

245
00:08:28,800 --> 00:08:29,833
Of course we need to apply.

246
00:08:29,833 --> 00:08:32,266
The fit. Transform method.

247
00:08:32,266 --> 00:08:32,733
To the.

248
00:08:32,733 --> 00:08:36,833
Training set and then only the transform
method on the test set.

249
00:08:36,833 --> 00:08:39,233
That's for the exact. Same reason
as. Before.

250
00:08:39,233 --> 00:08:39,700
It is to.

251
00:08:39,700 --> 00:08:42,600
Avoid information
leakage from the test set.

252
00:08:42,600 --> 00:08:45,300
All right. So let's do this.
That's our next step here.

253
00:08:45,300 --> 00:08:46,133
So we're going to take.

254
00:08:46,133 --> 00:08:46,766
First X.

255
00:08:46,766 --> 00:08:49,466
Train right which we're going to. Update.

256
00:08:49,466 --> 00:08:51,200
To become the new X train.

257
00:08:51,200 --> 00:08:54,600
After we apply
this LDA feature extraction technique.

258
00:08:54,900 --> 00:08:56,400
And to do this well we need to take.

259
00:08:56,400 --> 00:09:00,266
Of course our LDA object from which

260
00:09:00,266 --> 00:09:03,300
we're going to call the Fit transform.

261
00:09:04,266 --> 00:09:05,233
Method.

262
00:09:05,233 --> 00:09:06,633
Which will take as input.

263
00:09:06,633 --> 00:09:08,333
Well here. Be careful.

264
00:09:08,333 --> 00:09:10,600
It's not going to be exactly
the same input.

265
00:09:10,600 --> 00:09:12,500
As before. Because you know, with.

266
00:09:12,500 --> 00:09:13,933
PCA the fit.

267
00:09:13,933 --> 00:09:16,800
Transform
method took only Xtrain as input.

268
00:09:16,800 --> 00:09:21,300
Because it only need the features
to apply this PCA.

269
00:09:21,333 --> 00:09:23,200
Dimensionality reduction technique.

270
00:09:23,200 --> 00:09:25,500
But LDA is actually different.

271
00:09:25,500 --> 00:09:28,300
In order to apply the technique,
it needs not.

272
00:09:28,300 --> 00:09:30,300
Only the features but also the.

273
00:09:30,300 --> 00:09:32,000
Dependent variable. Right?

274
00:09:32,000 --> 00:09:33,766
A dependent variable is a required.

275
00:09:33,766 --> 00:09:35,233
Element inside. The.

276
00:09:35,233 --> 00:09:36,633
Equation. Of LDA.

277
00:09:36,633 --> 00:09:38,666
And therefore here in the fit transform.

278
00:09:38,666 --> 00:09:40,466
Method, we need to.

279
00:09:40,466 --> 00:09:44,333
Input not only x train
the old version of xtrain.

280
00:09:44,333 --> 00:09:46,500
Before we apply LDA.

281
00:09:46,500 --> 00:09:48,933
And. Y train.

282
00:09:48,933 --> 00:09:50,933
All right, so be very careful with this.

283
00:09:50,933 --> 00:09:54,033
Whether you choose to apply LDA
or PCA for PCA.

284
00:09:54,033 --> 00:09:56,066
You only have to input X train, and for.

285
00:09:56,066 --> 00:09:58,533
LDA you have to input both the features X.

286
00:09:58,533 --> 00:10:01,033
Train and the dependent variable y train.

287
00:10:01,033 --> 00:10:02,633
All right. Final step.

288
00:10:02,633 --> 00:10:04,633
Well now that we have an.

289
00:10:04,633 --> 00:10:08,100
LDA feature extractor object fitted.

290
00:10:08,133 --> 00:10:12,000
To the training set,
well we can apply. It to the test set.

291
00:10:12,000 --> 00:10:15,200
By only calling the transform
method. Right.

292
00:10:15,200 --> 00:10:16,633
It doesn't make sense to.

293
00:10:16,633 --> 00:10:18,366
Fit it. Again to the. Test set.

294
00:10:18,366 --> 00:10:21,966
Because the test set
is supposed to be new data on which we.

295
00:10:21,966 --> 00:10:24,166
Deploy our model, like in production.

296
00:10:24,166 --> 00:10:27,433
Therefore, we must only apply
the transform method here.

297
00:10:27,666 --> 00:10:29,700
And therefore I'm updating our.

298
00:10:29,700 --> 00:10:31,333
X test variable the.

299
00:10:31,333 --> 00:10:32,166
Following way.

300
00:10:32,166 --> 00:10:34,600
By first calling our LDA object.

301
00:10:34,600 --> 00:10:39,966
From which we're only going
to call the trend form method.

302
00:10:40,400 --> 00:10:42,066
And now, according to you, does.

303
00:10:42,066 --> 00:10:44,033
It need to take only X test.

304
00:10:44,033 --> 00:10:44,700
As input.

305
00:10:44,700 --> 00:10:46,933
Or X test and Y test?

306
00:10:46,933 --> 00:10:48,466
Well, obviously.

307
00:10:48,466 --> 00:10:49,500
It's only need.

308
00:10:49,500 --> 00:10:52,566
To take X test
because we're not supposed to have Y test.

309
00:10:52,566 --> 00:10:54,733
You know X test is like new data.

310
00:10:54,733 --> 00:10:56,766
On which we're going to deploy our model.

311
00:10:56,766 --> 00:10:57,833
Then we'll get. Our.

312
00:10:57,833 --> 00:11:00,633
Predictions and Y original
and we'll compare.

313
00:11:00,633 --> 00:11:02,000
Y prior to white test.

314
00:11:02,000 --> 00:11:05,400
But we're not supposed to have white test
because white is are the real result

315
00:11:05,533 --> 00:11:07,400
that. Contain the hidden truth.

316
00:11:07,400 --> 00:11:08,733
You know, the ground truth.

317
00:11:08,733 --> 00:11:10,033
So of course here.

318
00:11:10,033 --> 00:11:12,400
We only need to. Apply X test.

319
00:11:12,400 --> 00:11:14,566
And the reason why we could enter Y train.

320
00:11:14,566 --> 00:11:18,566
Here is because indeed, we are supposed
to get the ground truth of the.

321
00:11:18,566 --> 00:11:19,366
Training set.

322
00:11:19,366 --> 00:11:20,100
Otherwise we.

323
00:11:20,100 --> 00:11:20,566
Wouldn't be.

324
00:11:20,566 --> 00:11:23,433
Able to. Train our machine learning model.

325
00:11:23,433 --> 00:11:26,066
All right. So X test and there we go.

326
00:11:26,066 --> 00:11:27,500
Not only the implementation.

327
00:11:27,500 --> 00:11:30,166
Of LDA is over, but also the whole.

328
00:11:30,166 --> 00:11:33,000
Implementation. Is over as well.

329
00:11:33,000 --> 00:11:35,166
So now we're going to do this run.

330
00:11:35,166 --> 00:11:36,700
Oh now that we, you know.

331
00:11:36,700 --> 00:11:39,300
Uploaded that. Data set into the notebook.

332
00:11:39,300 --> 00:11:41,000
So we are 100% ready.

333
00:11:41,000 --> 00:11:45,766
And let's just remind what we want
to improve compared to previously.

334
00:11:46,033 --> 00:11:49,666
Well, you know, in the principal component
analysis implementation.

335
00:11:50,000 --> 00:11:50,900
We had.

336
00:11:50,900 --> 00:11:55,200
When obtaining the confusion
matrix, only one incorrect prediction.

337
00:11:55,200 --> 00:11:55,800
Resulting.

338
00:11:55,800 --> 00:11:59,300
In having an accuracy of 97% and in the.

339
00:11:59,666 --> 00:12:01,133
Test set results, which are the.

340
00:12:01,133 --> 00:12:02,633
Most interesting ones.

341
00:12:02,633 --> 00:12:04,866
Well, we had indeed an almost.

342
00:12:04,866 --> 00:12:07,633
Perfect separation. Of the. Three classes.

343
00:12:07,633 --> 00:12:12,733
And now we're going to see. It
with our new feature extracted from LDA.

344
00:12:12,900 --> 00:12:15,933
Well, we can get a perfect separation
of the classes.

345
00:12:15,933 --> 00:12:18,566
And therefore 100%.

346
00:12:18,566 --> 00:12:19,866
Accuracy.

347
00:12:19,866 --> 00:12:20,766
Are you ready?

348
00:12:20,766 --> 00:12:21,933
Let's do this.

349
00:12:21,933 --> 00:12:26,033
3 to 1 go run also.

350
00:12:26,033 --> 00:12:27,800
Now all. The cells. Are running
and there we go.

351
00:12:27,800 --> 00:12:29,300
Oh, there we go.

352
00:12:29,300 --> 00:12:32,800
We just had. A 100%. Accuracy.

353
00:12:32,966 --> 00:12:34,966
So in other. Words logistic.

354
00:12:34,966 --> 00:12:35,933
Regression model was.

355
00:12:35,933 --> 00:12:41,100
Totally able to classify
perfectly our three classes by separating.

356
00:12:41,100 --> 00:12:42,166
Them separately.

357
00:12:42,166 --> 00:12:44,400
And that's exactly
what. We're going to. See.

358
00:12:44,400 --> 00:12:47,300
And you know the. Test set results.

359
00:12:47,300 --> 00:12:48,266
Because indeed.

360
00:12:48,266 --> 00:12:50,600
Well we almost had an incorrect one here.

361
00:12:50,600 --> 00:12:52,800
But as we see the real.

362
00:12:52,800 --> 00:12:53,733
Ones, you know.

363
00:12:53,733 --> 00:12:56,033
Which are all the points here, red.

364
00:12:56,033 --> 00:12:59,133
Green and blue fall into the right.

365
00:12:59,266 --> 00:13:00,366
Prediction regions.

366
00:13:00,366 --> 00:13:02,366
That red prediction region where.

367
00:13:02,366 --> 00:13:05,466
Our model predict that the wine belongs
to customer segment number one.

368
00:13:05,966 --> 00:13:09,600
Then this one where our model predicts
that the wines belong to customer

369
00:13:09,600 --> 00:13:10,800
segment number two.

370
00:13:10,800 --> 00:13:13,600
And finally, this prediction region
where our model.

371
00:13:13,600 --> 00:13:16,266
Predicts
that the. Wines should be recommended to.

372
00:13:16,266 --> 00:13:18,933
Customer. Segment number. Three.

373
00:13:18,933 --> 00:13:20,200
And thanks to these.

374
00:13:20,200 --> 00:13:22,133
New extracted features.

375
00:13:22,133 --> 00:13:23,400
You know, LG one and LG.

376
00:13:23,400 --> 00:13:26,433
Two, well this time
we have a perfect class.

377
00:13:26,433 --> 00:13:27,766
Separator. In other words.

378
00:13:27,766 --> 00:13:30,000
We have a perfect classifier.

379
00:13:30,000 --> 00:13:31,166
And if you're wondering.

380
00:13:31,166 --> 00:13:32,866
How did. LDA managed to.

381
00:13:32,866 --> 00:13:34,533
Separate perfectly the classes?

382
00:13:34,533 --> 00:13:35,666
Whereas, you know, in.

383
00:13:35,666 --> 00:13:38,466
PCA we could see that
it was very difficult to.

384
00:13:38,466 --> 00:13:41,233
Separate.
You know, the wines of the test set.

385
00:13:41,233 --> 00:13:42,566
You know, especially this one.

386
00:13:42,566 --> 00:13:45,533
This one falls in the middle of.
The red. Wines.

387
00:13:45,533 --> 00:13:48,366
Well, that's because the extracted
features are different.

388
00:13:48,366 --> 00:13:51,366
You know,
they're not the same as PC1 and PC2.

389
00:13:51,566 --> 00:13:54,733
They are, you know,
in some other dimensions in which.

390
00:13:54,733 --> 00:13:55,233
Well.

391
00:13:55,233 --> 00:13:56,333
This time it is.

392
00:13:56,333 --> 00:13:59,333
Possible to separate
perfectly the classes.

393
00:13:59,333 --> 00:14:00,900
And that's why this. Time it works.

394
00:14:00,900 --> 00:14:03,900
We are, in other words, in another
dimension.

395
00:14:04,100 --> 00:14:05,100
Okay, so.

396
00:14:05,100 --> 00:14:08,700
I guess now we don't have much
of a challenge because it's impossible.

397
00:14:08,700 --> 00:14:10,666
To beat this. This is just perfect.

398
00:14:10,666 --> 00:14:13,433
I remind that
I did not make this data set.

399
00:14:13,433 --> 00:14:14,266
You know, it's a dataset.

400
00:14:14,266 --> 00:14:15,066
Taken from.

401
00:14:15,066 --> 00:14:17,100
The UCI. ML repository.

402
00:14:17,100 --> 00:14:19,700
So, you know, it's
very close to a real world data set.

403
00:14:19,700 --> 00:14:22,200
But there you go. That shows the power
of this.

404
00:14:22,200 --> 00:14:25,700
Dimensionality reduction technique linear
discriminant analysis.

405
00:14:26,166 --> 00:14:29,400
So now we're going to move on
to the next practical activity.

406
00:14:29,400 --> 00:14:32,800
The next section on this time kernel PCA.

407
00:14:33,033 --> 00:14:33,900
And we can just.

408
00:14:33,900 --> 00:14:37,866
Hope that we'll get, you know,
at least as good results as PCA

409
00:14:37,900 --> 00:14:41,400
or some as good results as LDA. In other.

410
00:14:41,400 --> 00:14:42,966
Words, let's hope that we get.

411
00:14:42,966 --> 00:14:45,966
Maximum one incorrect. Prediction.

412
00:14:46,100 --> 00:14:47,366
So I look forward to seeing you.

413
00:14:47,366 --> 00:14:50,200
In this next section
to implement kernel PCA.

414
00:14:50,200 --> 00:14:52,200
And until then, enjoy machine learning.