1
00:00:00,266 --> 00:00:01,200
Hello and welcome to

2
00:00:01,200 --> 00:00:04,666
this art tutorial and welcome to part
nine Dimensionality Reduction.

3
00:00:05,000 --> 00:00:08,366
So we are starting this part
with our first technique

4
00:00:08,366 --> 00:00:12,900
of dimensionality reduction
which is PCA principal component analysis.

5
00:00:13,233 --> 00:00:15,166
And so you know
in dimensionality reduction

6
00:00:15,166 --> 00:00:18,500
there are two techniques
feature selection and feature extraction.

7
00:00:18,766 --> 00:00:22,266
We did feature selection in part two
when we implemented the backward

8
00:00:22,266 --> 00:00:25,866
elimination model
to select the most relevant features

9
00:00:26,066 --> 00:00:27,233
of our matrix of features.

10
00:00:27,233 --> 00:00:30,700
That is the features that explained
the most the dependent variable.

11
00:00:30,933 --> 00:00:33,766
And now we are starting this new technique
of dimensionality

12
00:00:33,766 --> 00:00:36,900
reduction,
which is feature extraction and PCA.

13
00:00:36,900 --> 00:00:40,766
Principal component analysis is
one feature extraction technique.

14
00:00:41,100 --> 00:00:45,566
So as a reminder let's say your matrix
of features has m independent variables.

15
00:00:45,833 --> 00:00:47,800
Well what PCA will do is that it

16
00:00:47,800 --> 00:00:51,600
will extract a smaller number
of your independent variables.

17
00:00:51,800 --> 00:00:55,433
But there are going to be new independent
variables like new dimensions.

18
00:00:55,800 --> 00:00:58,766
And these new independent variables
extracted are going to be

19
00:00:58,766 --> 00:01:01,466
some new independent variables
that explained the most

20
00:01:01,466 --> 00:01:03,466
the variance of your data set.

21
00:01:03,466 --> 00:01:06,433
And that is regardless
of your dependent variable.

22
00:01:06,433 --> 00:01:10,500
And that makes PCA
an unsupervised model in the sense

23
00:01:10,500 --> 00:01:13,700
that we don't consider
the dependent variable in the model.

24
00:01:14,200 --> 00:01:15,300
So that's PCA.

25
00:01:15,300 --> 00:01:16,866
And remember in part

26
00:01:16,866 --> 00:01:19,866
two and part three we worked with 1
or 2 independent variables.

27
00:01:19,966 --> 00:01:22,266
Well that was for two specific purposes.

28
00:01:22,266 --> 00:01:26,433
The first purpose is that we needed
a graphic visualization of our results.

29
00:01:26,733 --> 00:01:30,966
And since each independent variable
corresponded to one dimension in the plot,

30
00:01:31,133 --> 00:01:35,233
well, we could visualize our results
with at most two independent variables.

31
00:01:35,566 --> 00:01:39,233
And the second reason is that
thanks to this PCA dimensionality

32
00:01:39,233 --> 00:01:40,400
reduction technique.

33
00:01:40,400 --> 00:01:43,733
Well, even if we have a lot of independent
variables at the beginning,

34
00:01:44,033 --> 00:01:47,300
well, we can end up with much less
independent variables.

35
00:01:47,633 --> 00:01:51,266
But that are going to be relevant
independent variables, because

36
00:01:51,466 --> 00:01:55,800
these independent variables will explain
the most the variance of your data set.

37
00:01:56,200 --> 00:01:59,633
And therefore, since we can reduce
this number of independent variables,

38
00:01:59,933 --> 00:02:03,000
well, we can end up with 2
or 3 independent variables

39
00:02:03,000 --> 00:02:06,300
and therefore visualize the results
as we did in part three.

40
00:02:06,600 --> 00:02:09,866
And this is exactly what we're going to do
in this tutorial in the following

41
00:02:09,866 --> 00:02:12,500
tutorials of this section
and the following sections.

42
00:02:12,500 --> 00:02:14,033
When we cover other dimensionality

43
00:02:14,033 --> 00:02:17,666
reduction techniques
like LDA and also kernel PCA.

44
00:02:18,066 --> 00:02:20,533
Well, we will have many features
at the beginning

45
00:02:20,533 --> 00:02:23,400
and therefore it will be impossible
to visualize the results.

46
00:02:23,400 --> 00:02:28,033
But then when we apply PCA or LDA,
we will reduce the number of features

47
00:02:28,033 --> 00:02:31,400
down to two and therefore will be able
to visualize the results.

48
00:02:31,800 --> 00:02:35,200
So let's start right now and let's start
by setting the right folder

49
00:02:35,200 --> 00:02:38,133
as working directory.
So as usual we go to our machine learning.

50
00:02:38,133 --> 00:02:39,266
It is that folder.

51
00:02:39,266 --> 00:02:41,900
Then part nine dimensionality reduction.

52
00:02:41,900 --> 00:02:44,600
And here we are
in the first section of this part nine

53
00:02:44,600 --> 00:02:46,433
Principal Component analysis.

54
00:02:46,433 --> 00:02:48,800
That's our first technique.
Let's click on it.

55
00:02:48,800 --> 00:02:51,633
And that's the folder
we want to set as working directory.

56
00:02:51,633 --> 00:02:53,800
Make sure that you have
the one dot CSV file.

57
00:02:53,800 --> 00:02:56,600
And if that's the case you're ready
to click on this more button here

58
00:02:56,600 --> 00:02:59,400
to set this folder as Working directory.

59
00:02:59,400 --> 00:03:00,300
Perfect.

60
00:03:00,300 --> 00:03:03,933
And now we're going to open another file
that we made in part

61
00:03:03,933 --> 00:03:08,133
three classification
which is our logistic regression file.

62
00:03:08,400 --> 00:03:12,566
Because what we're going to do
is take this logistic regression code.

63
00:03:12,866 --> 00:03:14,866
Then we are going to change
the name of the data set,

64
00:03:14,866 --> 00:03:19,766
because we will be working on a new data
set, which will be the wine dot CSV file.

65
00:03:20,100 --> 00:03:23,100
And we will apply PCA on this data set.

66
00:03:23,333 --> 00:03:26,366
And of course I will explain quickly
the business problem behind it.

67
00:03:26,700 --> 00:03:30,966
So I'm going to take everything from here
down to the bottom.

68
00:03:31,300 --> 00:03:33,166
Here we go. Copy.

69
00:03:33,166 --> 00:03:37,466
And I'm going to paste that in my PCA file
this way.

70
00:03:38,100 --> 00:03:41,566
So let's go up and let's change
the name of the data set.

71
00:03:41,566 --> 00:03:44,400
This is not social network add CSV.

72
00:03:44,400 --> 00:03:47,433
This is now one dot CSV.

73
00:03:47,866 --> 00:03:48,900
That's perfect.

74
00:03:48,900 --> 00:03:52,200
So now what we're going to do
is first import this data set

75
00:03:52,466 --> 00:03:54,233
and then apply data preprocessing.

76
00:03:54,233 --> 00:03:55,500
Maybe we'll need to change

77
00:03:55,500 --> 00:03:59,433
some things like the index is here
but this will be very quick.

78
00:03:59,600 --> 00:04:02,033
So first let's import the data set.

79
00:04:02,033 --> 00:04:05,033
So I'm going to select this line
and execute.

80
00:04:05,066 --> 00:04:07,366
Here we go. The data set is well imported.

81
00:04:07,366 --> 00:04:08,233
Here it is.

82
00:04:08,233 --> 00:04:11,300
And now let's expand
the business problem behind it okay.

83
00:04:11,300 --> 00:04:13,566
So first of all
this is a very famous data set

84
00:04:13,566 --> 00:04:15,766
well known in the machine
learning literature.

85
00:04:15,766 --> 00:04:20,333
And that you can find on the UCI machine
Learning repository, as you can see here.

86
00:04:20,333 --> 00:04:22,366
And you can find this page at this link.

87
00:04:23,366 --> 00:04:24,100
So basically

88
00:04:24,100 --> 00:04:27,633
first what are the independent variables
and what is the dependent variable.

89
00:04:28,066 --> 00:04:31,800
Well the independent variables
are all the variables from this one.

90
00:04:31,800 --> 00:04:34,800
Alcohol up to this one proline.

91
00:04:34,833 --> 00:04:39,066
And this last variable customer
segment is the dependent variable.

92
00:04:39,366 --> 00:04:43,800
So in the original data set this dependent
variable is not called customer segment.

93
00:04:43,800 --> 00:04:46,166
This is actually the origin of the wine.

94
00:04:46,166 --> 00:04:49,800
But let's imagine that we as data
scientist are working

95
00:04:49,800 --> 00:04:51,266
for one business owner.

96
00:04:51,266 --> 00:04:54,866
And this one business owner gathered
all these informations in this data set.

97
00:04:55,200 --> 00:04:59,300
And so first what this business owner did
is that it gathered all the informations

98
00:04:59,300 --> 00:05:01,000
of these independent variables here

99
00:05:01,000 --> 00:05:04,533
that are chemical informations
of several wines.

100
00:05:04,966 --> 00:05:08,200
And this business owner
applied some clustering technique

101
00:05:08,400 --> 00:05:10,633
to find some segments of customers

102
00:05:10,633 --> 00:05:14,333
that like a specific wine,
depending on the informations of the wine.

103
00:05:14,666 --> 00:05:17,500
And by applying these clustering
techniques,

104
00:05:17,500 --> 00:05:20,800
this business owner identified
three segments of customers.

105
00:05:20,800 --> 00:05:22,033
That's the first one here.

106
00:05:22,033 --> 00:05:25,066
Then we have the second one
and eventually the third one.

107
00:05:26,133 --> 00:05:27,033
So based on these

108
00:05:27,033 --> 00:05:30,066
informations
and thanks to its clustering techniques,

109
00:05:30,233 --> 00:05:34,066
well this one business owner managed
to find some segments of customers.

110
00:05:34,333 --> 00:05:38,100
Each segment having a specific preference
for a specific wine.

111
00:05:38,466 --> 00:05:41,500
So basically this business owner
found three types of wines,

112
00:05:41,800 --> 00:05:43,166
each type of one corresponding

113
00:05:43,166 --> 00:05:47,000
to one segment of customers
and therefore three segments of customers.

114
00:05:47,400 --> 00:05:49,966
And why does it create added value
for his business?

115
00:05:49,966 --> 00:05:51,033
Well, that's because now

116
00:05:51,033 --> 00:05:54,900
what this business owner can do is
take all these informations of the wines,

117
00:05:55,100 --> 00:05:58,100
as well as the information
about the customer segments,

118
00:05:58,300 --> 00:06:01,400
and make a classification model
like logistic regression,

119
00:06:01,533 --> 00:06:04,500
in which the independent variables
are all these variables

120
00:06:04,500 --> 00:06:07,500
and the dependent variable
is the customer segment.

121
00:06:07,500 --> 00:06:11,866
And therefore for each new wine,
it can predict to which customer segment

122
00:06:12,266 --> 00:06:14,033
it should recommend. This wine.

123
00:06:14,033 --> 00:06:16,633
So that adds a lot of value
for this business owner.

124
00:06:16,633 --> 00:06:19,500
But then if this business owner
wants to have a clear visual

125
00:06:19,500 --> 00:06:22,500
look at the prediction regions
and the prediction boundary

126
00:06:22,600 --> 00:06:25,566
of the classification model
that we're going to build, to be able to

127
00:06:25,566 --> 00:06:29,300
see if the predictions are in the right
spot of the customer segments.

128
00:06:29,300 --> 00:06:33,500
Well, it cannot be done with all these
independent variables because of course,

129
00:06:33,500 --> 00:06:37,333
we cannot represent these many independent
variables in one plot.

130
00:06:37,333 --> 00:06:38,400
That's impossible.

131
00:06:38,400 --> 00:06:42,066
So what we need to do is apply
some dimensionality reduction techniques

132
00:06:42,266 --> 00:06:43,233
to extract

133
00:06:43,233 --> 00:06:46,333
two independent variables
that explain the most the variance,

134
00:06:46,500 --> 00:06:49,366
and then we'll be able to see
the prediction regions

135
00:06:49,366 --> 00:06:50,766
and the prediction boundary.

136
00:06:50,766 --> 00:06:54,600
And therefore we will clearly be able
to see where the customer segments are

137
00:06:54,733 --> 00:06:56,000
and where these predictions

138
00:06:56,000 --> 00:06:59,733
of the customer segments are according
to the extracted features

139
00:06:59,733 --> 00:07:02,733
of all the informations
of our independent variables.

140
00:07:02,800 --> 00:07:06,633
And remember, these extracted features
are called the principal components.

141
00:07:07,233 --> 00:07:07,600
All right.

142
00:07:07,600 --> 00:07:11,733
So now that we understand the challenge
and the business problem, let's apply PCA

143
00:07:11,900 --> 00:07:15,100
to see how we can reduce
the dimensionality of this data set.

144
00:07:15,100 --> 00:07:19,033
Because indeed it contains 13 dimensions
because it contains 13

145
00:07:19,033 --> 00:07:20,400
independent variables.

146
00:07:20,400 --> 00:07:24,066
And we'll see how we can use PCA
to reduce this number

147
00:07:24,066 --> 00:07:27,200
of independent variables
down to two independent variables.

148
00:07:27,200 --> 00:07:28,266
But be careful.

149
00:07:28,266 --> 00:07:31,800
It's important to understand
that the new two independent variables

150
00:07:31,800 --> 00:07:35,166
that will have in the end
are going to be new ones, as opposed to

151
00:07:35,166 --> 00:07:39,066
feature selection, where, you know,
we end up with two independent variables

152
00:07:39,066 --> 00:07:42,866
that are among these original 13
independent variables.

153
00:07:43,100 --> 00:07:45,733
Here with PCA,
we'll get new extracted one.

154
00:07:45,733 --> 00:07:47,466
And that's the important distinction

155
00:07:47,466 --> 00:07:50,466
to make between feature selection
and feature extraction.

156
00:07:51,100 --> 00:07:51,400
All right.

157
00:07:51,400 --> 00:07:55,800
So before we apply PCA as usual
we need to preprocess the data.

158
00:07:56,133 --> 00:08:00,133
And this is actually going to be
very quick because our template is ready.

159
00:08:00,133 --> 00:08:02,533
We will just need to change
just a few things.

160
00:08:02,533 --> 00:08:05,333
So first data set equals
data set three five.

161
00:08:05,333 --> 00:08:09,666
That's just to select the independent
variables that matter for our problem.

162
00:08:09,900 --> 00:08:11,366
But here everything matters.

163
00:08:11,366 --> 00:08:14,366
We just want to reduce the dimensionality
of this data set.

164
00:08:14,500 --> 00:08:17,300
So we will keep all our independent
variables here.

165
00:08:17,300 --> 00:08:19,300
And therefore
we don't need this line here.

166
00:08:19,300 --> 00:08:22,600
So I will just remove it okay.

167
00:08:22,600 --> 00:08:24,766
So first section
and bring the data set ready.

168
00:08:24,766 --> 00:08:25,666
Well executed.

169
00:08:25,666 --> 00:08:27,666
Now let's move on to the next section.

170
00:08:27,666 --> 00:08:31,000
So the next section is about splitting
the data sets into the training set.

171
00:08:31,000 --> 00:08:32,400
And the test set.

172
00:08:32,400 --> 00:08:33,900
And here be careful.

173
00:08:33,900 --> 00:08:37,600
We just need to change this
name of the dependent variable.

174
00:08:37,900 --> 00:08:41,633
Because in logistic regression
we're dealing with the social network add

175
00:08:41,700 --> 00:08:42,900
CSV file.

176
00:08:42,900 --> 00:08:44,700
And the dependent variable was purchased.

177
00:08:44,700 --> 00:08:47,800
But now for a new business problem
the dependent variable

178
00:08:47,800 --> 00:08:49,033
is not called purchased.

179
00:08:49,033 --> 00:08:51,200
It is called customer segment.

180
00:08:51,200 --> 00:08:56,000
So we just need to replace purchased here
by customer segment.

181
00:08:56,100 --> 00:08:56,666
Here we go.

182
00:08:57,633 --> 00:08:58,000
All right.

183
00:08:58,000 --> 00:09:00,966
Do we keep a split ratio of 75%.

184
00:09:00,966 --> 00:09:04,333
Well let's rather take 80%.

185
00:09:04,333 --> 00:09:05,400
But that's as you want.

186
00:09:05,400 --> 00:09:08,933
It's just that
80% is a good split ratio to take.

187
00:09:08,933 --> 00:09:10,266
So we will go with that.

188
00:09:10,266 --> 00:09:13,566
And then here for training set and test
set we don't need to change anything.

189
00:09:13,566 --> 00:09:17,900
So we are ready to split our data set
into the training set and the test set.

190
00:09:17,900 --> 00:09:21,666
So let's do it
I'm going to select all this section here

191
00:09:21,900 --> 00:09:24,966
and press Command Plus Enter to execute.

192
00:09:25,366 --> 00:09:26,433
Here we go.

193
00:09:26,433 --> 00:09:30,300
The training set is now created
as well as the test set.

194
00:09:31,233 --> 00:09:31,966
Great.

195
00:09:31,966 --> 00:09:34,300
So ready to move on to the next section.

196
00:09:34,300 --> 00:09:36,600
The next section is about feature scaling.

197
00:09:36,600 --> 00:09:40,000
And for PCA
it is way better to apply feature scaling.

198
00:09:40,000 --> 00:09:43,466
You can actually apply it
by playing with the parameters

199
00:09:43,666 --> 00:09:46,666
of the PCA function
that we're going to use afterwards.

200
00:09:46,833 --> 00:09:49,800
But let's take this feature
scaling part of our code

201
00:09:49,800 --> 00:09:52,800
template
to put our features on the same scale.

202
00:09:52,833 --> 00:09:55,866
So here
we just need to change the indexes.

203
00:09:56,000 --> 00:10:00,033
We actually need to specify the indexes
of the features we want to scale.

204
00:10:00,266 --> 00:10:04,100
So basically the features we want to scale
are all the features

205
00:10:04,100 --> 00:10:06,200
from alcohol to proline.

206
00:10:06,200 --> 00:10:09,633
And so what we can do is specify
that we want to scale all the variables

207
00:10:09,633 --> 00:10:13,700
except the last one customer segment
that has index 14.

208
00:10:13,966 --> 00:10:18,033
So therefore here instead of
putting the indexes of the features

209
00:10:18,233 --> 00:10:21,233
we can replace it by -14.

210
00:10:21,333 --> 00:10:22,733
We can remove that.

211
00:10:22,733 --> 00:10:26,666
Let's copy this because
we will do the same for the others.

212
00:10:26,666 --> 00:10:29,666
So let's replace that here by -14.

213
00:10:30,100 --> 00:10:32,466
And -14 here as well.

214
00:10:32,466 --> 00:10:35,266
And eventually -14.

215
00:10:35,266 --> 00:10:35,566
All right.

216
00:10:35,566 --> 00:10:37,866
So now the feature scaling part is ready.

217
00:10:37,866 --> 00:10:40,533
So we are ready to select the section

218
00:10:40,533 --> 00:10:44,000
and press Command or Control
plus enter to execute.

219
00:10:44,200 --> 00:10:46,800
And now all our variables are scaled.

220
00:10:46,800 --> 00:10:50,566
As you can see we can clearly see that
all our features are on the same scale.

221
00:10:50,933 --> 00:10:55,200
And of course the customer segments
kept its labels one, two and three.

222
00:10:55,800 --> 00:10:57,900
And same for the test set.
Let's make sure that.

223
00:10:57,900 --> 00:10:59,366
All right perfect.

224
00:10:59,366 --> 00:11:01,166
So feature scaling done.

225
00:11:01,166 --> 00:11:04,633
And actually the pre-processing
phase is completed.

226
00:11:04,800 --> 00:11:07,033
So we did that quite efficiently.

227
00:11:07,033 --> 00:11:09,600
But that's good because now
we're getting to the exciting part.

228
00:11:09,600 --> 00:11:12,100
Applying PCA to our data.

229
00:11:12,100 --> 00:11:14,900
So actually we will do that right here.

230
00:11:14,900 --> 00:11:17,900
You apply PCA
right after the data preprocessing phase.

231
00:11:17,966 --> 00:11:21,766
And just before you fit your logistic
regression model to the training set,

232
00:11:21,766 --> 00:11:22,733
because of course,

233
00:11:22,733 --> 00:11:27,266
you want to train your model on your new
data set with the new extracted features,

234
00:11:27,533 --> 00:11:29,533
that is,
with the two new extracted features

235
00:11:29,533 --> 00:11:31,300
that will explain the most variance.

236
00:11:31,300 --> 00:11:35,300
And after you trained your classifier,
you're ready to predict the test results.

237
00:11:35,300 --> 00:11:36,700
Make the confusion matrix.

238
00:11:36,700 --> 00:11:39,166
Then you can also visualize
the training set results.

239
00:11:39,166 --> 00:11:42,833
Remember, this section is applied
on a data set that contains two features,

240
00:11:43,133 --> 00:11:46,233
and so we will see what we get
by extracting these two new features.

241
00:11:46,933 --> 00:11:50,400
All right so to finish this tutorial
I'm just going to introduce

242
00:11:50,400 --> 00:11:53,800
this new section here
that I'm going to call applying

243
00:11:55,266 --> 00:11:56,866
PCA. All right.

244
00:11:56,866 --> 00:11:59,600
And in the next tutorial
we are going to apply PCA.

245
00:11:59,600 --> 00:12:04,100
And then eventually we will build
our model on our new reduce data set.

246
00:12:04,433 --> 00:12:06,600
So I look forward to doing that
in the next tutorial.

247
00:12:06,600 --> 00:12:09,600
And until then enjoy machine learning.