1 00:00:00,200 --> 00:00:02,500 Hello and welcome to this art tutorial. 2 00:00:02,500 --> 00:00:05,833 So now we know how to implement two feature extraction techniques. 3 00:00:06,033 --> 00:00:08,100 These are PCA and LDA. 4 00:00:08,100 --> 00:00:11,466 But these feature extraction techniques work on linear problems. 5 00:00:11,466 --> 00:00:14,400 That is when the data is linearly separable. 6 00:00:14,400 --> 00:00:17,666 And in this section we are going to see one new feature extraction technique. 7 00:00:17,666 --> 00:00:19,000 But this time adapt it 8 00:00:19,000 --> 00:00:22,433 for nonlinear problems where the data is non-linearly separable. 9 00:00:23,000 --> 00:00:26,000 So this technique is called kernel PCA. 10 00:00:26,033 --> 00:00:27,233 Kernel PCA is. 11 00:00:27,233 --> 00:00:29,066 A kernel sized version. Of PCA. 12 00:00:29,066 --> 00:00:31,300 Where we map the data to a higher dimension 13 00:00:31,300 --> 00:00:32,433 using the kernel trick, 14 00:00:32,433 --> 00:00:35,433 and then from there we extract some new principal components, 15 00:00:35,600 --> 00:00:38,966 and we are going to see how it manages to deal with non-linear problems. 16 00:00:39,466 --> 00:00:41,800 So we're not going to work on the same problem 17 00:00:41,800 --> 00:00:44,866 as we did in the previous sections with the one data set, 18 00:00:45,100 --> 00:00:48,400 but we're going to work on the same data set as the one used in part 19 00:00:48,400 --> 00:00:50,833 three classification. Because now we need visuals. 20 00:00:50,833 --> 00:00:52,800 We need to clearly see what happens. 21 00:00:52,800 --> 00:00:53,066 We need. 22 00:00:53,066 --> 00:00:55,100 To see how kernel PCA. 23 00:00:55,100 --> 00:00:58,566 Manages to extract some new independent variables. 24 00:00:58,566 --> 00:01:00,133 The principal components. 25 00:01:00,133 --> 00:01:02,400 Even when the problem is non-linear, that is, 26 00:01:02,400 --> 00:01:04,566 when the data is not linearly separable. 27 00:01:04,566 --> 00:01:08,066 And this data set that we used in part three, the social network 28 00:01:08,066 --> 00:01:11,700 Ads data sets will remember it was clearly a nonlinear problem 29 00:01:11,700 --> 00:01:15,133 because nonlinear classifiers showed much better performance. 30 00:01:15,433 --> 00:01:17,700 So let's take this data set and let's apply. 31 00:01:17,700 --> 00:01:19,166 Konopka to see how it. 32 00:01:19,166 --> 00:01:21,133 Will handle the non-linearity. 33 00:01:21,133 --> 00:01:24,400 So let's find this data set into our working directory folder. 34 00:01:24,633 --> 00:01:27,200 So we'll go to our machine learning A to z folder. 35 00:01:27,200 --> 00:01:29,466 Then part nine dimensionality reduction. 36 00:01:29,466 --> 00:01:33,433 And here we are at the last section of this part nine kernel PCA. 37 00:01:33,800 --> 00:01:36,200 So that's the folder you want to set as a working directory. 38 00:01:36,200 --> 00:01:39,000 Make sure that you have the social network and CSV file. 39 00:01:39,000 --> 00:01:40,066 And if that's the case you 40 00:01:40,066 --> 00:01:43,833 ready to click on this more button here to set the folder as working directory. 41 00:01:44,400 --> 00:01:48,600 And now what we're going to do is take this logistic regression model. 42 00:01:48,633 --> 00:01:52,600 Because you know this logistic regression model is a linear classifier. 43 00:01:53,100 --> 00:01:56,233 Therefore it will not be appropriate for our problem because 44 00:01:56,400 --> 00:01:58,466 our data is not linearly separable. 45 00:01:58,466 --> 00:02:02,500 So what we're going to do is take this linear classifier here. 46 00:02:02,800 --> 00:02:07,366 But we are going to apply kernel PCA inside of it to see how kernel. 47 00:02:07,366 --> 00:02:09,866 PCA will. Save the situation. 48 00:02:09,866 --> 00:02:13,900 And so you will see that even if we apply a linear model well thanks to kernel. 49 00:02:13,900 --> 00:02:14,533 PCA that. 50 00:02:14,533 --> 00:02:15,900 Will manage to extract 51 00:02:15,900 --> 00:02:20,166 new principal components adapted for this non-linearly separable data. 52 00:02:20,333 --> 00:02:22,866 Well, you will see that we will get amazing results. 53 00:02:22,866 --> 00:02:27,766 So right now let's copy the whole model here from the top down to the bottom. 54 00:02:28,366 --> 00:02:31,100 Copy and let's paste it in our common. 55 00:02:31,100 --> 00:02:34,100 PCA. File. All right. 56 00:02:34,133 --> 00:02:38,133 And now basically the only thing that we have to do is to apply. 57 00:02:38,133 --> 00:02:40,500 Kernel PCA. At the right place. 58 00:02:40,500 --> 00:02:43,533 But before we do that I would just like us to visualize again 59 00:02:43,533 --> 00:02:47,266 why this linear model is not appropriate for this data set. 60 00:02:47,600 --> 00:02:50,700 So what we're going to do is take everything from here. 61 00:02:50,700 --> 00:02:51,500 Because, you know, this 62 00:02:51,500 --> 00:02:55,466 will visualize the training set results by plotting the prediction regions 63 00:02:55,466 --> 00:02:56,766 and the prediction boundary. 64 00:02:56,766 --> 00:02:59,766 So we're going to take everything from here up to the top 65 00:02:59,766 --> 00:03:03,400 to you know, import the data set, apply the preprocessing phase, 66 00:03:03,600 --> 00:03:06,000 fit the logistic regression to the training set 67 00:03:06,000 --> 00:03:08,333 and eventually plot the training set results. 68 00:03:08,333 --> 00:03:09,333 So let's do it. 69 00:03:09,333 --> 00:03:11,866 Let's visualize this again very quickly. 70 00:03:11,866 --> 00:03:14,733 And that will give us motivation to apply kernel PCA. 71 00:03:16,366 --> 00:03:17,266 And here we go. 72 00:03:17,266 --> 00:03:19,133 All executed properly. 73 00:03:19,133 --> 00:03:22,100 So as a reminder the points are the real observations. 74 00:03:22,100 --> 00:03:24,933 That is our real customers in the social network 75 00:03:24,933 --> 00:03:27,933 represented by their age and their estimated salary. 76 00:03:28,233 --> 00:03:30,400 So that's our real observation points. 77 00:03:30,400 --> 00:03:33,766 And our predictions are represented by these regions. 78 00:03:33,766 --> 00:03:36,133 The red region here and the green region here. 79 00:03:36,133 --> 00:03:39,833 And basically this red region is where our model predicts 80 00:03:39,966 --> 00:03:42,833 that the customer will not click on the ad. 81 00:03:42,833 --> 00:03:43,766 And this green region 82 00:03:43,766 --> 00:03:47,866 here is the region where a model predicts that the customers will click on the ad. 83 00:03:47,866 --> 00:03:49,766 And by the. SVM. 84 00:03:49,766 --> 00:03:54,266 And so remember, the problem was that this straight line here is 85 00:03:54,266 --> 00:03:57,933 actually the prediction boundary generated by the logistic regression model. 86 00:03:58,200 --> 00:04:01,700 But since the logistic regression model is a linear classifier, 87 00:04:01,700 --> 00:04:04,700 then it has to be a straight line here separating the data. 88 00:04:04,700 --> 00:04:07,733 And therefore remember the problem is that it cannot make some kind 89 00:04:07,733 --> 00:04:11,166 of a curve here to catch these green users. 90 00:04:11,166 --> 00:04:14,166 That should be in the green region right now they're in the red region. 91 00:04:14,300 --> 00:04:18,133 And so this clearly represents the fact that our data is not linearly separable, 92 00:04:18,300 --> 00:04:20,133 because we can clearly see that 93 00:04:20,133 --> 00:04:23,700 this prediction boundary here that plays the role of separator. 94 00:04:23,700 --> 00:04:26,400 And that is supposed to separate the two classes. 95 00:04:26,400 --> 00:04:29,166 Well, it cannot separate the two classes properly because. 96 00:04:29,166 --> 00:04:30,033 As you can see, these. 97 00:04:30,033 --> 00:04:32,633 Users are not in the right region. 98 00:04:32,633 --> 00:04:36,400 And so now what we're going to do is not make a non-linear classifier 99 00:04:36,400 --> 00:04:37,700 like we did in part three. 100 00:04:37,700 --> 00:04:41,466 You know, when we made kernel, SVM, Naive Bayes, decision trees or random forest. 101 00:04:41,766 --> 00:04:44,700 Well, what we're going to do now instead is applied. 102 00:04:44,700 --> 00:04:45,700 Kernel PCA. 103 00:04:45,700 --> 00:04:49,966 So that we keep a straight line as the separator, as the prediction 104 00:04:49,966 --> 00:04:51,700 boundary of a linear classifier. 105 00:04:51,700 --> 00:04:55,233 That is still going to be the prediction boundary of a logistic regression model. 106 00:04:55,666 --> 00:04:58,666 But since we're going to apply. Kernel PCA. 107 00:04:58,800 --> 00:05:02,700 Well, this will manage to apply some trick where the trick is actually 108 00:05:02,700 --> 00:05:06,666 the kernel trick to map the data into a higher dimension and then apply. 109 00:05:06,666 --> 00:05:07,600 PCA to. 110 00:05:07,600 --> 00:05:12,666 Extract new components that will be new dimensions that explain the most variance. 111 00:05:12,900 --> 00:05:15,466 But thanks to this kernel trick, well. 112 00:05:15,466 --> 00:05:16,633 You'll see that we'll manage. 113 00:05:16,633 --> 00:05:21,133 To get some new dimensions in which the data will be linearly separable 114 00:05:21,333 --> 00:05:24,333 even by a linear classifier like logistic regression. 115 00:05:24,433 --> 00:05:25,500 So let's see that right now. 116 00:05:25,500 --> 00:05:28,433 I can't wait to show you this. I'm going to close this. 117 00:05:28,433 --> 00:05:31,433 And now let's apply kernel. PCA. 118 00:05:31,433 --> 00:05:32,800 At the right location. 119 00:05:32,800 --> 00:05:34,666 So you already know what this location is. 120 00:05:34,666 --> 00:05:37,466 It's actually not different than before. 121 00:05:37,466 --> 00:05:39,266 We need to apply. Kernel PCA. 122 00:05:39,266 --> 00:05:41,266 Right after the data preprocessing phase. 123 00:05:41,266 --> 00:05:44,733 And just before fitting our classifier like logistic regression 124 00:05:44,733 --> 00:05:46,000 to our training set. 125 00:05:46,000 --> 00:05:46,900 So basically. 126 00:05:46,900 --> 00:05:49,866 We need to apply. Kernel PCA. Right here. 127 00:05:49,866 --> 00:05:52,200 So use section here. Applying 128 00:05:53,300 --> 00:05:55,666 kernel PCA. 129 00:05:55,666 --> 00:05:57,766 And here we go. Let's do it. 130 00:05:57,766 --> 00:05:58,100 All right. 131 00:05:58,100 --> 00:06:01,633 So first we need to install a new package that is called kernel lab 132 00:06:01,800 --> 00:06:03,566 which I don't think we've installed before. 133 00:06:03,566 --> 00:06:04,900 So let's do it right now. 134 00:06:04,900 --> 00:06:08,266 So we use the command install dot packages. 135 00:06:08,733 --> 00:06:09,500 Here we go. 136 00:06:09,500 --> 00:06:12,366 And in quotes kernel lab. 137 00:06:12,366 --> 00:06:13,166 All right. 138 00:06:13,166 --> 00:06:15,533 So I think I already have it installed. 139 00:06:15,533 --> 00:06:16,966 Let's check it out. 140 00:06:16,966 --> 00:06:20,033 So here it is kernel lab kernel based machine learning lab. 141 00:06:20,033 --> 00:06:21,900 So I will not install it. 142 00:06:21,900 --> 00:06:24,900 But if you want to do it you just select this line and execute. 143 00:06:25,100 --> 00:06:27,666 So I will just put this line as command. 144 00:06:27,666 --> 00:06:28,500 Here we go. 145 00:06:28,500 --> 00:06:32,366 But then since it is not imported I will import it using the library 146 00:06:33,066 --> 00:06:35,500 command kernel lab. 147 00:06:35,500 --> 00:06:36,200 All right. 148 00:06:36,200 --> 00:06:38,900 And that will import it. 149 00:06:38,900 --> 00:06:41,100 All right. Kernel lab will import it. 150 00:06:41,100 --> 00:06:44,100 And now let's start applying kernel PCA. 151 00:06:44,766 --> 00:06:46,366 So as. For PCA. 152 00:06:46,366 --> 00:06:49,966 And LDA we're going to start by creating an object which will. 153 00:06:49,966 --> 00:06:50,933 Be the kernel PCA. 154 00:06:50,933 --> 00:06:54,133 Object that we will use to transform our original data 155 00:06:54,133 --> 00:06:57,366 set into this new data set after using the kernel trick. 156 00:06:57,733 --> 00:07:00,733 So we'll call this object k. PCA. 157 00:07:00,900 --> 00:07:01,666 And then equals. 158 00:07:01,666 --> 00:07:05,100 And then that's where we use the function that will create this kernel. 159 00:07:05,100 --> 00:07:06,366 PCA. Object. 160 00:07:06,366 --> 00:07:09,366 So this function is also k PCA. 161 00:07:09,366 --> 00:07:10,466 Then parentheses. 162 00:07:10,466 --> 00:07:13,066 And then let's input the different arguments. 163 00:07:13,066 --> 00:07:14,100 So let's check it out. 164 00:07:14,100 --> 00:07:17,833 Let's press F1 here to have a look at the arguments. 165 00:07:18,633 --> 00:07:20,500 So the first argument is x. 166 00:07:20,500 --> 00:07:24,533 And this is actually the data matrix of the formula describing the model. 167 00:07:24,766 --> 00:07:27,366 And here I'll give you a little trick to describe the model. 168 00:07:27,366 --> 00:07:29,566 Very simply and very efficiently. 169 00:07:29,566 --> 00:07:33,233 We can simply input here a tilde and dot. 170 00:07:33,566 --> 00:07:34,733 And that will be enough for the. 171 00:07:34,733 --> 00:07:37,733 KPK. Function to understand what the formula is, 172 00:07:38,033 --> 00:07:42,166 because then we will add the second argument which is data, 173 00:07:42,300 --> 00:07:46,366 and that is actually the training set but without the dependent variable. 174 00:07:46,400 --> 00:07:47,800 Because remember kernel. 175 00:07:47,800 --> 00:07:49,800 PCA is just a PCA technique. 176 00:07:49,800 --> 00:07:53,866 Where we use the kernel trick to map the data into higher dimension 177 00:07:53,866 --> 00:07:55,133 and then apply. PCA. 178 00:07:55,133 --> 00:07:59,000 Because indeed, in this higher dimension, the data is linearly separable. 179 00:07:59,266 --> 00:08:01,300 And therefore, since we. Apply PCA. 180 00:08:01,300 --> 00:08:03,600 In this higher dimension. And PCA is an. 181 00:08:03,600 --> 00:08:06,733 Unsupervised technique, well, here for the data argument, 182 00:08:06,733 --> 00:08:10,800 we just need to input the training set but without the dependent variable. 183 00:08:11,033 --> 00:08:11,933 And therefore. 184 00:08:11,933 --> 00:08:13,066 As for PCA, we. 185 00:08:13,066 --> 00:08:18,366 Input here data equal training set then brackets 186 00:08:18,366 --> 00:08:21,666 to remove the dependent variable which is indexed by three. 187 00:08:22,033 --> 00:08:25,033 Because we only have two independent variables. 188 00:08:25,066 --> 00:08:25,533 All right. 189 00:08:25,533 --> 00:08:27,666 And then the next argument is kernel. 190 00:08:27,666 --> 00:08:31,800 So kernel is the kernel you want to use to apply the kernel trick. 191 00:08:32,000 --> 00:08:36,000 Remember when we studied kernel SVM we saw that there were several kernels 192 00:08:36,133 --> 00:08:37,700 to use the kernel trick. 193 00:08:37,700 --> 00:08:41,000 And here we're going to use the most common one which is the Gaussian kernel. 194 00:08:41,200 --> 00:08:43,566 And that is called here RBF dot. 195 00:08:43,566 --> 00:08:45,900 So that's our third argument. 196 00:08:45,900 --> 00:08:50,966 And so here we input kernel equals RBF dot. 197 00:08:52,066 --> 00:08:52,700 All right. 198 00:08:52,700 --> 00:08:54,600 And then what is the next argument. 199 00:08:54,600 --> 00:08:56,700 The next argument is k bar. 200 00:08:56,700 --> 00:08:58,566 We will actually not use this one. 201 00:08:58,566 --> 00:09:02,466 But then we have a very important argument that is at the heart 202 00:09:02,666 --> 00:09:04,466 of dimensionality reduction. 203 00:09:04,466 --> 00:09:06,866 That is features which is the number of features, 204 00:09:06,866 --> 00:09:09,900 the number of principal components you want to end up with. 205 00:09:10,433 --> 00:09:13,166 So here of course, we would like to visualize 206 00:09:13,166 --> 00:09:16,600 the training set results and the test results in two dimensions. 207 00:09:16,733 --> 00:09:20,500 And to have this in two dimensions, we need to keep a number of two 208 00:09:20,700 --> 00:09:22,866 new extracted independent variables. 209 00:09:22,866 --> 00:09:24,600 So here the number of. 210 00:09:24,600 --> 00:09:27,666 Features will be. Two. As for. PCA. 211 00:09:28,100 --> 00:09:32,400 So we will input here features equals to. 212 00:09:33,066 --> 00:09:33,466 All right. 213 00:09:33,466 --> 00:09:36,166 And that's it for our. K PCA. Object. 214 00:09:36,166 --> 00:09:37,600 It is ready to be created 215 00:09:37,600 --> 00:09:41,600 and to be used to transform our original data set into this new data 216 00:09:41,600 --> 00:09:45,233 set with the new extracted features derived from kernel PCA. 217 00:09:45,566 --> 00:09:48,900 So let's select this line and create the object. 218 00:09:49,200 --> 00:09:51,333 Here it is. CPK k well. Created. 219 00:09:51,333 --> 00:09:55,766 And now let's move on to the next step, which is to transform our original data 220 00:09:55,766 --> 00:09:58,933 set into this new extracted data set. 221 00:09:59,533 --> 00:10:02,866 So now things are going to look like what we did with PCA. 222 00:10:03,066 --> 00:10:04,500 But some things are going to change. 223 00:10:04,500 --> 00:10:08,800 So we will do it step by step and we will see where we need to make some changes. 224 00:10:09,400 --> 00:10:10,933 All right. So first as for. 225 00:10:10,933 --> 00:10:11,500 PCA we're. 226 00:10:11,500 --> 00:10:15,266 Going to use the predict function to transform our original training set 227 00:10:15,466 --> 00:10:17,700 into this new extracted training set. 228 00:10:17,700 --> 00:10:20,966 So this new training set with the new extracted features derived from. 229 00:10:20,966 --> 00:10:21,900 Kernel PCA. 230 00:10:21,900 --> 00:10:26,500 We call it training set underscore. PCA. 231 00:10:27,133 --> 00:10:28,566 All right. And then equals. 232 00:10:28,566 --> 00:10:32,200 And then we use the predict function to do the transformation. 233 00:10:32,533 --> 00:10:35,866 And inside this predict function we first input our. 234 00:10:35,866 --> 00:10:38,866 CBC.ca. Object as we. Did for PCA. 235 00:10:38,866 --> 00:10:42,200 And then the training set the original training set. 236 00:10:42,566 --> 00:10:43,200 So let's do it. 237 00:10:43,200 --> 00:10:46,033 Training set the second one. 238 00:10:46,033 --> 00:10:47,066 All right. 239 00:10:47,066 --> 00:10:48,866 And as opposed. To PCA. 240 00:10:48,866 --> 00:10:50,700 And as with. LDA. 241 00:10:50,700 --> 00:10:53,033 This will return a matrix. 242 00:10:53,033 --> 00:10:54,933 And we need it as a dataframe. 243 00:10:54,933 --> 00:10:58,333 So as for LDA we will use the as dot 244 00:10:58,466 --> 00:11:01,700 data dot frame function. 245 00:11:02,166 --> 00:11:05,166 So parentheses here and we close the parentheses here 246 00:11:05,633 --> 00:11:09,400 to set this transform training set the training. 247 00:11:09,400 --> 00:11:11,533 Set PCA. As dataframe. 248 00:11:11,533 --> 00:11:14,366 And as a reminder we're doing this to give what 249 00:11:14,366 --> 00:11:17,533 the next function will use in the next sections expect. 250 00:11:18,066 --> 00:11:19,866 All right. So far so good. 251 00:11:19,866 --> 00:11:23,300 And so now let's select this line and execute this. 252 00:11:23,700 --> 00:11:25,200 And you're going to see what's going to happen. 253 00:11:25,200 --> 00:11:29,000 And you're going to understand why we called this new training set training. 254 00:11:29,000 --> 00:11:29,833 Set PCA. 255 00:11:29,833 --> 00:11:32,566 With a different name than the original training set. 256 00:11:32,566 --> 00:11:33,733 Training set. 257 00:11:33,733 --> 00:11:33,966 All right. 258 00:11:33,966 --> 00:11:34,900 So let's execute. 259 00:11:34,900 --> 00:11:37,500 Here we go. Execute it properly. 260 00:11:37,500 --> 00:11:40,400 And now let's have a look at our training set. 261 00:11:40,400 --> 00:11:42,333 PCA that. We just created. 262 00:11:42,333 --> 00:11:46,866 So I'm going to enlarge this so that we can see which one is trained the PCA. 263 00:11:46,866 --> 00:11:47,866 That's the one. 264 00:11:47,866 --> 00:11:50,866 So let's have a look at this I'm going to click on it. 265 00:11:50,966 --> 00:11:53,566 And here is our training set PCA. So as. 266 00:11:53,566 --> 00:11:54,566 We can see it is. 267 00:11:54,566 --> 00:11:57,633 Composed of only two columns V1 and V2. 268 00:11:58,200 --> 00:12:00,433 So try to guess what these two guns are. 269 00:12:00,433 --> 00:12:01,933 I'm going to tell you right now, 270 00:12:01,933 --> 00:12:05,966 these two columns are the principal components that we obtained 271 00:12:06,000 --> 00:12:07,500 through kernel PCA. 272 00:12:07,500 --> 00:12:10,866 That is these are our two new extracted features. 273 00:12:10,866 --> 00:12:11,400 After all 274 00:12:11,400 --> 00:12:15,233 this mapping into this high dimension using the kernel trick and then applying. 275 00:12:15,233 --> 00:12:18,366 PCA. To the data set mapped into this higher dimension. 276 00:12:19,100 --> 00:12:21,400 But now the problem is that 277 00:12:21,400 --> 00:12:25,333 in this training set PCA, we don't have the dependent variable 278 00:12:25,500 --> 00:12:29,033 and we need it for the next sections, because in our code template 279 00:12:29,266 --> 00:12:33,000 we need to have the independent variables and the dependent variable. 280 00:12:33,366 --> 00:12:34,366 So what is the next step? 281 00:12:34,366 --> 00:12:40,400 Now the next step is to add the dependent variable into this training set PCA. 282 00:12:40,700 --> 00:12:44,733 And so the thing to understand here is that we lost the dependent variable. 283 00:12:45,000 --> 00:12:48,000 But we kept the observations. That is. 284 00:12:48,000 --> 00:12:51,133 This one here corresponds to the first observation 285 00:12:51,366 --> 00:12:53,800 we had in the original training set. 286 00:12:53,800 --> 00:12:54,500 This one. 287 00:12:54,500 --> 00:12:58,233 So this first observation here has the zero label 288 00:12:58,566 --> 00:13:01,566 that is that this first customer didn't buy the SUV. 289 00:13:01,566 --> 00:13:04,033 This was the original independent variables. 290 00:13:04,033 --> 00:13:08,166 And then if we go to our training set PCA, well this first customer 291 00:13:08,166 --> 00:13:11,100 is the same first customer as this training set. 292 00:13:11,100 --> 00:13:14,400 So it will have the zero label in the purchase column. 293 00:13:14,700 --> 00:13:17,333 But then these are new extracted features. 294 00:13:17,333 --> 00:13:21,166 So of course we don't get the same values as for the independent variables 295 00:13:21,166 --> 00:13:22,800 of our original training set. 296 00:13:22,800 --> 00:13:27,300 So what we can do now is simply take the dependent variable column 297 00:13:27,600 --> 00:13:32,666 purchased of this original training set and add it to our training set, PCA. 298 00:13:32,866 --> 00:13:34,800 Because these observations here 299 00:13:34,800 --> 00:13:37,933 are the same observations of our original training set. 300 00:13:38,400 --> 00:13:41,200 And so what we need to do now is very simple. 301 00:13:41,200 --> 00:13:45,166 We just need to take our training set PCA. 302 00:13:45,566 --> 00:13:49,066 Then we're going to add a new column that we'll call purchased. 303 00:13:49,533 --> 00:13:52,800 So by doing this you know I'm just creating a new column 304 00:13:53,066 --> 00:13:55,800 that I also called purchased because this new column is going. 305 00:13:55,800 --> 00:13:56,300 To be. 306 00:13:56,300 --> 00:13:59,666 The purchase dependent variable and then equals. 307 00:14:00,000 --> 00:14:03,400 And then what I have to do now is to take the real purchase 308 00:14:03,400 --> 00:14:06,500 dependent variable column from the original training set. 309 00:14:06,833 --> 00:14:08,500 And we can do that because the training set. 310 00:14:08,500 --> 00:14:09,400 PCA. Contains 311 00:14:09,400 --> 00:14:13,133 the same observations as the observations of the original training set. 312 00:14:13,500 --> 00:14:16,500 So here to take the purchase column of the original training set, 313 00:14:16,800 --> 00:14:21,133 we just need to take our original training set, which is called training set 314 00:14:21,533 --> 00:14:22,666 and then dollars. 315 00:14:22,666 --> 00:14:25,666 And then that's where we take the purchased column. 316 00:14:26,000 --> 00:14:29,400 So by doing this I will add this new column purchased. 317 00:14:29,766 --> 00:14:31,533 And then this new column purchased. 318 00:14:31,533 --> 00:14:35,800 I will include the values of the purchase column of the original training set. 319 00:14:36,200 --> 00:14:40,100 So let's check it out I'm going to select this line and execute. 320 00:14:40,500 --> 00:14:42,100 And now as you can. See. 321 00:14:42,100 --> 00:14:44,400 If I go back to training. Set PCA. 322 00:14:44,400 --> 00:14:48,033 This contains the purchase column of the original training set. 323 00:14:48,500 --> 00:14:49,100 So that's good. 324 00:14:49,100 --> 00:14:50,333 That's the next step done. 325 00:14:50,333 --> 00:14:53,000 And now we need to take care of the test set. 326 00:14:53,000 --> 00:14:54,733 And so to take care of the test set 327 00:14:54,733 --> 00:14:57,566 we're going to do exactly the same as we did for this. 328 00:14:57,566 --> 00:14:58,933 Training set PCA. 329 00:14:58,933 --> 00:15:01,633 So let's copy this. Copy. 330 00:15:01,633 --> 00:15:03,500 And let's paste it here. 331 00:15:03,500 --> 00:15:06,800 And of course what we're going to do now is replace this training. 332 00:15:06,800 --> 00:15:08,133 Set PCA. 333 00:15:08,133 --> 00:15:10,900 By test set. PCA. 334 00:15:10,900 --> 00:15:12,000 And same here. 335 00:15:12,000 --> 00:15:15,833 We take the original test set to make the transformation. 336 00:15:16,033 --> 00:15:19,333 And then we're going to add the purchase column of the original test 337 00:15:19,333 --> 00:15:22,333 set to this new test set. 338 00:15:22,500 --> 00:15:25,500 That is the test set extracted from current PCA. 339 00:15:25,500 --> 00:15:28,333 So test it here. And that should be okay. 340 00:15:28,333 --> 00:15:30,000 So I'm going to select. 341 00:15:30,000 --> 00:15:31,200 These two lines here. 342 00:15:31,200 --> 00:15:34,066 And execute perfect. 343 00:15:34,066 --> 00:15:36,400 Our new test. Set PCA is. Created. 344 00:15:36,400 --> 00:15:37,666 Let's have a quick check. 345 00:15:37,666 --> 00:15:39,300 So that's the test set. 346 00:15:39,300 --> 00:15:41,533 And that's our test set. PCA. 347 00:15:41,533 --> 00:15:45,300 With the two new extracted features and the purchase column. 348 00:15:45,300 --> 00:15:48,500 And now that means that we correctly applied kernel PCA. 349 00:15:49,033 --> 00:15:49,733 So great. 350 00:15:49,733 --> 00:15:52,000 We are ready to move on to the next section. 351 00:15:52,000 --> 00:15:55,000 So let's go back to our kernel. PCA. File. 352 00:15:55,133 --> 00:15:59,033 And let's now fit the logistic regression to the training set. 353 00:15:59,400 --> 00:16:02,100 Now do we need to change anything in this code section. 354 00:16:02,100 --> 00:16:05,300 Well yes of course we do because be careful. 355 00:16:05,300 --> 00:16:09,400 We called our new extractor training set training set PCA. 356 00:16:09,600 --> 00:16:14,400 So here for the data argument we need to specify training set PCA. 357 00:16:14,700 --> 00:16:16,400 So that's the only thing we need to change here. 358 00:16:16,400 --> 00:16:19,933 So we are ready to select this section and execute. 359 00:16:20,700 --> 00:16:22,233 All right classifier ready. 360 00:16:22,233 --> 00:16:25,900 And now let's move on to the next section predicting the test set results. 361 00:16:26,133 --> 00:16:27,766 And of course here that's the same. 362 00:16:27,766 --> 00:16:31,600 We need to replace test set by test set PCA. 363 00:16:32,033 --> 00:16:35,033 You need to enlarge this a little bit right. 364 00:16:35,766 --> 00:16:36,500 And that's it. 365 00:16:36,500 --> 00:16:40,466 We are ready to execute this section to predict the test set results. 366 00:16:40,900 --> 00:16:41,700 And here we go. 367 00:16:41,700 --> 00:16:43,800 We get our vector of predictions. 368 00:16:43,800 --> 00:16:46,800 Why pred for this new test set PCA. 369 00:16:46,833 --> 00:16:49,133 All right. Now let's make the confusion matrix. 370 00:16:49,133 --> 00:16:53,066 We of course need to change test set by test set PCA. 371 00:16:53,833 --> 00:16:55,566 Here we go. And now it's ready. 372 00:16:55,566 --> 00:17:00,766 Now we can execute this line of code to get the confusion matrix. 373 00:17:01,033 --> 00:17:01,866 And here it is. 374 00:17:01,866 --> 00:17:05,733 We can have a quick look and the council by pressing cmd enter. 375 00:17:06,066 --> 00:17:09,833 And we get 57 plus 26 equals 83. 376 00:17:10,033 --> 00:17:13,033 And since we have 100 observations in the test set, 377 00:17:13,033 --> 00:17:15,933 that gives us an 83% accuracy. 378 00:17:15,933 --> 00:17:17,166 So that's pretty good. 379 00:17:17,166 --> 00:17:20,866 And now let's get to the exciting part visualizing the training set results. 380 00:17:21,200 --> 00:17:23,300 So very quickly what do we need to change. 381 00:17:23,300 --> 00:17:27,300 Remember we need to change the names of the independent variables and columns 382 00:17:27,300 --> 00:17:28,800 here. That's compulsory. 383 00:17:28,800 --> 00:17:32,333 So as a reminder the names are v1 and v2. 384 00:17:32,366 --> 00:17:34,500 That's the name of the independent variables. 385 00:17:34,500 --> 00:17:38,033 So here we need to replace age by v1 386 00:17:38,433 --> 00:17:41,533 and estimated salary by v2. 387 00:17:42,100 --> 00:17:44,366 And that's not compulsory. 388 00:17:44,366 --> 00:17:46,466 And anyway we already have two good names. 389 00:17:46,466 --> 00:17:49,466 PC1 and PC2. So don't forget about that. 390 00:17:49,466 --> 00:17:52,433 And of course we need to change the name of the training set 391 00:17:52,433 --> 00:17:54,166 because we called our training set. 392 00:17:54,166 --> 00:17:55,800 Training set. PCA. 393 00:17:55,800 --> 00:17:58,533 So here I'm. Adding training. Set PCA. 394 00:17:58,533 --> 00:17:59,533 And that's perfect. 395 00:17:59,533 --> 00:18:03,100 That's ready to be executed to visualize the training set results. 396 00:18:03,566 --> 00:18:05,900 So we will only visualize the training set results. 397 00:18:05,900 --> 00:18:08,666 But let's make the same changes for the test set 398 00:18:08,666 --> 00:18:11,133 so that you can have a look at it yourself. 399 00:18:11,133 --> 00:18:14,366 So same we're replacing H by v1 400 00:18:15,166 --> 00:18:18,000 estimated salary by v2. 401 00:18:18,000 --> 00:18:22,566 And here we replace test set by test set underscore PCA. 402 00:18:23,266 --> 00:18:23,700 All right. 403 00:18:23,700 --> 00:18:26,966 And now let's have a look I look forward to showing you what's going to happen. 404 00:18:26,966 --> 00:18:31,133 So I'm going to select everything from here up to the top here. 405 00:18:31,133 --> 00:18:34,000 That is the whole section to visualize the training set results. 406 00:18:34,000 --> 00:18:37,066 And let's press Command and Control plus enter to execute. 407 00:18:37,700 --> 00:18:40,533 Here we go. The computations are being run. 408 00:18:42,800 --> 00:18:43,200 All right. 409 00:18:43,200 --> 00:18:45,933 So these are the results of kernel PCA. 410 00:18:45,933 --> 00:18:49,966 Combine to a logistic regression model that we applied on the nonlinear 411 00:18:49,966 --> 00:18:51,866 separable data set. 412 00:18:51,866 --> 00:18:55,133 And so we can appreciate the contrast between the simplicity 413 00:18:55,133 --> 00:18:58,833 of the obtained results and the complexity of what happened behind the scenes, 414 00:18:59,100 --> 00:19:01,666 because indeed we have this very simple results here 415 00:19:01,666 --> 00:19:04,666 with these two classes separated by the straight line. 416 00:19:04,766 --> 00:19:08,700 But what happened behind the scenes is that our original data set 417 00:19:08,700 --> 00:19:12,766 in our original feature space, was mapped to a higher dimension 418 00:19:12,766 --> 00:19:16,833 using the kernel trick to avoid too highly compute intensive computations, 419 00:19:17,133 --> 00:19:18,100 and then by mapping 420 00:19:18,100 --> 00:19:21,866 our data set in the original feature space to this higher dimension. 421 00:19:22,133 --> 00:19:24,433 Well, first, that created some new dimensions, 422 00:19:24,433 --> 00:19:27,500 and mostly that created a new feature space 423 00:19:27,633 --> 00:19:30,633 where our De that was then linearly separable. 424 00:19:30,666 --> 00:19:34,666 But by doing that, we had more dimensions than the original number of dimensions. 425 00:19:34,800 --> 00:19:37,033 So we still needed to apply. The PCA. 426 00:19:37,033 --> 00:19:41,166 Dimensionality reduction technique to end up with a lower number of dimensions. 427 00:19:41,466 --> 00:19:42,600 So then PCA was. 428 00:19:42,600 --> 00:19:46,066 Applied to this new feature space where the data was linearly separable. 429 00:19:46,433 --> 00:19:47,600 And through PCA some. 430 00:19:47,600 --> 00:19:50,466 New extracted independent variables were created 431 00:19:50,466 --> 00:19:52,866 that are nothing else and the principal components. 432 00:19:52,866 --> 00:19:53,800 Of PCA. 433 00:19:53,800 --> 00:19:56,800 And eventually we obtained this new feature space 434 00:19:56,800 --> 00:20:01,866 formed by these two new extracted principal components resulting from PCA, 435 00:20:02,166 --> 00:20:04,966 in which now our data is linearly separable 436 00:20:04,966 --> 00:20:07,966 and much better separated by a linear classifier. 437 00:20:08,400 --> 00:20:10,566 All right. So that's it for kernel PCA. 438 00:20:10,566 --> 00:20:14,066 And that's also the end of this part dimensionality reduction. 439 00:20:14,366 --> 00:20:15,933 And I'll see you in the next part. 440 00:20:15,933 --> 00:20:18,933 Part ten model selection and boosting. 441 00:20:18,966 --> 00:20:22,566 The last part of this course we will cover a very exciting algorithm 442 00:20:22,566 --> 00:20:25,566 in machine learning that is called XGBoost. 443 00:20:25,733 --> 00:20:28,066 So I look forward to seeing you in this next part. 444 00:20:28,066 --> 00:20:29,900 And until then, enjoy machine learning.