1 00:00:00,200 --> 00:00:02,333 Hello my friends, and welcome to this new. 2 00:00:02,333 --> 00:00:04,333 Practical activity of part. 3 00:00:04,333 --> 00:00:06,500 Nine dimensionality reduction. 4 00:00:06,500 --> 00:00:08,766 So in the previous. Section we experimented. 5 00:00:08,766 --> 00:00:10,066 With PCA. 6 00:00:10,066 --> 00:00:12,100 Principal component analysis. 7 00:00:12,100 --> 00:00:12,700 And we indeed. 8 00:00:12,700 --> 00:00:14,966 Got great results with our. 9 00:00:14,966 --> 00:00:15,800 One data. 10 00:00:15,800 --> 00:00:16,800 Set, which will be. 11 00:00:16,800 --> 00:00:19,000 The same. Data set for this new section. 12 00:00:19,000 --> 00:00:20,400 Because, you know, we want to compare. 13 00:00:20,400 --> 00:00:22,933 Several. Dimensionality. Reduction techniques. 14 00:00:22,933 --> 00:00:25,166 So there. We go. We're going to see if we can even. 15 00:00:25,166 --> 00:00:27,800 Beat PCA, which only had. 16 00:00:27,800 --> 00:00:29,900 One incorrect. Prediction. 17 00:00:29,900 --> 00:00:31,933 So we're going to work with the same data set. 18 00:00:31,933 --> 00:00:34,433 And therefore the implementation will be. Exactly. 19 00:00:34,433 --> 00:00:36,466 The same except one cell. 20 00:00:36,466 --> 00:00:38,233 Which will be the cell of course where we. 21 00:00:38,233 --> 00:00:41,700 Implement LDA instead of. PCA. All right. 22 00:00:41,800 --> 00:00:42,633 Are you ready? 23 00:00:42,633 --> 00:00:43,833 Let's do this. 24 00:00:43,833 --> 00:00:44,866 Before we get. 25 00:00:44,866 --> 00:00:48,166 Into this for the part nine, let's make sure everyone here is on the same page. 26 00:00:48,166 --> 00:00:50,733 I gave you the link to this folder containing all the codes. 27 00:00:50,733 --> 00:00:53,166 And data sets right before this tutorial. 28 00:00:53,166 --> 00:00:55,933 So make sure to connect to it. And now here we go. 29 00:00:55,933 --> 00:00:59,333 Let's end to part nine dimensionality reduction. 30 00:00:59,766 --> 00:01:00,833 And now we're going to go into. 31 00:01:00,833 --> 00:01:02,000 Section 44. 32 00:01:02,000 --> 00:01:04,100 Linear Discriminant analysis. 33 00:01:04,100 --> 00:01:06,533 LDA which will be a new technique. 34 00:01:06,533 --> 00:01:08,300 Of dimensionality reduction. 35 00:01:08,300 --> 00:01:10,200 Very powerful as we will see. 36 00:01:10,200 --> 00:01:10,866 So let's start with. 37 00:01:10,866 --> 00:01:11,466 Python as. 38 00:01:11,466 --> 00:01:12,900 Usual. And there. 39 00:01:12,900 --> 00:01:14,800 We go. This folder just as a. 40 00:01:14,800 --> 00:01:16,833 Previous one has. Two files. 41 00:01:16,833 --> 00:01:18,233 This is the implementation. 42 00:01:18,233 --> 00:01:21,633 And this is the same one data. Set which. 43 00:01:21,633 --> 00:01:21,966 You know. 44 00:01:21,966 --> 00:01:25,200 Belongs to one sub business owner who first. 45 00:01:25,200 --> 00:01:27,800 Asked you, you know the most sound to data scientist. 46 00:01:27,800 --> 00:01:28,500 To do. 47 00:01:28,500 --> 00:01:32,233 Some cluster to identify different customer segments. 48 00:01:32,233 --> 00:01:33,466 For each of. 49 00:01:33,466 --> 00:01:35,633 You know, the wines of. This data set. 50 00:01:35,633 --> 00:01:37,433 You know, each row of this data set. 51 00:01:37,433 --> 00:01:39,400 Corresponds to a certain wine. 52 00:01:39,400 --> 00:01:42,000 And for each one we have several wine features. 53 00:01:42,000 --> 00:01:43,466 Or, you know, characteristics. 54 00:01:43,466 --> 00:01:45,066 All these up to here. 55 00:01:45,066 --> 00:01:46,100 And you used. 56 00:01:46,100 --> 00:01:48,400 All these features to. Identify. 57 00:01:48,400 --> 00:01:50,733 Those three customer. Segments or, you know. 58 00:01:50,733 --> 00:01:52,333 Customer clusters. 59 00:01:52,333 --> 00:01:54,066 And after which, you know, since. 60 00:01:54,066 --> 00:01:56,300 This wine shop. Owner was so. Happy and. 61 00:01:56,300 --> 00:01:59,633 Impressed by your job, well, then of course the owner asked you to do. 62 00:01:59,633 --> 00:02:01,500 Another mission, which is the. 63 00:02:01,500 --> 00:02:04,533 One we're about to do with LDA, which consists of. 64 00:02:04,533 --> 00:02:08,866 Building a predictive model, combine two dimensionality reduction. 65 00:02:08,866 --> 00:02:12,366 Apply to this data set so that for each new. 66 00:02:12,366 --> 00:02:15,366 Wine. That this owner. Has in its wine shop. 67 00:02:15,500 --> 00:02:16,233 Well, by. 68 00:02:16,233 --> 00:02:17,000 Deploying this. 69 00:02:17,000 --> 00:02:21,300 New predictive model, this owner will be able to predict which customer. 70 00:02:21,333 --> 00:02:26,033 Segment this new one belongs to so that it can recommend this new wine to. 71 00:02:26,033 --> 00:02:29,000 The right. Customers and therefore optimize. 72 00:02:29,000 --> 00:02:31,033 Eventually the sales. 73 00:02:31,033 --> 00:02:31,366 All right. 74 00:02:31,366 --> 00:02:32,200 So that's exactly. 75 00:02:32,200 --> 00:02:34,566 The same data set. And now let's move on to 76 00:02:34,566 --> 00:02:38,300 our implementation linear discriminant analysis. 77 00:02:38,400 --> 00:02:39,633 Which we. Can either. 78 00:02:39,633 --> 00:02:41,700 Open with Google Collaboratory as. 79 00:02:41,700 --> 00:02:44,700 I'm doing it now. Or Jupyter. Notebook. 80 00:02:45,000 --> 00:02:47,800 And as you notice. I kept this. 81 00:02:47,800 --> 00:02:49,466 Previous implementation we did on. 82 00:02:49,466 --> 00:02:51,733 PCA so that. We can, you know, compare. 83 00:02:51,733 --> 00:02:53,366 The. Results. Indian. Right. 84 00:02:53,366 --> 00:02:55,766 This is PCA, this is LDA. 85 00:02:55,766 --> 00:02:58,100 But you know, since this is in read only mode. 86 00:02:58,100 --> 00:02:59,233 We're going to. Create now. 87 00:02:59,233 --> 00:03:00,400 A copy so that we can. 88 00:03:00,400 --> 00:03:01,300 Re-Implement that. 89 00:03:01,300 --> 00:03:04,133 Cell that belong to. LDA model. 90 00:03:04,133 --> 00:03:04,800 So there we go. 91 00:03:04,800 --> 00:03:06,600 Save a copy in drive. 92 00:03:06,600 --> 00:03:08,700 This will create a copy. Inside. 93 00:03:08,700 --> 00:03:10,833 Which will be able to. Re-Implement. 94 00:03:10,833 --> 00:03:12,533 The LDA model. 95 00:03:12,533 --> 00:03:13,100 All right. 96 00:03:13,100 --> 00:03:15,000 And now we can, you know. Close this so. 97 00:03:15,000 --> 00:03:17,966 That we can have the two implementations next to each other. 98 00:03:17,966 --> 00:03:19,366 You know, the two copies. 99 00:03:19,366 --> 00:03:20,400 And now let's do this. 100 00:03:20,400 --> 00:03:21,333 Let's quickly. 101 00:03:21,333 --> 00:03:22,933 Remove, you know, the. 102 00:03:22,933 --> 00:03:26,133 Cell that. Implements to. LDA. This one. 103 00:03:26,400 --> 00:03:27,733 And let's. 104 00:03:27,733 --> 00:03:29,433 Re-Implement this because you know. 105 00:03:29,433 --> 00:03:30,933 All the rest is the same. 106 00:03:30,933 --> 00:03:31,966 I will actually. 107 00:03:31,966 --> 00:03:34,366 Remove. All these outputs here so that you. Don't. 108 00:03:34,366 --> 00:03:35,800 See the final. Results. 109 00:03:35,800 --> 00:03:37,833 And we can keep them as a surprise. 110 00:03:37,833 --> 00:03:40,833 So let me just remove the outputs too. 111 00:03:41,166 --> 00:03:42,366 Don't look. Too close. 112 00:03:42,366 --> 00:03:43,133 And there. We go. 113 00:03:43,133 --> 00:03:43,466 All right. 114 00:03:43,466 --> 00:03:46,200 So basically all the cells of. 115 00:03:46,200 --> 00:03:47,666 This implementation are. 116 00:03:47,666 --> 00:03:48,733 Exactly the. Same. 117 00:03:48,733 --> 00:03:50,700 As the previous. One. PCA. 118 00:03:50,700 --> 00:03:52,766 Except of course this cell. 119 00:03:52,766 --> 00:03:55,333 That implements. LDA right here. 120 00:03:55,333 --> 00:03:57,066 So no need to explain all this. 121 00:03:57,066 --> 00:04:00,666 Plus all these other cells result from our diverse toolkits. 122 00:04:00,666 --> 00:04:03,600 So you're. Definitely 100% familiar. 123 00:04:03,600 --> 00:04:04,800 With them. 124 00:04:04,800 --> 00:04:05,200 All right. 125 00:04:05,200 --> 00:04:06,066 So let's do this. 126 00:04:06,066 --> 00:04:09,066 Let's, you know apply LDA. 127 00:04:09,133 --> 00:04:11,600 So we're going to create a new code cell. And there. 128 00:04:11,600 --> 00:04:12,333 We go. 129 00:04:12,333 --> 00:04:15,500 Let's implement linear discriminant analysis. 130 00:04:15,633 --> 00:04:18,266 So now you have two options. The first. 131 00:04:18,266 --> 00:04:20,466 And the. Best option is to press. 132 00:04:20,466 --> 00:04:23,033 Bus on the video and try to implement this. 133 00:04:23,033 --> 00:04:25,400 Yourself by of course browsing. 134 00:04:25,400 --> 00:04:27,133 The. Scikit learn API. 135 00:04:27,133 --> 00:04:28,466 And find that. 136 00:04:28,466 --> 00:04:31,200 LDA class that. Can implement that. LDA. 137 00:04:31,200 --> 00:04:32,933 Dimensionality reduction technique. 138 00:04:32,933 --> 00:04:35,200 And you will definitely end up. With the. 139 00:04:35,200 --> 00:04:38,200 Same solution I will implement in a few seconds. 140 00:04:38,233 --> 00:04:39,566 And the second option. 141 00:04:39,566 --> 00:04:41,566 Is of course to, well, not. 142 00:04:41,566 --> 00:04:45,600 Press pause on the video and implement with meet the solution in. 143 00:04:45,600 --> 00:04:51,500 Let's say three seconds, three, two, one and go, all right, let's do this. 144 00:04:51,500 --> 00:04:54,166 Let's implement together. LDA. 145 00:04:54,166 --> 00:04:55,700 So as I've just said. 146 00:04:55,700 --> 00:04:57,333 We're going to implement LDA. 147 00:04:57,333 --> 00:04:58,966 Thanks to the secured library. 148 00:04:58,966 --> 00:05:02,666 Therefore we're going to start from sklearn 149 00:05:02,966 --> 00:05:06,566 from which we're going to get access to this time. 150 00:05:06,566 --> 00:05:08,400 Not you know. 151 00:05:08,400 --> 00:05:11,000 The well, let me go to PCA here. 152 00:05:11,000 --> 00:05:15,733 Not the decomposition module of Cyclegan, but a new one, which. 153 00:05:15,766 --> 00:05:17,033 Is very. 154 00:05:17,033 --> 00:05:19,200 Easy to remember because. This is actually. 155 00:05:19,200 --> 00:05:20,366 Discriminant 156 00:05:22,133 --> 00:05:24,400 underscore analysis. 157 00:05:24,400 --> 00:05:26,033 Okay. That's another module. 158 00:05:26,033 --> 00:05:28,000 Of. Scikit-Learn that contains. 159 00:05:28,000 --> 00:05:29,166 Of course, the class. 160 00:05:29,166 --> 00:05:32,033 That can implement. LDA and that class. 161 00:05:32,033 --> 00:05:34,966 Well you know. After this import here we have to. 162 00:05:34,966 --> 00:05:36,400 Add the name of this class. 163 00:05:36,400 --> 00:05:37,000 And the name of. 164 00:05:37,000 --> 00:05:39,166 This class is capital L. 165 00:05:39,166 --> 00:05:42,100 And then very simply. Linear. 166 00:05:42,100 --> 00:05:45,100 Discriminant analysis. 167 00:05:45,966 --> 00:05:48,300 All right. Very good. The reason why Google. 168 00:05:48,300 --> 00:05:48,566 Collab. 169 00:05:48,566 --> 00:05:48,900 By the way, 170 00:05:48,900 --> 00:05:52,700 is not helping me with the suggestions is because the notebook is not running. 171 00:05:52,700 --> 00:05:56,200 And remember to run the notebook or, you know, to connect it. 172 00:05:56,400 --> 00:05:58,600 Well, you need to either. Run any of. 173 00:05:58,600 --> 00:06:01,200 The first cells or upload the data set. 174 00:06:01,200 --> 00:06:02,866 So let's do. It right now so that, you know. 175 00:06:02,866 --> 00:06:04,733 Google Colab. Can assist me. 176 00:06:04,733 --> 00:06:06,933 I really love it when it does it. 177 00:06:06,933 --> 00:06:09,600 So right now I just clicked on this folder button. 178 00:06:09,600 --> 00:06:12,333 And then let's click the upload button. 179 00:06:12,333 --> 00:06:14,500 And we will end up in the you know. Previous folder. 180 00:06:14,500 --> 00:06:16,033 For Principal components analysis. 181 00:06:16,033 --> 00:06:17,800 But let me show you the path again. 182 00:06:17,800 --> 00:06:19,266 I put my machine learning. 183 00:06:19,266 --> 00:06:21,066 It is a folder in my desktop. 184 00:06:21,066 --> 00:06:25,066 So inside we're going to go now to part nine and then section 185 00:06:25,066 --> 00:06:29,900 44 Linear Discriminant Analysis and Python and then one. 186 00:06:29,900 --> 00:06:30,233 All right. 187 00:06:30,233 --> 00:06:32,466 So this is exactly the same dataset as before. 188 00:06:32,466 --> 00:06:34,900 But I just wanted to. Show you the path. 189 00:06:34,900 --> 00:06:37,900 All right. And there we go. We have to one. 190 00:06:38,200 --> 00:06:39,133 And so now I'm going to show. 191 00:06:39,133 --> 00:06:40,700 You if I retype this. 192 00:06:40,700 --> 00:06:43,066 Linear discriminant. 193 00:06:43,066 --> 00:06:44,433 See now it is helping me. 194 00:06:44,433 --> 00:06:46,266 So that's maybe better to have, you know. 195 00:06:46,266 --> 00:06:49,333 This reflex to upload a data set right at the. 196 00:06:49,333 --> 00:06:52,200 Beginning. Okay. So linear discriminant analysis. 197 00:06:52,200 --> 00:06:52,866 But since. 198 00:06:52,866 --> 00:06:55,600 This class. Name is actually pretty long and pretty. 199 00:06:55,600 --> 00:07:01,500 Not practical, well, let's just, you know, add a simple shortcut like LDA. 200 00:07:01,500 --> 00:07:04,500 We can do this. That's fine. And now. 201 00:07:04,533 --> 00:07:05,600 Let's press. 202 00:07:05,600 --> 00:07:07,800 Enter to move on to the next step. 203 00:07:07,800 --> 00:07:09,800 Which is of. Course naturally. 204 00:07:09,800 --> 00:07:10,733 To create. 205 00:07:10,733 --> 00:07:12,266 An. Object of this. 206 00:07:12,266 --> 00:07:14,900 Linear discriminant analysis class. 207 00:07:14,900 --> 00:07:15,233 All right. 208 00:07:15,233 --> 00:07:17,700 So of course we're going to call it LDA. 209 00:07:17,700 --> 00:07:20,100 Right. And now we're going to call this class. 210 00:07:20,100 --> 00:07:22,333 And since we gave it the shortcut LDA. 211 00:07:22,333 --> 00:07:23,966 Well we can simply call. 212 00:07:23,966 --> 00:07:26,366 LDA this way. 213 00:07:26,366 --> 00:07:29,300 And now well exactly the same as before. 214 00:07:29,300 --> 00:07:31,500 This LDA class needs to take. 215 00:07:31,500 --> 00:07:34,366 As input only one argument, which is. Exactly. 216 00:07:34,366 --> 00:07:36,433 The same as before. And also. 217 00:07:36,433 --> 00:07:38,066 It has the exact. Same name. 218 00:07:38,066 --> 00:07:41,066 It is n components. 219 00:07:41,166 --> 00:07:42,833 Which corresponds, of course, to the. 220 00:07:42,833 --> 00:07:45,733 Final number of. Extracted features. You want to end up. 221 00:07:45,733 --> 00:07:47,533 With after applying this. 222 00:07:47,533 --> 00:07:49,566 Dimensionality reduction technique. 223 00:07:49,566 --> 00:07:50,566 And of course. 224 00:07:50,566 --> 00:07:51,100 As I. 225 00:07:51,100 --> 00:07:53,300 Recommended in the previous section, we're going to start. 226 00:07:53,300 --> 00:07:54,466 With two. 227 00:07:54,466 --> 00:07:56,400 So that we can see if even with only. 228 00:07:56,400 --> 00:07:57,900 Two extracted features. 229 00:07:57,900 --> 00:07:59,800 Well, we can get great. Results. 230 00:07:59,800 --> 00:08:01,500 And if that's the case, we'll not only. 231 00:08:01,500 --> 00:08:05,833 Will get great results, but also cherry on the cake will be able to visualize the. 232 00:08:05,833 --> 00:08:07,000 Results on a nice. 233 00:08:07,000 --> 00:08:07,766 2D plot. 234 00:08:07,766 --> 00:08:08,533 Indian. You know. 235 00:08:08,533 --> 00:08:10,933 Thanks to these two code section. 236 00:08:10,933 --> 00:08:11,700 All right. 237 00:08:11,700 --> 00:08:13,666 But right now we need. To finish this. 238 00:08:13,666 --> 00:08:14,700 So there we go. 239 00:08:14,700 --> 00:08:17,100 We're going to extract only two features. 240 00:08:17,100 --> 00:08:19,866 In the end. And to. Do this we. Need now of. 241 00:08:19,866 --> 00:08:21,500 Course. To connect our. 242 00:08:21,500 --> 00:08:23,900 LDA object to our data set. 243 00:08:23,900 --> 00:08:27,100 But once again separately the training set and the test set. 244 00:08:27,500 --> 00:08:28,800 And to connect. It well. 245 00:08:28,800 --> 00:08:29,833 Of course we need to apply. 246 00:08:29,833 --> 00:08:32,266 The fit. Transform method. 247 00:08:32,266 --> 00:08:32,733 To the. 248 00:08:32,733 --> 00:08:36,833 Training set and then only the transform method on the test set. 249 00:08:36,833 --> 00:08:39,233 That's for the exact. Same reason as. Before. 250 00:08:39,233 --> 00:08:39,700 It is to. 251 00:08:39,700 --> 00:08:42,600 Avoid information leakage from the test set. 252 00:08:42,600 --> 00:08:45,300 All right. So let's do this. That's our next step here. 253 00:08:45,300 --> 00:08:46,133 So we're going to take. 254 00:08:46,133 --> 00:08:46,766 First X. 255 00:08:46,766 --> 00:08:49,466 Train right which we're going to. Update. 256 00:08:49,466 --> 00:08:51,200 To become the new X train. 257 00:08:51,200 --> 00:08:54,600 After we apply this LDA feature extraction technique. 258 00:08:54,900 --> 00:08:56,400 And to do this well we need to take. 259 00:08:56,400 --> 00:09:00,266 Of course our LDA object from which 260 00:09:00,266 --> 00:09:03,300 we're going to call the Fit transform. 261 00:09:04,266 --> 00:09:05,233 Method. 262 00:09:05,233 --> 00:09:06,633 Which will take as input. 263 00:09:06,633 --> 00:09:08,333 Well here. Be careful. 264 00:09:08,333 --> 00:09:10,600 It's not going to be exactly the same input. 265 00:09:10,600 --> 00:09:12,500 As before. Because you know, with. 266 00:09:12,500 --> 00:09:13,933 PCA the fit. 267 00:09:13,933 --> 00:09:16,800 Transform method took only Xtrain as input. 268 00:09:16,800 --> 00:09:21,300 Because it only need the features to apply this PCA. 269 00:09:21,333 --> 00:09:23,200 Dimensionality reduction technique. 270 00:09:23,200 --> 00:09:25,500 But LDA is actually different. 271 00:09:25,500 --> 00:09:28,300 In order to apply the technique, it needs not. 272 00:09:28,300 --> 00:09:30,300 Only the features but also the. 273 00:09:30,300 --> 00:09:32,000 Dependent variable. Right? 274 00:09:32,000 --> 00:09:33,766 A dependent variable is a required. 275 00:09:33,766 --> 00:09:35,233 Element inside. The. 276 00:09:35,233 --> 00:09:36,633 Equation. Of LDA. 277 00:09:36,633 --> 00:09:38,666 And therefore here in the fit transform. 278 00:09:38,666 --> 00:09:40,466 Method, we need to. 279 00:09:40,466 --> 00:09:44,333 Input not only x train the old version of xtrain. 280 00:09:44,333 --> 00:09:46,500 Before we apply LDA. 281 00:09:46,500 --> 00:09:48,933 And. Y train. 282 00:09:48,933 --> 00:09:50,933 All right, so be very careful with this. 283 00:09:50,933 --> 00:09:54,033 Whether you choose to apply LDA or PCA for PCA. 284 00:09:54,033 --> 00:09:56,066 You only have to input X train, and for. 285 00:09:56,066 --> 00:09:58,533 LDA you have to input both the features X. 286 00:09:58,533 --> 00:10:01,033 Train and the dependent variable y train. 287 00:10:01,033 --> 00:10:02,633 All right. Final step. 288 00:10:02,633 --> 00:10:04,633 Well now that we have an. 289 00:10:04,633 --> 00:10:08,100 LDA feature extractor object fitted. 290 00:10:08,133 --> 00:10:12,000 To the training set, well we can apply. It to the test set. 291 00:10:12,000 --> 00:10:15,200 By only calling the transform method. Right. 292 00:10:15,200 --> 00:10:16,633 It doesn't make sense to. 293 00:10:16,633 --> 00:10:18,366 Fit it. Again to the. Test set. 294 00:10:18,366 --> 00:10:21,966 Because the test set is supposed to be new data on which we. 295 00:10:21,966 --> 00:10:24,166 Deploy our model, like in production. 296 00:10:24,166 --> 00:10:27,433 Therefore, we must only apply the transform method here. 297 00:10:27,666 --> 00:10:29,700 And therefore I'm updating our. 298 00:10:29,700 --> 00:10:31,333 X test variable the. 299 00:10:31,333 --> 00:10:32,166 Following way. 300 00:10:32,166 --> 00:10:34,600 By first calling our LDA object. 301 00:10:34,600 --> 00:10:39,966 From which we're only going to call the trend form method. 302 00:10:40,400 --> 00:10:42,066 And now, according to you, does. 303 00:10:42,066 --> 00:10:44,033 It need to take only X test. 304 00:10:44,033 --> 00:10:44,700 As input. 305 00:10:44,700 --> 00:10:46,933 Or X test and Y test? 306 00:10:46,933 --> 00:10:48,466 Well, obviously. 307 00:10:48,466 --> 00:10:49,500 It's only need. 308 00:10:49,500 --> 00:10:52,566 To take X test because we're not supposed to have Y test. 309 00:10:52,566 --> 00:10:54,733 You know X test is like new data. 310 00:10:54,733 --> 00:10:56,766 On which we're going to deploy our model. 311 00:10:56,766 --> 00:10:57,833 Then we'll get. Our. 312 00:10:57,833 --> 00:11:00,633 Predictions and Y original and we'll compare. 313 00:11:00,633 --> 00:11:02,000 Y prior to white test. 314 00:11:02,000 --> 00:11:05,400 But we're not supposed to have white test because white is are the real result 315 00:11:05,533 --> 00:11:07,400 that. Contain the hidden truth. 316 00:11:07,400 --> 00:11:08,733 You know, the ground truth. 317 00:11:08,733 --> 00:11:10,033 So of course here. 318 00:11:10,033 --> 00:11:12,400 We only need to. Apply X test. 319 00:11:12,400 --> 00:11:14,566 And the reason why we could enter Y train. 320 00:11:14,566 --> 00:11:18,566 Here is because indeed, we are supposed to get the ground truth of the. 321 00:11:18,566 --> 00:11:19,366 Training set. 322 00:11:19,366 --> 00:11:20,100 Otherwise we. 323 00:11:20,100 --> 00:11:20,566 Wouldn't be. 324 00:11:20,566 --> 00:11:23,433 Able to. Train our machine learning model. 325 00:11:23,433 --> 00:11:26,066 All right. So X test and there we go. 326 00:11:26,066 --> 00:11:27,500 Not only the implementation. 327 00:11:27,500 --> 00:11:30,166 Of LDA is over, but also the whole. 328 00:11:30,166 --> 00:11:33,000 Implementation. Is over as well. 329 00:11:33,000 --> 00:11:35,166 So now we're going to do this run. 330 00:11:35,166 --> 00:11:36,700 Oh now that we, you know. 331 00:11:36,700 --> 00:11:39,300 Uploaded that. Data set into the notebook. 332 00:11:39,300 --> 00:11:41,000 So we are 100% ready. 333 00:11:41,000 --> 00:11:45,766 And let's just remind what we want to improve compared to previously. 334 00:11:46,033 --> 00:11:49,666 Well, you know, in the principal component analysis implementation. 335 00:11:50,000 --> 00:11:50,900 We had. 336 00:11:50,900 --> 00:11:55,200 When obtaining the confusion matrix, only one incorrect prediction. 337 00:11:55,200 --> 00:11:55,800 Resulting. 338 00:11:55,800 --> 00:11:59,300 In having an accuracy of 97% and in the. 339 00:11:59,666 --> 00:12:01,133 Test set results, which are the. 340 00:12:01,133 --> 00:12:02,633 Most interesting ones. 341 00:12:02,633 --> 00:12:04,866 Well, we had indeed an almost. 342 00:12:04,866 --> 00:12:07,633 Perfect separation. Of the. Three classes. 343 00:12:07,633 --> 00:12:12,733 And now we're going to see. It with our new feature extracted from LDA. 344 00:12:12,900 --> 00:12:15,933 Well, we can get a perfect separation of the classes. 345 00:12:15,933 --> 00:12:18,566 And therefore 100%. 346 00:12:18,566 --> 00:12:19,866 Accuracy. 347 00:12:19,866 --> 00:12:20,766 Are you ready? 348 00:12:20,766 --> 00:12:21,933 Let's do this. 349 00:12:21,933 --> 00:12:26,033 3 to 1 go run also. 350 00:12:26,033 --> 00:12:27,800 Now all. The cells. Are running and there we go. 351 00:12:27,800 --> 00:12:29,300 Oh, there we go. 352 00:12:29,300 --> 00:12:32,800 We just had. A 100%. Accuracy. 353 00:12:32,966 --> 00:12:34,966 So in other. Words logistic. 354 00:12:34,966 --> 00:12:35,933 Regression model was. 355 00:12:35,933 --> 00:12:41,100 Totally able to classify perfectly our three classes by separating. 356 00:12:41,100 --> 00:12:42,166 Them separately. 357 00:12:42,166 --> 00:12:44,400 And that's exactly what. We're going to. See. 358 00:12:44,400 --> 00:12:47,300 And you know the. Test set results. 359 00:12:47,300 --> 00:12:48,266 Because indeed. 360 00:12:48,266 --> 00:12:50,600 Well we almost had an incorrect one here. 361 00:12:50,600 --> 00:12:52,800 But as we see the real. 362 00:12:52,800 --> 00:12:53,733 Ones, you know. 363 00:12:53,733 --> 00:12:56,033 Which are all the points here, red. 364 00:12:56,033 --> 00:12:59,133 Green and blue fall into the right. 365 00:12:59,266 --> 00:13:00,366 Prediction regions. 366 00:13:00,366 --> 00:13:02,366 That red prediction region where. 367 00:13:02,366 --> 00:13:05,466 Our model predict that the wine belongs to customer segment number one. 368 00:13:05,966 --> 00:13:09,600 Then this one where our model predicts that the wines belong to customer 369 00:13:09,600 --> 00:13:10,800 segment number two. 370 00:13:10,800 --> 00:13:13,600 And finally, this prediction region where our model. 371 00:13:13,600 --> 00:13:16,266 Predicts that the. Wines should be recommended to. 372 00:13:16,266 --> 00:13:18,933 Customer. Segment number. Three. 373 00:13:18,933 --> 00:13:20,200 And thanks to these. 374 00:13:20,200 --> 00:13:22,133 New extracted features. 375 00:13:22,133 --> 00:13:23,400 You know, LG one and LG. 376 00:13:23,400 --> 00:13:26,433 Two, well this time we have a perfect class. 377 00:13:26,433 --> 00:13:27,766 Separator. In other words. 378 00:13:27,766 --> 00:13:30,000 We have a perfect classifier. 379 00:13:30,000 --> 00:13:31,166 And if you're wondering. 380 00:13:31,166 --> 00:13:32,866 How did. LDA managed to. 381 00:13:32,866 --> 00:13:34,533 Separate perfectly the classes? 382 00:13:34,533 --> 00:13:35,666 Whereas, you know, in. 383 00:13:35,666 --> 00:13:38,466 PCA we could see that it was very difficult to. 384 00:13:38,466 --> 00:13:41,233 Separate. You know, the wines of the test set. 385 00:13:41,233 --> 00:13:42,566 You know, especially this one. 386 00:13:42,566 --> 00:13:45,533 This one falls in the middle of. The red. Wines. 387 00:13:45,533 --> 00:13:48,366 Well, that's because the extracted features are different. 388 00:13:48,366 --> 00:13:51,366 You know, they're not the same as PC1 and PC2. 389 00:13:51,566 --> 00:13:54,733 They are, you know, in some other dimensions in which. 390 00:13:54,733 --> 00:13:55,233 Well. 391 00:13:55,233 --> 00:13:56,333 This time it is. 392 00:13:56,333 --> 00:13:59,333 Possible to separate perfectly the classes. 393 00:13:59,333 --> 00:14:00,900 And that's why this. Time it works. 394 00:14:00,900 --> 00:14:03,900 We are, in other words, in another dimension. 395 00:14:04,100 --> 00:14:05,100 Okay, so. 396 00:14:05,100 --> 00:14:08,700 I guess now we don't have much of a challenge because it's impossible. 397 00:14:08,700 --> 00:14:10,666 To beat this. This is just perfect. 398 00:14:10,666 --> 00:14:13,433 I remind that I did not make this data set. 399 00:14:13,433 --> 00:14:14,266 You know, it's a dataset. 400 00:14:14,266 --> 00:14:15,066 Taken from. 401 00:14:15,066 --> 00:14:17,100 The UCI. ML repository. 402 00:14:17,100 --> 00:14:19,700 So, you know, it's very close to a real world data set. 403 00:14:19,700 --> 00:14:22,200 But there you go. That shows the power of this. 404 00:14:22,200 --> 00:14:25,700 Dimensionality reduction technique linear discriminant analysis. 405 00:14:26,166 --> 00:14:29,400 So now we're going to move on to the next practical activity. 406 00:14:29,400 --> 00:14:32,800 The next section on this time kernel PCA. 407 00:14:33,033 --> 00:14:33,900 And we can just. 408 00:14:33,900 --> 00:14:37,866 Hope that we'll get, you know, at least as good results as PCA 409 00:14:37,900 --> 00:14:41,400 or some as good results as LDA. In other. 410 00:14:41,400 --> 00:14:42,966 Words, let's hope that we get. 411 00:14:42,966 --> 00:14:45,966 Maximum one incorrect. Prediction. 412 00:14:46,100 --> 00:14:47,366 So I look forward to seeing you. 413 00:14:47,366 --> 00:14:50,200 In this next section to implement kernel PCA. 414 00:14:50,200 --> 00:14:52,200 And until then, enjoy machine learning.