1 00:00:00,300 --> 00:00:01,033 I'm a friend. 2 00:00:01,033 --> 00:00:03,800 Are you ready to build that convolutional neural network? 3 00:00:03,800 --> 00:00:06,933 We have a long but yet very exciting journey ahead of us. 4 00:00:07,000 --> 00:00:09,566 Let's do this. Let's kick this off. 5 00:00:09,566 --> 00:00:09,900 All right, 6 00:00:09,900 --> 00:00:11,866 so we're going to start with the very essential step 7 00:00:11,866 --> 00:00:15,933 importing the libraries, which will only consist of importing TensorFlow. 8 00:00:15,933 --> 00:00:19,266 And actually the preprocessing module by the Keras library. 9 00:00:19,400 --> 00:00:23,166 So let's do this first quickly and efficiently in a new code cell. 10 00:00:23,166 --> 00:00:26,166 So actually you know how to import TensorFlow. 11 00:00:26,200 --> 00:00:29,133 We start with the import command. 12 00:00:29,133 --> 00:00:32,133 Then we specify the name of the library TensorFlow. 13 00:00:32,400 --> 00:00:35,333 And we add the shortcut tf. 14 00:00:35,333 --> 00:00:36,900 Just like before with the a. 15 00:00:36,900 --> 00:00:40,966 And then I would like to import something else which will allow us to do 16 00:00:41,100 --> 00:00:45,200 the image preprocessing in part one, and which is the image 17 00:00:45,233 --> 00:00:48,600 submodule of the pre-processing module of the Keras library. 18 00:00:48,966 --> 00:00:53,300 And therefore here we're going to start from, well, the Keras library 19 00:00:53,633 --> 00:00:58,166 from which we're going to get access to the pre-processing module 20 00:00:58,166 --> 00:01:02,500 or from which we're going to import the image submodule. 21 00:01:02,933 --> 00:01:06,266 And the reason why we want to import this is because we want to import 22 00:01:06,266 --> 00:01:10,800 a specific class, which is the image data generator. 23 00:01:11,133 --> 00:01:13,333 And I will explain very quickly what this is about. 24 00:01:13,333 --> 00:01:17,866 But this is absolutely compulsory in part one data preprocessing. 25 00:01:17,866 --> 00:01:20,500 You know when pre-processing your images. 26 00:01:20,500 --> 00:01:21,500 So let's import it. 27 00:01:21,500 --> 00:01:27,466 Here we just need to add import and then image data generator. 28 00:01:28,500 --> 00:01:30,000 Okay good. 29 00:01:30,000 --> 00:01:33,333 I will explain very soon what this is about and how we will use it. 30 00:01:33,566 --> 00:01:34,300 All right. 31 00:01:34,300 --> 00:01:37,033 And then you know this other thing that I like doing 32 00:01:37,033 --> 00:01:40,533 just to show you that indeed we are working with TensorFlow 2.0. 33 00:01:40,800 --> 00:01:44,800 I just want to print the version of TensorFlow we're using right now. 34 00:01:44,900 --> 00:01:47,900 And remember to do this we need to call TensorFlow first. 35 00:01:48,033 --> 00:01:52,166 And then after a dot a double underscore and then version 36 00:01:52,266 --> 00:01:53,800 and double underscore again. 37 00:01:53,800 --> 00:01:56,900 And this will you know print in the output. 38 00:01:56,900 --> 00:01:59,000 The version of TensorFlow we're using. 39 00:01:59,000 --> 00:02:01,866 This is just to make sure we're working with TensorFlow 2.0. 40 00:02:01,866 --> 00:02:03,500 However you know depending on 41 00:02:03,500 --> 00:02:06,233 when you run this code you know after I record this tutorial 42 00:02:06,233 --> 00:02:09,800 you might have a different version, but you will definitely get a TensorFlow 43 00:02:09,833 --> 00:02:11,800 two version. Okay. 44 00:02:11,800 --> 00:02:12,166 All right. 45 00:02:12,166 --> 00:02:14,233 So here I just execute the first cell 46 00:02:14,233 --> 00:02:18,066 importing TensorFlow end image preprocessing module like Keras. 47 00:02:18,233 --> 00:02:22,966 And now let's run this to indeed reassure ourselves 48 00:02:22,966 --> 00:02:26,766 that we are working with TensorFlow 2.0, which is 49 00:02:26,766 --> 00:02:29,600 so much better than TensorFlow one. 50 00:02:29,600 --> 00:02:30,733 All right. Good. 51 00:02:30,733 --> 00:02:33,600 So now we can move on to part one data preprocessing 52 00:02:33,600 --> 00:02:35,366 which will be done in two steps. 53 00:02:35,366 --> 00:02:37,433 First, preprocessing the training set. 54 00:02:37,433 --> 00:02:40,300 And second preprocessing the test set. 55 00:02:40,300 --> 00:02:41,866 So let's start with the training set. 56 00:02:41,866 --> 00:02:43,833 And let's create a new code cell. 57 00:02:43,833 --> 00:02:46,966 And now let me explain how we're going to do this. 58 00:02:47,633 --> 00:02:48,000 All right. 59 00:02:48,000 --> 00:02:51,466 So how are we going to preprocess our images. 60 00:02:51,833 --> 00:02:54,600 Well we're actually going to do multiple things. 61 00:02:54,600 --> 00:02:58,966 The first thing we'll do is we will apply some transformations 62 00:02:58,966 --> 00:03:01,766 on all the images of the training set. 63 00:03:01,766 --> 00:03:03,700 The images are the training set only. 64 00:03:03,700 --> 00:03:06,700 We won't apply these same transformations on the test set. 65 00:03:06,733 --> 00:03:09,933 The reason why we want to apply some transformations 66 00:03:09,933 --> 00:03:13,533 on the images of the training set is for only one purpose. 67 00:03:13,733 --> 00:03:16,166 It is to avoid overfitting. 68 00:03:16,166 --> 00:03:18,900 Indeed, if we don't apply these transformations 69 00:03:18,900 --> 00:03:22,466 well, when training our CNN on the training set, 70 00:03:22,666 --> 00:03:27,033 we will get a huge difference between the accuracy on the training set 71 00:03:27,033 --> 00:03:30,200 and the one on the test set, you know, on the evaluation set. 72 00:03:30,433 --> 00:03:33,866 Actually, we will get very high accuracies on the training set, 73 00:03:33,866 --> 00:03:38,333 you know, close to 98% and much lower accuracies on the test set. 74 00:03:38,600 --> 00:03:40,233 And that is called overfitting. 75 00:03:40,233 --> 00:03:43,233 And that's something we absolutely need to avoid. 76 00:03:43,300 --> 00:03:44,700 Anyway, you know, whether you're 77 00:03:44,700 --> 00:03:49,133 working on a classic data set or working for computer vision and for computer 78 00:03:49,133 --> 00:03:52,266 vision, well, the way to avoid overfitting 79 00:03:52,400 --> 00:03:55,400 is, as I said, to apply transformations. 80 00:03:55,700 --> 00:03:56,600 So that was the why. 81 00:03:56,600 --> 00:04:01,000 And now let me explain to what you know, what are these transformations. 82 00:04:01,000 --> 00:04:04,100 And then I will proceed in the end to the how how are we going to implement that. 83 00:04:04,500 --> 00:04:07,500 So the what what are these transformations is. 84 00:04:07,633 --> 00:04:10,733 Well, some simple geometrical transformations 85 00:04:10,733 --> 00:04:14,766 or some zooms or some rotations on your images. 86 00:04:14,966 --> 00:04:18,466 So basically we're going to apply some geometrical transformations 87 00:04:18,466 --> 00:04:21,766 like transactions to shift some of the pixels. 88 00:04:22,200 --> 00:04:24,333 Then we're going to rotate a bit the images. 89 00:04:24,333 --> 00:04:26,366 We're going to do some horizontal flips. 90 00:04:26,366 --> 00:04:28,800 We're going to do some zoom in and zoom out. 91 00:04:28,800 --> 00:04:32,933 Well you know we're going to apply a series of transformation so as to modify 92 00:04:32,933 --> 00:04:36,233 the images and get them as we say, augment it. 93 00:04:36,466 --> 00:04:38,300 In fact, the technical term 94 00:04:38,300 --> 00:04:41,300 of what we're going to do now, you know, with all these transformations 95 00:04:41,300 --> 00:04:46,333 is called image augmentation, which consists basically of transforming 96 00:04:46,333 --> 00:04:51,666 your images of the training set so that your CNN model doesn't over learn, 97 00:04:51,666 --> 00:04:54,966 you know, it's not over trained on the existing images, 98 00:04:54,966 --> 00:04:58,800 because by applying these transformations, we will get new images, 99 00:04:59,000 --> 00:05:02,100 which is the reason why we call this image augmentation. 100 00:05:02,100 --> 00:05:06,166 We basically augment the variety, you know, the diversity of the training 101 00:05:06,166 --> 00:05:07,533 set images. 102 00:05:07,533 --> 00:05:09,233 All right. So that is the what. 103 00:05:09,233 --> 00:05:12,566 And now we're going to proceed to the how and to proceed to the how. 104 00:05:12,566 --> 00:05:16,400 I'm going to take you to the Keras API because you have to see it. 105 00:05:16,600 --> 00:05:18,266 You know, just like what we did with scikit 106 00:05:18,266 --> 00:05:21,366 learn, I'm going to show you and guide you through the curves API 107 00:05:21,500 --> 00:05:24,500 to find the exact tool we're going to use for this. 108 00:05:24,733 --> 00:05:27,433 So let's open a new tab here. 109 00:05:27,433 --> 00:05:28,066 There we go. 110 00:05:28,066 --> 00:05:32,233 And in the search bar let's enter just Keras Keras like that. 111 00:05:32,400 --> 00:05:33,366 Let's press enter. 112 00:05:33,366 --> 00:05:35,233 And let's just get the first link. 113 00:05:35,233 --> 00:05:36,600 There is only one Keras. 114 00:05:36,600 --> 00:05:39,600 And that's of course the deep learning library in Python 115 00:05:40,100 --> 00:05:41,533 developed by Francois Shelley. 116 00:05:41,533 --> 00:05:44,866 By the way, a very talented French data scientist. 117 00:05:45,300 --> 00:05:45,600 All right. 118 00:05:45,600 --> 00:05:48,600 So let's go now to API docs. 119 00:05:48,900 --> 00:05:52,200 And now, my friends, welcome to the Keras API. 120 00:05:52,233 --> 00:05:55,233 This is probably my favorite deep learning library. 121 00:05:55,233 --> 00:05:57,100 It's absolutely fantastic. 122 00:05:57,100 --> 00:05:58,233 And now where we want to go is 123 00:05:58,233 --> 00:06:02,300 of course to data preprocessing which includes of course three things. 124 00:06:02,300 --> 00:06:03,633 Actually you have to know it 125 00:06:03,633 --> 00:06:07,333 image data preprocessing, which is what we're about to use right now, but then 126 00:06:07,333 --> 00:06:11,533 also time series data preprocessing and also text data preprocessing. 127 00:06:11,533 --> 00:06:16,633 You can also do some deep NLP, you know, NLP with deep learning with CAS. 128 00:06:17,100 --> 00:06:20,733 But now of course we're looking for something in image data preprocessing. 129 00:06:20,933 --> 00:06:23,966 And let me show you exactly what that something is. 130 00:06:24,200 --> 00:06:25,300 We just need to scroll down. 131 00:06:25,300 --> 00:06:27,100 Well actually you already know what this is 132 00:06:27,100 --> 00:06:29,966 because we already import the class, but there it is. 133 00:06:29,966 --> 00:06:35,700 I'm talking, of course, about the image data generator class, which will indeed 134 00:06:35,800 --> 00:06:40,866 generate batches of tensor image data with real time data augmentation, 135 00:06:40,866 --> 00:06:45,200 which is exactly what I've just explained and I haven't mentioned the batches yet. 136 00:06:45,200 --> 00:06:46,166 Well, that's because, you know, 137 00:06:46,166 --> 00:06:50,133 we will create different batches of actually 32 images. 138 00:06:50,500 --> 00:06:53,833 And these images will either be the original ones or, you know, 139 00:06:53,833 --> 00:06:57,933 the augmented ones, the transformed ones after we apply the transformations. 140 00:06:58,533 --> 00:07:01,066 And speaking of applying these transformations, well, 141 00:07:01,066 --> 00:07:04,066 we're going to do that exactly with this image data. 142 00:07:04,066 --> 00:07:07,666 Generate a class for which you will find all the arguments here. 143 00:07:07,666 --> 00:07:11,233 And, you know, most of them correspond to different transformations. 144 00:07:11,466 --> 00:07:14,533 I can already tell you that we will use the zoom range, 145 00:07:14,533 --> 00:07:18,566 which consists of zooming in or zooming out on the images, but also we'll 146 00:07:18,566 --> 00:07:23,033 use the horizontal flip, which consists of flipping the images horizontally. 147 00:07:23,333 --> 00:07:26,500 And then we will also use this one, the shear range, 148 00:07:26,633 --> 00:07:28,233 which is some kind of transfection. 149 00:07:28,233 --> 00:07:31,733 You can check it online, but no need to understand this and all the details. 150 00:07:31,733 --> 00:07:34,566 Just know that it's a geometrical transformation. 151 00:07:34,566 --> 00:07:38,233 And if you want to go further that it's actually some kind of transfection. 152 00:07:38,500 --> 00:07:39,466 But there we go. 153 00:07:39,466 --> 00:07:41,466 These are the three transformations. 154 00:07:41,466 --> 00:07:45,800 We'll use the shear range, the zoom range and the horizontal flip. 155 00:07:46,066 --> 00:07:50,133 And now I'm sure some of you are asking why do we use these transformations? 156 00:07:50,400 --> 00:07:52,033 Well, I'll be honest with you. 157 00:07:52,033 --> 00:07:56,566 The reason I'm using them is because I simply took, you know, the code 158 00:07:56,566 --> 00:08:01,433 snippet example from Keras, which is right below exactly here. 159 00:08:01,733 --> 00:08:06,666 This is the code snippet example using the image data generator class. 160 00:08:06,666 --> 00:08:09,666 And as you can see, we use a shearing transformation, 161 00:08:09,666 --> 00:08:13,000 a zoom transformation, and a horizontal flip transformation. 162 00:08:13,000 --> 00:08:14,566 And we're just going to do the same. 163 00:08:14,566 --> 00:08:17,500 But of course feel free to try some other transformations. 164 00:08:17,500 --> 00:08:20,533 Who knows, maybe you'll get better accuracy in the end. 165 00:08:20,600 --> 00:08:22,366 Okay. But let's just trust is. 166 00:08:22,366 --> 00:08:26,166 And actually I trust this because of course I tried it on our future CNN, 167 00:08:26,166 --> 00:08:27,200 which we're about to build. 168 00:08:27,200 --> 00:08:30,833 And you're going to see that the results in the end will be absolutely amazing. 169 00:08:31,033 --> 00:08:32,633 Okay. So let's just take this. 170 00:08:32,633 --> 00:08:35,933 Let's just take this code snippet to, you know, 171 00:08:35,933 --> 00:08:39,566 actually get the tool that will apply these transformations. 172 00:08:39,833 --> 00:08:42,833 Then of course we'll have to connect the tool to our training set. 173 00:08:43,166 --> 00:08:45,266 So back into our implementation. 174 00:08:45,266 --> 00:08:48,000 Well let's base that right here. 175 00:08:48,000 --> 00:08:51,933 And this as you can see creates an object which we call train 176 00:08:51,966 --> 00:08:55,133 data gen of the image data generated class. 177 00:08:55,133 --> 00:08:59,466 So train data gen is an instance of that image data generator class. 178 00:08:59,466 --> 00:09:02,700 And which represents of course the tool that will apply 179 00:09:02,700 --> 00:09:06,566 all the transformations on the images of the training set. 180 00:09:06,866 --> 00:09:08,533 And there is one I haven't mentioned. 181 00:09:08,533 --> 00:09:12,233 You know, I mentioned and explain these three ones which are the transformations. 182 00:09:12,566 --> 00:09:14,333 But we also notice this one. 183 00:09:14,333 --> 00:09:17,800 Rescale equals one divided by 255. 184 00:09:18,133 --> 00:09:20,100 Can you guess what this is about? 185 00:09:20,100 --> 00:09:23,900 You know, we already saw this many times on our classic data set. 186 00:09:24,100 --> 00:09:26,933 Well, this is of course about feature scaling. 187 00:09:26,933 --> 00:09:31,966 This will apply feature scaling to each and every single one of your pixels 188 00:09:32,166 --> 00:09:35,366 by dividing their value by 255. 189 00:09:35,366 --> 00:09:40,733 Because remember that each pixel takes a value between 0 and 255. 190 00:09:40,733 --> 00:09:45,900 So by dividing all of them by 255, we indeed get all the pixel values 191 00:09:46,066 --> 00:09:49,366 between 0 and 1, which is just like a normalization. 192 00:09:49,600 --> 00:09:53,166 And once again, feature scaling is absolutely compulsory 193 00:09:53,166 --> 00:09:56,400 for neural networks, you know, in training neural networks. 194 00:09:56,700 --> 00:09:59,133 All right. So basically this is feature scaling. 195 00:09:59,133 --> 00:10:02,466 And these are the transformations that will perform 196 00:10:02,666 --> 00:10:05,900 image augmentation on the images of the training set. 197 00:10:05,900 --> 00:10:10,100 And this I remind is in order to prevent overfitting in the end 198 00:10:10,100 --> 00:10:13,566 you can try actually you know the future training will have without these. 199 00:10:13,566 --> 00:10:16,266 And you will see what I mean by overfitting. 200 00:10:16,266 --> 00:10:18,500 All right. Good. So that's not it. 201 00:10:18,500 --> 00:10:19,100 That's not it. 202 00:10:19,100 --> 00:10:22,366 You know, for the training set preprocessing, we need to of course now 203 00:10:22,366 --> 00:10:26,933 connect that train data gen object to our training set. 204 00:10:26,933 --> 00:10:28,566 You know to our training set images. 205 00:10:28,566 --> 00:10:30,633 So far this is just the object. 206 00:10:30,633 --> 00:10:34,166 And so the way we're going to do this is we will 207 00:10:34,166 --> 00:10:37,166 of course go back to our Keras API. 208 00:10:37,366 --> 00:10:41,600 Because indeed the way to do this is just to take this next code here 209 00:10:41,833 --> 00:10:45,500 that will actually import the training set 210 00:10:45,733 --> 00:10:48,733 by accessing it from, you know, our directory. 211 00:10:49,000 --> 00:10:52,566 And at the same time creating these batches and resizing 212 00:10:52,700 --> 00:10:56,166 the images, you know, in case we need to resize them in order to 213 00:10:56,333 --> 00:11:00,000 reduce the computations of the machine, you know, to make it less compute 214 00:11:00,000 --> 00:11:03,200 intensive, which is what we'll do, because we will see that 215 00:11:03,200 --> 00:11:06,233 with a lower size will still get amazing results in the end. 216 00:11:06,633 --> 00:11:07,600 So let's get this. 217 00:11:07,600 --> 00:11:10,600 And once again I will explain all this code. 218 00:11:10,600 --> 00:11:13,700 And mostly we will have to change it the right way 219 00:11:13,700 --> 00:11:16,700 so that we can adapt it indeed to our situation. 220 00:11:17,133 --> 00:11:19,133 So let's take it step by step. 221 00:11:19,133 --> 00:11:22,333 This is actually the name you want to give to 222 00:11:22,366 --> 00:11:25,433 your training set, which you are importing in the notebook. 223 00:11:25,666 --> 00:11:27,566 And let's just give the usual names. 224 00:11:27,566 --> 00:11:32,100 We're going to call that training underscore set just like before. 225 00:11:32,500 --> 00:11:35,833 Then we take indeed our train data gen object, 226 00:11:35,833 --> 00:11:38,833 that instance of the image data generator class. 227 00:11:38,866 --> 00:11:43,733 And from this object we're going to call a method of this class. 228 00:11:43,733 --> 00:11:46,633 Right. Because this class is every class contains methods. 229 00:11:46,633 --> 00:11:50,366 And one of them is this flow from directory which will just simply, 230 00:11:50,533 --> 00:11:55,733 you know, connect this image augmentation tool to the images of your training set. 231 00:11:56,333 --> 00:11:56,800 All right. 232 00:11:56,800 --> 00:11:59,433 Then let's have a look at the different parameters. 233 00:11:59,433 --> 00:12:05,300 So the first one here is actually the path leading to your training set. 234 00:12:05,633 --> 00:12:09,233 And so of course we have to change this because we have a different path 235 00:12:09,233 --> 00:12:10,600 to our data set. 236 00:12:10,600 --> 00:12:12,100 So this is a whole folder 237 00:12:12,100 --> 00:12:15,100 which I've shared with you at the beginning of this section. 238 00:12:15,133 --> 00:12:17,033 And this is also the root folder. 239 00:12:17,033 --> 00:12:18,433 You know this is the base of the folder. 240 00:12:18,433 --> 00:12:20,300 You know the beginning of the path. 241 00:12:20,300 --> 00:12:23,100 And so now in order to access the training set 242 00:12:23,100 --> 00:12:26,233 well we first need to specify that we want to go into this data 243 00:12:26,233 --> 00:12:29,233 set folder and then into this training set folder. 244 00:12:29,233 --> 00:12:32,533 And that's exactly the path leading to the training set. 245 00:12:32,833 --> 00:12:36,233 And therefore here you know in this parameter of the flow 246 00:12:36,233 --> 00:12:37,666 from directory function. 247 00:12:37,666 --> 00:12:41,266 Well we simply need to replace data here by data set 248 00:12:41,466 --> 00:12:44,466 and then train here by training set. 249 00:12:44,700 --> 00:12:45,000 All right. 250 00:12:45,000 --> 00:12:48,700 This is a simple path leading to the train set folder starting 251 00:12:48,700 --> 00:12:52,433 from the root of our directory folder okay good. 252 00:12:52,600 --> 00:12:54,900 Now next argument target size. 253 00:12:54,900 --> 00:12:58,166 That's indeed the final size of your images 254 00:12:58,166 --> 00:13:02,233 when they, you know, will be fed into the convolutional neural network. 255 00:13:02,633 --> 00:13:06,266 And actually I tried with 150 by 150. 256 00:13:06,500 --> 00:13:09,233 And that's actually made the training very very long. 257 00:13:09,233 --> 00:13:14,500 So I actually wanted to reduce that to, you know, 64 by 258 00:13:15,466 --> 00:13:17,666 64. And that's totally fine. 259 00:13:17,666 --> 00:13:19,566 This will make the training much faster. 260 00:13:19,566 --> 00:13:21,800 And still we will have amazing results. 261 00:13:21,800 --> 00:13:23,366 You'll see that at the end. 262 00:13:23,366 --> 00:13:26,666 Then the batch size is, you know, the size of the batches, meaning 263 00:13:26,666 --> 00:13:29,200 how many images we want to have in each batch. 264 00:13:29,200 --> 00:13:31,500 And the 32 is a classic default value. 265 00:13:31,500 --> 00:13:33,833 And we're going to keep that. That will be totally fine. 266 00:13:33,833 --> 00:13:34,600 And finally 267 00:13:34,600 --> 00:13:38,966 we have to specify the class mode, which is either binary or categorical. 268 00:13:39,300 --> 00:13:42,866 And of course since now we have a binary outcome, you know, cat 269 00:13:42,866 --> 00:13:46,800 or dog, well, we have to choose of course class mode equals binary. 270 00:13:47,233 --> 00:13:48,533 Okay, perfect. 271 00:13:48,533 --> 00:13:52,133 And that closes the pre-processing of the training set. 272 00:13:52,133 --> 00:13:55,266 We are done with this first step of data preprocessing. 273 00:13:55,266 --> 00:14:00,000 And so now we're going to move on to the next step preprocessing the test set. 274 00:14:00,366 --> 00:14:04,200 And of course in the spirit of always be as much efficient as we can, 275 00:14:04,233 --> 00:14:07,200 well, we're going to go back to our Keras API. 276 00:14:07,200 --> 00:14:10,433 And we're just going to take this time, this line of code 277 00:14:10,433 --> 00:14:14,266 to, you know, get that same image data generator 278 00:14:14,266 --> 00:14:18,133 object to, you know, apply the transformations to the test images. 279 00:14:18,366 --> 00:14:19,833 But be careful. 280 00:14:19,833 --> 00:14:22,200 We're not going to apply the same transformations here, 281 00:14:22,200 --> 00:14:24,900 such as shearing the zoom and the horizontal flip, 282 00:14:24,900 --> 00:14:27,666 because of course we don't want to touch the test images 283 00:14:27,666 --> 00:14:31,466 because they're like new images, like when deploying our model in production. 284 00:14:31,633 --> 00:14:35,433 And therefore, of course, we have to keep them intact like the original ones. 285 00:14:35,733 --> 00:14:41,200 However, what we have to do to them is indeed to rescale their pixels. 286 00:14:41,200 --> 00:14:43,133 And that's the same as before, you know, 287 00:14:43,133 --> 00:14:47,366 remember when we were applying feature scaling to our training set and test it? 288 00:14:47,566 --> 00:14:50,833 Well, we used the fit transform method on the training set, 289 00:14:50,833 --> 00:14:53,833 but only the trend for method on the test set. 290 00:14:54,000 --> 00:14:57,633 And that was of course to avoid information leakage from the test set. 291 00:14:57,966 --> 00:14:59,800 And well, here that's exactly the same. 292 00:14:59,800 --> 00:15:00,566 We have to keep 293 00:15:00,566 --> 00:15:04,466 the images of the test set intact by not applying any transformation. 294 00:15:04,666 --> 00:15:08,566 However, we have to feature scaled them because once again, the future predict 295 00:15:08,566 --> 00:15:12,566 method of the CNN will have to be applied to the same scale 296 00:15:12,833 --> 00:15:15,466 as the one that was applied on the training set. 297 00:15:15,466 --> 00:15:17,300 So you see, this is exactly the same as before. 298 00:15:17,300 --> 00:15:17,766 It's just that 299 00:15:17,766 --> 00:15:22,200 we are using some different classes, but which are, after all, the same tools. 300 00:15:22,700 --> 00:15:23,000 All right. 301 00:15:23,000 --> 00:15:24,333 So let's get this and let's 302 00:15:24,333 --> 00:15:27,900 put that back into our implementation in a new coat cell. 303 00:15:28,200 --> 00:15:29,600 So we're going to paste that. 304 00:15:29,600 --> 00:15:31,866 We're going to keep the same name for the object. 305 00:15:31,866 --> 00:15:33,066 That's solely fine. 306 00:15:33,066 --> 00:15:37,733 And then well same we're going to go back to this and we're going to get exactly 307 00:15:37,733 --> 00:15:41,200 this which will actually import the test 308 00:15:41,200 --> 00:15:44,200 set images into our notebook. 309 00:15:44,266 --> 00:15:44,600 All right. 310 00:15:44,600 --> 00:15:46,366 So let's test that. 311 00:15:46,366 --> 00:15:48,366 And now let's do the required change. 312 00:15:48,366 --> 00:15:51,966 Actually please press pause on the video and do the changes yourself. 313 00:15:51,966 --> 00:15:54,000 I'm sure you're going to do this successfully 314 00:15:54,000 --> 00:15:56,466 because this is exactly the same as before. 315 00:15:56,466 --> 00:15:59,133 All right. So now let's do it together. 316 00:15:59,133 --> 00:16:01,833 The first thing I would like to do is just change its name, 317 00:16:01,833 --> 00:16:05,333 which is exactly the name of the variable that will contain the test set. 318 00:16:05,500 --> 00:16:09,800 And just to be consistent with before, well, let's just call the test set. 319 00:16:11,000 --> 00:16:11,833 All right. 320 00:16:11,833 --> 00:16:13,800 So test set then this is correct. 321 00:16:13,800 --> 00:16:17,000 We call our test data in here which will only apply 322 00:16:17,033 --> 00:16:20,033 which is scaling to the pixels of the test images. 323 00:16:20,133 --> 00:16:21,900 Then we call that same function 324 00:16:21,900 --> 00:16:25,500 flow from directory to access the test set from our directory. 325 00:16:25,800 --> 00:16:29,100 And here once again we need to replace data here by data set 326 00:16:29,400 --> 00:16:32,700 and then validation by you know remember 327 00:16:33,666 --> 00:16:36,500 now we want to get the path that leads to the test set. 328 00:16:36,500 --> 00:16:39,633 And therefore that first data set and then test set. 329 00:16:39,866 --> 00:16:40,500 All right. 330 00:16:40,500 --> 00:16:45,633 So here we just need to replace validation by test set. 331 00:16:46,100 --> 00:16:46,966 Good. 332 00:16:46,966 --> 00:16:49,800 Then of course we need to have the same target size. 333 00:16:49,800 --> 00:16:53,200 Because basically the break method has to be called on the exact 334 00:16:53,200 --> 00:16:56,733 same format as the one that was used for the images of the training. 335 00:16:56,733 --> 00:17:00,533 So here we need to get the same size as in the training set. 336 00:17:00,533 --> 00:17:06,566 Therefore, 64 by 64 and the same batch size. 337 00:17:06,566 --> 00:17:10,433 Basically our Mo will be evaluated on batches of 32 images. 338 00:17:10,700 --> 00:17:13,500 And of course the same class modes. Binary. 339 00:17:13,500 --> 00:17:14,166 Good. 340 00:17:14,166 --> 00:17:15,066 Well there you go. 341 00:17:15,066 --> 00:17:17,433 We're done with data preprocessing. 342 00:17:17,433 --> 00:17:19,800 It was very different. It was actually brand new. 343 00:17:19,800 --> 00:17:23,933 But we recognize some of the same process steps as what we did before. 344 00:17:24,666 --> 00:17:27,266 And so now I'm very excited because we can move on 345 00:17:27,266 --> 00:17:30,900 to the exciting part, which is about building the CNN. 346 00:17:31,200 --> 00:17:35,033 Yes, we're ready for part two now, which we're going to tackle in several steps. 347 00:17:35,300 --> 00:17:37,633 And so make sure to get good energy for this. 348 00:17:37,633 --> 00:17:40,866 And as soon as it is a case, join me in the next tutorial 349 00:17:40,900 --> 00:17:43,900 to smash this time part to building the CNN. 350 00:17:44,266 --> 00:17:46,200 And until then, enjoy machine learning.