1 00:00:00,390 --> 00:00:08,370 Now, we have a fairly good idea about our X and Y variables, our X variable is present in the form 2 00:00:08,370 --> 00:00:18,600 of 2D array of 28 and 28 pixel intensities, where each individual pixel intensity lies between zero 3 00:00:18,750 --> 00:00:20,130 and 255. 4 00:00:21,420 --> 00:00:29,730 And since we are going to use gradient descent to compile lower model, we need to normalize this pixel 5 00:00:29,790 --> 00:00:30,720 intensities. 6 00:00:32,010 --> 00:00:39,390 By normalizing, I mean, we have to restrict this pixel intensities between zero and one. 7 00:00:41,230 --> 00:00:48,240 A very simple way to do this is by dividing all the pixel intensities by 255. 8 00:00:49,660 --> 00:00:57,880 So zero will remain zero and 255, which sends four completely white pixel becomes one. 9 00:00:58,880 --> 00:00:59,880 And so on. 10 00:01:01,640 --> 00:01:07,940 So to normalize, we can just be way too extreme for each data set by 255. 11 00:01:08,210 --> 00:01:12,050 And similarly, we have to normalize our test data set as well. 12 00:01:12,590 --> 00:01:13,790 So four tests also. 13 00:01:13,970 --> 00:01:18,160 We are dividing all the pixel intensities by 255. 14 00:01:20,380 --> 00:01:27,700 This normalization is different from the normalization we generally do for machine learning algorithms. 15 00:01:28,920 --> 00:01:35,880 Since here, we know that all of these values are on an absolute scale of zero to 255. 16 00:01:36,480 --> 00:01:38,690 We can vertically divided by 250 feet. 17 00:01:39,690 --> 00:01:44,310 But for general machine learning databases, we don't know the absolute scale. 18 00:01:44,970 --> 00:01:53,040 So we generally subtract the mean from these numbers and divided by their standard deviations. 19 00:01:54,240 --> 00:02:01,170 But that process is not needed here, since we know that the pixel intensities lies between zero and 20 00:02:01,170 --> 00:02:01,860 255. 21 00:02:02,640 --> 00:02:05,940 So here we can directly divide it by two fifty five. 22 00:02:07,050 --> 00:02:11,490 And one thing you can notice is that we are not dividing it by 250 feet. 23 00:02:11,610 --> 00:02:14,910 We are dividing it by two fifty five point zero. 24 00:02:15,720 --> 00:02:22,950 That because we want the final output in the form of floating numbers between zero and one four, divided 25 00:02:22,950 --> 00:02:25,830 by just integer values of 255. 26 00:02:26,490 --> 00:02:33,630 So since the intensities are integer value, there might be some cases with some Python version where 27 00:02:33,630 --> 00:02:39,890 we get the output as integer since we want the whole greyscale between zero and one. 28 00:02:40,950 --> 00:02:46,230 We have to use two fifty five point zero with three cent Python version. 29 00:02:46,350 --> 00:02:52,800 You don't have to do this but to make sure that the code is compatible with all other Python versions. 30 00:02:54,080 --> 00:03:00,680 It's better to do it with a floating number so that the final output is in the form of floating number 31 00:03:00,680 --> 00:03:01,790 between zero and one. 32 00:03:05,610 --> 00:03:08,130 Just figured this. 33 00:03:10,470 --> 00:03:16,970 We're calling over normalize data sets, as Xander underscore, Crane, underscore N and X, underscore 34 00:03:16,980 --> 00:03:18,190 tests underscore N. 35 00:03:21,620 --> 00:03:25,570 As I told you earlier, our train dataset is off. 36 00:03:25,580 --> 00:03:27,590 Six thousand observations. 37 00:03:27,700 --> 00:03:32,330 Another test dataset is of another 10000 observations. 38 00:03:33,530 --> 00:03:39,410 We will further divide over Krien dataset and do training and validation sets. 39 00:03:40,460 --> 00:03:45,410 We will use the first 5000 observations as our validation test. 40 00:03:46,100 --> 00:03:50,210 And next, fifty five thousand as ever creaming dataset. 41 00:03:51,810 --> 00:03:57,510 So to do that, we can just do I did using this simple operations. 42 00:03:58,430 --> 00:04:04,410 We are saving over zero to 5000 datasets and two X validation. 43 00:04:05,340 --> 00:04:09,420 And from 5000 vun to 60000. 44 00:04:10,380 --> 00:04:16,730 And to extreme, similarly, we have to do this for a world wide dataset. 45 00:04:16,740 --> 00:04:22,020 Also, we are saving first 5000 observations and two X validation. 46 00:04:22,860 --> 00:04:25,470 And next 55000 observations. 47 00:04:25,570 --> 00:04:26,980 And two, Vytorin. 48 00:04:28,170 --> 00:04:30,720 And our X test will remain the same. 49 00:04:30,900 --> 00:04:35,610 So we are just saving our normalized data and do X test data. 50 00:04:38,310 --> 00:04:40,390 So just run this. 51 00:04:42,100 --> 00:04:43,550 Now we have two datasets. 52 00:04:44,830 --> 00:04:46,980 First is the validation set of 5000. 53 00:04:47,920 --> 00:04:51,000 Then the training site of 55000. 54 00:04:51,820 --> 00:04:57,280 And then add another dataset of 10000 observations in our dataset. 55 00:05:00,810 --> 00:05:03,810 We will be using green dataset to train our model. 56 00:05:04,770 --> 00:05:09,100 We will be using validation set to optimize the performance of automotive. 57 00:05:09,900 --> 00:05:15,950 And then after tuning all the hyper parameters, we will be using test data set. 58 00:05:17,170 --> 00:05:19,540 To evaluate the performance of automotive. 59 00:05:20,500 --> 00:05:26,460 To view the values of this dataset, you can just call the dataset. 60 00:05:29,720 --> 00:05:33,710 You can see now the values are between zero and one. 61 00:05:34,550 --> 00:05:37,250 Just look at the first when you. 62 00:05:41,880 --> 00:05:45,570 Here you can see there are some values which are between zero and one. 63 00:05:46,440 --> 00:05:51,360 And now it is normalized in the next lecture. 64 00:05:52,140 --> 00:05:58,530 We'll look at different methods that are available to create neural network using get us. 65 00:05:58,890 --> 00:05:59,280 Thank you.