1 00:00:00,400 --> 00:00:08,550 Now we have a fairly good idea about our x and y variables our x variable is present in the form of 2 00:00:08,580 --> 00:00:20,900 2D array of 28 and 220 pixel intensities where each individual pixel intensity lies between 0 and 255. 3 00:00:21,420 --> 00:00:29,730 And since we are going to use gradient descent to compile our model we need to normalize this pixel 4 00:00:29,840 --> 00:00:41,650 intensities by normalizing I mean we have to restrict this pixel intensities between 0 and 1 a very 5 00:00:41,650 --> 00:00:49,660 simple way to do this is by dividing all the pixel intensities by 255. 6 00:00:49,660 --> 00:00:58,630 So Zillow will remain 0 and 255 which stands for completely white pixel becomes 1. 7 00:00:58,870 --> 00:00:59,890 And so on. 8 00:01:01,640 --> 00:01:08,240 So to normalize we can just be very low at X strength for let's say by 255. 9 00:01:08,240 --> 00:01:12,590 And similarly we have to normalize our test data set as well. 10 00:01:12,590 --> 00:01:22,960 So for tests also we are dividing all the pixel intensities by 255 this normalization is different from 11 00:01:23,350 --> 00:01:31,800 the normalization we generally do for machine learning algorithms since here we know that all these 12 00:01:31,800 --> 00:01:36,430 values are on an absolute scale of 0 to 255. 13 00:01:36,480 --> 00:01:43,970 We can radically divided by 255 but for the general machine learning databases we don't know the absolute 14 00:01:43,980 --> 00:01:44,370 scale. 15 00:01:44,970 --> 00:01:54,870 So we generally subtract the mean from these numbers and divided by their standard deviations but that 16 00:01:54,870 --> 00:01:56,820 process is not needed here. 17 00:01:56,910 --> 00:02:05,190 Since we know that the pixel in densities lies between 0 and 255 so here we can directly divide it by 18 00:02:05,250 --> 00:02:06,620 255. 19 00:02:07,050 --> 00:02:11,580 And one thing you can notice is that we are not dividing it by 255. 20 00:02:11,610 --> 00:02:15,370 We are dividing it by two fifty five point zero. 21 00:02:15,720 --> 00:02:22,110 That because we want the final output in the form of floating numbers between 0 and 1. 22 00:02:22,180 --> 00:02:26,440 If we divide it by just integer values of 255. 23 00:02:26,490 --> 00:02:33,630 So since the intensities are integer value there might be some cases with some python version where 24 00:02:33,630 --> 00:02:40,850 we get the output as integer since we won the whole grade scale between 0 and 1. 25 00:02:40,950 --> 00:02:46,350 We have to use to fifty five point zero with three cent python version. 26 00:02:46,350 --> 00:02:52,770 You don't have to do this but to make sure that the code is compatible with all other Python versions 27 00:02:54,150 --> 00:03:00,680 it's better to do it with a floating number so that the final output is in the form of floating number 28 00:03:00,680 --> 00:03:05,470 between 0 and 1. 29 00:03:05,610 --> 00:03:16,930 Just figured this we're calling over normalized datasets as Xander screen underscore and an X underscore 30 00:03:16,980 --> 00:03:18,180 tests underscore and 31 00:03:21,620 --> 00:03:30,830 as I told you earlier our trained dataset is of 6000 observations and no test dataset is of another 32 00:03:30,830 --> 00:03:33,530 10000 observations. 33 00:03:33,530 --> 00:03:41,810 We will further divide our green data set and two screening and validation sets we will use the first 34 00:03:41,810 --> 00:03:45,790 5000 observations as our validation test. 35 00:03:46,100 --> 00:03:50,230 And next five posing as a training dataset. 36 00:03:51,840 --> 00:03:55,680 So to do that we can just do. 37 00:03:55,800 --> 00:04:04,890 Using this simple operations we are saving over 0 to 5000 data sets and 2 x validation. 38 00:04:05,340 --> 00:04:12,230 And from five thousand one to 60000 and too extreme. 39 00:04:13,260 --> 00:04:20,820 Similarly we have to do this for a world wide dataset also we are saving first 5000 observations and 40 00:04:20,820 --> 00:04:22,540 2 x validation. 41 00:04:22,860 --> 00:04:27,040 And next fifty five thousand observations into victory. 42 00:04:28,170 --> 00:04:30,860 And our x test will remain the same. 43 00:04:30,900 --> 00:04:42,050 So we are just saving our normalized data and do X test data so just run this. 44 00:04:42,080 --> 00:04:44,790 Now we have three datasets. 45 00:04:44,810 --> 00:04:47,720 First is the validation set of 5000. 46 00:04:47,900 --> 00:04:51,140 Then the training set of fifty five thousand. 47 00:04:51,830 --> 00:04:57,320 And then add another dataset of 10000 observations in our has dataset 48 00:05:00,810 --> 00:05:04,630 we will be using green data set to train our model. 49 00:05:04,770 --> 00:05:09,770 We will be using validation set to optimize the performance of our model. 50 00:05:09,900 --> 00:05:18,940 And then after tuning all the hyper parameters we will be using test data set to evaluate the performance 51 00:05:18,940 --> 00:05:23,370 of automotive to view the values of this dataset. 52 00:05:23,410 --> 00:05:26,380 You can just call the data. 53 00:05:29,720 --> 00:05:34,470 You can see now the values are between 0 and 1. 54 00:05:34,550 --> 00:05:37,240 Just look at the first when you 55 00:05:41,910 --> 00:05:50,580 here you can see there are some values which are between 0 and 1 and now what it has normalized in the 56 00:05:50,580 --> 00:05:58,840 next lecture will look at different methods that are available to create neural network using kid us. 57 00:05:58,890 --> 00:05:59,250 Thank you.