1
00:00:00,400 --> 00:00:08,550
Now we have a fairly good idea about our x and y variables our x variable is present in the form of

2
00:00:08,580 --> 00:00:20,900
2D array of 28 and 220 pixel intensities where each individual pixel intensity lies between 0 and 255.

3
00:00:21,420 --> 00:00:29,730
And since we are going to use gradient descent to compile our model we need to normalize this pixel

4
00:00:29,840 --> 00:00:41,650
intensities by normalizing I mean we have to restrict this pixel intensities between 0 and 1 a very

5
00:00:41,650 --> 00:00:49,660
simple way to do this is by dividing all the pixel intensities by 255.

6
00:00:49,660 --> 00:00:58,630
So Zillow will remain 0 and 255 which stands for completely white pixel becomes 1.

7
00:00:58,870 --> 00:00:59,890
And so on.

8
00:01:01,640 --> 00:01:08,240
So to normalize we can just be very low at X strength for let's say by 255.

9
00:01:08,240 --> 00:01:12,590
And similarly we have to normalize our test data set as well.

10
00:01:12,590 --> 00:01:22,960
So for tests also we are dividing all the pixel intensities by 255 this normalization is different from

11
00:01:23,350 --> 00:01:31,800
the normalization we generally do for machine learning algorithms since here we know that all these

12
00:01:31,800 --> 00:01:36,430
values are on an absolute scale of 0 to 255.

13
00:01:36,480 --> 00:01:43,970
We can radically divided by 255 but for the general machine learning databases we don't know the absolute

14
00:01:43,980 --> 00:01:44,370
scale.

15
00:01:44,970 --> 00:01:54,870
So we generally subtract the mean from these numbers and divided by their standard deviations but that

16
00:01:54,870 --> 00:01:56,820
process is not needed here.

17
00:01:56,910 --> 00:02:05,190
Since we know that the pixel in densities lies between 0 and 255 so here we can directly divide it by

18
00:02:05,250 --> 00:02:06,620
255.

19
00:02:07,050 --> 00:02:11,580
And one thing you can notice is that we are not dividing it by 255.

20
00:02:11,610 --> 00:02:15,370
We are dividing it by two fifty five point zero.

21
00:02:15,720 --> 00:02:22,110
That because we want the final output in the form of floating numbers between 0 and 1.

22
00:02:22,180 --> 00:02:26,440
If we divide it by just integer values of 255.

23
00:02:26,490 --> 00:02:33,630
So since the intensities are integer value there might be some cases with some python version where

24
00:02:33,630 --> 00:02:40,850
we get the output as integer since we won the whole grade scale between 0 and 1.

25
00:02:40,950 --> 00:02:46,350
We have to use to fifty five point zero with three cent python version.

26
00:02:46,350 --> 00:02:52,770
You don't have to do this but to make sure that the code is compatible with all other Python versions

27
00:02:54,150 --> 00:03:00,680
it's better to do it with a floating number so that the final output is in the form of floating number

28
00:03:00,680 --> 00:03:05,470
between 0 and 1.

29
00:03:05,610 --> 00:03:16,930
Just figured this we're calling over normalized datasets as Xander screen underscore and an X underscore

30
00:03:16,980 --> 00:03:18,180
tests underscore and

31
00:03:21,620 --> 00:03:30,830
as I told you earlier our trained dataset is of 6000 observations and no test dataset is of another

32
00:03:30,830 --> 00:03:33,530
10000 observations.

33
00:03:33,530 --> 00:03:41,810
We will further divide our green data set and two screening and validation sets we will use the first

34
00:03:41,810 --> 00:03:45,790
5000 observations as our validation test.

35
00:03:46,100 --> 00:03:50,230
And next five posing as a training dataset.

36
00:03:51,840 --> 00:03:55,680
So to do that we can just do.

37
00:03:55,800 --> 00:04:04,890
Using this simple operations we are saving over 0 to 5000 data sets and 2 x validation.

38
00:04:05,340 --> 00:04:12,230
And from five thousand one to 60000 and too extreme.

39
00:04:13,260 --> 00:04:20,820
Similarly we have to do this for a world wide dataset also we are saving first 5000 observations and

40
00:04:20,820 --> 00:04:22,540
2 x validation.

41
00:04:22,860 --> 00:04:27,040
And next fifty five thousand observations into victory.

42
00:04:28,170 --> 00:04:30,860
And our x test will remain the same.

43
00:04:30,900 --> 00:04:42,050
So we are just saving our normalized data and do X test data so just run this.

44
00:04:42,080 --> 00:04:44,790
Now we have three datasets.

45
00:04:44,810 --> 00:04:47,720
First is the validation set of 5000.

46
00:04:47,900 --> 00:04:51,140
Then the training set of fifty five thousand.

47
00:04:51,830 --> 00:04:57,320
And then add another dataset of 10000 observations in our has dataset

48
00:05:00,810 --> 00:05:04,630
we will be using green data set to train our model.

49
00:05:04,770 --> 00:05:09,770
We will be using validation set to optimize the performance of our model.

50
00:05:09,900 --> 00:05:18,940
And then after tuning all the hyper parameters we will be using test data set to evaluate the performance

51
00:05:18,940 --> 00:05:23,370
of automotive to view the values of this dataset.

52
00:05:23,410 --> 00:05:26,380
You can just call the data.

53
00:05:29,720 --> 00:05:34,470
You can see now the values are between 0 and 1.

54
00:05:34,550 --> 00:05:37,240
Just look at the first when you

55
00:05:41,910 --> 00:05:50,580
here you can see there are some values which are between 0 and 1 and now what it has normalized in the

56
00:05:50,580 --> 00:05:58,840
next lecture will look at different methods that are available to create neural network using kid us.

57
00:05:58,890 --> 00:05:59,250
Thank you.