1
00:00:05,480 --> 00:00:12,170
Now let's create a structure of forward first artificial neural network model.

2
00:00:12,230 --> 00:00:22,610
Before starting let's just say random seed 242 using these clues statements random seed is used to replicate

3
00:00:22,730 --> 00:00:25,130
the same dessert every time.

4
00:00:25,400 --> 00:00:32,600
You can use any of them but instead of 42 if you use that number in future when you are running the

5
00:00:32,600 --> 00:00:38,270
same code you will get the same output as we have discussed in the theory.

6
00:00:38,330 --> 00:00:45,680
There are multiple occasions where a what neural network generates random number such as assigning the

7
00:00:45,680 --> 00:00:55,310
initial weights using random seed will help you to reproduce the same result using the same initial

8
00:00:55,310 --> 00:01:01,300
weights every time they're tested on this.

9
00:01:01,940 --> 00:01:11,630
So for the word problem we have observations in the form of 28 and 220 pixels observations are in the

10
00:01:11,630 --> 00:01:19,430
form of to be array and as an output we want 10 categories.

11
00:01:19,430 --> 00:01:22,970
These categories are exclusive.

12
00:01:22,970 --> 00:01:32,960
That means a single limit can either be a t shirt ordered top or a boat etc..

13
00:01:33,460 --> 00:01:37,180
This is what we are planning to do.

14
00:01:37,450 --> 00:01:43,040
We are first converting over to be observations and book flight.

15
00:01:43,240 --> 00:01:45,240
1 these observations.

16
00:01:45,640 --> 00:01:54,780
So insert over to be a day of 2010 228 pixel we want 784 pixel in our input layer.

17
00:01:55,390 --> 00:02:03,820
Then we are going to create two hidden layers the activation function which we are going to use for

18
00:02:03,820 --> 00:02:07,220
hidden layers will be real.

19
00:02:08,200 --> 00:02:17,170
As discussed in the theory lecture we always prefer red loop for classification models and in the output.

20
00:02:17,230 --> 00:02:25,270
Since this 10 categories are exclusive and and this is a classification model we will be using soft

21
00:02:25,390 --> 00:02:28,620
Max activation.

22
00:02:28,720 --> 00:02:32,940
We have already discussed this activation types in our theory reelected.

23
00:02:33,100 --> 00:02:37,850
That's why we are not going to discuss it here.

24
00:02:38,020 --> 00:02:46,240
Now let's start creating this neural network using sequential a piece of data.

25
00:02:46,240 --> 00:02:49,190
First we will need to create an model object.

26
00:02:50,110 --> 00:02:58,510
So our object variable name is model and we are just building it using this function get US DOT models

27
00:02:58,540 --> 00:03:03,590
dot sequential in this sequential object.

28
00:03:03,590 --> 00:03:05,820
We can add different layers.

29
00:03:05,900 --> 00:03:13,880
We will start with our input layer will move on to the hidden layer 1 then to heard on Layer 2 and then

30
00:03:14,120 --> 00:03:15,200
to the output layer.

31
00:03:17,270 --> 00:03:27,080
So first for the input layer we can write like this model dot ag and then guide us dot layers and then

32
00:03:27,380 --> 00:03:35,140
since we want to convert this to be a real friend data into 28 pixels to say 184 pixel in a single array

33
00:03:35,660 --> 00:03:36,980
we are using then

34
00:03:39,680 --> 00:03:46,370
and then we need to provide the input shape of over X variables.

35
00:03:46,370 --> 00:03:52,540
Since about X variable is a beauty array of 28 and Goodwin dead pixel we are using input shape equal

36
00:03:52,550 --> 00:03:58,550
to then we are providing a list of two variables that is defined here called 128

37
00:04:01,820 --> 00:04:04,290
then over the second layer is the hidden layer.

38
00:04:06,330 --> 00:04:14,790
So in the next step we are adding another layer that is model dot and then get us dot layer.

39
00:04:14,850 --> 00:04:18,380
And since this is a dance layer will write dense.

40
00:04:18,990 --> 00:04:25,770
And here we need to mention the number of neurons we want in this layer.

41
00:04:26,100 --> 00:04:30,190
So and then layer one we need 300 neurons.

42
00:04:30,200 --> 00:04:32,640
That is why we are writing 300.

43
00:04:32,970 --> 00:04:36,600
And then we want real to activation function.

44
00:04:36,630 --> 00:04:43,040
That's why we are writing activation equally to very low in the next step.

45
00:04:43,050 --> 00:04:45,080
We want another hidden layer.

46
00:04:45,750 --> 00:04:53,790
So we are following the same process that is we are writing more than not Ed and in record we are writing

47
00:04:53,790 --> 00:05:00,150
gave us dot layer dot bands and in this layer we 100 neurons.

48
00:05:00,300 --> 00:05:07,180
That's why we are writing hundred and then activation equal to the loop since this is also done layer.

49
00:05:07,380 --> 00:05:13,190
We want activation function to be done in the next output layer.

50
00:05:13,320 --> 00:05:15,610
We want 10 different categories.

51
00:05:15,630 --> 00:05:25,050
That's why we have to add 10 neurons and to this layer and since the classes are exclusive.

52
00:05:25,050 --> 00:05:30,540
That's why we have to use soft Max activation.

53
00:05:30,540 --> 00:05:38,730
So we'll write model and Dot ag and then K does dot layers dot bends and then the number of neurons

54
00:05:39,240 --> 00:05:42,320
which expand and activation equate to solve.

55
00:05:42,340 --> 00:05:51,960
Max I hope you remember Waterloo and soft Max are real low is zero for all the negative numbers and

56
00:05:52,140 --> 00:06:02,290
equate to the input for all the positive inputs very soft Max equates the sum of all the class probability

57
00:06:02,410 --> 00:06:02,890
to one

58
00:06:05,890 --> 00:06:13,270
in case if you want any additional hidden layer you can always add additional layer between any of these

59
00:06:13,270 --> 00:06:16,800
layers in the later part of the course.

60
00:06:16,990 --> 00:06:21,910
We will see how to choose the number of neurons in each layer.

61
00:06:24,010 --> 00:06:26,890
Let's run this.

62
00:06:26,920 --> 00:06:34,310
After creating this model structure you can look at it using the somebody method.

63
00:06:34,360 --> 00:06:39,610
So if you write your object name that is model and if you write dot somebody

64
00:06:42,400 --> 00:06:51,120
the model somebody method displays all the more than layers including each layers names its outputs

65
00:06:51,150 --> 00:06:59,220
shape and the number of parameters so these are the layer names.

66
00:06:59,220 --> 00:07:01,770
Second is the output shape.

67
00:07:01,840 --> 00:07:03,980
This the number of output.

68
00:07:04,170 --> 00:07:07,860
And this is the bed size of the input.

69
00:07:08,160 --> 00:07:15,360
Since we are passing all our data and that's why this is none none means no limit on input data.

70
00:07:18,060 --> 00:07:23,800
And next is the number of green parameters.

71
00:07:24,420 --> 00:07:32,220
Since our input have 784 variables and we are passing each of these variables and to 300 different neurons

72
00:07:32,790 --> 00:07:37,000
we have individual words for each of these linkages.

73
00:07:37,050 --> 00:07:45,540
So total number of weights is 784 in two 300 plus there are other 300 biased variables that they're

74
00:07:45,540 --> 00:07:49,110
not associated with each of these neurons.

75
00:07:49,140 --> 00:07:55,210
So 7 they take forward and 2 300 plus 300 will give you this number.

76
00:07:55,350 --> 00:08:02,280
Our neural network is trying to optimize this many parameters for this layer.

77
00:08:02,610 --> 00:08:06,720
Similarly this are the trainable parameters for this layer.

78
00:08:06,720 --> 00:08:09,870
Again this will be 300 and 200.

79
00:08:10,710 --> 00:08:17,880
There are three hundred and two hundred linkages between these two layers and each of that linkage will

80
00:08:17,880 --> 00:08:25,110
have associated weights and each of the neurons in this layer that is hundred neurons have hundred different

81
00:08:25,110 --> 00:08:26,640
biases values.

82
00:08:26,730 --> 00:08:33,150
So three hundred and two hundred less hundred So thirty thousand one hundred three enabled parameters

83
00:08:33,270 --> 00:08:35,380
are associated with this layer.

84
00:08:36,240 --> 00:08:42,730
Similarly 1010 trainable parameters are associated with this layer.

85
00:08:44,130 --> 00:08:50,610
So at the bottom you get the summary of total number of training parameters and this neural network.

86
00:08:50,880 --> 00:08:56,410
So our neural network will try to optimize this many parameters to get the best result.

87
00:08:59,350 --> 00:09:09,510
Now if you want to look at our now if you want to look at our neural network you can do that using BI

88
00:09:09,510 --> 00:09:10,060
dot.

89
00:09:10,450 --> 00:09:18,640
So you have to import by DOT and if by a dot is not installed in your system you can install it using

90
00:09:18,990 --> 00:09:28,540
Pip space installed in space by a dot or Conda is space install the space by DOT in your command prompt.

91
00:09:30,940 --> 00:09:39,250
So if you just write get us not utilities dot plot under Scott Morton and then give your object name

92
00:09:40,780 --> 00:09:47,440
and if you run this you will get the structure of fewer neural network.

93
00:09:47,470 --> 00:09:49,660
So here we have input layer.

94
00:09:49,900 --> 00:09:54,060
Then we have lightning that would be a day and a one day a day.

95
00:09:54,430 --> 00:10:01,570
So that's why we have a flat layer and then we have two dense good then layer and we have an output

96
00:10:01,570 --> 00:10:10,370
layer which is giving us the plus probabilities so after creating the structure of your neural network.

97
00:10:10,370 --> 00:10:13,490
You can also visualize this a structure using this command

98
00:10:16,320 --> 00:10:24,660
as I said earlier our model is trying to optimize rates and biases that are represented by this number

99
00:10:24,930 --> 00:10:27,550
to get output.

100
00:10:28,050 --> 00:10:35,760
And if you remember in theory lecture we have this because the rates are assigned randomly for initialization

101
00:10:38,040 --> 00:10:42,300
to get the information off those weights and biases.

102
00:10:42,330 --> 00:10:49,800
There is a get underscore word method that you can use to get information of those weights and biases

103
00:10:52,450 --> 00:10:59,990
so I can write my object name that is model and then the Layer number for the second layer.

104
00:11:00,040 --> 00:11:09,670
I can write layers and then 1 since the location of object to is 1 and then I can use get free it's

105
00:11:09,670 --> 00:11:12,200
my turn.

106
00:11:12,210 --> 00:11:16,670
I'm sorting this information into two new variables weights and biases.

107
00:11:16,960 --> 00:11:31,600
So if I just output the weights you can see this are the randomly generated weights that are 784 and

108
00:11:31,600 --> 00:11:37,450
do 300 such weights in this layer.

109
00:11:37,630 --> 00:11:49,040
So if you just view the shape you can see that there are 784 rows and 300 columns and the word weights

110
00:11:49,520 --> 00:11:54,760
these All weights are randomly assigned for an installation.

111
00:11:54,770 --> 00:12:02,600
Similarly you can also look at the biases values biases are initialized as Zito.

112
00:12:02,870 --> 00:12:06,670
And if you just check the shape of biases.

113
00:12:07,160 --> 00:12:13,830
This should be 300.

114
00:12:14,370 --> 00:12:20,040
You can see that there are 300 biases in the next video.

115
00:12:20,040 --> 00:12:23,130
We will combine and train our model.

116
00:12:23,760 --> 00:12:24,120
Thank you.