1 00:00:05,480 --> 00:00:12,170 Now let's create a structure of forward first artificial neural network model. 2 00:00:12,230 --> 00:00:22,610 Before starting let's just say random seed 242 using these clues statements random seed is used to replicate 3 00:00:22,730 --> 00:00:25,130 the same dessert every time. 4 00:00:25,400 --> 00:00:32,600 You can use any of them but instead of 42 if you use that number in future when you are running the 5 00:00:32,600 --> 00:00:38,270 same code you will get the same output as we have discussed in the theory. 6 00:00:38,330 --> 00:00:45,680 There are multiple occasions where a what neural network generates random number such as assigning the 7 00:00:45,680 --> 00:00:55,310 initial weights using random seed will help you to reproduce the same result using the same initial 8 00:00:55,310 --> 00:01:01,300 weights every time they're tested on this. 9 00:01:01,940 --> 00:01:11,630 So for the word problem we have observations in the form of 28 and 220 pixels observations are in the 10 00:01:11,630 --> 00:01:19,430 form of to be array and as an output we want 10 categories. 11 00:01:19,430 --> 00:01:22,970 These categories are exclusive. 12 00:01:22,970 --> 00:01:32,960 That means a single limit can either be a t shirt ordered top or a boat etc.. 13 00:01:33,460 --> 00:01:37,180 This is what we are planning to do. 14 00:01:37,450 --> 00:01:43,040 We are first converting over to be observations and book flight. 15 00:01:43,240 --> 00:01:45,240 1 these observations. 16 00:01:45,640 --> 00:01:54,780 So insert over to be a day of 2010 228 pixel we want 784 pixel in our input layer. 17 00:01:55,390 --> 00:02:03,820 Then we are going to create two hidden layers the activation function which we are going to use for 18 00:02:03,820 --> 00:02:07,220 hidden layers will be real. 19 00:02:08,200 --> 00:02:17,170 As discussed in the theory lecture we always prefer red loop for classification models and in the output. 20 00:02:17,230 --> 00:02:25,270 Since this 10 categories are exclusive and and this is a classification model we will be using soft 21 00:02:25,390 --> 00:02:28,620 Max activation. 22 00:02:28,720 --> 00:02:32,940 We have already discussed this activation types in our theory reelected. 23 00:02:33,100 --> 00:02:37,850 That's why we are not going to discuss it here. 24 00:02:38,020 --> 00:02:46,240 Now let's start creating this neural network using sequential a piece of data. 25 00:02:46,240 --> 00:02:49,190 First we will need to create an model object. 26 00:02:50,110 --> 00:02:58,510 So our object variable name is model and we are just building it using this function get US DOT models 27 00:02:58,540 --> 00:03:03,590 dot sequential in this sequential object. 28 00:03:03,590 --> 00:03:05,820 We can add different layers. 29 00:03:05,900 --> 00:03:13,880 We will start with our input layer will move on to the hidden layer 1 then to heard on Layer 2 and then 30 00:03:14,120 --> 00:03:15,200 to the output layer. 31 00:03:17,270 --> 00:03:27,080 So first for the input layer we can write like this model dot ag and then guide us dot layers and then 32 00:03:27,380 --> 00:03:35,140 since we want to convert this to be a real friend data into 28 pixels to say 184 pixel in a single array 33 00:03:35,660 --> 00:03:36,980 we are using then 34 00:03:39,680 --> 00:03:46,370 and then we need to provide the input shape of over X variables. 35 00:03:46,370 --> 00:03:52,540 Since about X variable is a beauty array of 28 and Goodwin dead pixel we are using input shape equal 36 00:03:52,550 --> 00:03:58,550 to then we are providing a list of two variables that is defined here called 128 37 00:04:01,820 --> 00:04:04,290 then over the second layer is the hidden layer. 38 00:04:06,330 --> 00:04:14,790 So in the next step we are adding another layer that is model dot and then get us dot layer. 39 00:04:14,850 --> 00:04:18,380 And since this is a dance layer will write dense. 40 00:04:18,990 --> 00:04:25,770 And here we need to mention the number of neurons we want in this layer. 41 00:04:26,100 --> 00:04:30,190 So and then layer one we need 300 neurons. 42 00:04:30,200 --> 00:04:32,640 That is why we are writing 300. 43 00:04:32,970 --> 00:04:36,600 And then we want real to activation function. 44 00:04:36,630 --> 00:04:43,040 That's why we are writing activation equally to very low in the next step. 45 00:04:43,050 --> 00:04:45,080 We want another hidden layer. 46 00:04:45,750 --> 00:04:53,790 So we are following the same process that is we are writing more than not Ed and in record we are writing 47 00:04:53,790 --> 00:05:00,150 gave us dot layer dot bands and in this layer we 100 neurons. 48 00:05:00,300 --> 00:05:07,180 That's why we are writing hundred and then activation equal to the loop since this is also done layer. 49 00:05:07,380 --> 00:05:13,190 We want activation function to be done in the next output layer. 50 00:05:13,320 --> 00:05:15,610 We want 10 different categories. 51 00:05:15,630 --> 00:05:25,050 That's why we have to add 10 neurons and to this layer and since the classes are exclusive. 52 00:05:25,050 --> 00:05:30,540 That's why we have to use soft Max activation. 53 00:05:30,540 --> 00:05:38,730 So we'll write model and Dot ag and then K does dot layers dot bends and then the number of neurons 54 00:05:39,240 --> 00:05:42,320 which expand and activation equate to solve. 55 00:05:42,340 --> 00:05:51,960 Max I hope you remember Waterloo and soft Max are real low is zero for all the negative numbers and 56 00:05:52,140 --> 00:06:02,290 equate to the input for all the positive inputs very soft Max equates the sum of all the class probability 57 00:06:02,410 --> 00:06:02,890 to one 58 00:06:05,890 --> 00:06:13,270 in case if you want any additional hidden layer you can always add additional layer between any of these 59 00:06:13,270 --> 00:06:16,800 layers in the later part of the course. 60 00:06:16,990 --> 00:06:21,910 We will see how to choose the number of neurons in each layer. 61 00:06:24,010 --> 00:06:26,890 Let's run this. 62 00:06:26,920 --> 00:06:34,310 After creating this model structure you can look at it using the somebody method. 63 00:06:34,360 --> 00:06:39,610 So if you write your object name that is model and if you write dot somebody 64 00:06:42,400 --> 00:06:51,120 the model somebody method displays all the more than layers including each layers names its outputs 65 00:06:51,150 --> 00:06:59,220 shape and the number of parameters so these are the layer names. 66 00:06:59,220 --> 00:07:01,770 Second is the output shape. 67 00:07:01,840 --> 00:07:03,980 This the number of output. 68 00:07:04,170 --> 00:07:07,860 And this is the bed size of the input. 69 00:07:08,160 --> 00:07:15,360 Since we are passing all our data and that's why this is none none means no limit on input data. 70 00:07:18,060 --> 00:07:23,800 And next is the number of green parameters. 71 00:07:24,420 --> 00:07:32,220 Since our input have 784 variables and we are passing each of these variables and to 300 different neurons 72 00:07:32,790 --> 00:07:37,000 we have individual words for each of these linkages. 73 00:07:37,050 --> 00:07:45,540 So total number of weights is 784 in two 300 plus there are other 300 biased variables that they're 74 00:07:45,540 --> 00:07:49,110 not associated with each of these neurons. 75 00:07:49,140 --> 00:07:55,210 So 7 they take forward and 2 300 plus 300 will give you this number. 76 00:07:55,350 --> 00:08:02,280 Our neural network is trying to optimize this many parameters for this layer. 77 00:08:02,610 --> 00:08:06,720 Similarly this are the trainable parameters for this layer. 78 00:08:06,720 --> 00:08:09,870 Again this will be 300 and 200. 79 00:08:10,710 --> 00:08:17,880 There are three hundred and two hundred linkages between these two layers and each of that linkage will 80 00:08:17,880 --> 00:08:25,110 have associated weights and each of the neurons in this layer that is hundred neurons have hundred different 81 00:08:25,110 --> 00:08:26,640 biases values. 82 00:08:26,730 --> 00:08:33,150 So three hundred and two hundred less hundred So thirty thousand one hundred three enabled parameters 83 00:08:33,270 --> 00:08:35,380 are associated with this layer. 84 00:08:36,240 --> 00:08:42,730 Similarly 1010 trainable parameters are associated with this layer. 85 00:08:44,130 --> 00:08:50,610 So at the bottom you get the summary of total number of training parameters and this neural network. 86 00:08:50,880 --> 00:08:56,410 So our neural network will try to optimize this many parameters to get the best result. 87 00:08:59,350 --> 00:09:09,510 Now if you want to look at our now if you want to look at our neural network you can do that using BI 88 00:09:09,510 --> 00:09:10,060 dot. 89 00:09:10,450 --> 00:09:18,640 So you have to import by DOT and if by a dot is not installed in your system you can install it using 90 00:09:18,990 --> 00:09:28,540 Pip space installed in space by a dot or Conda is space install the space by DOT in your command prompt. 91 00:09:30,940 --> 00:09:39,250 So if you just write get us not utilities dot plot under Scott Morton and then give your object name 92 00:09:40,780 --> 00:09:47,440 and if you run this you will get the structure of fewer neural network. 93 00:09:47,470 --> 00:09:49,660 So here we have input layer. 94 00:09:49,900 --> 00:09:54,060 Then we have lightning that would be a day and a one day a day. 95 00:09:54,430 --> 00:10:01,570 So that's why we have a flat layer and then we have two dense good then layer and we have an output 96 00:10:01,570 --> 00:10:10,370 layer which is giving us the plus probabilities so after creating the structure of your neural network. 97 00:10:10,370 --> 00:10:13,490 You can also visualize this a structure using this command 98 00:10:16,320 --> 00:10:24,660 as I said earlier our model is trying to optimize rates and biases that are represented by this number 99 00:10:24,930 --> 00:10:27,550 to get output. 100 00:10:28,050 --> 00:10:35,760 And if you remember in theory lecture we have this because the rates are assigned randomly for initialization 101 00:10:38,040 --> 00:10:42,300 to get the information off those weights and biases. 102 00:10:42,330 --> 00:10:49,800 There is a get underscore word method that you can use to get information of those weights and biases 103 00:10:52,450 --> 00:10:59,990 so I can write my object name that is model and then the Layer number for the second layer. 104 00:11:00,040 --> 00:11:09,670 I can write layers and then 1 since the location of object to is 1 and then I can use get free it's 105 00:11:09,670 --> 00:11:12,200 my turn. 106 00:11:12,210 --> 00:11:16,670 I'm sorting this information into two new variables weights and biases. 107 00:11:16,960 --> 00:11:31,600 So if I just output the weights you can see this are the randomly generated weights that are 784 and 108 00:11:31,600 --> 00:11:37,450 do 300 such weights in this layer. 109 00:11:37,630 --> 00:11:49,040 So if you just view the shape you can see that there are 784 rows and 300 columns and the word weights 110 00:11:49,520 --> 00:11:54,760 these All weights are randomly assigned for an installation. 111 00:11:54,770 --> 00:12:02,600 Similarly you can also look at the biases values biases are initialized as Zito. 112 00:12:02,870 --> 00:12:06,670 And if you just check the shape of biases. 113 00:12:07,160 --> 00:12:13,830 This should be 300. 114 00:12:14,370 --> 00:12:20,040 You can see that there are 300 biases in the next video. 115 00:12:20,040 --> 00:12:23,130 We will combine and train our model. 116 00:12:23,760 --> 00:12:24,120 Thank you.