1 00:00:00,480 --> 00:00:05,570 In this lesson we're gonna be setting up our tensor flow graph and our neural network. 2 00:00:05,700 --> 00:00:13,110 And that means creating our tenses as well as setting up the layers in our neural network the first 3 00:00:13,110 --> 00:00:23,460 thing I'll do is I'll credit a markdown cell here that's gonna read set up tensor flow graph now cutting 4 00:00:23,460 --> 00:00:29,910 a tensor in tensor flow is actually pretty straightforward and tensor flow actually will require us 5 00:00:29,910 --> 00:00:35,800 to create some place holder tenses ahead of time and that so that tensor flow can figure out how did 6 00:00:35,810 --> 00:00:42,090 it setup and it can figure out how data should flow within its graph. 7 00:00:42,090 --> 00:00:48,690 Here's the API documentation for the tensor flow placeholder to create a tensor. 8 00:00:48,690 --> 00:00:55,280 We're essentially going to supply two things a data type and a shape scrolling down. 9 00:00:55,320 --> 00:00:59,410 You can see an example of how to use this code here. 10 00:00:59,430 --> 00:01:04,090 A tensor is created and stored inside this variable here. 11 00:01:04,110 --> 00:01:10,140 This tensor is going to contain a floating point number and it's going to be two dimensional. 12 00:01:10,230 --> 00:01:17,940 It's going to be of shape 1024 by 1000 and 24 so let's do something similar. 13 00:01:18,230 --> 00:01:25,040 Let's set up our X and our y r features and our labels as placeholders. 14 00:01:25,040 --> 00:01:26,240 That's the first step. 15 00:01:26,570 --> 00:01:32,280 So capital X shall be equal to T F dot place holder. 16 00:01:32,300 --> 00:01:37,130 We're also going to be working with floating point numbers so we'll use TAF Dot. 17 00:01:37,130 --> 00:01:47,140 Float 32 and then for the shape we'll say shape is equal to and in our case our shape will be as follows. 18 00:01:47,160 --> 00:01:50,050 I'll be none comma. 19 00:01:50,330 --> 00:01:53,780 Seven hundred and eighty four. 20 00:01:53,960 --> 00:01:55,130 Why is that. 21 00:01:55,130 --> 00:02:00,230 It's because we've got seven hundred and eighty four features. 22 00:02:00,230 --> 00:02:01,840 Now what about this other dimension. 23 00:02:01,850 --> 00:02:03,710 Why did I put none here. 24 00:02:03,740 --> 00:02:08,720 Why am I effectively doing him is leaving this first dimension blank. 25 00:02:08,720 --> 00:02:15,920 And the reason is is that because this dimension here will hold on to how many examples are going to 26 00:02:15,920 --> 00:02:16,670 be used. 27 00:02:16,820 --> 00:02:20,620 How many samples will be contained in the tensor. 28 00:02:20,930 --> 00:02:26,990 And this will actually be determined a little later on because when it comes to training our model we're 29 00:02:26,990 --> 00:02:33,800 going to be splitting up our data set into batches and by leaving this dimension blank at the point 30 00:02:33,800 --> 00:02:39,950 in time when we're creating the place holder we can change the sizes of the batches as we see fit. 31 00:02:39,980 --> 00:02:45,370 We can use 1000 samples per batch we can use two thousand five thousand ten thousand. 32 00:02:45,380 --> 00:02:53,620 It doesn't matter what I'm going to do is I'm actually going to change this 784 to a constant so I'm 33 00:02:53,620 --> 00:02:59,920 going to go up to my constants and I'm going to add a few more constants appear to make this a bit more 34 00:02:59,920 --> 00:03:01,490 explicit. 35 00:03:01,630 --> 00:03:08,380 We've discussed before that the image width is 28 pixels in this dataset. 36 00:03:08,440 --> 00:03:14,380 We've also discussed how the image height is 28 pixels in the dataset. 37 00:03:14,410 --> 00:03:17,100 And finally all the images were grayscale. 38 00:03:17,470 --> 00:03:22,540 So we know that the number of channels is equal to one. 39 00:03:22,720 --> 00:03:28,630 And what I've done in these CSP files that I've provided is I've flattened all the images. 40 00:03:28,630 --> 00:03:37,090 So instead of providing you with an array that was shaped 28 28 and 1 I've flattened the structure to 41 00:03:37,090 --> 00:03:48,520 make the total number of inputs equal to the width times the height times the channels. 42 00:03:48,760 --> 00:03:55,540 Now that we've got this constant appear we can come back down and we can replace this thing here by 43 00:03:55,840 --> 00:03:58,870 the total number of inputs. 44 00:03:59,080 --> 00:03:59,630 Brilliant. 45 00:03:59,650 --> 00:04:02,280 So that's the place holder for our features. 46 00:04:02,290 --> 00:04:04,810 What about the place holder for our labels. 47 00:04:04,830 --> 00:04:12,940 We're gonna create this one with Y is equal to T F dot placeholder and for the data type we're also 48 00:04:12,940 --> 00:04:16,170 gonna use float 32. 49 00:04:16,180 --> 00:04:23,950 And now for the shape it's going to be equal to none because this dimension will be determined by the 50 00:04:23,950 --> 00:04:25,410 size of our batch. 51 00:04:25,870 --> 00:04:32,220 And then they'll be 10 y 10 because this is the total number of classes that we have. 52 00:04:32,230 --> 00:04:37,360 This is the total number of categories that we're looking to predict and what I'm actually gonna do 53 00:04:37,570 --> 00:04:45,250 is I'm going to use a constant here as well number an R underscore classes you can see that I've already 54 00:04:45,250 --> 00:04:49,420 created this constant at the top right here. 55 00:04:49,420 --> 00:04:56,310 Let me hit shift enter on the cell and now we can move on to setting up our neural network. 56 00:04:56,410 --> 00:05:04,660 I'll add a little subheading here that's gonna read neural network architecture the very first thing 57 00:05:04,660 --> 00:05:12,220 that we're gonna determine are the so-called hyper parameters and by hyper parameters I mean parameters 58 00:05:12,490 --> 00:05:17,650 that don't come out of training the model a parameter that comes out of training the model would be 59 00:05:17,650 --> 00:05:19,050 something like the weights right. 60 00:05:19,540 --> 00:05:24,190 But the hyper parameters are basically things that we're gonna determine ahead of time. 61 00:05:24,220 --> 00:05:31,300 So these are things like how long to train our model for number of epochs used for training the learning 62 00:05:31,300 --> 00:05:37,660 rate used in our optimizer the number of layers in our neural network the number of nodes per layer 63 00:05:38,080 --> 00:05:40,980 all of these things are hyper parameters. 64 00:05:41,650 --> 00:05:43,070 So let's add a few here. 65 00:05:43,240 --> 00:05:49,480 The first one I'm gonna add is the number of epochs and I want to set that to maybe five. 66 00:05:49,480 --> 00:05:56,480 To start with the second thing I'm going to add is the learning rate learning on a school right. 67 00:05:57,220 --> 00:05:59,740 And I set that equal to a very small number. 68 00:05:59,740 --> 00:06:03,110 Zero point zero zero zero one. 69 00:06:03,430 --> 00:06:07,800 By the way if you wanted to write this in scientific notation you can as well. 70 00:06:08,050 --> 00:06:10,540 So 1 e minus 4. 71 00:06:11,060 --> 00:06:15,970 We'll give it the very same number so I could write that like so as well. 72 00:06:16,210 --> 00:06:21,090 The next thing that I'm going to specify are the number of neurons and the number of layers. 73 00:06:21,190 --> 00:06:28,550 So I tell you what let's have two hidden layers and the first hidden layer will have 512 neurons. 74 00:06:28,600 --> 00:06:37,880 So and underscore hidden one it's gonna be equal to 512 and that second hidden layer and hidden to is 75 00:06:37,900 --> 00:06:39,830 gonna be equal to sixty four. 76 00:06:40,240 --> 00:06:42,100 We're not gonna make it too complicated for now. 77 00:06:42,580 --> 00:06:47,000 So two hidden layers first hidden layer 512 neurons. 78 00:06:47,110 --> 00:06:51,580 Second hidden layer 64 neurons the output layer. 79 00:06:51,580 --> 00:06:53,810 Remember it's only going to have 10. 80 00:06:53,890 --> 00:07:00,330 So let me hit shift enter on the cell and I'm going to delete this one here and add a few more cells. 81 00:07:00,970 --> 00:07:02,200 So what's next. 82 00:07:02,200 --> 00:07:04,570 We've initialized our hyper parameters. 83 00:07:04,600 --> 00:07:11,530 We've given them some starting values and we've created our place holder tenses for our features and 84 00:07:11,530 --> 00:07:12,780 our labels. 85 00:07:13,090 --> 00:07:18,880 The next thing we're gonna do is we're gonna give some starting values to our weights and our biases 86 00:07:18,970 --> 00:07:25,640 for our neural network all the connection weights in the neural network needs some sort of initial value. 87 00:07:26,050 --> 00:07:32,710 And what we're gonna do is we're going to give a small random value as a starting point and we're gonna 88 00:07:32,710 --> 00:07:39,940 do that for every single weight in the network tensor flow actually has a really nice way of generating 89 00:07:39,940 --> 00:07:47,360 these random values for us and the way it does it is to pick these values from a distribution. 90 00:07:47,440 --> 00:07:55,840 So I'm going to store all of them in a variable called initial on a score w one for initial weights 91 00:07:55,900 --> 00:08:04,190 on the first hidden layer and I'm gonna set that equal to T F dot truncated normal. 92 00:08:04,750 --> 00:08:10,720 So it's gonna be a normal distribution but it's gonna be truncated meaning that there's not gonna be 93 00:08:10,720 --> 00:08:16,960 any extreme values that are gonna be generated on either tail of the distribution to the extreme left 94 00:08:16,960 --> 00:08:18,570 or the extreme right. 95 00:08:18,610 --> 00:08:26,590 It's truncated and now this function has to know how many values it needs to generate What's the shape 96 00:08:26,800 --> 00:08:33,730 of this first layer the shape for this layer is gonna be determined by two things the number of inputs 97 00:08:34,330 --> 00:08:36,850 and the number of neurons in the layer. 98 00:08:36,940 --> 00:08:37,800 Right. 99 00:08:37,870 --> 00:08:44,110 So we said that the number of inputs was seven hundred and eighty four and the number of neurons in 100 00:08:44,110 --> 00:08:46,950 this layer was five hundred and twelve. 101 00:08:47,680 --> 00:08:49,600 And we don't actually have to have these numbers here. 102 00:08:49,690 --> 00:08:59,050 We can just write total inputs and an underscore hidden one in place of these numbers. 103 00:08:59,050 --> 00:09:04,630 So now this function knows how many weights to generate for us in total but it doesn't yet know how 104 00:09:04,630 --> 00:09:09,660 to pick out these weights should they be far apart from one another or should we rather close together 105 00:09:10,090 --> 00:09:16,180 for that it needs to know a little bit about the standard deviation and we can end that with this little 106 00:09:16,180 --> 00:09:20,770 argument here and we can set that equal to a small value like zero point one. 107 00:09:21,850 --> 00:09:27,520 And because everything is randomized and you might want to replicate my results we can set the seed 108 00:09:27,520 --> 00:09:30,280 value here equal to 42. 109 00:09:30,490 --> 00:09:35,200 That way you'll be drawing the same random weights every single time. 110 00:09:35,200 --> 00:09:36,450 So those are our weights. 111 00:09:36,460 --> 00:09:38,790 Let me hit shift enter on the cell. 112 00:09:38,860 --> 00:09:43,840 Now you might think that you can actually just take a look at what values we picked out here what values 113 00:09:43,840 --> 00:09:50,310 we generated and you would be disappointed that you can't actually do that just yet. 114 00:09:50,410 --> 00:09:54,600 And the reason is is that tensor flow kind of has this like two stage approach. 115 00:09:54,730 --> 00:10:01,750 The first stage is all setup and it's only the second stage that these calculations are actually all 116 00:10:01,750 --> 00:10:02,970 done. 117 00:10:03,010 --> 00:10:10,210 So as long as we don't actually tell tensor flow to evaluate any of these tenses and to run all the 118 00:10:10,210 --> 00:10:16,820 calculations we don't actually get to see the values that are generated just yet. 119 00:10:17,020 --> 00:10:22,270 This is one of those things to wrap your head around with tensor flow that it's got this two stage approach 120 00:10:22,390 --> 00:10:26,390 and we're gonna be talking about that a whole lot more in a bit. 121 00:10:26,440 --> 00:10:33,580 So back to the setup we've created our random little values him as the initial values for the weights 122 00:10:34,120 --> 00:10:40,750 but the thing that we should actually do with these values is to create something called a variable 123 00:10:40,800 --> 00:10:50,160 a tensor flow variable and that variable will hold onto all the weights in that first hidden layer. 124 00:10:50,170 --> 00:10:59,110 So when I call it w 1 and I want to set that equal to T F dot variable with a capital V and open parentheses 125 00:11:00,100 --> 00:11:10,050 initial value is gonna be equal to our initial underscore w one in this line of code we're actually 126 00:11:10,050 --> 00:11:16,110 creating the weights the weights remember are more than just the initial values they have to persist 127 00:11:16,110 --> 00:11:20,030 and be updated as all the calculations intense floor are going to be run. 128 00:11:20,190 --> 00:11:24,500 This is why we create them as a variable. 129 00:11:24,540 --> 00:11:31,320 Now let's tackle the biases remember how we said that changing the weights is like a stretching or compressing 130 00:11:31,320 --> 00:11:33,440 the activation function in a neuron. 131 00:11:33,840 --> 00:11:40,740 The bias will work together with the weight and the bias is what shifts the activation function from 132 00:11:40,890 --> 00:11:42,090 left to right. 133 00:11:42,090 --> 00:11:45,040 By adding or subtracting a number. 134 00:11:45,150 --> 00:11:50,460 So what we're gonna do next is we're going to initialize the biases and the way we're going to do this 135 00:11:50,580 --> 00:11:58,230 is we'll get all the initial biases for that first hidden layer with initial underscore B 1 and we're 136 00:11:58,230 --> 00:12:07,560 gonna set that equal to not TFR truncated normal but f dot constant all of our biases are going to start 137 00:12:07,560 --> 00:12:09,710 out with the same value. 138 00:12:09,870 --> 00:12:14,970 What value is that going to be that value is just going to be equal to zero. 139 00:12:15,030 --> 00:12:20,580 The only other argument we have to supply to this function is how many initial values we need. 140 00:12:20,970 --> 00:12:30,150 So in this case we once again need to supply a shape and that's gonna be equal to 512 or n underscore 141 00:12:30,480 --> 00:12:36,180 hidden one to create our biases then we're going to do it very similar to how we did it with the weights 142 00:12:36,950 --> 00:12:44,340 we're going to use tensor flow and we're going to create a tensor flow parable and this variable just 143 00:12:44,340 --> 00:12:48,620 has to know what the initial value is of the biases. 144 00:12:48,660 --> 00:12:56,170 So initial underscore value is equal to initial underscore b 1. 145 00:12:56,250 --> 00:13:02,940 So all of this is still part of the setup just for the first layer and the biases and the weights for 146 00:13:02,940 --> 00:13:06,850 this layer will be updated during the training process. 147 00:13:06,870 --> 00:13:09,920 This is where the network does its learning. 148 00:13:10,020 --> 00:13:16,770 These are the values that will feed into the activation functions of the neurons and represent the strength 149 00:13:16,980 --> 00:13:20,540 of the connections between the different units. 150 00:13:20,550 --> 00:13:26,460 Now what we need to do is think back to that slide where we had that formula. 151 00:13:26,460 --> 00:13:33,720 We need to determine how the weights and the biases work together to determine the inputs into this 152 00:13:33,720 --> 00:13:35,390 hidden layer. 153 00:13:35,400 --> 00:13:38,980 We discussed this in the previous lesson on this slide right here. 154 00:13:39,660 --> 00:13:45,840 So we know that the first step is we have to multiply whatever comes into this Greenlee right here by 155 00:13:45,840 --> 00:13:46,860 the weights. 156 00:13:46,920 --> 00:13:54,030 This is the first step I'm going to store the inputs that come into our hidden layer in a variable called 157 00:13:54,150 --> 00:14:02,010 Layer 1 underscore in and I'm going to set that equal to the result of this multiplication and the way 158 00:14:02,010 --> 00:14:09,810 that we can multiply our tenses together is with this function right here from tensor flow this function 159 00:14:09,810 --> 00:14:16,320 gets used so often that there is even a nice short little alias that we can use to use this function 160 00:14:16,460 --> 00:14:25,380 need to supply two things we'll supply in a and a B and then this function will multiply matrix A by 161 00:14:25,380 --> 00:14:35,360 matrix B producing a times b no surprises that so let's call this function in our notebook TMF Dot. 162 00:14:35,450 --> 00:14:42,870 Matt Mol and now we have to figure out what is it that we're multiplying given that this is our first 163 00:14:42,870 --> 00:14:43,720 hidden layer. 164 00:14:43,830 --> 00:14:47,100 We're going to multiply the place holder for input features. 165 00:14:47,100 --> 00:14:50,890 This is going to be the tensor that we created by our weights. 166 00:14:50,910 --> 00:14:57,000 So w 1 so X in this case represents our raw inputs. 167 00:14:57,000 --> 00:14:58,720 We're still on the very first layer. 168 00:14:58,740 --> 00:15:01,970 So these are our raw inputs of our feature vector. 169 00:15:02,730 --> 00:15:10,320 But remember what actually reaches the neurons is gonna be the inputs multiplied by the weights plus 170 00:15:10,350 --> 00:15:11,310 the bias. 171 00:15:11,310 --> 00:15:20,250 So plus b 1 the result of this calculation is what actually feeds into the activation function and that's 172 00:15:20,250 --> 00:15:22,160 the next step. 173 00:15:22,180 --> 00:15:26,370 Now let's complete the whole puzzle by working out the output from our hidden layer. 174 00:15:27,210 --> 00:15:36,420 And I'm going to call this layer 1 out and that will be equal to the output of the activation function 175 00:15:36,720 --> 00:15:40,560 from all the neurons in the previous module. 176 00:15:40,620 --> 00:15:46,070 We've used these really activation function with Caris so let's stick with reload. 177 00:15:46,170 --> 00:15:53,160 This time around as well looking at the documentation it really only requires one input namely the features 178 00:15:53,670 --> 00:15:55,370 what are the features. 179 00:15:55,380 --> 00:15:58,180 Well it'll be the inputs to the layer. 180 00:15:58,230 --> 00:15:58,930 Right. 181 00:15:58,980 --> 00:16:10,620 So TAF Dot and Dot really do layer 1 underscore in will now calculate the output from our first hidden 182 00:16:10,620 --> 00:16:14,460 layer so we've done a lot of work just now. 183 00:16:14,460 --> 00:16:17,290 We've initialized the weights for the first time in lab. 184 00:16:17,340 --> 00:16:20,460 We finish lines the biases for the first in lamb. 185 00:16:20,460 --> 00:16:25,860 We figured out what the features were that were going into the first hidden layer. 186 00:16:26,160 --> 00:16:32,610 And this was gonna be the weighted average of the inputs plus the bias and the output of this layer 187 00:16:32,960 --> 00:16:39,580 is going to be the result of the calculations that happen inside the activation function. 188 00:16:39,610 --> 00:16:41,340 Now I've got a challenge for you. 189 00:16:41,640 --> 00:16:48,600 I'd like you to complete the code to set up the second hidden layer. 190 00:16:48,600 --> 00:16:56,070 Remember this layer has 64 neurons and that layer will depend on the output of the first hidden layer 191 00:16:57,120 --> 00:17:01,460 and I'd also like you to setup the output layer. 192 00:17:01,740 --> 00:17:10,140 And here the trick is that the activation function will be the soft max function this challenge is gonna 193 00:17:10,160 --> 00:17:11,410 be a little bit more tricky. 194 00:17:11,450 --> 00:17:20,060 You have to get a couple of things right to solve it but pause the video and give this a go. 195 00:17:20,200 --> 00:17:21,460 Ready. 196 00:17:21,460 --> 00:17:24,060 Here's the solution. 197 00:17:24,220 --> 00:17:29,530 The first thing that we need to do for that second hidden layer is to initialize the weights and the 198 00:17:29,530 --> 00:17:38,560 biases so copy this line and I'll copy this line and I'll just pasted in the cell below. 199 00:17:39,130 --> 00:17:46,360 And what I wanna do is I want to change my variable names so these are not gonna be the same weights 200 00:17:46,420 --> 00:17:47,550 or biases. 201 00:17:47,680 --> 00:17:51,940 As for the first hidden layer and I won't have to do this everywhere right. 202 00:17:52,180 --> 00:17:55,300 So I've got my initial weights here. 203 00:17:55,390 --> 00:17:57,940 I've got my initial weights here. 204 00:17:58,180 --> 00:18:05,890 I've got a separate variable W2 the values for my initial bias is here and the variable for my initial 205 00:18:05,890 --> 00:18:08,090 bias is here. 206 00:18:08,320 --> 00:18:15,760 Now the first trick is actually setting up the weights and the biases for the second layer correctly 207 00:18:16,270 --> 00:18:21,490 because the thing that's different between the first layer and the second layer is the shape. 208 00:18:21,490 --> 00:18:22,330 Right. 209 00:18:22,450 --> 00:18:29,350 So the number of inputs that these second hidden layer gets is not equal to seven hundred and eighty 210 00:18:29,380 --> 00:18:30,070 four. 211 00:18:30,940 --> 00:18:35,320 It's actually equal to the number of neurons in the first hidden layer. 212 00:18:35,380 --> 00:18:36,280 Right. 213 00:18:36,280 --> 00:18:43,190 Instead of 780 for him we'll have an underscore hidden one. 214 00:18:43,900 --> 00:18:51,730 The total number of neurons in the second hidden layer is also not an underscore hidden one but an underscore 215 00:18:51,810 --> 00:18:58,100 hidden to if this is the shape that you've provided for those initial weight values. 216 00:18:58,190 --> 00:18:59,630 Well done. 217 00:18:59,810 --> 00:19:02,510 Now what about the biases in this case. 218 00:19:02,630 --> 00:19:12,190 We need 64 neurons whether we've got a bias so the shape here should be an underscore hidden to the 219 00:19:12,340 --> 00:19:14,050 inputs for the second layer. 220 00:19:14,140 --> 00:19:15,730 Well then use these values. 221 00:19:15,730 --> 00:19:16,450 Right. 222 00:19:16,510 --> 00:19:27,600 So layer two in will be equal to the multiplication between x and w one No because that second hidden 223 00:19:27,600 --> 00:19:29,520 layer follows the first hidden layer. 224 00:19:30,090 --> 00:19:33,150 It'll be the output of that first hidden layer. 225 00:19:33,150 --> 00:19:33,870 Right. 226 00:19:33,870 --> 00:19:40,470 It'll be Layer 1 out and then it won't be w 1 but it'll be w 2. 227 00:19:40,680 --> 00:19:44,790 And of course also be the biases for that second hidden layer. 228 00:19:45,510 --> 00:19:51,240 So in this case it'll be the output of the first hidden layer multiplied by the weights of the second 229 00:19:51,240 --> 00:19:58,890 hidden layer plus the biases of the second hidden layer that will form the inputs into that second hidden 230 00:19:58,890 --> 00:20:00,000 layer. 231 00:20:00,010 --> 00:20:01,930 Now what about the output. 232 00:20:01,980 --> 00:20:05,380 Well in this case I'll be quite similar right. 233 00:20:05,730 --> 00:20:12,720 We'll have Layer 2 out and there'll be also redo we're gonna stick with the same activation function 234 00:20:13,290 --> 00:20:16,960 and it'll just be Layer 2 on a score. 235 00:20:18,060 --> 00:20:21,410 So now we've got the output for that second hidden layer. 236 00:20:22,980 --> 00:20:24,630 What about that final layer though. 237 00:20:24,690 --> 00:20:26,450 What about the output layer. 238 00:20:26,490 --> 00:20:32,390 Well in this case we have to update everything for the output layer starting at the top. 239 00:20:32,430 --> 00:20:34,040 We've got to change the weights right. 240 00:20:34,530 --> 00:20:41,770 So I to go with W3 here and W3 here and here. 241 00:20:41,790 --> 00:20:45,730 The shape in this case will be 64 and 10. 242 00:20:45,750 --> 00:20:46,420 Right. 243 00:20:46,650 --> 00:20:55,830 We've got 64 neurons in that second hidden layer so an underscore hidden 2 and we've got 10 neurons 244 00:20:56,130 --> 00:21:05,540 in that output layer or the number of classes for the weights we're also have a shape determined by 245 00:21:05,540 --> 00:21:07,160 the number of outputs. 246 00:21:07,240 --> 00:21:12,720 Our number of categories that we want to predict I'll update the names of course as well. 247 00:21:12,830 --> 00:21:16,970 Initial underscore B three and initial on his work b 3. 248 00:21:18,120 --> 00:21:24,370 Ok so now we've got the weights and the biases for that output layer. 249 00:21:24,500 --> 00:21:27,520 So what's coming into the output layer. 250 00:21:27,530 --> 00:21:34,500 Well it's kind of B layer three underscore in is equal to the multiplication of. 251 00:21:34,550 --> 00:21:37,570 Well in this case it's gonna be the output of layer number two right. 252 00:21:37,600 --> 00:21:44,330 Number two is gonna feed into Layer number three and we're gonna multiply it by the weights that connect 253 00:21:44,330 --> 00:21:48,950 those two layers and of course we're gonna add the biases that are specific to a layer number three 254 00:21:49,190 --> 00:21:57,000 the output layer finally our output as a whole is kind of equal to whatever comes out of the activation 255 00:21:57,000 --> 00:21:59,400 function in our output layer. 256 00:21:59,940 --> 00:22:06,110 But in this case it's not gonna be really you it's gonna be soft Max. 257 00:22:06,300 --> 00:22:12,500 So that way we get a nice probability that is associated with each of the outputs. 258 00:22:12,720 --> 00:22:19,050 If you weren't sure how to find the soft max function from tensor flow then if you google for it you 259 00:22:19,050 --> 00:22:25,290 actually get directed to this page right here which is the API reference for tensor flow and there you 260 00:22:25,290 --> 00:22:33,270 see that the soft max function requires also only really a single input which here have a name called 261 00:22:33,330 --> 00:22:34,880 Low gits. 262 00:22:35,220 --> 00:22:41,100 But effectively what this function needs to calculate the probabilities is of course the weighted inputs 263 00:22:41,190 --> 00:22:42,240 for the output layer. 264 00:22:43,340 --> 00:22:45,560 So I hope that didn't throw you off. 265 00:22:45,650 --> 00:22:50,510 This was definitely one of the hotter in video challenges that I've asked you to complete because it's 266 00:22:50,510 --> 00:22:56,690 really built on understanding the code that we've written a minute ago plus understanding the concepts 267 00:22:56,870 --> 00:23:02,360 that we've kind of discussed over the last two modules but don't worry if you didn't quite get this 268 00:23:02,360 --> 00:23:08,360 right there's still plenty of opportunities throughout this module to solidify your understanding and 269 00:23:08,360 --> 00:23:11,150 see how this works it'll become a lot more clear. 270 00:23:11,450 --> 00:23:18,470 Once we actually run our code trainer model and you get to see these things in action in the next lesson 271 00:23:18,710 --> 00:23:20,510 we're gonna continue with our setup. 272 00:23:20,510 --> 00:23:22,910 We're going to hammer out three important points. 273 00:23:22,910 --> 00:23:28,460 They need the loss function that we want to use the optimizations that we want tensor flow to do and 274 00:23:28,460 --> 00:23:33,820 the metrics like the accuracy that we want tensor flow to calculate along the way. 275 00:23:34,070 --> 00:23:38,070 And only after we've done all that we can actually train our model. 276 00:23:38,240 --> 00:23:39,750 I'll see you in the next lesson. 277 00:23:39,770 --> 00:23:40,300 Take care.