1
00:00:00,480 --> 00:00:05,570
In this lesson we're gonna be setting up our tensor flow graph and our neural network.

2
00:00:05,700 --> 00:00:13,110
And that means creating our tenses as well as setting up the layers in our neural network the first

3
00:00:13,110 --> 00:00:23,460
thing I'll do is I'll credit a markdown cell here that's gonna read set up tensor flow graph now cutting

4
00:00:23,460 --> 00:00:29,910
a tensor in tensor flow is actually pretty straightforward and tensor flow actually will require us

5
00:00:29,910 --> 00:00:35,800
to create some place holder tenses ahead of time and that so that tensor flow can figure out how did

6
00:00:35,810 --> 00:00:42,090
it setup and it can figure out how data should flow within its graph.

7
00:00:42,090 --> 00:00:48,690
Here's the API documentation for the tensor flow placeholder to create a tensor.

8
00:00:48,690 --> 00:00:55,280
We're essentially going to supply two things a data type and a shape scrolling down.

9
00:00:55,320 --> 00:00:59,410
You can see an example of how to use this code here.

10
00:00:59,430 --> 00:01:04,090
A tensor is created and stored inside this variable here.

11
00:01:04,110 --> 00:01:10,140
This tensor is going to contain a floating point number and it's going to be two dimensional.

12
00:01:10,230 --> 00:01:17,940
It's going to be of shape 1024 by 1000 and 24 so let's do something similar.

13
00:01:18,230 --> 00:01:25,040
Let's set up our X and our y r features and our labels as placeholders.

14
00:01:25,040 --> 00:01:26,240
That's the first step.

15
00:01:26,570 --> 00:01:32,280
So capital X shall be equal to T F dot place holder.

16
00:01:32,300 --> 00:01:37,130
We're also going to be working with floating point numbers so we'll use TAF Dot.

17
00:01:37,130 --> 00:01:47,140
Float 32 and then for the shape we'll say shape is equal to and in our case our shape will be as follows.

18
00:01:47,160 --> 00:01:50,050
I'll be none comma.

19
00:01:50,330 --> 00:01:53,780
Seven hundred and eighty four.

20
00:01:53,960 --> 00:01:55,130
Why is that.

21
00:01:55,130 --> 00:02:00,230
It's because we've got seven hundred and eighty four features.

22
00:02:00,230 --> 00:02:01,840
Now what about this other dimension.

23
00:02:01,850 --> 00:02:03,710
Why did I put none here.

24
00:02:03,740 --> 00:02:08,720
Why am I effectively doing him is leaving this first dimension blank.

25
00:02:08,720 --> 00:02:15,920
And the reason is is that because this dimension here will hold on to how many examples are going to

26
00:02:15,920 --> 00:02:16,670
be used.

27
00:02:16,820 --> 00:02:20,620
How many samples will be contained in the tensor.

28
00:02:20,930 --> 00:02:26,990
And this will actually be determined a little later on because when it comes to training our model we're

29
00:02:26,990 --> 00:02:33,800
going to be splitting up our data set into batches and by leaving this dimension blank at the point

30
00:02:33,800 --> 00:02:39,950
in time when we're creating the place holder we can change the sizes of the batches as we see fit.

31
00:02:39,980 --> 00:02:45,370
We can use 1000 samples per batch we can use two thousand five thousand ten thousand.

32
00:02:45,380 --> 00:02:53,620
It doesn't matter what I'm going to do is I'm actually going to change this 784 to a constant so I'm

33
00:02:53,620 --> 00:02:59,920
going to go up to my constants and I'm going to add a few more constants appear to make this a bit more

34
00:02:59,920 --> 00:03:01,490
explicit.

35
00:03:01,630 --> 00:03:08,380
We've discussed before that the image width is 28 pixels in this dataset.

36
00:03:08,440 --> 00:03:14,380
We've also discussed how the image height is 28 pixels in the dataset.

37
00:03:14,410 --> 00:03:17,100
And finally all the images were grayscale.

38
00:03:17,470 --> 00:03:22,540
So we know that the number of channels is equal to one.

39
00:03:22,720 --> 00:03:28,630
And what I've done in these CSP files that I've provided is I've flattened all the images.

40
00:03:28,630 --> 00:03:37,090
So instead of providing you with an array that was shaped 28 28 and 1 I've flattened the structure to

41
00:03:37,090 --> 00:03:48,520
make the total number of inputs equal to the width times the height times the channels.

42
00:03:48,760 --> 00:03:55,540
Now that we've got this constant appear we can come back down and we can replace this thing here by

43
00:03:55,840 --> 00:03:58,870
the total number of inputs.

44
00:03:59,080 --> 00:03:59,630
Brilliant.

45
00:03:59,650 --> 00:04:02,280
So that's the place holder for our features.

46
00:04:02,290 --> 00:04:04,810
What about the place holder for our labels.

47
00:04:04,830 --> 00:04:12,940
We're gonna create this one with Y is equal to T F dot placeholder and for the data type we're also

48
00:04:12,940 --> 00:04:16,170
gonna use float 32.

49
00:04:16,180 --> 00:04:23,950
And now for the shape it's going to be equal to none because this dimension will be determined by the

50
00:04:23,950 --> 00:04:25,410
size of our batch.

51
00:04:25,870 --> 00:04:32,220
And then they'll be 10 y 10 because this is the total number of classes that we have.

52
00:04:32,230 --> 00:04:37,360
This is the total number of categories that we're looking to predict and what I'm actually gonna do

53
00:04:37,570 --> 00:04:45,250
is I'm going to use a constant here as well number an R underscore classes you can see that I've already

54
00:04:45,250 --> 00:04:49,420
created this constant at the top right here.

55
00:04:49,420 --> 00:04:56,310
Let me hit shift enter on the cell and now we can move on to setting up our neural network.

56
00:04:56,410 --> 00:05:04,660
I'll add a little subheading here that's gonna read neural network architecture the very first thing

57
00:05:04,660 --> 00:05:12,220
that we're gonna determine are the so-called hyper parameters and by hyper parameters I mean parameters

58
00:05:12,490 --> 00:05:17,650
that don't come out of training the model a parameter that comes out of training the model would be

59
00:05:17,650 --> 00:05:19,050
something like the weights right.

60
00:05:19,540 --> 00:05:24,190
But the hyper parameters are basically things that we're gonna determine ahead of time.

61
00:05:24,220 --> 00:05:31,300
So these are things like how long to train our model for number of epochs used for training the learning

62
00:05:31,300 --> 00:05:37,660
rate used in our optimizer the number of layers in our neural network the number of nodes per layer

63
00:05:38,080 --> 00:05:40,980
all of these things are hyper parameters.

64
00:05:41,650 --> 00:05:43,070
So let's add a few here.

65
00:05:43,240 --> 00:05:49,480
The first one I'm gonna add is the number of epochs and I want to set that to maybe five.

66
00:05:49,480 --> 00:05:56,480
To start with the second thing I'm going to add is the learning rate learning on a school right.

67
00:05:57,220 --> 00:05:59,740
And I set that equal to a very small number.

68
00:05:59,740 --> 00:06:03,110
Zero point zero zero zero one.

69
00:06:03,430 --> 00:06:07,800
By the way if you wanted to write this in scientific notation you can as well.

70
00:06:08,050 --> 00:06:10,540
So 1 e minus 4.

71
00:06:11,060 --> 00:06:15,970
We'll give it the very same number so I could write that like so as well.

72
00:06:16,210 --> 00:06:21,090
The next thing that I'm going to specify are the number of neurons and the number of layers.

73
00:06:21,190 --> 00:06:28,550
So I tell you what let's have two hidden layers and the first hidden layer will have 512 neurons.

74
00:06:28,600 --> 00:06:37,880
So and underscore hidden one it's gonna be equal to 512 and that second hidden layer and hidden to is

75
00:06:37,900 --> 00:06:39,830
gonna be equal to sixty four.

76
00:06:40,240 --> 00:06:42,100
We're not gonna make it too complicated for now.

77
00:06:42,580 --> 00:06:47,000
So two hidden layers first hidden layer 512 neurons.

78
00:06:47,110 --> 00:06:51,580
Second hidden layer 64 neurons the output layer.

79
00:06:51,580 --> 00:06:53,810
Remember it's only going to have 10.

80
00:06:53,890 --> 00:07:00,330
So let me hit shift enter on the cell and I'm going to delete this one here and add a few more cells.

81
00:07:00,970 --> 00:07:02,200
So what's next.

82
00:07:02,200 --> 00:07:04,570
We've initialized our hyper parameters.

83
00:07:04,600 --> 00:07:11,530
We've given them some starting values and we've created our place holder tenses for our features and

84
00:07:11,530 --> 00:07:12,780
our labels.

85
00:07:13,090 --> 00:07:18,880
The next thing we're gonna do is we're gonna give some starting values to our weights and our biases

86
00:07:18,970 --> 00:07:25,640
for our neural network all the connection weights in the neural network needs some sort of initial value.

87
00:07:26,050 --> 00:07:32,710
And what we're gonna do is we're going to give a small random value as a starting point and we're gonna

88
00:07:32,710 --> 00:07:39,940
do that for every single weight in the network tensor flow actually has a really nice way of generating

89
00:07:39,940 --> 00:07:47,360
these random values for us and the way it does it is to pick these values from a distribution.

90
00:07:47,440 --> 00:07:55,840
So I'm going to store all of them in a variable called initial on a score w one for initial weights

91
00:07:55,900 --> 00:08:04,190
on the first hidden layer and I'm gonna set that equal to T F dot truncated normal.

92
00:08:04,750 --> 00:08:10,720
So it's gonna be a normal distribution but it's gonna be truncated meaning that there's not gonna be

93
00:08:10,720 --> 00:08:16,960
any extreme values that are gonna be generated on either tail of the distribution to the extreme left

94
00:08:16,960 --> 00:08:18,570
or the extreme right.

95
00:08:18,610 --> 00:08:26,590
It's truncated and now this function has to know how many values it needs to generate What's the shape

96
00:08:26,800 --> 00:08:33,730
of this first layer the shape for this layer is gonna be determined by two things the number of inputs

97
00:08:34,330 --> 00:08:36,850
and the number of neurons in the layer.

98
00:08:36,940 --> 00:08:37,800
Right.

99
00:08:37,870 --> 00:08:44,110
So we said that the number of inputs was seven hundred and eighty four and the number of neurons in

100
00:08:44,110 --> 00:08:46,950
this layer was five hundred and twelve.

101
00:08:47,680 --> 00:08:49,600
And we don't actually have to have these numbers here.

102
00:08:49,690 --> 00:08:59,050
We can just write total inputs and an underscore hidden one in place of these numbers.

103
00:08:59,050 --> 00:09:04,630
So now this function knows how many weights to generate for us in total but it doesn't yet know how

104
00:09:04,630 --> 00:09:09,660
to pick out these weights should they be far apart from one another or should we rather close together

105
00:09:10,090 --> 00:09:16,180
for that it needs to know a little bit about the standard deviation and we can end that with this little

106
00:09:16,180 --> 00:09:20,770
argument here and we can set that equal to a small value like zero point one.

107
00:09:21,850 --> 00:09:27,520
And because everything is randomized and you might want to replicate my results we can set the seed

108
00:09:27,520 --> 00:09:30,280
value here equal to 42.

109
00:09:30,490 --> 00:09:35,200
That way you'll be drawing the same random weights every single time.

110
00:09:35,200 --> 00:09:36,450
So those are our weights.

111
00:09:36,460 --> 00:09:38,790
Let me hit shift enter on the cell.

112
00:09:38,860 --> 00:09:43,840
Now you might think that you can actually just take a look at what values we picked out here what values

113
00:09:43,840 --> 00:09:50,310
we generated and you would be disappointed that you can't actually do that just yet.

114
00:09:50,410 --> 00:09:54,600
And the reason is is that tensor flow kind of has this like two stage approach.

115
00:09:54,730 --> 00:10:01,750
The first stage is all setup and it's only the second stage that these calculations are actually all

116
00:10:01,750 --> 00:10:02,970
done.

117
00:10:03,010 --> 00:10:10,210
So as long as we don't actually tell tensor flow to evaluate any of these tenses and to run all the

118
00:10:10,210 --> 00:10:16,820
calculations we don't actually get to see the values that are generated just yet.

119
00:10:17,020 --> 00:10:22,270
This is one of those things to wrap your head around with tensor flow that it's got this two stage approach

120
00:10:22,390 --> 00:10:26,390
and we're gonna be talking about that a whole lot more in a bit.

121
00:10:26,440 --> 00:10:33,580
So back to the setup we've created our random little values him as the initial values for the weights

122
00:10:34,120 --> 00:10:40,750
but the thing that we should actually do with these values is to create something called a variable

123
00:10:40,800 --> 00:10:50,160
a tensor flow variable and that variable will hold onto all the weights in that first hidden layer.

124
00:10:50,170 --> 00:10:59,110
So when I call it w 1 and I want to set that equal to T F dot variable with a capital V and open parentheses

125
00:11:00,100 --> 00:11:10,050
initial value is gonna be equal to our initial underscore w one in this line of code we're actually

126
00:11:10,050 --> 00:11:16,110
creating the weights the weights remember are more than just the initial values they have to persist

127
00:11:16,110 --> 00:11:20,030
and be updated as all the calculations intense floor are going to be run.

128
00:11:20,190 --> 00:11:24,500
This is why we create them as a variable.

129
00:11:24,540 --> 00:11:31,320
Now let's tackle the biases remember how we said that changing the weights is like a stretching or compressing

130
00:11:31,320 --> 00:11:33,440
the activation function in a neuron.

131
00:11:33,840 --> 00:11:40,740
The bias will work together with the weight and the bias is what shifts the activation function from

132
00:11:40,890 --> 00:11:42,090
left to right.

133
00:11:42,090 --> 00:11:45,040
By adding or subtracting a number.

134
00:11:45,150 --> 00:11:50,460
So what we're gonna do next is we're going to initialize the biases and the way we're going to do this

135
00:11:50,580 --> 00:11:58,230
is we'll get all the initial biases for that first hidden layer with initial underscore B 1 and we're

136
00:11:58,230 --> 00:12:07,560
gonna set that equal to not TFR truncated normal but f dot constant all of our biases are going to start

137
00:12:07,560 --> 00:12:09,710
out with the same value.

138
00:12:09,870 --> 00:12:14,970
What value is that going to be that value is just going to be equal to zero.

139
00:12:15,030 --> 00:12:20,580
The only other argument we have to supply to this function is how many initial values we need.

140
00:12:20,970 --> 00:12:30,150
So in this case we once again need to supply a shape and that's gonna be equal to 512 or n underscore

141
00:12:30,480 --> 00:12:36,180
hidden one to create our biases then we're going to do it very similar to how we did it with the weights

142
00:12:36,950 --> 00:12:44,340
we're going to use tensor flow and we're going to create a tensor flow parable and this variable just

143
00:12:44,340 --> 00:12:48,620
has to know what the initial value is of the biases.

144
00:12:48,660 --> 00:12:56,170
So initial underscore value is equal to initial underscore b 1.

145
00:12:56,250 --> 00:13:02,940
So all of this is still part of the setup just for the first layer and the biases and the weights for

146
00:13:02,940 --> 00:13:06,850
this layer will be updated during the training process.

147
00:13:06,870 --> 00:13:09,920
This is where the network does its learning.

148
00:13:10,020 --> 00:13:16,770
These are the values that will feed into the activation functions of the neurons and represent the strength

149
00:13:16,980 --> 00:13:20,540
of the connections between the different units.

150
00:13:20,550 --> 00:13:26,460
Now what we need to do is think back to that slide where we had that formula.

151
00:13:26,460 --> 00:13:33,720
We need to determine how the weights and the biases work together to determine the inputs into this

152
00:13:33,720 --> 00:13:35,390
hidden layer.

153
00:13:35,400 --> 00:13:38,980
We discussed this in the previous lesson on this slide right here.

154
00:13:39,660 --> 00:13:45,840
So we know that the first step is we have to multiply whatever comes into this Greenlee right here by

155
00:13:45,840 --> 00:13:46,860
the weights.

156
00:13:46,920 --> 00:13:54,030
This is the first step I'm going to store the inputs that come into our hidden layer in a variable called

157
00:13:54,150 --> 00:14:02,010
Layer 1 underscore in and I'm going to set that equal to the result of this multiplication and the way

158
00:14:02,010 --> 00:14:09,810
that we can multiply our tenses together is with this function right here from tensor flow this function

159
00:14:09,810 --> 00:14:16,320
gets used so often that there is even a nice short little alias that we can use to use this function

160
00:14:16,460 --> 00:14:25,380
need to supply two things we'll supply in a and a B and then this function will multiply matrix A by

161
00:14:25,380 --> 00:14:35,360
matrix B producing a times b no surprises that so let's call this function in our notebook TMF Dot.

162
00:14:35,450 --> 00:14:42,870
Matt Mol and now we have to figure out what is it that we're multiplying given that this is our first

163
00:14:42,870 --> 00:14:43,720
hidden layer.

164
00:14:43,830 --> 00:14:47,100
We're going to multiply the place holder for input features.

165
00:14:47,100 --> 00:14:50,890
This is going to be the tensor that we created by our weights.

166
00:14:50,910 --> 00:14:57,000
So w 1 so X in this case represents our raw inputs.

167
00:14:57,000 --> 00:14:58,720
We're still on the very first layer.

168
00:14:58,740 --> 00:15:01,970
So these are our raw inputs of our feature vector.

169
00:15:02,730 --> 00:15:10,320
But remember what actually reaches the neurons is gonna be the inputs multiplied by the weights plus

170
00:15:10,350 --> 00:15:11,310
the bias.

171
00:15:11,310 --> 00:15:20,250
So plus b 1 the result of this calculation is what actually feeds into the activation function and that's

172
00:15:20,250 --> 00:15:22,160
the next step.

173
00:15:22,180 --> 00:15:26,370
Now let's complete the whole puzzle by working out the output from our hidden layer.

174
00:15:27,210 --> 00:15:36,420
And I'm going to call this layer 1 out and that will be equal to the output of the activation function

175
00:15:36,720 --> 00:15:40,560
from all the neurons in the previous module.

176
00:15:40,620 --> 00:15:46,070
We've used these really activation function with Caris so let's stick with reload.

177
00:15:46,170 --> 00:15:53,160
This time around as well looking at the documentation it really only requires one input namely the features

178
00:15:53,670 --> 00:15:55,370
what are the features.

179
00:15:55,380 --> 00:15:58,180
Well it'll be the inputs to the layer.

180
00:15:58,230 --> 00:15:58,930
Right.

181
00:15:58,980 --> 00:16:10,620
So TAF Dot and Dot really do layer 1 underscore in will now calculate the output from our first hidden

182
00:16:10,620 --> 00:16:14,460
layer so we've done a lot of work just now.

183
00:16:14,460 --> 00:16:17,290
We've initialized the weights for the first time in lab.

184
00:16:17,340 --> 00:16:20,460
We finish lines the biases for the first in lamb.

185
00:16:20,460 --> 00:16:25,860
We figured out what the features were that were going into the first hidden layer.

186
00:16:26,160 --> 00:16:32,610
And this was gonna be the weighted average of the inputs plus the bias and the output of this layer

187
00:16:32,960 --> 00:16:39,580
is going to be the result of the calculations that happen inside the activation function.

188
00:16:39,610 --> 00:16:41,340
Now I've got a challenge for you.

189
00:16:41,640 --> 00:16:48,600
I'd like you to complete the code to set up the second hidden layer.

190
00:16:48,600 --> 00:16:56,070
Remember this layer has 64 neurons and that layer will depend on the output of the first hidden layer

191
00:16:57,120 --> 00:17:01,460
and I'd also like you to setup the output layer.

192
00:17:01,740 --> 00:17:10,140
And here the trick is that the activation function will be the soft max function this challenge is gonna

193
00:17:10,160 --> 00:17:11,410
be a little bit more tricky.

194
00:17:11,450 --> 00:17:20,060
You have to get a couple of things right to solve it but pause the video and give this a go.

195
00:17:20,200 --> 00:17:21,460
Ready.

196
00:17:21,460 --> 00:17:24,060
Here's the solution.

197
00:17:24,220 --> 00:17:29,530
The first thing that we need to do for that second hidden layer is to initialize the weights and the

198
00:17:29,530 --> 00:17:38,560
biases so copy this line and I'll copy this line and I'll just pasted in the cell below.

199
00:17:39,130 --> 00:17:46,360
And what I wanna do is I want to change my variable names so these are not gonna be the same weights

200
00:17:46,420 --> 00:17:47,550
or biases.

201
00:17:47,680 --> 00:17:51,940
As for the first hidden layer and I won't have to do this everywhere right.

202
00:17:52,180 --> 00:17:55,300
So I've got my initial weights here.

203
00:17:55,390 --> 00:17:57,940
I've got my initial weights here.

204
00:17:58,180 --> 00:18:05,890
I've got a separate variable W2 the values for my initial bias is here and the variable for my initial

205
00:18:05,890 --> 00:18:08,090
bias is here.

206
00:18:08,320 --> 00:18:15,760
Now the first trick is actually setting up the weights and the biases for the second layer correctly

207
00:18:16,270 --> 00:18:21,490
because the thing that's different between the first layer and the second layer is the shape.

208
00:18:21,490 --> 00:18:22,330
Right.

209
00:18:22,450 --> 00:18:29,350
So the number of inputs that these second hidden layer gets is not equal to seven hundred and eighty

210
00:18:29,380 --> 00:18:30,070
four.

211
00:18:30,940 --> 00:18:35,320
It's actually equal to the number of neurons in the first hidden layer.

212
00:18:35,380 --> 00:18:36,280
Right.

213
00:18:36,280 --> 00:18:43,190
Instead of 780 for him we'll have an underscore hidden one.

214
00:18:43,900 --> 00:18:51,730
The total number of neurons in the second hidden layer is also not an underscore hidden one but an underscore

215
00:18:51,810 --> 00:18:58,100
hidden to if this is the shape that you've provided for those initial weight values.

216
00:18:58,190 --> 00:18:59,630
Well done.

217
00:18:59,810 --> 00:19:02,510
Now what about the biases in this case.

218
00:19:02,630 --> 00:19:12,190
We need 64 neurons whether we've got a bias so the shape here should be an underscore hidden to the

219
00:19:12,340 --> 00:19:14,050
inputs for the second layer.

220
00:19:14,140 --> 00:19:15,730
Well then use these values.

221
00:19:15,730 --> 00:19:16,450
Right.

222
00:19:16,510 --> 00:19:27,600
So layer two in will be equal to the multiplication between x and w one No because that second hidden

223
00:19:27,600 --> 00:19:29,520
layer follows the first hidden layer.

224
00:19:30,090 --> 00:19:33,150
It'll be the output of that first hidden layer.

225
00:19:33,150 --> 00:19:33,870
Right.

226
00:19:33,870 --> 00:19:40,470
It'll be Layer 1 out and then it won't be w 1 but it'll be w 2.

227
00:19:40,680 --> 00:19:44,790
And of course also be the biases for that second hidden layer.

228
00:19:45,510 --> 00:19:51,240
So in this case it'll be the output of the first hidden layer multiplied by the weights of the second

229
00:19:51,240 --> 00:19:58,890
hidden layer plus the biases of the second hidden layer that will form the inputs into that second hidden

230
00:19:58,890 --> 00:20:00,000
layer.

231
00:20:00,010 --> 00:20:01,930
Now what about the output.

232
00:20:01,980 --> 00:20:05,380
Well in this case I'll be quite similar right.

233
00:20:05,730 --> 00:20:12,720
We'll have Layer 2 out and there'll be also redo we're gonna stick with the same activation function

234
00:20:13,290 --> 00:20:16,960
and it'll just be Layer 2 on a score.

235
00:20:18,060 --> 00:20:21,410
So now we've got the output for that second hidden layer.

236
00:20:22,980 --> 00:20:24,630
What about that final layer though.

237
00:20:24,690 --> 00:20:26,450
What about the output layer.

238
00:20:26,490 --> 00:20:32,390
Well in this case we have to update everything for the output layer starting at the top.

239
00:20:32,430 --> 00:20:34,040
We've got to change the weights right.

240
00:20:34,530 --> 00:20:41,770
So I to go with W3 here and W3 here and here.

241
00:20:41,790 --> 00:20:45,730
The shape in this case will be 64 and 10.

242
00:20:45,750 --> 00:20:46,420
Right.

243
00:20:46,650 --> 00:20:55,830
We've got 64 neurons in that second hidden layer so an underscore hidden 2 and we've got 10 neurons

244
00:20:56,130 --> 00:21:05,540
in that output layer or the number of classes for the weights we're also have a shape determined by

245
00:21:05,540 --> 00:21:07,160
the number of outputs.

246
00:21:07,240 --> 00:21:12,720
Our number of categories that we want to predict I'll update the names of course as well.

247
00:21:12,830 --> 00:21:16,970
Initial underscore B three and initial on his work b 3.

248
00:21:18,120 --> 00:21:24,370
Ok so now we've got the weights and the biases for that output layer.

249
00:21:24,500 --> 00:21:27,520
So what's coming into the output layer.

250
00:21:27,530 --> 00:21:34,500
Well it's kind of B layer three underscore in is equal to the multiplication of.

251
00:21:34,550 --> 00:21:37,570
Well in this case it's gonna be the output of layer number two right.

252
00:21:37,600 --> 00:21:44,330
Number two is gonna feed into Layer number three and we're gonna multiply it by the weights that connect

253
00:21:44,330 --> 00:21:48,950
those two layers and of course we're gonna add the biases that are specific to a layer number three

254
00:21:49,190 --> 00:21:57,000
the output layer finally our output as a whole is kind of equal to whatever comes out of the activation

255
00:21:57,000 --> 00:21:59,400
function in our output layer.

256
00:21:59,940 --> 00:22:06,110
But in this case it's not gonna be really you it's gonna be soft Max.

257
00:22:06,300 --> 00:22:12,500
So that way we get a nice probability that is associated with each of the outputs.

258
00:22:12,720 --> 00:22:19,050
If you weren't sure how to find the soft max function from tensor flow then if you google for it you

259
00:22:19,050 --> 00:22:25,290
actually get directed to this page right here which is the API reference for tensor flow and there you

260
00:22:25,290 --> 00:22:33,270
see that the soft max function requires also only really a single input which here have a name called

261
00:22:33,330 --> 00:22:34,880
Low gits.

262
00:22:35,220 --> 00:22:41,100
But effectively what this function needs to calculate the probabilities is of course the weighted inputs

263
00:22:41,190 --> 00:22:42,240
for the output layer.

264
00:22:43,340 --> 00:22:45,560
So I hope that didn't throw you off.

265
00:22:45,650 --> 00:22:50,510
This was definitely one of the hotter in video challenges that I've asked you to complete because it's

266
00:22:50,510 --> 00:22:56,690
really built on understanding the code that we've written a minute ago plus understanding the concepts

267
00:22:56,870 --> 00:23:02,360
that we've kind of discussed over the last two modules but don't worry if you didn't quite get this

268
00:23:02,360 --> 00:23:08,360
right there's still plenty of opportunities throughout this module to solidify your understanding and

269
00:23:08,360 --> 00:23:11,150
see how this works it'll become a lot more clear.

270
00:23:11,450 --> 00:23:18,470
Once we actually run our code trainer model and you get to see these things in action in the next lesson

271
00:23:18,710 --> 00:23:20,510
we're gonna continue with our setup.

272
00:23:20,510 --> 00:23:22,910
We're going to hammer out three important points.

273
00:23:22,910 --> 00:23:28,460
They need the loss function that we want to use the optimizations that we want tensor flow to do and

274
00:23:28,460 --> 00:23:33,820
the metrics like the accuracy that we want tensor flow to calculate along the way.

275
00:23:34,070 --> 00:23:38,070
And only after we've done all that we can actually train our model.

276
00:23:38,240 --> 00:23:39,750
I'll see you in the next lesson.

277
00:23:39,770 --> 00:23:40,300
Take care.