1
00:00:00,770 --> 00:00:01,300
All right.

2
00:00:01,320 --> 00:00:07,920
So in this lesson what we're gonna do is going to vary up the hyper parameters and an experiment with

3
00:00:07,920 --> 00:00:09,170
a dropout layer.

4
00:00:09,270 --> 00:00:14,630
We're also going to look at a news summary intensive bought namely the histogram summary.

5
00:00:14,940 --> 00:00:18,990
All of this will give us a chance to run our model for a different number of epochs.

6
00:00:18,990 --> 00:00:23,410
We can see how our accuracy changes as we continue trading our model.

7
00:00:23,430 --> 00:00:27,800
We can also vary up the learning rate and we can change the architecture.

8
00:00:27,930 --> 00:00:29,070
So let's get on it.

9
00:00:29,250 --> 00:00:34,200
The very first thing I'm going to do is I'm going to go up to the function where I'm creating all my

10
00:00:34,200 --> 00:00:35,250
layers.

11
00:00:35,250 --> 00:00:42,180
And in this function I'm going to add two more summaries because I'm very much interested in what happens

12
00:00:42,180 --> 00:00:50,010
to my weights and my biases as my model continues training in order to see what happens with our weights

13
00:00:50,070 --> 00:01:01,680
and our biases we're going to use the histogram summary so TAF does summary don't histogram open parentheses

14
00:01:01,830 --> 00:01:08,610
and then I'll have single quotes weights as the name for the summary and then the values are going to

15
00:01:08,610 --> 00:01:17,970
be my lower case w want to be tracking my variables for all my layers in this histogram summary and

16
00:01:17,970 --> 00:01:20,990
then also going to track my biases.

17
00:01:21,000 --> 00:01:24,780
So once again this is gonna be a histogram summary.

18
00:01:25,020 --> 00:01:30,630
I'm going to call this one biases and the value is going to be lowercase b.

19
00:01:30,690 --> 00:01:33,540
The next thing I want to do is to scroll down a little bit.

20
00:01:33,930 --> 00:01:40,080
And since this here is actually where we're setting up our model I'm going to create another variable

21
00:01:40,080 --> 00:01:43,690
here called model on a score name.

22
00:01:44,430 --> 00:01:47,040
And this one is gonna be an F string.

23
00:01:47,040 --> 00:01:50,310
And in this model name I'm going to be very descriptive.

24
00:01:50,940 --> 00:01:53,540
I'm gonna put down the architecture of this model.

25
00:01:53,550 --> 00:02:06,220
So an underscore hidden one close in curly braces hyphen curly braces and underscore hidden to allow

26
00:02:06,620 --> 00:02:11,280
for learning rate curly braces learning on a school rate.

27
00:02:11,280 --> 00:02:13,090
That's this one right here.

28
00:02:13,350 --> 00:02:21,870
And at the end of May maybe have capital E curly braces and R underscore epochs.

29
00:02:22,050 --> 00:02:29,310
So that way would have a very specific model name him that includes the number of neurons the learning

30
00:02:29,310 --> 00:02:35,790
rate and the number of epochs some capturing some of these hyper parameters in the model name.

31
00:02:35,790 --> 00:02:40,570
What I'll do next is where I'm setting up the directories for tensor board.

32
00:02:40,590 --> 00:02:49,460
I'm actually gonna change this from Model 1 to kind of braces and then model on a score name.

33
00:02:49,470 --> 00:02:54,650
So now when we're setting up our Directories we get the model name in the directory.

34
00:02:54,690 --> 00:03:01,110
And this means the model name is gonna show up here and it's going to show up on the runs on our graph.

35
00:03:01,110 --> 00:03:03,690
Now we've done quite a few runs already.

36
00:03:03,690 --> 00:03:11,910
So what I'm gonna do is I want to delete all the folders inside my amnesty digit logs directory and

37
00:03:11,910 --> 00:03:14,070
that way when I read run my cells.

38
00:03:14,070 --> 00:03:15,790
I have a blank slate.

39
00:03:15,900 --> 00:03:21,990
The next thing I'm going to do is I'm going to change the number of epochs from five to 50.

40
00:03:21,990 --> 00:03:28,620
So we're gonna do a whole lot more training and we're going to be able to see how our loss and our accuracy

41
00:03:28,950 --> 00:03:32,760
evolve over time as we continue training our model.

42
00:03:32,760 --> 00:03:38,790
So let's get on it and select this cell here and then I'm going to say run all below.

43
00:03:38,910 --> 00:03:43,140
And now we play the waiting game as you can see training for 50 epochs.

44
00:03:43,140 --> 00:03:46,380
Takes a lot more time to just training for five.

45
00:03:46,470 --> 00:03:52,470
If you're running all of this in the Google collab notebook then this is gonna run a lot quicker but

46
00:03:52,770 --> 00:03:57,930
you'll have the disadvantage of not being able to follow on tensor bored so easily.

47
00:03:57,930 --> 00:03:58,350
All right.

48
00:03:58,500 --> 00:04:03,590
So let me refresh tensor board I get a little error message here but that's not a problem.

49
00:04:03,600 --> 00:04:06,350
The reason is is that we're still in the process of training.

50
00:04:06,360 --> 00:04:11,250
So I think it's a bit confused since we've already talked about our graph and we've talked about the

51
00:04:11,250 --> 00:04:14,210
images I'm going to look at my scales.

52
00:04:14,400 --> 00:04:19,900
And here I have my two metrics my accuracy metric and my cost.

53
00:04:19,950 --> 00:04:27,330
This is my loss remember grouped in the same section the section is called performance because performance

54
00:04:27,480 --> 00:04:31,650
is the name scope for both of these summaries.

55
00:04:31,650 --> 00:04:34,890
I'm going to enlarge my chart here in Orange.

56
00:04:34,890 --> 00:04:41,600
I've got the accuracy on the training data set and in blue I've got the accuracy on the validation data

57
00:04:41,600 --> 00:04:48,660
set and what I can see is that most of the learning takes place in the first ten epochs and then it

58
00:04:48,660 --> 00:04:53,980
takes longer and longer and longer to get that extra little bit of accuracy.

59
00:04:54,060 --> 00:05:04,760
So going from 86 to 87 percent accuracy takes a lot longer than going from say 60 to 70 percent accuracy.

60
00:05:04,780 --> 00:05:07,870
In other words this is the classic shape of a chart.

61
00:05:07,870 --> 00:05:14,350
When you've got diminishing returns now let's scroll down to our losses in the last module we talked

62
00:05:14,350 --> 00:05:20,740
about how it was a problem when our validation loss stopped going down when the validation loss starts

63
00:05:20,740 --> 00:05:25,740
to stabilize and even starts going up then we're over fitting our model.

64
00:05:25,750 --> 00:05:32,300
So in this case we can see that it's still going down at epoch number 50 but not by much.

65
00:05:32,320 --> 00:05:39,250
Now what about these other tabs that showed up distributions and histogram is the reason we get those

66
00:05:39,250 --> 00:05:44,830
two new tabs is because we've added the histogram summary these two lines of code are now going to show

67
00:05:44,830 --> 00:05:51,690
us a histogram and a distribution for our weights and our biases over time check it out.

68
00:05:51,700 --> 00:05:56,380
Let's start out looking at the histogram and what I'm going to do is I'm just gonna take our training

69
00:05:56,380 --> 00:06:03,400
run here and then I'm going to make these charts a little bigger and bigger and bigger and I mean expand

70
00:06:03,400 --> 00:06:05,160
the output layer here.

71
00:06:05,370 --> 00:06:06,440
What's hidden before.

72
00:06:06,670 --> 00:06:09,500
And then make that one bigger.

73
00:06:09,520 --> 00:06:16,240
Now let me scroll to the very top and let's take a look at what this is actually showing us because

74
00:06:16,300 --> 00:06:18,730
it's very pretty but what does it mean.

75
00:06:19,120 --> 00:06:23,410
Well what it's showing us is a distribution.

76
00:06:23,410 --> 00:06:29,890
It's showing us a distribution of our biases for the first hidden layer over time.

77
00:06:29,890 --> 00:06:34,530
Now if you remember all our biases started out at zero.

78
00:06:34,630 --> 00:06:39,540
And by the end of the first training run our biases got a little update.

79
00:06:39,640 --> 00:06:47,680
And then one hundred and eighty three of them were at about zero point 0 0 5 0 5 and about sixty three

80
00:06:47,680 --> 00:06:52,860
of them were around minus zero point zero zero three nine nine.

81
00:06:52,870 --> 00:06:59,190
So in the very back here we've got the oldest histogram at the end of epoch number zero.

82
00:06:59,470 --> 00:07:04,860
Our biases updated and some of them about smaller and some of them got larger.

83
00:07:04,930 --> 00:07:09,850
Now for hidden layer number one there were a five hundred and twelve different biases.

84
00:07:09,910 --> 00:07:17,440
So what we get in this picture is an idea of how the distribution of these five hundred and twelve different

85
00:07:17,440 --> 00:07:21,010
biases changed over time on the y axis.

86
00:07:21,010 --> 00:07:22,340
You can see the steps.

87
00:07:22,480 --> 00:07:26,890
So we've recorded a summary in every single epoch in our for loop.

88
00:07:26,950 --> 00:07:32,220
So here we're moving forward in time until we get to the most recent epoch.

89
00:07:32,260 --> 00:07:35,550
So this was the one right at the end of the training.

90
00:07:35,650 --> 00:07:41,500
And here we see that some buy's values that are quite large and some values for the biases got quite

91
00:07:41,500 --> 00:07:42,340
small.

92
00:07:42,340 --> 00:07:48,250
The reason why these little steps correspond to one epoch is because this is when we decided to add

93
00:07:48,250 --> 00:07:52,450
our summary and tell our file writer to write to our disk.

94
00:07:52,450 --> 00:07:57,280
How do we structured our for loop a little differently than we might have written a summary every two

95
00:07:57,280 --> 00:07:59,550
epochs every five epochs.

96
00:07:59,590 --> 00:08:06,100
So the way to think about what you're seeing on this axis is the step size how often during the training

97
00:08:06,400 --> 00:08:08,420
you're writing to your file.

98
00:08:08,440 --> 00:08:13,610
So in our first layer our biases changed quite a bit during the course of the training.

99
00:08:13,630 --> 00:08:15,610
What about our weights.

100
00:08:15,610 --> 00:08:17,550
Now this is quite interesting right.

101
00:08:17,560 --> 00:08:25,150
Our weights started out as a truncated normal distribution so they started out as kind of bell curve

102
00:08:25,330 --> 00:08:27,770
without the tails on either end.

103
00:08:28,000 --> 00:08:36,580
And what we see is done by the end of the training all the weights still seem to more or less fall into

104
00:08:36,580 --> 00:08:38,220
this truncated distribution.

105
00:08:38,230 --> 00:08:43,060
They still seem to have the same distribution as when they started.

106
00:08:43,060 --> 00:08:48,910
So to me to suggest that all the learning that's been taking place in that first hidden layer actually

107
00:08:48,910 --> 00:08:52,940
happened in the biases and not in the weights.

108
00:08:53,020 --> 00:08:54,410
What about the second hidden layer.

109
00:08:54,790 --> 00:08:56,700
So this is quite interesting.

110
00:08:56,790 --> 00:08:59,650
Their biases changed dramatically.

111
00:08:59,650 --> 00:09:01,210
They all started out at zero.

112
00:09:01,240 --> 00:09:03,520
We had 64 of them to begin with.

113
00:09:03,520 --> 00:09:06,460
And by the end they had this kind of distribution.

114
00:09:06,460 --> 00:09:11,140
So there's some over here quite negative somewhere here quite positive.

115
00:09:11,140 --> 00:09:18,370
So once again all 64 weights in the second hidden layer started out with a truncated normal distribution.

116
00:09:18,370 --> 00:09:21,430
And by the end there hasn't been that much of a change.

117
00:09:21,850 --> 00:09:23,530
What about the output layer though.

118
00:09:23,530 --> 00:09:26,670
Here we only had 10 biases and these.

119
00:09:26,720 --> 00:09:30,040
Yeah they moved around a little bit since there's only 10 values here.

120
00:09:30,100 --> 00:09:32,550
This visualization looks very very jagged.

121
00:09:32,590 --> 00:09:38,140
It tends to be a little bit more realistic when you've got a lot of values but for 10 you kind of get

122
00:09:38,140 --> 00:09:40,020
this like mountain landscape.

123
00:09:40,180 --> 00:09:45,790
But regardless what we do see is there was a change from the beginning of the training towards the end

124
00:09:45,790 --> 00:09:50,950
of the training and the same can actually be said for the weights.

125
00:09:50,950 --> 00:09:57,070
In this case we've had 10 weights and as you can see by the change in the shape here there was some

126
00:09:57,070 --> 00:10:02,690
change to the weights in the output layer now tens of what actually gives us so many different ways

127
00:10:02,690 --> 00:10:05,210
of looking at the same kind of data.

128
00:10:05,300 --> 00:10:08,600
For example we can change the mode here.

129
00:10:08,700 --> 00:10:12,380
You can change the histogram mode from offset to overlay.

130
00:10:12,980 --> 00:10:18,500
And when we do that we rotate the whole thing by 45 degrees and we're looking at from the side.

131
00:10:18,680 --> 00:10:24,590
So this is another way of looking at how things change over time and mousing over it.

132
00:10:24,590 --> 00:10:31,100
We can see the different steps the different histogram as the different curve gets highlighted and all

133
00:10:31,100 --> 00:10:38,680
the slices are no longer spread out over time but instead they're all plotted on the same y axis.

134
00:10:38,690 --> 00:10:43,790
So if you prefer looking at the data in this way then overlay is your friend.

135
00:10:43,790 --> 00:10:46,550
But let's take a look at this other tab here as well.

136
00:10:46,580 --> 00:10:52,430
This is quite interesting because what this shows us as well shows is the same data as before.

137
00:10:52,430 --> 00:10:58,010
So I'm going to look at only the training data this time and I'm going to take a little look at what

138
00:10:58,100 --> 00:11:00,100
it is that we're looking at here.

139
00:11:00,110 --> 00:11:06,500
So you're calling this one distributions and what we're actually looking at here are the percentile

140
00:11:06,530 --> 00:11:09,280
changes in the middle of this distribution.

141
00:11:09,290 --> 00:11:15,160
We've got the median value and on the top we've got the maximum value for lightly colored.

142
00:11:15,260 --> 00:11:20,680
And at the bottom we've got the minimum value also very lightly colored.

143
00:11:20,690 --> 00:11:23,890
And on the x axis here we have the steps.

144
00:11:23,930 --> 00:11:27,270
So at Step Number Zero we were here.

145
00:11:27,320 --> 00:11:34,970
And step number 50 we were over here the information on the distribution tab is essentially some high

146
00:11:34,970 --> 00:11:41,270
level statistics each line in the chart represents a percentile in the distribution of the data.

147
00:11:42,110 --> 00:11:48,890
So what I'm seeing on this distribution of the weights in the first hidden layer is that nothing much

148
00:11:48,890 --> 00:11:52,400
happened in the middle of my distribution right.

149
00:11:52,400 --> 00:12:00,080
So all the weights that were between zero and zero point one nothing really changed for them between

150
00:12:00,440 --> 00:12:02,960
starting out and finishing training.

151
00:12:03,200 --> 00:12:07,070
But I can see that there was some change at the extreme.

152
00:12:07,070 --> 00:12:11,150
So the smallest weights that started out at minus zero point two.

153
00:12:11,150 --> 00:12:12,230
They did change.

154
00:12:12,230 --> 00:12:19,280
They got more negative and the largest weights at positive zero point two got larger.

155
00:12:19,280 --> 00:12:24,410
This was something that was quite difficult to see on the histogram but if you look very closely you

156
00:12:24,410 --> 00:12:28,880
can see that the tails off the distribution got a bit longer.

157
00:12:28,880 --> 00:12:29,180
Right.

158
00:12:29,210 --> 00:12:34,670
So initially the truncated distribution kind of was very steep and it ended here but then it kind of

159
00:12:34,670 --> 00:12:37,160
fanned out a bit by step 49.

160
00:12:37,160 --> 00:12:41,810
In other words the changes in that first hidden layer were happening on the edges.

161
00:12:42,380 --> 00:12:48,200
So in summary this histogram tab and its distribution tab gives you a chance to see what's happening

162
00:12:48,200 --> 00:12:52,100
with your weights what's happening with your biases over time.

163
00:12:52,130 --> 00:12:58,250
And when I say time I mean over the course of the training is your network learning whereas it learning

164
00:12:58,480 --> 00:13:05,990
and such tensor board really acts as a flashlight to help you better understand the goings on of your

165
00:13:05,990 --> 00:13:07,420
neural network.

166
00:13:07,430 --> 00:13:14,270
So what I want to do now is I want to make a change to my hyper parameters and then I want to see how

167
00:13:14,270 --> 00:13:21,080
these histogram is and how does distributions and how my scales change in response the hyper parameter

168
00:13:21,140 --> 00:13:26,300
that I'm going to change is going to be my learning rate I'm going to make it 10 times larger than what

169
00:13:26,300 --> 00:13:27,480
it is currently.

170
00:13:27,770 --> 00:13:37,220
I'm going to change it from E to the minus four which is zero point zero zero zero one to e to the minus

171
00:13:37,220 --> 00:13:41,020
three which is zero point zero zero one.

172
00:13:41,030 --> 00:13:47,210
So without further ado you change this parameter here.

173
00:13:47,360 --> 00:13:52,780
Now I'm going to click here and say run all below.

174
00:13:54,050 --> 00:14:01,730
So let's run this thing again for 50 epochs and we'll take a look back intensive board and when I refresh

175
00:14:01,730 --> 00:14:04,640
the page just kick start this thing.

176
00:14:04,640 --> 00:14:06,790
So we've got a higher learning rate now.

177
00:14:07,070 --> 00:14:16,130
And what we can see and what we can see is that we're approaching a much higher accuracy much faster.

178
00:14:16,220 --> 00:14:17,770
So this is quite interesting right.

179
00:14:17,780 --> 00:14:25,790
Using a higher learning rate we've been able to train our model a lot quicker in fewer epochs than with

180
00:14:25,790 --> 00:14:27,540
this lower learning rate.

181
00:14:27,590 --> 00:14:34,310
So that's a massive improvement in performance just from changing this one hyper parameter check out

182
00:14:34,310 --> 00:14:40,010
the performance in blue here on the validation data set with each of the minus four versus e to the

183
00:14:40,010 --> 00:14:41,060
minus three.

184
00:14:41,210 --> 00:14:46,070
If we look at the loss we had a massive decrease at epoch number five.

185
00:14:46,160 --> 00:14:54,860
And then slowly starting to stabilize slowly starting to decrease at a much much lower rate.

186
00:14:54,860 --> 00:14:59,630
So the question is really what it's start over fitting by the end maybe we have to change our model

187
00:14:59,630 --> 00:15:05,560
of and add some regularization or a drop out layer to prevent it from over fitting.

188
00:15:06,060 --> 00:15:12,360
Okay so I'm done training my accuracy on my training data set is very very high.

189
00:15:12,390 --> 00:15:14,630
But that's not surprising.

190
00:15:14,650 --> 00:15:17,780
The key is the accuracy on the validation data set.

191
00:15:17,860 --> 00:15:21,430
Looking at the performance this seems very promising.

192
00:15:21,430 --> 00:15:27,580
Towards the end of training we've got an incredibly high accuracy around ninety eight percent on the

193
00:15:27,580 --> 00:15:30,950
validation data set and looking at our loss.

194
00:15:31,060 --> 00:15:32,700
It doesn't seem to be going up.

195
00:15:32,710 --> 00:15:34,940
It does seem to have stabilized.

196
00:15:34,940 --> 00:15:37,170
It doesn't seem to be decreasing that much anymore.

197
00:15:37,240 --> 00:15:43,640
As we're continuing to train but it doesn't seem to have started over fitting yet in particular but

198
00:15:43,660 --> 00:15:51,220
I want to pull up side by side are my histogram is for that first run with that really small learning

199
00:15:51,220 --> 00:15:58,580
rate and my histogram is for my second run in with my learning rate that was ten times larger.

200
00:15:58,600 --> 00:16:00,280
So here are my biases.

201
00:16:00,430 --> 00:16:07,720
And here are my weights and what I can see that is that the distribution changed massively between the

202
00:16:07,720 --> 00:16:14,650
two runs right now if you remember on our first run by the end of the training our distribution had

203
00:16:14,650 --> 00:16:21,610
a much wider tail but on the second run the change in the weights even in the very first layer was much

204
00:16:21,610 --> 00:16:23,050
more dramatic.

205
00:16:23,050 --> 00:16:28,870
We see the same pattern emerge where previously the weights weren't really changing all that much.

206
00:16:28,870 --> 00:16:36,130
With that lower learning rate now the all the distribution is shifting from epoch to epoch.

207
00:16:36,130 --> 00:16:38,420
And the same is true for the output layer.

208
00:16:38,560 --> 00:16:44,020
So previously when the learning rate was low most of the adjustments seemed to have been made in the

209
00:16:44,020 --> 00:16:48,930
output layer and then fewer and fewer adjustments happen in the second layer.

210
00:16:48,970 --> 00:16:54,880
And then the first hidden layer this time around there were also big changes in the output layer.

211
00:16:55,060 --> 00:16:59,190
So given that the distribution of these 10 weights started out like this.

212
00:16:59,250 --> 00:17:05,110
And by the end we had something more like this more and more of the learning seems to have filtered

213
00:17:05,110 --> 00:17:11,580
through to the second hidden layer and the very first hidden layer when the learning rate was higher.

214
00:17:11,830 --> 00:17:13,740
But that's just the weight side of course.

215
00:17:13,870 --> 00:17:17,320
In both cases the biases changed dramatically.

216
00:17:17,320 --> 00:17:23,160
So even with the learning rate was low the biases changed quite a lot in the output layer in the first

217
00:17:23,280 --> 00:17:23,630
layer.

218
00:17:23,650 --> 00:17:29,260
And the second hidden layer and we see the same pattern emerge with the higher learning rate the biases

219
00:17:29,350 --> 00:17:36,010
also changed quite a lot which is a good indication that our model is learning having looked at the

220
00:17:36,010 --> 00:17:42,730
histogram side by side for the two runs allowed us to see the differences from the beginning and the

221
00:17:42,730 --> 00:17:45,100
end for the first run in the second run.

222
00:17:45,100 --> 00:17:49,950
But we can also look at the change in the percentiles for the distributions.

223
00:17:50,410 --> 00:17:56,500
The smallest changes occurred in that first hidden layer there we had bigger changes in the biases and

224
00:17:56,500 --> 00:17:59,690
smaller changes in the weights with the higher learning rate.

225
00:17:59,740 --> 00:18:06,040
We still had quite big changes in the biases and smaller changes in the weights but comparing the distribution

226
00:18:06,040 --> 00:18:11,230
in the percentiles between the second run and the first run we can see that in the second run we've

227
00:18:11,230 --> 00:18:13,380
got much more of a spread.

228
00:18:13,540 --> 00:18:15,920
The same is true for the second hidden lamb.

229
00:18:16,060 --> 00:18:23,080
Notice for instance the scale here on the y axis for the first run we're looking at values between zero

230
00:18:23,080 --> 00:18:25,450
point three and minus zero point three.

231
00:18:25,590 --> 00:18:31,510
And on that second run we're looking at values spewing minus zero point six and zero point six almost

232
00:18:31,510 --> 00:18:34,270
double the range in the previous module.

233
00:18:34,390 --> 00:18:38,350
I showed you how to use a tripod layer in Caris.

234
00:18:38,350 --> 00:18:42,490
Now I want to show you how to include a drop out layer just using tensor flow.

235
00:18:42,490 --> 00:18:45,130
This is more for illustration than necessity.

236
00:18:45,130 --> 00:18:50,280
I don't think this model is over fitting but I would like to cover this nonetheless because you're going

237
00:18:50,280 --> 00:18:52,470
to need it on your future projects.

238
00:18:52,510 --> 00:18:57,400
I'll copy paste this cell here and I'm going to comb it out.

239
00:18:57,430 --> 00:18:58,450
This line of code here.

240
00:18:59,200 --> 00:19:03,350
So this was our model without drop out.

241
00:19:03,610 --> 00:19:09,190
And here we're going to include a drop out layer and we're going to include that droplet layer after

242
00:19:09,220 --> 00:19:10,450
our first hidden layer.

243
00:19:11,110 --> 00:19:13,820
So I'm going to change my model name to include this.

244
00:19:13,860 --> 00:19:19,700
Oh here for dropout and then what I'm going to do is I'm going to add a dropout layer here.

245
00:19:20,080 --> 00:19:27,070
Call this layer on a squat drop and the way I'm going to add this dropout layer is using T F dot and

246
00:19:27,070 --> 00:19:28,390
then dot dropout.

247
00:19:28,750 --> 00:19:30,990
This will create the dropout layer for me.

248
00:19:31,060 --> 00:19:38,650
All it needs is an x and something called a keep probability the keep probability is the probability

249
00:19:38,920 --> 00:19:41,050
that an element is kept.

250
00:19:41,050 --> 00:19:47,710
So if I wanted 20 percent of the neurons in the previous layer to drop out then I would set my key probability

251
00:19:48,040 --> 00:19:49,210
at 80 percent.

252
00:19:49,960 --> 00:19:51,220
So let's try it out.

253
00:19:51,280 --> 00:19:59,600
TMF Dot and Dot drop out parentheses and then it needs the input the input will be the previous layer.

254
00:19:59,810 --> 00:20:02,440
So layer under score one.

255
00:20:02,750 --> 00:20:04,670
Then we need to specify the.

256
00:20:04,790 --> 00:20:09,450
Keep probability and let's call this zero point eight.

257
00:20:09,530 --> 00:20:12,970
And finally let's give this whole thing a name right.

258
00:20:13,050 --> 00:20:23,810
Name is equal to or drop out on a score layer but since we added this Trump on him we have to change

259
00:20:23,930 --> 00:20:26,560
the input for Layer number two.

260
00:20:26,570 --> 00:20:29,440
In this case the input is no longer in a row one.

261
00:20:29,750 --> 00:20:31,790
But it's this tripod layer in between.

262
00:20:32,090 --> 00:20:39,320
So it's not really a real layer but it is what forms the input to that second hidden layer.

263
00:20:39,350 --> 00:20:39,770
All right.

264
00:20:39,800 --> 00:20:44,400
So my training has finished and I've refreshed tensor flow.

265
00:20:44,600 --> 00:20:53,960
So in green we have the validation run with dropout and in blue we have the validation run without dropout.

266
00:20:55,010 --> 00:21:00,890
What we see is kind of as expected when you include drop out then the learning takes place a little

267
00:21:00,890 --> 00:21:02,370
bit more slowly.

268
00:21:02,450 --> 00:21:08,240
That's why the performance of the model with the drop out is a bit below the previous one.

269
00:21:08,480 --> 00:21:15,050
But it does catch up and towards the end both are pretty much identical in terms of the loss.

270
00:21:15,050 --> 00:21:20,840
We didn't really have a problem with over fitting previously and we most certainly don't have one now.

271
00:21:21,020 --> 00:21:22,700
So that's concludes this lesson.

272
00:21:22,700 --> 00:21:26,150
We've explored tensor board extensively.

273
00:21:26,210 --> 00:21:32,450
We've looked at a lot of different summaries from scales to images to our graph to the history grams

274
00:21:32,540 --> 00:21:38,290
and the distributions and we were able to see what was happening as we were training our model.

275
00:21:38,360 --> 00:21:42,220
We were able to see our lost decrease and change over time.

276
00:21:42,260 --> 00:21:45,260
We were able to see our accuracy increase over time.

277
00:21:45,320 --> 00:21:52,370
We're able to see how much of a difference the learning rate can make and we're also able to see how

278
00:21:52,370 --> 00:21:58,180
our biases and our weights changed over time as we were training our model.

279
00:21:58,190 --> 00:22:04,610
The changes were most significant in the output layer which had the fewest neurons and then the changes

280
00:22:04,610 --> 00:22:12,030
got smaller and smaller as they propagated down the network to that first hidden layer which had 512

281
00:22:12,110 --> 00:22:13,470
neurons.

282
00:22:13,490 --> 00:22:18,560
Initially it seemed that the biases were doing most of the adjustment but when we cranked up the learning

283
00:22:18,560 --> 00:22:24,660
rate we could see that there were also bigger changes happening with the weights in the next lesson.

284
00:22:24,860 --> 00:22:31,130
We're gonna be evaluating our model on our testing dataset and we're also going to make a prediction

285
00:22:31,550 --> 00:22:34,300
on a single image for all that and more.

286
00:22:34,490 --> 00:22:35,870
I'll see you in the next lesson.