1
00:00:00,670 --> 00:00:04,630
In this lesson we will train our model.

2
00:00:04,630 --> 00:00:12,590
We will fit the model to our data and we will start to update the weights in our neural network so let's

3
00:00:12,950 --> 00:00:18,400
add a markdown cell in Jupiter notebook and commemorate this.

4
00:00:18,470 --> 00:00:23,620
So this market on cell will read fit the model.

5
00:00:24,050 --> 00:00:27,310
But how do we actually go about doing this.

6
00:00:27,320 --> 00:00:32,580
Well the first place to check is the caris API documentation.

7
00:00:32,630 --> 00:00:37,860
The relevant section is called Model parentheses functional API.

8
00:00:38,150 --> 00:00:40,840
And if we scroll on here we see that compile method.

9
00:00:41,240 --> 00:00:48,500
But we also see this fit method and this fit method can take a number of different arguments so let's

10
00:00:48,500 --> 00:00:55,570
call this fit method and supply our training data for the X and our labels for the Y.

11
00:00:55,580 --> 00:01:00,890
This is where our really tiny data sets will come in handy because we're gonna use these when we're

12
00:01:00,890 --> 00:01:03,230
iterating and trying things out.

13
00:01:03,380 --> 00:01:14,930
So let's write model underscore one dot fit parentheses x underscore a train underscore X S for extra

14
00:01:14,930 --> 00:01:19,230
small and y on a school train on a score.

15
00:01:19,340 --> 00:01:23,840
Access Now at the very top of the cell before his shift enter.

16
00:01:24,330 --> 00:01:31,430
I'm going to add some micro benchmarking Code 2 percent signs and time will tell us how long.

17
00:01:31,440 --> 00:01:35,270
Jupiter note book takes to execute this line of code.

18
00:01:35,280 --> 00:01:41,540
Now let me hit shift enter on the cell and what I see is that it took about two seconds to execute this

19
00:01:41,540 --> 00:01:43,890
cell but how did that go.

20
00:01:43,890 --> 00:01:45,950
Did we do the job.

21
00:01:46,830 --> 00:01:49,630
Well at this point I have no idea.

22
00:01:49,690 --> 00:01:56,820
So this is where we're gonna use tensor board tensor board will help us figure out what's going on behind

23
00:01:56,820 --> 00:01:58,160
the scenes.

24
00:01:58,170 --> 00:02:04,070
Now the question is how do we get the information from our model into tensor board.

25
00:02:04,080 --> 00:02:09,870
Well let's look back at this documentation here and here we see this parameter called callbacks which

26
00:02:09,870 --> 00:02:11,840
by default is set to None.

27
00:02:12,630 --> 00:02:16,150
But we've seen this would callbacks somewhere before right.

28
00:02:16,170 --> 00:02:23,040
So I recall in our import statements tensor board was actually a callback.

29
00:02:23,040 --> 00:02:27,550
So this means that we can actually use tensor board as a callback.

30
00:02:27,560 --> 00:02:33,270
Now is the time that we get to use our get tensor board function that we created earlier.

31
00:02:33,270 --> 00:02:36,840
Let's add that callback argument to our fit method now.

32
00:02:36,910 --> 00:02:38,590
So I'll add a comma.

33
00:02:38,760 --> 00:02:48,390
Right callbacks equal to and then square brackets get underscore tensor board open parentheses and our

34
00:02:48,390 --> 00:02:49,800
supply my model name.

35
00:02:49,830 --> 00:02:51,910
I'll call this one model 1.

36
00:02:52,650 --> 00:02:54,300
Why do I have the square brackets here.

37
00:02:54,810 --> 00:03:01,860
Well because callbacks is expected to contain a list and we can see this if we go to the documentation

38
00:03:02,280 --> 00:03:05,640
and scroll down to where it says callbacks.

39
00:03:05,640 --> 00:03:13,260
And here the carries documentation very clearly states that a list of callbacks is expected.

40
00:03:13,260 --> 00:03:21,600
So let's head back to our Jupiter notebook and hit shift enter on the cell and our output now reads

41
00:03:21,750 --> 00:03:26,960
successfully create a directory which we can see right here.

42
00:03:27,470 --> 00:03:33,060
My training time this time was around one point two seconds but I don't see tensor board anywhere.

43
00:03:33,060 --> 00:03:34,050
So how do we get hold of it.

44
00:03:35,010 --> 00:03:39,990
Well I actually have to open a terminal or the windows command prompt again at this point.

45
00:03:40,620 --> 00:03:50,440
So if I open up another window here new window and then I write tensor board and then as an argument

46
00:03:51,040 --> 00:03:57,000
I supply my logging directory so minus minus log.

47
00:03:57,040 --> 00:03:59,660
Dear is equal to.

48
00:03:59,920 --> 00:04:02,650
Well this folder right here.

49
00:04:02,650 --> 00:04:07,300
This is my logging directory that we've created since I'm on a Mac.

50
00:04:07,300 --> 00:04:14,740
All I have to do to get my path to this folder is drag and drop it into my terminal and there I see

51
00:04:14,740 --> 00:04:23,710
my path users and all projects tensor board sofar underscore logs let me hit enter on this right now

52
00:04:23,800 --> 00:04:27,860
and I'll show you how to get this path very quickly on windows in a minute.

53
00:04:28,900 --> 00:04:38,350
Once my command executes successfully what I get as a result is this here you are L and I can copy this

54
00:04:38,350 --> 00:04:47,770
your l go back into my browser pasted in and this is where tensor board lives so here's how you can

55
00:04:47,770 --> 00:04:55,420
get the path very quickly on Windows so go to your tensor board underscore CEF are on a underscore logs

56
00:04:55,420 --> 00:05:00,290
folder open it and then up here you see this address ball.

57
00:05:00,430 --> 00:05:07,990
And here you can click and then you see the path for this folder which you can copy and then when you

58
00:05:08,050 --> 00:05:17,890
open your an anaconda prompt all you need to do is of course type tensor board space hyphen hyphen log

59
00:05:18,030 --> 00:05:28,170
D.A. R equals and then right click to paste right click to paste is the trick.

60
00:05:28,210 --> 00:05:32,590
Okay so now that we've got tense about running what are we looking at.

61
00:05:32,590 --> 00:05:38,530
Well we've got two tabs up here graphs and scales in this module.

62
00:05:38,530 --> 00:05:41,880
I'm going to be focusing on these scalar section.

63
00:05:41,980 --> 00:05:45,280
The next thing is is that we get some data here.

64
00:05:45,430 --> 00:05:48,110
The accuracy and our loss.

65
00:05:48,280 --> 00:05:55,750
I want to make this a little larger and you can see here we've got a single data point here.

66
00:05:55,750 --> 00:06:02,080
Our model currently has an accuracy of about eleven percent in sample.

67
00:06:02,080 --> 00:06:07,110
So it has a 11 percent accuracy on the training data set.

68
00:06:07,150 --> 00:06:11,670
The next thing here that we see is a loss and for the loss.

69
00:06:11,800 --> 00:06:15,550
We also only have a single data point.

70
00:06:15,610 --> 00:06:17,070
Now this is a little strange.

71
00:06:17,080 --> 00:06:19,900
This is not exactly what we expect.

72
00:06:19,930 --> 00:06:26,340
So let's dig into the documentation and see why that might be when we look at our fit method.

73
00:06:26,340 --> 00:06:32,170
Then we see two additional parameters batch underscore size equals none.

74
00:06:32,190 --> 00:06:35,560
And epochs equals one.

75
00:06:35,580 --> 00:06:37,650
So what does that mean.

76
00:06:37,650 --> 00:06:39,870
Let's tackle the epoch first.

77
00:06:39,870 --> 00:06:41,610
What's an epoch.

78
00:06:41,610 --> 00:06:50,880
Well an epoch is when the entire dataset has been passed through the neural network a single time considering

79
00:06:50,880 --> 00:06:54,560
that the default value for epoch is 1.

80
00:06:54,570 --> 00:07:02,110
It also makes sense to see only a single data point on our tensor board now passing the entire dataset

81
00:07:02,410 --> 00:07:07,220
through the model a single time almost sounds like it should be enough right.

82
00:07:08,020 --> 00:07:10,640
Well unfortunately the answer is no.

83
00:07:10,810 --> 00:07:18,880
As we saw in the gradient descent module the optimization process really is an iterative process meaning

84
00:07:19,000 --> 00:07:25,420
yes the weights were updated that one time that we ran the fit method but a single pass is actually

85
00:07:25,420 --> 00:07:26,530
not enough.

86
00:07:26,530 --> 00:07:33,970
We have to pass the entire dataset through the network again and again and again for the neural network

87
00:07:34,210 --> 00:07:37,380
to differentiate amongst our images.

88
00:07:37,450 --> 00:07:41,470
But wait why don't we have lots and lots of training data.

89
00:07:41,500 --> 00:07:44,460
What if we have a huge dataset.

90
00:07:44,590 --> 00:07:49,290
Well here at the practicalities of training a neural network set in.

91
00:07:49,660 --> 00:07:57,370
If your computer is powerful enough to handle your entire dataset or your dataset is small enough then

92
00:07:57,370 --> 00:08:00,690
yes you can probably process everything at once.

93
00:08:01,000 --> 00:08:09,250
But usually what you have to do is you have to split up your data set and process one piece of it at

94
00:08:09,250 --> 00:08:15,050
a time a single piece of your data set is called a batch.

95
00:08:15,340 --> 00:08:22,090
And if you're splitting up your data set like this then you're training your model on one batch at a

96
00:08:22,090 --> 00:08:23,040
time.

97
00:08:23,110 --> 00:08:30,480
So that means that it will take multiple iterations to actually go through the entire dataset.

98
00:08:30,580 --> 00:08:36,910
The formula for the number of iterations that it takes to chew through the entire training data is actually

99
00:08:36,910 --> 00:08:38,490
pretty straightforward.

100
00:08:38,530 --> 00:08:46,450
All you need to do is divide the number of training samples by the number of samples in the batch.

101
00:08:46,450 --> 00:08:52,480
That's how you can work out how many iterations it takes to feed the entire data set through the model

102
00:08:52,600 --> 00:08:54,510
a single time.

103
00:08:54,610 --> 00:09:01,450
In other words if you have 100 data points in total and your batch size is equal to 50 then you would

104
00:09:01,450 --> 00:09:05,320
need two iterations to go through the entire dataset.

105
00:09:05,380 --> 00:09:10,300
It would take you two iterations to complete a single epoch.

106
00:09:11,710 --> 00:09:19,160
So the number of iterations is just the number of batches needed to complete one epoch with this vocabulary

107
00:09:19,280 --> 00:09:26,660
out of the way let's head back into Jupiter and change both the batch size and the number of epochs

108
00:09:26,690 --> 00:09:29,550
that we're using to train our model.

109
00:09:29,570 --> 00:09:36,580
So what I'll do is I'll add another cell here and I'll create a variable called samples on a school

110
00:09:36,600 --> 00:09:42,680
per on a score batch and I'll set that equal to 1000.

111
00:09:42,710 --> 00:09:48,560
Now we've got some micro benchmarking code here in the cell below where we're fitting our model and

112
00:09:48,590 --> 00:09:55,100
what I encourage you to do is to try out different patch sizes and see how long it takes the cell to

113
00:09:55,100 --> 00:09:56,960
execute.

114
00:09:56,960 --> 00:10:00,430
Now let's set the number of epochs equal to 20.

115
00:10:00,470 --> 00:10:03,660
Let's train our model for 20 epochs.

116
00:10:03,830 --> 00:10:09,920
I'll create another variable in this cell that reads and hour underscore epochs and I'll set that equal

117
00:10:09,920 --> 00:10:13,850
to 20 and then inside our fit method.

118
00:10:13,910 --> 00:10:21,170
I'm going to add another comma here before callbacks and here I'll add batch underscore size and I'll

119
00:10:21,170 --> 00:10:26,290
set that equal to samples on a scale per underscore batch.

120
00:10:26,430 --> 00:10:31,700
My autocomplete isn't working because I haven't actually hit shift enter on the cell.

121
00:10:31,700 --> 00:10:42,130
Now I can also add my epochs which had the parameter name while epochs and that I can set equal to an

122
00:10:42,130 --> 00:10:45,420
hour on a score epochs.

123
00:10:45,590 --> 00:10:48,300
Now the single line is getting very very long.

124
00:10:48,410 --> 00:10:53,900
So what I'll do is I'll put my callbacks on the line below.

125
00:10:53,900 --> 00:11:02,420
Now let me hit shift and her on the cell and we can see our model being trained throughout the 20 epochs

126
00:11:03,860 --> 00:11:05,170
now as part of the output.

127
00:11:05,180 --> 00:11:11,960
You can see the total amount of training data you can see the loss that was calculated and you can see

128
00:11:12,140 --> 00:11:16,130
the accuracy in sample per batch.

129
00:11:16,130 --> 00:11:23,150
And one thing that you'll notice is that in this case our model isn't learning anything it's just guessing

130
00:11:23,150 --> 00:11:24,040
randomly.

131
00:11:24,110 --> 00:11:26,300
It's getting about 12 percent right.

132
00:11:26,300 --> 00:11:32,150
Meaning if it has 10 things to classify and it's getting about a 10 percent accuracy it's just randomly

133
00:11:32,150 --> 00:11:33,260
guessing.

134
00:11:33,260 --> 00:11:39,110
Now we would have hoped to see some learning happen by epoch number 20.

135
00:11:39,110 --> 00:11:42,870
The fact that we armed means there's a bit of a problem now.

136
00:11:42,920 --> 00:11:50,480
We could have this problem for a number of reasons but one possible reason is that maybe we have a bad

137
00:11:50,480 --> 00:11:51,920
starting point.

138
00:11:51,920 --> 00:11:58,270
Maybe our optimizer is stuck and it cannot minimize this loss any further.

139
00:11:58,280 --> 00:12:06,380
So one thing we can try is we can go up here to where we're defining our model and recompile it.

140
00:12:06,980 --> 00:12:14,830
So just hitting shift enter on the cell coming down here and rerunning our fit method.

141
00:12:15,020 --> 00:12:16,800
What do we see now.

142
00:12:17,300 --> 00:12:24,110
In this case we see some things are happening as we scroll down our model.

143
00:12:24,110 --> 00:12:26,790
Indeed appears to be learning.

144
00:12:26,990 --> 00:12:35,750
It started out with about 10 percent in epoch number two and by epoch number 20 it had improved this

145
00:12:35,750 --> 00:12:38,560
to about 16 percent.

146
00:12:38,570 --> 00:12:40,540
Now here's the big advantage of tensor board.

147
00:12:40,580 --> 00:12:48,500
It makes us a lot more clear if we refresh our page it's going to redraw all our graphs and what we

148
00:12:48,500 --> 00:12:54,960
see here now is if we enlarge this slightly we see the R three runs right.

149
00:12:55,190 --> 00:12:58,430
Our very first run was just one epoch.

150
00:12:58,430 --> 00:13:02,010
Our second run our model did not learn anything.

151
00:13:02,090 --> 00:13:06,800
The training accuracy stayed constant in our third run.

152
00:13:06,800 --> 00:13:13,010
We can see how the training accuracy evolved over time and what we see as this improvement from around

153
00:13:13,010 --> 00:13:16,870
10 percent to around 16 percent.

154
00:13:17,000 --> 00:13:22,790
The way that 10 support works is that it read some files from our disk and it reads these so-called

155
00:13:22,880 --> 00:13:24,560
event files.

156
00:13:24,560 --> 00:13:28,200
This is where it pulls the data from to throw up onto these charts.

157
00:13:28,250 --> 00:13:33,620
Now I can actually see I've got four folders here but I've only got three charts.

158
00:13:33,620 --> 00:13:40,400
And that's because last night I created this empty folder here which does not contain one of these event

159
00:13:40,400 --> 00:13:46,010
files so I'm just going to delete this folder and of course if I delete one of these event files here

160
00:13:46,550 --> 00:13:52,060
and I refresh my page then the data will also disappear from 10 support.

161
00:13:53,000 --> 00:13:57,600
So let's take that blue second run here at eleven thirty nine.

162
00:13:57,650 --> 00:14:02,250
I'll delete the folder and then I'll refresh my page.

163
00:14:03,070 --> 00:14:04,060
And here we go.

164
00:14:04,090 --> 00:14:07,480
It disappeared from tensor board.

165
00:14:07,530 --> 00:14:14,770
Now back to some more urgent questions seeing our model learn over the course of 20 epochs seems promising

166
00:14:14,770 --> 00:14:15,550
right.

167
00:14:15,580 --> 00:14:20,230
Perhaps all we need to do is run our model for longer.

168
00:14:20,710 --> 00:14:27,220
So maybe let's come up to where we're training our model and change this whole thing from twenty to

169
00:14:27,220 --> 00:14:29,620
one hundred and fifty epochs.

170
00:14:29,740 --> 00:14:37,070
Now if I don't want to see all this output then I can actually mutate looking at the carries documentation.

171
00:14:37,090 --> 00:14:45,040
There's a parameter called verbose by default it's set to 1 but we didn't have to have it set to 1.

172
00:14:45,130 --> 00:14:50,890
If we said boats equal to zero then we mute a lot of this output.

173
00:14:50,890 --> 00:14:55,090
We're getting this intensive Ward anyhow so let's try this.

174
00:14:55,090 --> 00:15:03,210
Let's see what happens when I run my model with one hundred and fifty epochs in this case it's going

175
00:15:03,210 --> 00:15:04,790
to run a lot longer.

176
00:15:05,310 --> 00:15:11,720
I've successfully created my directory my model is running and if I go into tensor board and I refresh

177
00:15:11,720 --> 00:15:18,130
it then what you should see is your new graph appearing.

178
00:15:18,130 --> 00:15:22,210
The red line here is clearly above the blue line.

179
00:15:22,210 --> 00:15:25,410
Now one thing I don't really like is how it's going off the chart here.

180
00:15:25,690 --> 00:15:30,570
But more importantly have a look at the starting point of this red line here.

181
00:15:30,610 --> 00:15:38,290
My latest run actually starts at around 16 percent and this is no coincidence the previous run actually

182
00:15:38,290 --> 00:15:40,670
finished around 16 percent.

183
00:15:40,750 --> 00:15:48,800
So if I refresh this cell over and over again the model actually picks up training where it left off.

184
00:15:48,850 --> 00:15:55,030
So if we actually want to compare the different runs starting from scratch starting from zero instead

185
00:15:55,030 --> 00:16:05,430
of adding to them what we have to do is we have to come back to the cell here refreshed and then come

186
00:16:05,430 --> 00:16:08,610
back down here and fit our model.

187
00:16:08,760 --> 00:16:16,290
In this case when I go back to tensor board and I wait a little bit then I can actually see that it

188
00:16:16,290 --> 00:16:23,320
picks up the new file that was created with the latest run right here at eleven fifty two.

189
00:16:23,460 --> 00:16:27,710
And it actually starts plotting the thing while my code is running.

190
00:16:27,840 --> 00:16:34,740
So every once in a while this chart will actually update with the latest model grabbing the latest file

191
00:16:35,010 --> 00:16:38,610
from the disk even as the training is in progress.

192
00:16:39,690 --> 00:16:45,240
So what we see now is that we're starting at a similar starting point as the previous runs where we're

193
00:16:45,240 --> 00:16:46,670
starting from scratch.

194
00:16:46,950 --> 00:16:54,140
Then the optimizer is working working working and improving the accuracy on the training data.

195
00:16:54,570 --> 00:17:01,200
At the same time when we scroll down here the loss on the training data is decreasing and decreasing

196
00:17:01,200 --> 00:17:03,190
and decreasing.

197
00:17:03,210 --> 00:17:05,170
So this seems promising right.

198
00:17:05,400 --> 00:17:13,600
After about 150 epochs we're on 40 percent accuracy but that's on the training data right.

199
00:17:13,680 --> 00:17:15,660
What about our validation data.

200
00:17:15,870 --> 00:17:21,190
What about on images that this model has not seen yet.

201
00:17:21,210 --> 00:17:28,650
Looking at the caris documentation we can see that there is another parameter that we can supply called

202
00:17:28,650 --> 00:17:38,170
validation underscore data the fit method expects to receive the validation data as a tuple.

203
00:17:38,170 --> 00:17:42,130
So let's add that to our fit method and refresh the cell.

204
00:17:42,130 --> 00:17:46,320
I'll add a comma and then I'll type validation on a score.

205
00:17:46,390 --> 00:17:57,490
Data is equal to parentheses to create that tuple X underscore Val comma Y underscore Val.

206
00:17:57,520 --> 00:18:00,820
Now what I'll do is I'll quickly come back up here.

207
00:18:00,820 --> 00:18:08,180
Refresh this cell and rerun our fit back intense aboard.

208
00:18:08,320 --> 00:18:15,190
I'm going to refer to page and now I can see my latest run in pink.

209
00:18:16,390 --> 00:18:21,820
I quite like the fact that tensor board scale this graph a little better and I can see my computer doing

210
00:18:21,820 --> 00:18:27,580
work and that little pink dot moving up as the model is being trained.

211
00:18:27,700 --> 00:18:36,250
You see that after fifty five steps fifty five epochs we've got about 32 percent accuracy on the training

212
00:18:36,250 --> 00:18:44,050
data back in my Jupiter notebook I can see that this whole run took me about one minute and 12 seconds.

213
00:18:44,050 --> 00:18:48,670
Mind you my computer actually does not have a dedicated GP you.

214
00:18:48,790 --> 00:18:54,370
So this is definitely taking a little bit longer on my four or five year old machine than it would on

215
00:18:54,370 --> 00:18:56,850
something a little bit more powerful.

216
00:18:56,950 --> 00:19:02,560
So let's see how this is going intensive board the training accuracy once again seems like it's off

217
00:19:02,560 --> 00:19:04,000
the charts right.

218
00:19:04,030 --> 00:19:12,010
So let me try and refresh this page again see if we actually can redraw the plot so that it shows us

219
00:19:12,010 --> 00:19:14,120
where it actually ends.

220
00:19:14,230 --> 00:19:18,250
If we're lucky it does just that and we are brilliant.

221
00:19:18,610 --> 00:19:22,540
Now one thing that you might ask at this point is well hold on a second.

222
00:19:22,600 --> 00:19:25,330
Why is it that all these lines are different.

223
00:19:25,330 --> 00:19:29,520
We're using the same data and we're using the same number of epochs.

224
00:19:29,530 --> 00:19:32,800
Why am I always getting a different chart.

225
00:19:32,800 --> 00:19:39,130
Well the thing is there's some randomness involved in the optimizer one way the randomness for instance

226
00:19:39,130 --> 00:19:45,820
comes in is the starting point where the optimizer starts to optimize and what we saw the very first

227
00:19:45,820 --> 00:19:50,740
time that we fitted our model was that our model didn't learn anything at all.

228
00:19:50,830 --> 00:19:54,840
And so at this point you can see how this might just be a fluke.

229
00:19:54,850 --> 00:19:55,060
Right.

230
00:19:55,060 --> 00:20:01,510
This might be just bad luck by fitting the model a couple of times we can actually see how each training

231
00:20:01,510 --> 00:20:03,190
run is going.

232
00:20:03,190 --> 00:20:09,280
This latest one ending with 55 percent accuracy after one hundred and fifty epochs seems to have gone

233
00:20:09,280 --> 00:20:10,690
particularly well.

234
00:20:10,690 --> 00:20:10,990
Right.

235
00:20:12,010 --> 00:20:14,850
But remember this is the training accuracy.

236
00:20:14,920 --> 00:20:21,640
Our model is learning to classify our training dataset Let's scroll down here and what we should see

237
00:20:21,850 --> 00:20:29,710
is having added the validation data said to our fit method we have two new sections down here intensive.

238
00:20:29,920 --> 00:20:37,540
The first one is the validation accuracy and because our previous runs did not include the validation

239
00:20:37,540 --> 00:20:38,300
data.

240
00:20:38,390 --> 00:20:40,970
They are not appearing on this chart.

241
00:20:41,150 --> 00:20:43,720
So let's see how we're doing.

242
00:20:43,720 --> 00:20:53,260
After about 150 epochs we're getting about 30 odd percent accuracy on the validation data set and I

243
00:20:53,260 --> 00:21:00,890
can tell you that the validation accuracy is always going to be lower than your training accuracy but

244
00:21:01,070 --> 00:21:04,880
let's take a look at this last section the validation loss.

245
00:21:04,940 --> 00:21:07,020
Now this is very interesting.

246
00:21:07,190 --> 00:21:12,830
What do we see is that our validation loss starts high and then it starts to come down as we're training

247
00:21:13,010 --> 00:21:21,170
right to point to with step 18 to point one on epoch number 48 the validation loss is increasing and

248
00:21:21,170 --> 00:21:25,680
decreasing but it's getting a little bit more jagged down.

249
00:21:26,060 --> 00:21:33,270
And by epoch 100 but we're actually seeing is that our validation loss starts to creep up again.

250
00:21:33,530 --> 00:21:37,610
Our validation loss starts to increase.

251
00:21:37,610 --> 00:21:39,840
This is actually a problem.

252
00:21:39,980 --> 00:21:45,890
And this is why evaluating your model on the validation data set is so important.

253
00:21:45,890 --> 00:21:52,100
The problem that we're seeing here has to do with over fitting and in the next lesson we're going to

254
00:21:52,100 --> 00:21:54,510
see how we can tackle this.

255
00:21:54,830 --> 00:21:59,600
Now even though we haven't written a lot of code in this lesson we've actually covered quite a few different

256
00:21:59,600 --> 00:22:00,820
concepts.

257
00:22:00,830 --> 00:22:04,220
The first thing that we saw was how we can fit our model.

258
00:22:04,730 --> 00:22:06,940
And that was using this fit method.

259
00:22:07,430 --> 00:22:12,940
But one thing that we learned about the fit method was that it starts right left off.

260
00:22:12,950 --> 00:22:18,290
So if we've trained the model for 20 epochs and we call the fit method again it will remember what the

261
00:22:18,290 --> 00:22:21,870
values of the weights were during the previous run.

262
00:22:21,950 --> 00:22:26,070
The next thing that we learned was that not every run is the same.

263
00:22:26,070 --> 00:22:32,360
There is some random this baked in and the optimizer will start at a different starting point every

264
00:22:32,360 --> 00:22:35,050
time you run the fit method.

265
00:22:35,060 --> 00:22:40,890
The third thing that we learned was the process of passing the data set through the network.

266
00:22:41,120 --> 00:22:43,570
And this process was iterative.

267
00:22:43,640 --> 00:22:49,340
Not only did we have to pass the entire data set through the network more than once in order for the

268
00:22:49,340 --> 00:22:50,770
weights to become meaningful.

269
00:22:51,200 --> 00:22:59,630
What we also learned was this technique of splitting up our data into batches and finally we learned

270
00:22:59,630 --> 00:23:01,480
a little bit more about tensor board.

271
00:23:01,550 --> 00:23:07,820
We learned how 10 support creates these files on our desk and then reads from these files to create

272
00:23:07,850 --> 00:23:09,620
our chance.

273
00:23:09,620 --> 00:23:13,800
Now I know this was a very dense lesson with a lot of information to take in.

274
00:23:14,030 --> 00:23:19,320
But don't worry we'll be reinforcing a lot of these concepts in the upcoming lessons.

275
00:23:19,340 --> 00:23:20,560
I'll see you there.

276
00:23:20,570 --> 00:23:21,080
Take care.