1
00:00:00,530 --> 00:00:07,970
How exciting is this whole we've gone a long way from battling to get our images and stances our labels

2
00:00:07,970 --> 00:00:10,600
into tenses preparing our data.

3
00:00:10,610 --> 00:00:19,330
Choosing a model on tensor flow hub creating model callbacks we're finally at the stage where we're

4
00:00:19,330 --> 00:00:25,820
ready to train our first deep learning model so let's create a heading training a model.

5
00:00:25,900 --> 00:00:32,380
Now we're gonna put on a subset of data because if you remember back when we split our data up we're

6
00:00:32,380 --> 00:00:39,690
only going to start training on 1000 images because why do we want to do that.

7
00:00:39,760 --> 00:00:40,690
Let's write here.

8
00:00:40,780 --> 00:00:51,080
Our first model is only going to train on 1000 images to make sure everything is working.

9
00:00:51,490 --> 00:00:52,630
So that's what we're trying to do.

10
00:00:52,630 --> 00:00:55,200
We're trying to minimize our time between experiments.

11
00:00:55,210 --> 00:01:00,720
So we begin by starting on a subset of the data to make sure all of this code.

12
00:01:00,730 --> 00:01:02,620
I mean it might break.

13
00:01:02,620 --> 00:01:02,950
Right.

14
00:01:03,640 --> 00:01:10,150
So we want to make sure that it's working before we spend a long time training on 10000 images because

15
00:01:10,150 --> 00:01:16,420
training on a thousand images is going to go a lot faster than 10000 images and you could imagine the

16
00:01:16,420 --> 00:01:23,020
same goes if you increase it to 100000 images or in the case of image net fourteen point two million

17
00:01:24,240 --> 00:01:32,910
so there's one more variable that we have to define before we can get into training a model.

18
00:01:33,170 --> 00:01:35,760
And that is number of epochs.

19
00:01:35,900 --> 00:01:40,120
And so the number of epochs is how many passes of the data.

20
00:01:40,160 --> 00:01:47,270
We'd like our model to do and a pass is you can imagine is equivalent to our model trying to find patterns

21
00:01:47,360 --> 00:01:54,290
in each dog image and see which patterns relate to each label that's going to hear Naam epochs

22
00:01:57,550 --> 00:02:01,660
equals one hundred and we're going to do something cool here we're gonna create a little slider like

23
00:02:01,660 --> 00:02:14,420
we have before param and then we're going to create type slider and then the men will be 10 the max

24
00:02:14,420 --> 00:02:19,100
can be hundred and then the step can be 10 looking at how cool is that.

25
00:02:19,160 --> 00:02:24,770
So if we wanted to do 10 epochs we could do 10 but we're gonna stick it to 100 because we've got our

26
00:02:24,860 --> 00:02:30,910
early stopping callback and we're gonna see all of this in action in a second but essentially what a

27
00:02:30,920 --> 00:02:36,680
training pass is going to do if we go back to our Kino one epoch would be giving our model a single

28
00:02:36,680 --> 00:02:43,390
chance to look at all the training data and then validating it from their 100 epochs so we're giving

29
00:02:43,390 --> 00:02:50,200
our model up to 100 chances to go through the training data set and figure out the patterns so it's

30
00:02:50,200 --> 00:02:56,110
going to what a pass is it's going to go through these layers we're going to pass our training images

31
00:02:56,110 --> 00:03:01,930
to our model it's gonna go through these layers and then our model as it learns patterns is going to

32
00:03:01,930 --> 00:03:08,560
make guesses on which label belongs to which image and the worse off the guess is the higher the loss

33
00:03:08,560 --> 00:03:09,570
will be.

34
00:03:09,790 --> 00:03:11,710
So that's where Adam comes in.

35
00:03:11,710 --> 00:03:16,780
Remember that guy down the bottom of the hill at the International Hill descent championships is telling

36
00:03:16,780 --> 00:03:24,160
our model how it can improve its guesses with each epoch and then we're gonna see how our model is doing

37
00:03:24,250 --> 00:03:29,950
on an accuracy level because we've got that spectator judge at the bottom of the hill watching how well

38
00:03:29,980 --> 00:03:31,990
our model is performing.

39
00:03:31,990 --> 00:03:35,920
So again if all this doesn't make sense it's going to make more sense once we actually start to run

40
00:03:35,920 --> 00:03:42,550
a model so let's do one final check the final check is one last time to make sure we're using a GP you

41
00:03:43,540 --> 00:03:49,660
because if we're not using a GP you this training our model on images is going to take a very very long

42
00:03:49,660 --> 00:03:56,560
time you saw right at the beginning a GP you can speed up our code by up to 30 times and sometimes more

43
00:03:57,010 --> 00:04:07,030
let's write that code check to make sure we're still running on a GPO so print

44
00:04:10,050 --> 00:04:10,800
GP you

45
00:04:14,050 --> 00:04:27,900
available and then we're gonna go Yes If TAF got config not least physical devices and then we want

46
00:04:27,900 --> 00:04:37,170
to check GPA else not available so this is just the same code we wrote above but we need a GP to be

47
00:04:37,170 --> 00:04:39,900
out of run this in a considerable amount of time

48
00:04:42,600 --> 00:04:47,700
yes we could also have checked here remember that's a little shortcut trick but I love saying this print

49
00:04:47,700 --> 00:04:55,660
out here now boom we've got a GP running we've got a number of epochs set up let's create a simple function

50
00:04:55,720 --> 00:05:05,020
which trains a model so let's write that here let's create a function which trains a model and the function

51
00:05:05,020 --> 00:05:16,840
is going to create a model using create model and then we want to go set up a tensor board callback

52
00:05:19,030 --> 00:05:25,960
using our tensor board function so create C this is where creating functions has really come into play

53
00:05:25,960 --> 00:05:26,760
for us here.

54
00:05:27,250 --> 00:05:33,340
Otherwise we'd have to be writing code ad hoc but writing functions to begin with takes a little longer

55
00:05:33,670 --> 00:05:42,320
but in the long run saves us time so call the fit function on our model.

56
00:05:42,320 --> 00:05:57,950
Passing it the training data validation data number of epochs to train for which is now a box we just

57
00:05:57,950 --> 00:06:03,890
defined that without sick little slider and the callbacks which are our helper functions.

58
00:06:03,890 --> 00:06:09,060
We'd like to use and then finally we want it to return the model.

59
00:06:09,350 --> 00:06:15,120
We're gonna change this in a markdown that is looking phenomenal.

60
00:06:15,120 --> 00:06:16,910
Okay let's go here.

61
00:06:17,160 --> 00:06:25,050
Build a function to train and return a trained model.

62
00:06:25,050 --> 00:06:25,830
So here we are.

63
00:06:25,950 --> 00:06:28,730
Def train well I can't believe we're finally up to this.

64
00:06:28,740 --> 00:06:30,510
This is amazing.

65
00:06:30,630 --> 00:06:38,790
We're training our first deep learning and no network so trains a given model and returns the trained

66
00:06:38,790 --> 00:06:39,300
version

67
00:06:43,630 --> 00:06:45,250
nice and simple.

68
00:06:45,310 --> 00:06:50,950
Remember even though we're getting excited doesn't mean that we're losing our communicative habits so

69
00:06:50,950 --> 00:06:52,000
we'll create a model.

70
00:06:53,070 --> 00:06:57,100
So we're going to model equals say this is where our create model function comes in.

71
00:06:57,220 --> 00:06:59,810
Oh yeah that's so satisfying to write.

72
00:07:00,130 --> 00:07:08,860
And then we go create new tensor board session every time we train a model because remember with our

73
00:07:08,860 --> 00:07:19,000
tensor board with our tensor board callback we are actually we just call this tensor board.

74
00:07:19,000 --> 00:07:20,180
We don't need that callback.

75
00:07:20,950 --> 00:07:23,870
We set up our function.

76
00:07:24,160 --> 00:07:29,740
So if we come back up here every time it's called it's going to create a new folder in logs with the

77
00:07:29,740 --> 00:07:30,670
current day time.

78
00:07:30,670 --> 00:07:35,800
So this is important because every time we run our train model function so every time we try to train

79
00:07:35,800 --> 00:07:43,110
a new model a new experiment we're creating a logs folder which we can then track our model's performance.

80
00:07:43,180 --> 00:07:47,950
So that's a real helpful thing later on when we're trying to evaluate which experiment did better than

81
00:07:47,950 --> 00:07:48,330
another.

82
00:07:49,210 --> 00:07:58,460
And then we go fit the model to the data passing it the callbacks we created.

83
00:07:58,480 --> 00:08:03,250
Now again there's probably a better way to do this function but we're just trying to make it so that

84
00:08:03,280 --> 00:08:07,780
it works and then we can go back after we've sort of got through the phase of fitting a model for the

85
00:08:07,780 --> 00:08:10,120
first time and refine it.

86
00:08:10,260 --> 00:08:17,320
That's got a model to fit and we're gonna pass it X equals the train data which is a data batch.

87
00:08:17,320 --> 00:08:24,340
If you remember which contains the images and labels so epochs is going to be num epochs a.k.a..

88
00:08:24,340 --> 00:08:30,190
How many times is our model allowed to look at the training data before it stops or how many chances

89
00:08:30,190 --> 00:08:36,250
did our model have to pass over the entire training dataset to find patterns.

90
00:08:36,250 --> 00:08:45,890
And then the validation data is going to be Val data which is a data bunch as well or data back.

91
00:08:45,890 --> 00:08:46,650
Sorry.

92
00:08:46,850 --> 00:08:54,470
And the validation frequency is how often we want to test the patterns that our model has found on our

93
00:08:54,470 --> 00:09:00,260
validation set and we set it to 1 because we want it to test its patterns that it's found in the training

94
00:09:00,260 --> 00:09:02,620
set on the validation data every epoch.

95
00:09:02,630 --> 00:09:09,620
So once an epoch and then our callbacks are tense a board which two e's just created above with our

96
00:09:09,620 --> 00:09:14,920
function and early stopping who.

97
00:09:15,610 --> 00:09:19,530
And then we want to return model put a little

98
00:09:22,070 --> 00:09:25,030
return the fitted model.

99
00:09:25,190 --> 00:09:27,260
How cool is that.

100
00:09:28,250 --> 00:09:40,900
And now we can fit our model to the data by as simple as going model equals train model.

101
00:09:41,060 --> 00:09:46,160
You see the benefits in creating functions so we could run this function after doing a bunch of different

102
00:09:46,160 --> 00:09:50,000
experiments later on and so are you ready now.

103
00:09:50,000 --> 00:09:51,520
A little tidbit here.

104
00:09:51,710 --> 00:09:57,050
When training a model for the first time especially on image data or any other kind of large scale data

105
00:09:57,770 --> 00:10:02,270
the first epoch will usually take the longest compared to the rest.

106
00:10:02,360 --> 00:10:07,820
Now this is because the functions that we've written is getting the data and it's been initialized so

107
00:10:07,850 --> 00:10:16,140
a.k.a. loading the data into the memory of our GP you now using more data will generally take longer

108
00:10:16,200 --> 00:10:24,270
which is why we've started with a thousand images and so after the first epoch right we've set it up

109
00:10:24,270 --> 00:10:25,710
so it's capped at 100.

110
00:10:25,710 --> 00:10:30,780
The first one might take a couple of minutes after the first epoch is subsequent epoch should only take

111
00:10:30,780 --> 00:10:33,030
a couple of seconds so you're ready.

112
00:10:33,300 --> 00:10:34,200
We're gonna do this together.

113
00:10:34,200 --> 00:10:40,280
Fingers crossed all of our functions because this actually depends on a fair few lines of code up here.

114
00:10:40,410 --> 00:10:47,110
Let's say it works 3 2 1 Oh look at that.

115
00:10:47,110 --> 00:10:49,780
Building a model with my bone at V2

116
00:10:53,790 --> 00:10:56,860
I've got my fingers crossed.

117
00:10:57,060 --> 00:10:58,030
I hope you do too.

118
00:11:00,570 --> 00:11:02,530
Oh yes.

119
00:11:02,950 --> 00:11:04,850
It's starting up a posse.

120
00:11:04,860 --> 00:11:05,280
There we go.

121
00:11:05,280 --> 00:11:14,310
A pork one of 100 because a box is set to 100 so it's going to try and for twenty five steps which means

122
00:11:14,970 --> 00:11:16,020
our batch size.

123
00:11:16,020 --> 00:11:21,120
So if we go we have eight hundred images in our training set because remember we started with a thousand.

124
00:11:21,330 --> 00:11:28,350
So if we divide that by thirty two we get twenty five so that's why training is twenty five steps because

125
00:11:28,380 --> 00:11:32,850
there's batch size 32 and validation.

126
00:11:32,850 --> 00:11:40,220
If we go 200 divided by 32 it's gonna rounded up so it's rounded it up to seven.

127
00:11:40,380 --> 00:11:46,810
That's why we have validation so 200 images over batch sizes 32 and check this out.

128
00:11:46,810 --> 00:11:51,690
There we go 88 about five minutes.

129
00:11:51,710 --> 00:11:54,050
And so here's what I'm talking about with the loss.

130
00:11:54,050 --> 00:11:59,160
So now all these things that we created right up here so the loss.

131
00:11:59,180 --> 00:12:06,080
So our goal is remember to minimize the loss because we're at the top of the hill we're afraid of heights.

132
00:12:06,220 --> 00:12:09,350
We want to get to the bottom of the hill you want to minimize the loss function.

133
00:12:09,400 --> 00:12:11,620
The loss is the height of the hill.

134
00:12:11,830 --> 00:12:20,690
We come down here so hopefully this number goes down but we want to maximize accuracy so that's how

135
00:12:20,690 --> 00:12:26,120
we'll know if our model is training or not if it's learning patterns this loss should go down in this

136
00:12:26,130 --> 00:12:34,010
accuracy should go up so it's gonna take a few minutes to get through the first epoch I'm going to wait

137
00:12:34,070 --> 00:12:40,640
until it's gone through the first epoch and then so I'll speed this video up and then I'll see you once

138
00:12:40,640 --> 00:12:46,480
it's past the first epoch because you're gonna see subsequent epochs after this so epoch 2 out of 100

139
00:12:46,490 --> 00:12:49,700
is gonna be pretty quick because we use the GP you.

140
00:12:49,820 --> 00:12:55,190
The only reason the first one takes a while is because it has to load all of those images into the GP

141
00:12:55,190 --> 00:12:56,720
Q memory.

142
00:12:56,720 --> 00:13:03,210
I'll see you pretty instantaneously but for me it'll be about three minutes Alrighty I'm back.

143
00:13:03,210 --> 00:13:09,120
So we've got an ACA here which stands for estimated time of arrival is about 12 seconds left in this

144
00:13:09,120 --> 00:13:09,890
first epoch.

145
00:13:09,900 --> 00:13:16,920
As long as everything goes to plan and so you can see the loss has reduced slightly and the accuracy

146
00:13:16,920 --> 00:13:17,880
has increased.

147
00:13:18,600 --> 00:13:20,300
So this is a good thing.

148
00:13:20,340 --> 00:13:26,640
Now what we should see at the end of this first epoch if we set up correctly we should say this is lost

149
00:13:26,640 --> 00:13:29,370
on the training set and accuracy on the training set.

150
00:13:29,550 --> 00:13:36,480
We should see the loss on the validation data and the accuracy on the validation set.

151
00:13:36,480 --> 00:13:40,310
Pop up at any second now so we're just waiting.

152
00:13:40,320 --> 00:13:45,840
So probably what it's doing now is it's been a little bit longer than 12 seconds but it's loading the

153
00:13:45,840 --> 00:13:54,070
validation data into memory so it can calculate and evaluate how it's performing on that dataset.

154
00:13:54,080 --> 00:13:58,110
I can't believe we're doing this we like training a model together in real time.

155
00:13:58,130 --> 00:13:59,390
This is phenomenal.

156
00:13:59,420 --> 00:14:00,930
I've never done this before.

157
00:14:04,580 --> 00:14:09,210
That is as long as your is working here we go.

158
00:14:09,240 --> 00:14:10,160
Look at that.

159
00:14:10,170 --> 00:14:12,000
So now we have this one I'm talking about.

160
00:14:12,000 --> 00:14:16,690
We have this metric is on the training data and this metric is on the validation.

161
00:14:16,710 --> 00:14:19,620
So see how quickly our accuracy is improving.

162
00:14:19,680 --> 00:14:22,260
Now I told you epochs are going to speed up.

163
00:14:22,260 --> 00:14:23,010
Look at that.

164
00:14:23,190 --> 00:14:27,010
The first one took about five minutes but these are taking five seconds.

165
00:14:27,090 --> 00:14:31,160
We're already at one point zero accuracy that's 100 percent accuracy on the training data.

166
00:14:31,160 --> 00:14:32,370
That is insane.

167
00:14:32,370 --> 00:14:38,460
So what this is telling us actually is it's a good thing our model is over fitting because it's performing

168
00:14:38,460 --> 00:14:44,670
way better on the training data than it is on the validation data but this is great.

169
00:14:44,670 --> 00:14:51,030
Our model is working it's finding patterns it's using what mobile Net V2 has learned an image net and

170
00:14:51,120 --> 00:14:57,750
it's applying it to our dog's dataset dog vision is coming to life.

171
00:14:58,020 --> 00:14:59,720
So stoked.

172
00:14:59,910 --> 00:15:05,260
Now I hope yours is running like this as well because we're training a model together in real time.

173
00:15:05,280 --> 00:15:12,870
This is beautiful so what we should see is once the validation accuracy stops improving for a number

174
00:15:12,870 --> 00:15:13,920
of epochs.

175
00:15:14,040 --> 00:15:19,610
So three because of our early stopping callback it's going to stop training.

176
00:15:19,800 --> 00:15:24,440
So we actually I don't think we'll reach the 100 epochs maybe not even close.

177
00:15:24,570 --> 00:15:33,320
Cassie here we're getting one point zero accuracy with a tiny loss on our training data so here we go

178
00:15:34,410 --> 00:15:38,590
it stopped.

179
00:15:38,830 --> 00:15:40,510
My heart is racing.

180
00:15:40,540 --> 00:15:41,680
That was phenomenal.

181
00:15:41,680 --> 00:15:42,850
So see what I meant.

182
00:15:42,850 --> 00:15:46,450
The first epoch takes a while because it's loading data into memory.

183
00:15:46,450 --> 00:15:52,510
Once it's loaded that initial data and the memory the GPO is just going to go all right time to step

184
00:15:52,510 --> 00:15:57,680
on the gas and we'll turn on the afterburners or gas no brakes maybe.

185
00:15:57,760 --> 00:16:04,540
So we reached 18 out of 100 epochs and we can see that our model on our training data set is performing

186
00:16:04,630 --> 00:16:06,280
at 100 per cent accuracy.

187
00:16:06,280 --> 00:16:07,760
That's pretty crazy 1.0.

188
00:16:07,810 --> 00:16:09,730
You're going to times this number by 100.

189
00:16:09,730 --> 00:16:11,380
The loss is fairly low.

190
00:16:11,380 --> 00:16:18,610
I mean a loss of zero is is perfect but we can tell that our model is over fitting because it's performing

191
00:16:18,610 --> 00:16:23,550
way better on the training data than it is on the validation data.

192
00:16:23,560 --> 00:16:30,430
So if we come back it's like our model has memorized the course materials rather than the problem solving

193
00:16:30,430 --> 00:16:37,120
principles of those course materials and it's struggling to adapt to a data set it hasn't seen before

194
00:16:37,210 --> 00:16:40,860
a.k.a. a practice exam or our validation set.

195
00:16:40,870 --> 00:16:45,770
So right now our model has a poor ability to generalize.

196
00:16:46,080 --> 00:16:48,320
So that was I think that's enough for one video.

197
00:16:48,330 --> 00:16:55,320
That's a lot of excitement what I want you to do a little bit of homework is before we go and check

198
00:16:55,320 --> 00:17:01,590
the tensor board box in the next video because we've used our tensor board callback before.

199
00:17:02,040 --> 00:17:07,120
I want you to check out this is a question is this is a really important one.

200
00:17:07,230 --> 00:17:12,730
After we've tried to model I want you to go question.

201
00:17:12,820 --> 00:17:19,690
It looks like our model is over feeding because it's performing

202
00:17:22,300 --> 00:17:35,290
far better on the training data set than the validation dataset What are some ways to prevent model

203
00:17:35,380 --> 00:17:46,470
over feeding in deep learning neural networks so we'll turn that into markdown.

204
00:17:46,540 --> 00:17:49,660
That's a question I want you to look up before we get into the next video.

205
00:17:50,830 --> 00:17:56,440
What are some ways to prevent model over feeding and deep known networks so check that out even if you're

206
00:17:56,440 --> 00:17:57,190
not sure of them.

207
00:17:57,190 --> 00:18:01,020
I just want you to start getting curious about what's going on.

208
00:18:01,020 --> 00:18:06,550
I want you to stop picking up on the clues as to when your model is over feeding such as performing

209
00:18:06,550 --> 00:18:10,290
far better on the training data set than the validation data set.

210
00:18:10,810 --> 00:18:20,330
And a little note here note over fitting to begin with is a good thing.

211
00:18:20,620 --> 00:18:29,890
It means our model is learning so that I'm phenomenally excited that we have just trained our first

212
00:18:29,890 --> 00:18:36,170
deep learning neural network together using transfer Learning dog vision is coming to life.

213
00:18:36,190 --> 00:18:41,110
So we finish training a model at least a first subset of a model.

214
00:18:41,110 --> 00:18:42,190
This is not on the full data.

215
00:18:42,190 --> 00:18:43,660
This is only on 1000 images.

216
00:18:43,660 --> 00:18:50,500
So our next step is to if we come into our keynote what are we up to so we fit the model to the data

217
00:18:50,560 --> 00:18:52,150
we haven't made a prediction yet.

218
00:18:52,660 --> 00:18:59,320
So what we might do is evaluate the model using our tensor board callback so you can see that and then

219
00:18:59,320 --> 00:19:06,640
we'll figure out how to make a prediction with our trained model sound like a plan so check out this

220
00:19:06,640 --> 00:19:09,110
question and I'll see you in the next video.