1
00:00:00,210 --> 00:00:06,960
In this lesson we're finally going to train our model and this is going to involve starting a session

2
00:00:07,650 --> 00:00:11,940
patching our data and then finally running the training loop.

3
00:00:12,000 --> 00:00:14,910
It's gonna be a pretty jam packed lesson coming up.

4
00:00:14,910 --> 00:00:16,280
Let's dive right in.

5
00:00:16,620 --> 00:00:23,180
The first thing I'll do is I'll add a markdown cell here that reads run session.

6
00:00:24,570 --> 00:00:32,310
So the weight tensor flow works is that it's got these session objects and the session objects encapsulate

7
00:00:32,430 --> 00:00:40,170
the environment under which all the operations and all the calculations take place and are executed.

8
00:00:40,170 --> 00:00:46,380
So we've done all this setup previously with worked out these place holders and we've said what calculations

9
00:00:46,380 --> 00:00:52,770
should take place and what the operation is going to be for minimizing the loss of our lost function

10
00:00:53,070 --> 00:00:54,020
and so on.

11
00:00:54,180 --> 00:00:59,640
All of these calculations only take place inside a session.

12
00:00:59,640 --> 00:01:08,730
The way we create one of these is with T F dot session and parentheses at the end.

13
00:01:08,730 --> 00:01:14,920
I'm actually going to store this session that I'm creating here in a variable called cess.

14
00:01:15,090 --> 00:01:20,670
The next thing I have to do is I'm going to have to initialize all the variables.

15
00:01:20,670 --> 00:01:24,830
I'll just add a comment here so I'll say in it.

16
00:01:25,020 --> 00:01:33,690
These are all our variables and they're going to be initialized with T F dot global variables initialized.

17
00:01:34,410 --> 00:01:36,060
So there we go.

18
00:01:36,060 --> 00:01:40,880
And next I want to feed all of these variables to the session.

19
00:01:41,040 --> 00:01:46,940
So says don't run open parentheses in it.

20
00:01:47,520 --> 00:01:50,340
These three lines of code get us all set up.

21
00:01:51,390 --> 00:01:58,600
So let me come up here where I see set up tensor flow graph and go to cell and run all below.

22
00:01:58,620 --> 00:02:05,220
Now that we've create our session initialized our variables and have run our session we can actually

23
00:02:05,220 --> 00:02:08,050
peek inside some of these tenses.

24
00:02:08,100 --> 00:02:14,370
So for example if we wanted to see what weights we've actually got for our first hidden layer then we

25
00:02:14,370 --> 00:02:23,070
can come down here and we can say w 1 will take our tensor and we'll say Val evaluate and then we have

26
00:02:23,070 --> 00:02:30,790
to provide the session as an argument if I had shift into I can actually see our starting weights.

27
00:02:31,020 --> 00:02:34,250
The same is true for our biases right.

28
00:02:34,250 --> 00:02:41,990
B One dot eval and then our session will show us that our biases are all initialized with the values

29
00:02:41,990 --> 00:02:44,490
0 for our first hidden layer.

30
00:02:44,490 --> 00:02:51,740
We've cut a lot of biases but for our output layer b 3 We've only got 10.

31
00:02:52,170 --> 00:02:57,990
So the way the tensor flow works is that with the setup what we're doing is we're building out this

32
00:02:58,170 --> 00:03:00,560
graph these calculations.

33
00:03:00,870 --> 00:03:04,690
It's almost like laying pipes when we actually run a session.

34
00:03:04,800 --> 00:03:10,230
The data starts flowing through these pipes that we've laid out and then we can actually evaluate our

35
00:03:10,230 --> 00:03:13,740
calculations and we can look at what's inside our variables.

36
00:03:13,740 --> 00:03:19,650
We can get some outputs and the calculations are actually executed.

37
00:03:19,650 --> 00:03:25,470
So now that we've covered that we're almost ready for writing our loop for training our model.

38
00:03:25,710 --> 00:03:33,120
But before we do that we're going to write some code that will split up our training data into smaller

39
00:03:33,120 --> 00:03:34,220
components.

40
00:03:34,350 --> 00:03:40,650
And the reason we're doing that is because we want to be able to train our model on batches of 1000

41
00:03:40,650 --> 00:03:43,140
samples at a time.

42
00:03:43,200 --> 00:03:49,040
Chances are that for your own projects you're gonna be working with quite large datasets.

43
00:03:49,080 --> 00:03:55,610
So having the skill of dividing up your data is gonna be essential let me add a quick mark down sell

44
00:03:55,610 --> 00:03:58,110
him and add a variable here.

45
00:03:58,130 --> 00:04:08,210
That's going to read size of batch and set that equal to 1000 and then I'll create three more variables.

46
00:04:08,210 --> 00:04:14,240
The number of examples I'm going to store in num underscore examples.

47
00:04:14,740 --> 00:04:20,080
And that's gonna be equal to the number of examples in our training dataset.

48
00:04:20,120 --> 00:04:26,580
So why on a school train dot shape square brackets zero.

49
00:04:26,600 --> 00:04:32,520
The next thing I'll do is figure out the number of iterations that we need for training.

50
00:04:32,570 --> 00:04:42,950
And here I see an our underscore iterations is equal to the number of examples divided by the size of

51
00:04:43,400 --> 00:04:44,000
the batch.

52
00:04:45,800 --> 00:04:52,960
Now if I wanted to make sure that this was indeed an integer then I could cast it or converted to integer

53
00:04:53,000 --> 00:04:55,420
using I.A. parentheses.

54
00:04:55,560 --> 00:04:58,760
Some examples divided by size of batch.

55
00:04:58,760 --> 00:05:05,810
The last thing I'll do is create an index variable so I'll see index on the score in underscore e Paul

56
00:05:06,470 --> 00:05:08,890
and I'll set that equal to zero.

57
00:05:09,140 --> 00:05:15,320
The index here will help us keep track of where one batch ends and the other batch of samples should

58
00:05:15,320 --> 00:05:22,460
start because what we want to do is we want the first batch to go from zero to nine hundred and ninety

59
00:05:22,470 --> 00:05:31,550
nine and we want the second batch to go from 1000 to one thousand nine and ninety nine and so on until

60
00:05:31,550 --> 00:05:37,280
we've chewed through all fifty thousand examples in our training dataset.

61
00:05:37,520 --> 00:05:41,580
The next thing we're gonna do is we're gonna define a function right.

62
00:05:41,600 --> 00:05:50,300
When I say next underscore batch is gonna encapsulate the logic for going from one batch to the next

63
00:05:50,300 --> 00:05:51,590
batch.

64
00:05:51,710 --> 00:05:54,560
This function of ours is going to take three parameters.

65
00:05:54,560 --> 00:05:57,770
The first will call batch size.

66
00:05:57,770 --> 00:06:00,740
So how big are the batches gonna be.

67
00:06:00,770 --> 00:06:03,720
The second thing is what's the dataset.

68
00:06:03,800 --> 00:06:06,800
And the third thing shall be the labels.

69
00:06:07,730 --> 00:06:16,010
So this function needs our exes and our Y's next I'm going to use the global keyword to get a hold of

70
00:06:16,130 --> 00:06:23,360
a variable that's outside of this function someone a reference number of examples which is outside right

71
00:06:23,360 --> 00:06:25,900
here using this global keyword.

72
00:06:26,750 --> 00:06:31,630
And I'm also going to reference the index in the epoch.

73
00:06:31,670 --> 00:06:35,790
This one here our index variable inside this function.

74
00:06:36,110 --> 00:06:42,250
And the reason I'm doing this is because we need to figure out the starting and the ending point the

75
00:06:42,260 --> 00:06:44,900
starting point is gonna be equal to our next variable.

76
00:06:45,230 --> 00:06:50,690
So it's gonna be equal to zero in the beginning but then our index will update.

77
00:06:50,690 --> 00:06:50,950
Right.

78
00:06:51,470 --> 00:06:58,070
So we'll take our index and we're going to add the batch size to it.

79
00:06:58,070 --> 00:07:03,600
So it starts out at zero and then next time round it's gonna be equal to 1000.

80
00:07:04,210 --> 00:07:13,310
So we'll see index in epoch is equal to index and epoch plus batch on the score size.

81
00:07:13,570 --> 00:07:20,390
Now oftentimes when you're updating a variable like this there is a shorthand notation so you can write

82
00:07:20,630 --> 00:07:24,460
plus equals and then batch size.

83
00:07:24,530 --> 00:07:30,350
So now that we've got the start figured out and we're moving our index along by the batch size we can

84
00:07:30,350 --> 00:07:31,340
think about the end.

85
00:07:31,340 --> 00:07:31,700
Right.

86
00:07:31,970 --> 00:07:39,320
So the end of the batch is gonna be at index in epoch.

87
00:07:39,320 --> 00:07:46,970
What this allows us to do is then return our x values and our y values that are between these two points

88
00:07:47,030 --> 00:07:54,860
between the starting point and the ending point so we can do this with our data square brackets start

89
00:07:55,280 --> 00:07:57,210
colon end.

90
00:07:57,470 --> 00:08:02,630
So what are we getting the very first time round that we call this function well we're gonna get all

91
00:08:02,630 --> 00:08:07,560
the values between 0 and 999.

92
00:08:07,580 --> 00:08:08,560
Why.

93
00:08:08,690 --> 00:08:12,010
Because it won't supply 1000 to the size of the batch.

94
00:08:12,590 --> 00:08:15,300
So the starting value will be equal to zero.

95
00:08:15,320 --> 00:08:21,440
Because that's where index and e pork is starting out then it's gonna be index and epoch is equal to

96
00:08:21,440 --> 00:08:27,280
itself plus 1000 and then the ending value will be equal to 1000.

97
00:08:27,440 --> 00:08:35,390
So if this notation you'll look at all the values between zero and this ending value not inclusive.

98
00:08:35,420 --> 00:08:36,320
Right.

99
00:08:36,380 --> 00:08:42,670
So why underscore train for example square brackets zero colon three.

100
00:08:42,740 --> 00:08:45,910
Well give us all the rows between 0 and 3.

101
00:08:45,920 --> 00:08:46,260
Right.

102
00:08:46,260 --> 00:08:47,940
0 1 2 right.

103
00:08:47,950 --> 00:08:51,890
That last one is at index 2 not 3.

104
00:08:51,980 --> 00:08:57,970
And that means the next time around this method gets called the starting value will be equal to 1000.

105
00:08:58,200 --> 00:09:03,020
Then the next updates to two thousand and then the ending value updates to two thousand.

106
00:09:03,240 --> 00:09:08,290
And we get everything between 1000 and one thousand nine hundred and ninety nine.

107
00:09:09,150 --> 00:09:14,520
But if we're going to return to things we're going to return our features which I've called data and

108
00:09:14,520 --> 00:09:23,040
we're going to return our labels right labels square brackets start colon and the only thing left to

109
00:09:23,040 --> 00:09:30,780
do is to include some logic between these two lines to capture what will happen when we get to the end

110
00:09:30,780 --> 00:09:32,490
of the dataset.

111
00:09:32,520 --> 00:09:39,930
So what will happen if the index is greater than the number of examples.

112
00:09:39,960 --> 00:09:43,420
So in this case we have to do a couple of things in this case.

113
00:09:43,470 --> 00:09:46,060
We should start the next epoch right.

114
00:09:46,080 --> 00:09:56,790
So start should be reset to equal zero and the index should be reset to the batch size.

115
00:09:56,790 --> 00:09:57,690
Brilliant.

116
00:09:57,690 --> 00:10:05,760
So now we can move on to our next step which is actually training the model lot a markdown cell here

117
00:10:06,210 --> 00:10:09,300
and that's going to read training loop.

118
00:10:09,300 --> 00:10:10,890
So what are we going to do him.

119
00:10:10,890 --> 00:10:18,960
What we'll do in our training loop is we'll go from epoch number 0 through to epoch number 5 so we'll

120
00:10:18,960 --> 00:10:25,640
start at zero and we'll go for whatever we've specified as one of our hyper parameters here.

121
00:10:25,770 --> 00:10:36,660
So we'll a for punk in range number of epochs so we're iterating through each epoch and then will iterate

122
00:10:36,870 --> 00:10:38,230
through our data itself.

123
00:10:38,370 --> 00:10:38,940
So we'll see.

124
00:10:38,940 --> 00:10:49,440
For i in range number of iterations number of iterations was equal to the number of examples divided

125
00:10:49,440 --> 00:10:52,500
by the size of the batch.

126
00:10:52,500 --> 00:10:56,130
If you'd like to know what that number is we can actually print it out here.

127
00:10:56,310 --> 00:10:59,410
It's 50 so there'll be 50 iterations.

128
00:10:59,410 --> 00:11:05,210
Paul take us through all 50000 examples 1000 examples at a time.

129
00:11:05,340 --> 00:11:06,560
It's a number of iterations.

130
00:11:07,310 --> 00:11:09,040
And what are we doing in each iteration.

131
00:11:09,210 --> 00:11:16,650
While the first thing is we'll need a batch of 1000 samples for features and 1000 samples for our labels

132
00:11:16,920 --> 00:11:28,380
so we'll see patch underscore X come up batch underscore Y is equal to next underscore batch the method

133
00:11:28,410 --> 00:11:32,370
that we created a minute ago will return to us two things.

134
00:11:32,580 --> 00:11:37,920
Some x values and some y values but it requires three inputs.

135
00:11:37,920 --> 00:11:42,710
So the patch signs are x values and y values right.

136
00:11:42,780 --> 00:11:52,410
So in this case they'll be batch size is equal to size of batch data shall be equal to X on a school

137
00:11:52,440 --> 00:11:59,580
train and labels shall be equal to y on the school train.

138
00:11:59,580 --> 00:12:02,780
That's that how we get to the really fun part.

139
00:12:02,850 --> 00:12:05,640
And by that I mean working with tensor flow.

140
00:12:06,330 --> 00:12:13,590
So what we have to do here is we have to create something called a feed dictionary a feed dictionary

141
00:12:13,710 --> 00:12:16,360
is nothing other than a Python dictionary.

142
00:12:16,410 --> 00:12:23,410
So we'll see feed underscore dictionary is equal to curly braces.

143
00:12:23,820 --> 00:12:29,150
And inside those curly braces will provide two key value pairs.

144
00:12:29,160 --> 00:12:37,350
The first is gonna be our X that capital X is gonna be the key and batch underscore X is gonna be the

145
00:12:37,350 --> 00:12:38,620
value.

146
00:12:38,640 --> 00:12:41,160
The second thing will be no surprise there.

147
00:12:41,190 --> 00:12:47,400
Capital Y as the key and batch underscore y as the value.

148
00:12:47,400 --> 00:12:52,760
The reason that we've created this is so that we can feed it to our session.

149
00:12:52,920 --> 00:12:56,300
Our session is gonna run all our calculations for us.

150
00:12:56,490 --> 00:13:00,680
So we'll take our session as ESFS.

151
00:13:00,690 --> 00:13:07,710
Remember we've created it up here and our session is gonna have this run method and a run method will

152
00:13:07,770 --> 00:13:09,420
do the calculations.

153
00:13:09,420 --> 00:13:17,970
The key calculation in our case is the training step the training step is the operation from our optimizer

154
00:13:18,150 --> 00:13:20,610
that minimizes our loss.

155
00:13:20,610 --> 00:13:22,740
What is it going to minimize the loss on.

156
00:13:23,490 --> 00:13:30,000
Well it's going to minimize it on the data that we're providing through our feed dictionary.

157
00:13:30,000 --> 00:13:37,380
So our feed dictionary is going to have our x's and our Y's are features and our labels and when we

158
00:13:37,380 --> 00:13:45,390
provide this alongside the lost minimization to the session it will run the calculations and update

159
00:13:45,660 --> 00:13:47,040
our weights.

160
00:13:47,040 --> 00:13:51,740
These two lines of code are really the second piece of this tensor flow puzzle.

161
00:13:51,810 --> 00:13:57,340
The first piece is setting up all these place holders and initializing all these variables and setting

162
00:13:57,340 --> 00:13:59,580
up all the calculations ahead of time.

163
00:13:59,590 --> 00:14:02,730
The second piece is actually running the calculations.

164
00:14:02,800 --> 00:14:10,060
So running that training step running our optimizer and running that calculation on our data which in

165
00:14:10,060 --> 00:14:13,750
our case is a batch of 1000 samples.

166
00:14:13,870 --> 00:14:18,410
But even though we're training our model let's not forget our accuracy metric.

167
00:14:18,430 --> 00:14:24,010
We've defined this calculation up here now's our chance to use it once again.

168
00:14:24,010 --> 00:14:29,140
This is where our session is going to come into play using the run method on our session can give us

169
00:14:29,140 --> 00:14:31,030
this accuracy as an output.

170
00:14:31,330 --> 00:14:41,380
So let's say batch underscore accuracy is equal to says Don't run parentheses.

171
00:14:41,400 --> 00:14:47,680
And here we need to specify what the session should fetch for us what output we want to get out of this

172
00:14:47,680 --> 00:14:51,580
session and tensor flow has a really funny word for this.

173
00:14:51,580 --> 00:14:52,930
It's called fetches.

174
00:14:52,930 --> 00:14:55,570
It's almost like a Caris callback.

175
00:14:55,570 --> 00:15:01,640
So what should fetches be equal to it should be equal to our accuracy calculation.

176
00:15:01,930 --> 00:15:07,930
And what's the accuracy going to be calculated on the accuracy is going to be calculated on our batch.

177
00:15:07,930 --> 00:15:15,250
So feed on a score dict is gonna be equal to our feed dictionary once again.

178
00:15:15,460 --> 00:15:21,730
Now since we're fetching this let's print it out but let's not print it out inside this for loop which

179
00:15:21,730 --> 00:15:23,300
will run 50 times.

180
00:15:23,380 --> 00:15:24,280
Let's print it out.

181
00:15:24,640 --> 00:15:27,280
Every epoch has that.

182
00:15:27,280 --> 00:15:31,080
So once as for loop is done well hit a print statement.

183
00:15:31,090 --> 00:15:36,280
We'll see print and I'll put an F string here and the first thing I'll do is I'll print out the epoch

184
00:15:36,310 --> 00:15:43,980
that's running so epoch curly braces lowercase epoch which is gonna be this one right here.

185
00:15:44,350 --> 00:15:52,420
Just the index in our loop and then I'm gonna tab over so I'll have an escape character backslash T

186
00:15:52,820 --> 00:16:04,340
and I'll put a pipe symbol there and I'll see training accuracy equal to curly braces batch underscore

187
00:16:05,500 --> 00:16:14,500
accuracy and when we're all done I'll just say print done training and this is the Mon we've all been

188
00:16:14,500 --> 00:16:14,990
waiting for.

189
00:16:14,990 --> 00:16:15,250
Right.

190
00:16:15,640 --> 00:16:21,920
So let's shift enter on this L in epoch 0.

191
00:16:22,030 --> 00:16:27,840
I've got a training accuracy on my batch of about 64 percent by Epoch 3.

192
00:16:27,910 --> 00:16:29,750
I'm up to 86 percent.

193
00:16:30,190 --> 00:16:38,290
And by epoch number four namely the fifth epoch I'm up to eighty seven point two percent on my training

194
00:16:38,290 --> 00:16:39,250
data set.

195
00:16:39,280 --> 00:16:40,970
Fantastic.

196
00:16:40,990 --> 00:16:47,470
So now that we've been successful in training our neural network we can add some bells and whistles

197
00:16:47,470 --> 00:16:56,290
to our code and we can also investigate in a bit more detail how it works what it does and we can modify

198
00:16:56,290 --> 00:17:02,320
it more to help us better evaluate what's actually going on for all of that.

199
00:17:02,470 --> 00:17:08,440
We're gonna be using tensor board once we have tensor boards set up we can start looking at our performance

200
00:17:08,800 --> 00:17:14,290
a bit more closely especially the performance on the evaluation data set.

201
00:17:14,290 --> 00:17:18,190
So far all of that and more I'll see you in the next lesson.

202
00:17:18,190 --> 00:17:19,800
You know the drill C in a bit.