1
00:00:01,050 --> 00:00:06,680
In this video, we are going to learn about how to save and restore our model.

2
00:00:08,160 --> 00:00:10,980
We have created several models in the past few lectures.

3
00:00:12,490 --> 00:00:19,320
Now, if you want to see the detail for model that can be saved in a separate file on our system.

4
00:00:19,980 --> 00:00:27,420
And whenever we want to retrieve or restore our model using that saved information, we can do it by

5
00:00:27,420 --> 00:00:31,380
reading that filing model information.

6
00:00:31,530 --> 00:00:33,660
Basically consists of three things.

7
00:00:34,320 --> 00:00:35,730
One is model architecture.

8
00:00:36,450 --> 00:00:37,900
Second model configuration.

9
00:00:38,460 --> 00:00:41,430
And thirdly, DVDs of the train model.

10
00:00:44,340 --> 00:00:50,940
So when we save a model, all this information will be stored in a file.

11
00:00:51,720 --> 00:00:56,850
The file has a format called its D.F. File or its file.

12
00:00:57,930 --> 00:00:59,760
Both of these things stand for the same thing.

13
00:01:00,750 --> 00:01:06,800
When you say, well, fine, you can save it with its DFI format, Dolto or HFA format.

14
00:01:07,000 --> 00:01:10,890
Also, then you save a model like this.

15
00:01:11,610 --> 00:01:18,330
This model will have the entire information of architecture, configuration and Digweed of this model.

16
00:01:20,550 --> 00:01:28,230
So in the last video, when we fitted our model using functional EPA, we then this line of code.

17
00:01:30,900 --> 00:01:34,080
And this model func is now a train model.

18
00:01:35,190 --> 00:01:43,130
If you want to store all the information that is in this model, you can run this save model and just

19
00:01:43,230 --> 00:01:44,420
put its DFI.

20
00:01:44,590 --> 00:01:45,040
Come on.

21
00:01:46,050 --> 00:01:47,250
And given name to the fight.

22
00:01:47,910 --> 00:01:53,160
This will be the name of the file with which it will be stored in your working directory.

23
00:01:55,050 --> 00:02:01,470
So if you go to these files that here you will see this Driton here.

24
00:02:01,980 --> 00:02:04,840
This is the working directory for your obsession.

25
00:02:05,760 --> 00:02:12,390
If you want to change the working directory, you just click here, select wherever you want to go,

26
00:02:13,620 --> 00:02:15,360
open that particular location.

27
00:02:16,650 --> 00:02:21,420
And then you go into the more options and click on Settings Working Directory.

28
00:02:22,770 --> 00:02:27,420
This will set that particular address as your working directory for our decision.

29
00:02:30,460 --> 00:02:33,510
So I have said this location as my working directory

30
00:02:36,630 --> 00:02:37,920
when I ran this line of code.

31
00:02:40,170 --> 00:02:42,610
It will create a new file title.

32
00:02:42,960 --> 00:02:43,930
My model dot.

33
00:02:44,100 --> 00:02:44,610
It's for you.

34
00:02:45,420 --> 00:02:50,190
And this has the information of my entire neural network.

35
00:02:53,510 --> 00:03:00,710
No, whenever I want to create a new model containing the same architecture and the same weight, I

36
00:03:00,710 --> 00:03:06,970
can load all the information from this file using the Lord underscore model underscored a deep faith

37
00:03:06,980 --> 00:03:07,430
function.

38
00:03:08,600 --> 00:03:16,010
So my new model will have the exactly same information that was saved in the model function.

39
00:03:16,210 --> 00:03:20,810
Also, if you look at the summary of dysfunction.

40
00:03:25,560 --> 00:03:31,740
It has input layer to the layers when concatenation layer and one output layer.

41
00:03:33,120 --> 00:03:40,410
Similarly, a new model also has input led to hidden layers.

42
00:03:40,650 --> 00:03:47,100
When concatenation layer and one output layer, if you check the performance of both of these model

43
00:03:47,370 --> 00:03:50,010
on new data, it will also be exactly same.

44
00:03:54,090 --> 00:04:01,680
Another thing we will cover here is use of callbacks we saw when we were putting our model.

45
00:04:02,190 --> 00:04:09,900
We gave a barometer called Epochs Epoch suggested the number of times our model went through the entire

46
00:04:09,900 --> 00:04:10,610
training dataset.

47
00:04:12,420 --> 00:04:21,810
Now, at the end of each epoch, we saw that our model has a set value of offbeat corresponding to which

48
00:04:21,930 --> 00:04:26,040
there is a particular value of accuracy and loss.

49
00:04:28,470 --> 00:04:34,920
Now, sometimes when we have a lot of epochs, our model overfit on the training data.

50
00:04:36,750 --> 00:04:44,630
So we may want to store the information at the end of all the epochs and Lawder model from in-between

51
00:04:44,720 --> 00:04:45,350
the e-books.

52
00:04:45,660 --> 00:04:52,980
So, for example, we can run the model for the P box and probably load the weight of the model from

53
00:04:52,980 --> 00:04:55,470
the 30th book or the 28 epoch.

54
00:04:57,720 --> 00:05:00,390
This can be achieved using callbacks.

55
00:05:02,970 --> 00:05:09,990
Let me show you how in these first few lines I am creating a directory where this file will be saved.

56
00:05:10,650 --> 00:05:17,820
So basically I am creating a new folder called Check Point in this working directory.

57
00:05:18,300 --> 00:05:27,570
So when I run this command, this has created a new variable called Checkpoint Day containing the name

58
00:05:27,630 --> 00:05:28,310
Checkpoint.

59
00:05:29,220 --> 00:05:37,710
And when I run directory create command, it will create a new folder in my working directly with the

60
00:05:37,710 --> 00:05:38,640
name Checkpoint.

61
00:05:39,960 --> 00:05:43,800
Now I am creating a variable called file.

62
00:05:43,830 --> 00:05:52,200
But this file but variable is containing the information of where the files will be stored and what

63
00:05:52,200 --> 00:05:53,520
will be the name of the fight.

64
00:05:54,920 --> 00:05:58,110
So file but function has two parameters.

65
00:05:58,380 --> 00:06:02,430
The first parameter is the location of the folder.

66
00:06:02,460 --> 00:06:06,920
We want to store defined and the second is the name of divides.

67
00:06:08,680 --> 00:06:19,230
Now, since we will be running the model for multiple epochs such as 30 or 50 bucks, each file should

68
00:06:19,230 --> 00:06:20,100
have a different name.

69
00:06:20,420 --> 00:06:27,870
Otherwise, the same file will be overwritten multiple times to have a different name for each file.

70
00:06:28,770 --> 00:06:30,390
We can use something like this.

71
00:06:33,300 --> 00:06:39,420
So the name of the file that I am suggesting is name will be Epoch Hyphen.

72
00:06:40,080 --> 00:06:47,520
And this part, which is in the curly bracket, is basically suggesting that it is a variable containing

73
00:06:47,520 --> 00:06:53,250
the number of epoch and the number should be in the two digit format.

74
00:06:54,000 --> 00:06:59,530
That is the first epoch will be the DNS epoch hyphen zero one.

75
00:07:00,200 --> 00:07:03,780
The second will be written as Epoch hyphen zero two and so on.

76
00:07:05,440 --> 00:07:12,840
Now, with the part variable ready, we create a new variable called S.P. Underscore Callback, which

77
00:07:12,840 --> 00:07:16,290
uses dysfunction callback model checkpoint.

78
00:07:17,220 --> 00:07:19,830
And it has one important parameter.

79
00:07:20,460 --> 00:07:26,070
This is mandatory that is defined but but should have both things.

80
00:07:26,400 --> 00:07:28,740
One is the directory and what is the name.

81
00:07:29,500 --> 00:07:33,470
So that information we have stored in file but variable already.

82
00:07:34,830 --> 00:07:37,020
So we run these lines of code.

83
00:07:37,020 --> 00:07:44,040
Also filed by Terry Mills, creator and S.P. Underscore Callback Wytheville is also created.

84
00:07:45,300 --> 00:07:54,360
Now before we create a new model on the same data, because the model information is stored in model

85
00:07:54,360 --> 00:08:01,560
func variable plus all the information is also stored in the data session that is running the background.

86
00:08:02,430 --> 00:08:04,110
We can clear that information.

87
00:08:04,680 --> 00:08:07,290
It will have two helpful impact.

88
00:08:07,500 --> 00:08:13,260
One is it will clear the memory that is being used in our system so our system will perform better.

89
00:08:14,310 --> 00:08:19,950
Secondly, when we train our new model, it will start afresh and it will not start with the weight

90
00:08:19,980 --> 00:08:22,050
that we have been previously trained with.

91
00:08:24,000 --> 00:08:31,600
So to clear the history, we removed the previously trained variable using the item function and we

92
00:08:31,610 --> 00:08:36,480
clear decision using gay underscore, clear and association function.

93
00:08:36,930 --> 00:08:42,250
This will clear all the background information that we have stored in the Get US library.

94
00:08:46,590 --> 00:08:48,180
Now we'll be training a new model.

95
00:08:48,690 --> 00:08:51,990
This model I'm calling as model underscore callback.

96
00:08:54,570 --> 00:09:00,290
First, we define the architecture, using it as underscored moral function, then we configured our

97
00:09:00,300 --> 00:09:02,160
model using the combined function.

98
00:09:05,150 --> 00:09:07,350
Then we train our model using different function.

99
00:09:08,240 --> 00:09:13,090
But when we are training our model, we have to include one new parameter.

100
00:09:13,460 --> 00:09:15,440
This barometer is called callbacks.

101
00:09:16,610 --> 00:09:23,840
And in this, we have to specify the callback variable, which contains the information of when that

102
00:09:23,840 --> 00:09:26,570
file is to be created and where it is to be created.

103
00:09:28,400 --> 00:09:35,930
So this checkpoint is for each epoch and at each epoch a file will be saved at this file.

104
00:09:36,670 --> 00:09:39,860
But when I run different function.

105
00:09:50,010 --> 00:09:53,910
At the end of each epoch, a new file is being saved.

106
00:09:58,020 --> 00:10:06,990
Let us go to the files tab, open the checkpoint and you can see that we have here 30 files corresponding

107
00:10:06,990 --> 00:10:07,970
to each book.

108
00:10:09,090 --> 00:10:13,590
Each of these files contains the information of the neural network.

109
00:10:13,860 --> 00:10:23,340
At the end of that particular epoch to at the end of First Epoch, our model was giving a loss of twenty

110
00:10:23,340 --> 00:10:30,400
three point forty five with a mean absolute error of three point forward, one nine debates.

111
00:10:30,690 --> 00:10:34,710
At this particular point are stored in the Epoch File one.

112
00:10:39,130 --> 00:10:45,400
You can also check the list of files stored in the territory using the list DOT files function, it

113
00:10:45,400 --> 00:10:48,310
will list down all defiance that you have in this territory.

114
00:10:50,500 --> 00:10:54,460
And if you want to load the lead in a particular model.

115
00:10:54,790 --> 00:11:02,620
So if I want to load debate in the 10th model into a new model that I'm calling as 10th model, only

116
00:11:03,760 --> 00:11:08,590
you can use load model as DFI function, as we saw earlier also.

117
00:11:10,380 --> 00:11:13,510
And we can specify the flight path where it is located.

118
00:11:14,320 --> 00:11:19,830
So the Epoch hyphen 10 file is located at the checkpoint barratry.

119
00:11:21,670 --> 00:11:26,810
And when we done this tent model variable is created.

120
00:11:27,040 --> 00:11:31,260
And it is a model containing the information of the 10th Epoch.

121
00:11:31,600 --> 00:11:37,600
So whatever the were at this particular point are now assigned to the Pentagon.

122
00:11:39,100 --> 00:11:44,470
You can see the summary of this 10th model, although the architecture will be automatically assigned

123
00:11:44,470 --> 00:11:51,100
as the same architecture that we define for the model on which we were fitting the training data to

124
00:11:51,370 --> 00:11:57,210
whatever architecture that model called back hard will be the same architecture assigned to any model.

125
00:12:00,540 --> 00:12:07,680
There is one important feature of this callback process that we can use, instead of saving all the

126
00:12:07,680 --> 00:12:13,020
30 files, we can save only one file, which has the best model.

127
00:12:14,010 --> 00:12:16,350
And how do we judge that, which is the best model?

128
00:12:16,890 --> 00:12:25,650
We can say that whichever model has the least validation laws saved the information of only that model

129
00:12:26,310 --> 00:12:28,650
and do not save the information of any other model.

130
00:12:30,480 --> 00:12:38,640
So when we are defining our backward evil using callback, just go model and record check point function,

131
00:12:40,080 --> 00:12:42,900
we can add two more parameters.

132
00:12:44,670 --> 00:12:48,180
Five out is mandatory first parameter.

133
00:12:48,540 --> 00:12:51,450
Do we add it is what values to be monitored.

134
00:12:52,440 --> 00:12:56,900
Since this is a regression problem, we'll be looking at validation loss.

135
00:12:57,570 --> 00:13:02,130
If it would have been a classification problem, we would be monitoring the accuracy.

136
00:13:03,360 --> 00:13:07,600
And the second parameter is still best only is equal to two.

137
00:13:09,120 --> 00:13:14,770
This means that only the model with best validation loss.

138
00:13:15,090 --> 00:13:22,950
That is the least validation loss will be saved in this file with the title best.

139
00:13:23,050 --> 00:13:27,630
And just got a model dot edge for you to recreate this variable first.

140
00:13:30,060 --> 00:13:31,590
Delete the previous models

141
00:13:35,160 --> 00:13:38,460
and start building our new model.

142
00:13:39,560 --> 00:13:44,130
We define the architecture, configure it and frame it.

143
00:13:48,750 --> 00:13:51,340
So it will again done for all Dieter D-Box.

144
00:13:53,210 --> 00:14:01,530
But whichever Epoch has the least validation loss which is shown by this green graph here.

145
00:14:02,210 --> 00:14:03,530
Only that will be seen.

146
00:14:05,780 --> 00:14:13,470
So if we go back to a working directory, here's our best underscored model dot HFA.

147
00:14:14,450 --> 00:14:20,480
This contains the information of the model with least validation loss.

148
00:14:21,920 --> 00:14:26,930
You may also notice here that instead of validation data, I'm just using best data.

149
00:14:27,440 --> 00:14:35,600
Since this dataset has only 506 observations out of which we have 102 observations at test data instead

150
00:14:35,600 --> 00:14:37,410
of creating a separate validation set.

151
00:14:37,520 --> 00:14:41,420
I'm just using deep test dataset as the validation dataset here.

152
00:14:43,550 --> 00:14:49,640
If you have larger data, I would recommend that you keep a small amount of data separately as a validation

153
00:14:49,660 --> 00:14:55,430
dataset, which you can use here to monitor the validation loss.

154
00:14:58,280 --> 00:15:01,610
So whichever is the best model is stored in this file.

155
00:15:01,940 --> 00:15:08,260
You can load all the information of this model and do your new best model variable.

156
00:15:09,590 --> 00:15:10,820
And here you have this model.

157
00:15:11,270 --> 00:15:14,150
You can use this model for predicting in the future.

158
00:15:18,190 --> 00:15:24,460
The last thing I'm going to discuss in this lecture is the feature of a release stopping in callbacks.

159
00:15:25,510 --> 00:15:26,800
It is a very important feature.

160
00:15:28,210 --> 00:15:36,490
The basic concept is instead of running the model for all the epochs, we can specify a large value

161
00:15:36,490 --> 00:15:41,380
for the epochs randy model and stop it.

162
00:15:41,590 --> 00:15:44,740
When we do not see much improvement in the accuracy.

163
00:15:46,900 --> 00:15:52,930
So basically, instead of running it for only 30, we tell that one in four hundred D-Box.

164
00:15:53,860 --> 00:16:00,940
But stop at that point of time when you stop seeing any improvement in a particular metric that we have

165
00:16:00,940 --> 00:16:01,470
specified.

166
00:16:02,350 --> 00:16:08,830
For example, here I'm creating that callback variable in that variable.

167
00:16:08,920 --> 00:16:09,970
I'm adding two parts.

168
00:16:10,360 --> 00:16:16,540
One part is the same model checkpoint part, which contains the name of defined in which the information

169
00:16:16,540 --> 00:16:17,130
will be stored.

170
00:16:19,180 --> 00:16:26,110
What parameter is to be monitored and whether to save only the best model or save all the models.

171
00:16:27,730 --> 00:16:30,740
The other part is callback, at least stopping.

172
00:16:32,650 --> 00:16:39,370
In this part, we tell what parameters to be monitored and how much patience do we have.

173
00:16:39,640 --> 00:16:45,730
That is, for how many epochs will we see whether there is an improvement or not.

174
00:16:46,840 --> 00:16:54,760
If there is no improvement for three consecutive epochs in the validation, lost parameter will top

175
00:16:54,850 --> 00:16:56,050
our model training.

176
00:16:56,120 --> 00:17:02,080
There are only two basically earlier when we ran our model.

177
00:17:02,500 --> 00:17:09,370
You can see here that the validation loss is not improving much on liddick training losses decreasing.

178
00:17:09,520 --> 00:17:12,970
But even that improvement is not significant.

179
00:17:14,350 --> 00:17:19,270
This is suggesting that we are just overfitting on our model to prevent this.

180
00:17:19,360 --> 00:17:23,890
What we're doing, we'll be using at least stopping of training our model.

181
00:17:25,900 --> 00:17:27,370
So we run this code.

182
00:17:29,260 --> 00:17:31,610
This has created our callbacks list variable.

183
00:17:34,320 --> 00:17:35,770
We believed the previous model.

184
00:17:36,070 --> 00:17:37,460
We cleared the gate gas station.

185
00:17:38,320 --> 00:17:44,160
Then we redefined the architecture of our new model and we configured it.

186
00:17:45,370 --> 00:17:53,620
And now when we are training our model, we specify a large number of epochs because we know that their

187
00:17:53,620 --> 00:18:01,210
training will automatically stop when there's no significant improvement in the validation loss.

188
00:18:01,330 --> 00:18:05,860
But I will do so again in the validation data set.

189
00:18:05,890 --> 00:18:07,230
I have given the test data set.

190
00:18:07,240 --> 00:18:10,350
Only callbacks is equal to callbacks list.

191
00:18:12,110 --> 00:18:12,880
We've done this.

192
00:18:16,840 --> 00:18:26,160
It starts trading, but it stops at the fifth epoch only because after the first epoch, there was some

193
00:18:26,160 --> 00:18:31,320
improvement in the validation loss from twenty became nineteen point five.

194
00:18:32,760 --> 00:18:38,970
But after that, in the next three consecutive epochs, deregulation laws did not decrease.

195
00:18:40,500 --> 00:18:43,410
So we stopped blaming our model.

196
00:18:43,690 --> 00:18:49,320
I did 50 Bocconi, too, even though we specified that we will run this model.

197
00:18:49,320 --> 00:18:50,270
Four hundred epochs.

198
00:18:50,940 --> 00:18:57,330
We were saved the processing pain and or draining all of our model because we used at least dumping

199
00:18:58,110 --> 00:19:01,320
this model in the fifth epoch is stored.

200
00:19:01,350 --> 00:19:09,920
And this best model of stopping or it's five, you can load that model using the load model is DFI function.

201
00:19:12,930 --> 00:19:16,000
Well, that's all in this lecture.

202
00:19:16,110 --> 00:19:21,720
We saw how to see all the information of a model into a separate file.

203
00:19:23,490 --> 00:19:28,950
The benefit of this is you can share this file with your colleagues, with your student or anyone in

204
00:19:28,950 --> 00:19:29,520
your office.

205
00:19:30,300 --> 00:19:37,380
And whenever someone loads a model from that file, that model will have the exact same architecture

206
00:19:37,500 --> 00:19:39,380
and debate that your model had.

207
00:19:42,090 --> 00:19:44,700
Then we saw the importance of using callbacks.

208
00:19:45,300 --> 00:19:48,900
Callbacks helped us save model at each epoch.

209
00:19:49,620 --> 00:19:51,090
Then we saw how to save.

210
00:19:52,350 --> 00:19:58,500
Then we saw how to use saved best only parameter to save only one file.

211
00:19:58,710 --> 00:20:01,680
Instead of saving separate files for each book.

212
00:20:03,420 --> 00:20:10,710
And lastly, we saw the use of at least orbing functionality so that we can prevent overfitting and

213
00:20:10,860 --> 00:20:13,320
excessive processing time to bring our model.

214
00:20:15,210 --> 00:20:15,650
Thanks.