1 00:00:01,050 --> 00:00:06,680 In this video, we are going to learn about how to save and restore our model. 2 00:00:08,160 --> 00:00:10,980 We have created several models in the past few lectures. 3 00:00:12,490 --> 00:00:19,320 Now, if you want to see the detail for model that can be saved in a separate file on our system. 4 00:00:19,980 --> 00:00:27,420 And whenever we want to retrieve or restore our model using that saved information, we can do it by 5 00:00:27,420 --> 00:00:31,380 reading that filing model information. 6 00:00:31,530 --> 00:00:33,660 Basically consists of three things. 7 00:00:34,320 --> 00:00:35,730 One is model architecture. 8 00:00:36,450 --> 00:00:37,900 Second model configuration. 9 00:00:38,460 --> 00:00:41,430 And thirdly, DVDs of the train model. 10 00:00:44,340 --> 00:00:50,940 So when we save a model, all this information will be stored in a file. 11 00:00:51,720 --> 00:00:56,850 The file has a format called its D.F. File or its file. 12 00:00:57,930 --> 00:00:59,760 Both of these things stand for the same thing. 13 00:01:00,750 --> 00:01:06,800 When you say, well, fine, you can save it with its DFI format, Dolto or HFA format. 14 00:01:07,000 --> 00:01:10,890 Also, then you save a model like this. 15 00:01:11,610 --> 00:01:18,330 This model will have the entire information of architecture, configuration and Digweed of this model. 16 00:01:20,550 --> 00:01:28,230 So in the last video, when we fitted our model using functional EPA, we then this line of code. 17 00:01:30,900 --> 00:01:34,080 And this model func is now a train model. 18 00:01:35,190 --> 00:01:43,130 If you want to store all the information that is in this model, you can run this save model and just 19 00:01:43,230 --> 00:01:44,420 put its DFI. 20 00:01:44,590 --> 00:01:45,040 Come on. 21 00:01:46,050 --> 00:01:47,250 And given name to the fight. 22 00:01:47,910 --> 00:01:53,160 This will be the name of the file with which it will be stored in your working directory. 23 00:01:55,050 --> 00:02:01,470 So if you go to these files that here you will see this Driton here. 24 00:02:01,980 --> 00:02:04,840 This is the working directory for your obsession. 25 00:02:05,760 --> 00:02:12,390 If you want to change the working directory, you just click here, select wherever you want to go, 26 00:02:13,620 --> 00:02:15,360 open that particular location. 27 00:02:16,650 --> 00:02:21,420 And then you go into the more options and click on Settings Working Directory. 28 00:02:22,770 --> 00:02:27,420 This will set that particular address as your working directory for our decision. 29 00:02:30,460 --> 00:02:33,510 So I have said this location as my working directory 30 00:02:36,630 --> 00:02:37,920 when I ran this line of code. 31 00:02:40,170 --> 00:02:42,610 It will create a new file title. 32 00:02:42,960 --> 00:02:43,930 My model dot. 33 00:02:44,100 --> 00:02:44,610 It's for you. 34 00:02:45,420 --> 00:02:50,190 And this has the information of my entire neural network. 35 00:02:53,510 --> 00:03:00,710 No, whenever I want to create a new model containing the same architecture and the same weight, I 36 00:03:00,710 --> 00:03:06,970 can load all the information from this file using the Lord underscore model underscored a deep faith 37 00:03:06,980 --> 00:03:07,430 function. 38 00:03:08,600 --> 00:03:16,010 So my new model will have the exactly same information that was saved in the model function. 39 00:03:16,210 --> 00:03:20,810 Also, if you look at the summary of dysfunction. 40 00:03:25,560 --> 00:03:31,740 It has input layer to the layers when concatenation layer and one output layer. 41 00:03:33,120 --> 00:03:40,410 Similarly, a new model also has input led to hidden layers. 42 00:03:40,650 --> 00:03:47,100 When concatenation layer and one output layer, if you check the performance of both of these model 43 00:03:47,370 --> 00:03:50,010 on new data, it will also be exactly same. 44 00:03:54,090 --> 00:04:01,680 Another thing we will cover here is use of callbacks we saw when we were putting our model. 45 00:04:02,190 --> 00:04:09,900 We gave a barometer called Epochs Epoch suggested the number of times our model went through the entire 46 00:04:09,900 --> 00:04:10,610 training dataset. 47 00:04:12,420 --> 00:04:21,810 Now, at the end of each epoch, we saw that our model has a set value of offbeat corresponding to which 48 00:04:21,930 --> 00:04:26,040 there is a particular value of accuracy and loss. 49 00:04:28,470 --> 00:04:34,920 Now, sometimes when we have a lot of epochs, our model overfit on the training data. 50 00:04:36,750 --> 00:04:44,630 So we may want to store the information at the end of all the epochs and Lawder model from in-between 51 00:04:44,720 --> 00:04:45,350 the e-books. 52 00:04:45,660 --> 00:04:52,980 So, for example, we can run the model for the P box and probably load the weight of the model from 53 00:04:52,980 --> 00:04:55,470 the 30th book or the 28 epoch. 54 00:04:57,720 --> 00:05:00,390 This can be achieved using callbacks. 55 00:05:02,970 --> 00:05:09,990 Let me show you how in these first few lines I am creating a directory where this file will be saved. 56 00:05:10,650 --> 00:05:17,820 So basically I am creating a new folder called Check Point in this working directory. 57 00:05:18,300 --> 00:05:27,570 So when I run this command, this has created a new variable called Checkpoint Day containing the name 58 00:05:27,630 --> 00:05:28,310 Checkpoint. 59 00:05:29,220 --> 00:05:37,710 And when I run directory create command, it will create a new folder in my working directly with the 60 00:05:37,710 --> 00:05:38,640 name Checkpoint. 61 00:05:39,960 --> 00:05:43,800 Now I am creating a variable called file. 62 00:05:43,830 --> 00:05:52,200 But this file but variable is containing the information of where the files will be stored and what 63 00:05:52,200 --> 00:05:53,520 will be the name of the fight. 64 00:05:54,920 --> 00:05:58,110 So file but function has two parameters. 65 00:05:58,380 --> 00:06:02,430 The first parameter is the location of the folder. 66 00:06:02,460 --> 00:06:06,920 We want to store defined and the second is the name of divides. 67 00:06:08,680 --> 00:06:19,230 Now, since we will be running the model for multiple epochs such as 30 or 50 bucks, each file should 68 00:06:19,230 --> 00:06:20,100 have a different name. 69 00:06:20,420 --> 00:06:27,870 Otherwise, the same file will be overwritten multiple times to have a different name for each file. 70 00:06:28,770 --> 00:06:30,390 We can use something like this. 71 00:06:33,300 --> 00:06:39,420 So the name of the file that I am suggesting is name will be Epoch Hyphen. 72 00:06:40,080 --> 00:06:47,520 And this part, which is in the curly bracket, is basically suggesting that it is a variable containing 73 00:06:47,520 --> 00:06:53,250 the number of epoch and the number should be in the two digit format. 74 00:06:54,000 --> 00:06:59,530 That is the first epoch will be the DNS epoch hyphen zero one. 75 00:07:00,200 --> 00:07:03,780 The second will be written as Epoch hyphen zero two and so on. 76 00:07:05,440 --> 00:07:12,840 Now, with the part variable ready, we create a new variable called S.P. Underscore Callback, which 77 00:07:12,840 --> 00:07:16,290 uses dysfunction callback model checkpoint. 78 00:07:17,220 --> 00:07:19,830 And it has one important parameter. 79 00:07:20,460 --> 00:07:26,070 This is mandatory that is defined but but should have both things. 80 00:07:26,400 --> 00:07:28,740 One is the directory and what is the name. 81 00:07:29,500 --> 00:07:33,470 So that information we have stored in file but variable already. 82 00:07:34,830 --> 00:07:37,020 So we run these lines of code. 83 00:07:37,020 --> 00:07:44,040 Also filed by Terry Mills, creator and S.P. Underscore Callback Wytheville is also created. 84 00:07:45,300 --> 00:07:54,360 Now before we create a new model on the same data, because the model information is stored in model 85 00:07:54,360 --> 00:08:01,560 func variable plus all the information is also stored in the data session that is running the background. 86 00:08:02,430 --> 00:08:04,110 We can clear that information. 87 00:08:04,680 --> 00:08:07,290 It will have two helpful impact. 88 00:08:07,500 --> 00:08:13,260 One is it will clear the memory that is being used in our system so our system will perform better. 89 00:08:14,310 --> 00:08:19,950 Secondly, when we train our new model, it will start afresh and it will not start with the weight 90 00:08:19,980 --> 00:08:22,050 that we have been previously trained with. 91 00:08:24,000 --> 00:08:31,600 So to clear the history, we removed the previously trained variable using the item function and we 92 00:08:31,610 --> 00:08:36,480 clear decision using gay underscore, clear and association function. 93 00:08:36,930 --> 00:08:42,250 This will clear all the background information that we have stored in the Get US library. 94 00:08:46,590 --> 00:08:48,180 Now we'll be training a new model. 95 00:08:48,690 --> 00:08:51,990 This model I'm calling as model underscore callback. 96 00:08:54,570 --> 00:09:00,290 First, we define the architecture, using it as underscored moral function, then we configured our 97 00:09:00,300 --> 00:09:02,160 model using the combined function. 98 00:09:05,150 --> 00:09:07,350 Then we train our model using different function. 99 00:09:08,240 --> 00:09:13,090 But when we are training our model, we have to include one new parameter. 100 00:09:13,460 --> 00:09:15,440 This barometer is called callbacks. 101 00:09:16,610 --> 00:09:23,840 And in this, we have to specify the callback variable, which contains the information of when that 102 00:09:23,840 --> 00:09:26,570 file is to be created and where it is to be created. 103 00:09:28,400 --> 00:09:35,930 So this checkpoint is for each epoch and at each epoch a file will be saved at this file. 104 00:09:36,670 --> 00:09:39,860 But when I run different function. 105 00:09:50,010 --> 00:09:53,910 At the end of each epoch, a new file is being saved. 106 00:09:58,020 --> 00:10:06,990 Let us go to the files tab, open the checkpoint and you can see that we have here 30 files corresponding 107 00:10:06,990 --> 00:10:07,970 to each book. 108 00:10:09,090 --> 00:10:13,590 Each of these files contains the information of the neural network. 109 00:10:13,860 --> 00:10:23,340 At the end of that particular epoch to at the end of First Epoch, our model was giving a loss of twenty 110 00:10:23,340 --> 00:10:30,400 three point forty five with a mean absolute error of three point forward, one nine debates. 111 00:10:30,690 --> 00:10:34,710 At this particular point are stored in the Epoch File one. 112 00:10:39,130 --> 00:10:45,400 You can also check the list of files stored in the territory using the list DOT files function, it 113 00:10:45,400 --> 00:10:48,310 will list down all defiance that you have in this territory. 114 00:10:50,500 --> 00:10:54,460 And if you want to load the lead in a particular model. 115 00:10:54,790 --> 00:11:02,620 So if I want to load debate in the 10th model into a new model that I'm calling as 10th model, only 116 00:11:03,760 --> 00:11:08,590 you can use load model as DFI function, as we saw earlier also. 117 00:11:10,380 --> 00:11:13,510 And we can specify the flight path where it is located. 118 00:11:14,320 --> 00:11:19,830 So the Epoch hyphen 10 file is located at the checkpoint barratry. 119 00:11:21,670 --> 00:11:26,810 And when we done this tent model variable is created. 120 00:11:27,040 --> 00:11:31,260 And it is a model containing the information of the 10th Epoch. 121 00:11:31,600 --> 00:11:37,600 So whatever the were at this particular point are now assigned to the Pentagon. 122 00:11:39,100 --> 00:11:44,470 You can see the summary of this 10th model, although the architecture will be automatically assigned 123 00:11:44,470 --> 00:11:51,100 as the same architecture that we define for the model on which we were fitting the training data to 124 00:11:51,370 --> 00:11:57,210 whatever architecture that model called back hard will be the same architecture assigned to any model. 125 00:12:00,540 --> 00:12:07,680 There is one important feature of this callback process that we can use, instead of saving all the 126 00:12:07,680 --> 00:12:13,020 30 files, we can save only one file, which has the best model. 127 00:12:14,010 --> 00:12:16,350 And how do we judge that, which is the best model? 128 00:12:16,890 --> 00:12:25,650 We can say that whichever model has the least validation laws saved the information of only that model 129 00:12:26,310 --> 00:12:28,650 and do not save the information of any other model. 130 00:12:30,480 --> 00:12:38,640 So when we are defining our backward evil using callback, just go model and record check point function, 131 00:12:40,080 --> 00:12:42,900 we can add two more parameters. 132 00:12:44,670 --> 00:12:48,180 Five out is mandatory first parameter. 133 00:12:48,540 --> 00:12:51,450 Do we add it is what values to be monitored. 134 00:12:52,440 --> 00:12:56,900 Since this is a regression problem, we'll be looking at validation loss. 135 00:12:57,570 --> 00:13:02,130 If it would have been a classification problem, we would be monitoring the accuracy. 136 00:13:03,360 --> 00:13:07,600 And the second parameter is still best only is equal to two. 137 00:13:09,120 --> 00:13:14,770 This means that only the model with best validation loss. 138 00:13:15,090 --> 00:13:22,950 That is the least validation loss will be saved in this file with the title best. 139 00:13:23,050 --> 00:13:27,630 And just got a model dot edge for you to recreate this variable first. 140 00:13:30,060 --> 00:13:31,590 Delete the previous models 141 00:13:35,160 --> 00:13:38,460 and start building our new model. 142 00:13:39,560 --> 00:13:44,130 We define the architecture, configure it and frame it. 143 00:13:48,750 --> 00:13:51,340 So it will again done for all Dieter D-Box. 144 00:13:53,210 --> 00:14:01,530 But whichever Epoch has the least validation loss which is shown by this green graph here. 145 00:14:02,210 --> 00:14:03,530 Only that will be seen. 146 00:14:05,780 --> 00:14:13,470 So if we go back to a working directory, here's our best underscored model dot HFA. 147 00:14:14,450 --> 00:14:20,480 This contains the information of the model with least validation loss. 148 00:14:21,920 --> 00:14:26,930 You may also notice here that instead of validation data, I'm just using best data. 149 00:14:27,440 --> 00:14:35,600 Since this dataset has only 506 observations out of which we have 102 observations at test data instead 150 00:14:35,600 --> 00:14:37,410 of creating a separate validation set. 151 00:14:37,520 --> 00:14:41,420 I'm just using deep test dataset as the validation dataset here. 152 00:14:43,550 --> 00:14:49,640 If you have larger data, I would recommend that you keep a small amount of data separately as a validation 153 00:14:49,660 --> 00:14:55,430 dataset, which you can use here to monitor the validation loss. 154 00:14:58,280 --> 00:15:01,610 So whichever is the best model is stored in this file. 155 00:15:01,940 --> 00:15:08,260 You can load all the information of this model and do your new best model variable. 156 00:15:09,590 --> 00:15:10,820 And here you have this model. 157 00:15:11,270 --> 00:15:14,150 You can use this model for predicting in the future. 158 00:15:18,190 --> 00:15:24,460 The last thing I'm going to discuss in this lecture is the feature of a release stopping in callbacks. 159 00:15:25,510 --> 00:15:26,800 It is a very important feature. 160 00:15:28,210 --> 00:15:36,490 The basic concept is instead of running the model for all the epochs, we can specify a large value 161 00:15:36,490 --> 00:15:41,380 for the epochs randy model and stop it. 162 00:15:41,590 --> 00:15:44,740 When we do not see much improvement in the accuracy. 163 00:15:46,900 --> 00:15:52,930 So basically, instead of running it for only 30, we tell that one in four hundred D-Box. 164 00:15:53,860 --> 00:16:00,940 But stop at that point of time when you stop seeing any improvement in a particular metric that we have 165 00:16:00,940 --> 00:16:01,470 specified. 166 00:16:02,350 --> 00:16:08,830 For example, here I'm creating that callback variable in that variable. 167 00:16:08,920 --> 00:16:09,970 I'm adding two parts. 168 00:16:10,360 --> 00:16:16,540 One part is the same model checkpoint part, which contains the name of defined in which the information 169 00:16:16,540 --> 00:16:17,130 will be stored. 170 00:16:19,180 --> 00:16:26,110 What parameter is to be monitored and whether to save only the best model or save all the models. 171 00:16:27,730 --> 00:16:30,740 The other part is callback, at least stopping. 172 00:16:32,650 --> 00:16:39,370 In this part, we tell what parameters to be monitored and how much patience do we have. 173 00:16:39,640 --> 00:16:45,730 That is, for how many epochs will we see whether there is an improvement or not. 174 00:16:46,840 --> 00:16:54,760 If there is no improvement for three consecutive epochs in the validation, lost parameter will top 175 00:16:54,850 --> 00:16:56,050 our model training. 176 00:16:56,120 --> 00:17:02,080 There are only two basically earlier when we ran our model. 177 00:17:02,500 --> 00:17:09,370 You can see here that the validation loss is not improving much on liddick training losses decreasing. 178 00:17:09,520 --> 00:17:12,970 But even that improvement is not significant. 179 00:17:14,350 --> 00:17:19,270 This is suggesting that we are just overfitting on our model to prevent this. 180 00:17:19,360 --> 00:17:23,890 What we're doing, we'll be using at least stopping of training our model. 181 00:17:25,900 --> 00:17:27,370 So we run this code. 182 00:17:29,260 --> 00:17:31,610 This has created our callbacks list variable. 183 00:17:34,320 --> 00:17:35,770 We believed the previous model. 184 00:17:36,070 --> 00:17:37,460 We cleared the gate gas station. 185 00:17:38,320 --> 00:17:44,160 Then we redefined the architecture of our new model and we configured it. 186 00:17:45,370 --> 00:17:53,620 And now when we are training our model, we specify a large number of epochs because we know that their 187 00:17:53,620 --> 00:18:01,210 training will automatically stop when there's no significant improvement in the validation loss. 188 00:18:01,330 --> 00:18:05,860 But I will do so again in the validation data set. 189 00:18:05,890 --> 00:18:07,230 I have given the test data set. 190 00:18:07,240 --> 00:18:10,350 Only callbacks is equal to callbacks list. 191 00:18:12,110 --> 00:18:12,880 We've done this. 192 00:18:16,840 --> 00:18:26,160 It starts trading, but it stops at the fifth epoch only because after the first epoch, there was some 193 00:18:26,160 --> 00:18:31,320 improvement in the validation loss from twenty became nineteen point five. 194 00:18:32,760 --> 00:18:38,970 But after that, in the next three consecutive epochs, deregulation laws did not decrease. 195 00:18:40,500 --> 00:18:43,410 So we stopped blaming our model. 196 00:18:43,690 --> 00:18:49,320 I did 50 Bocconi, too, even though we specified that we will run this model. 197 00:18:49,320 --> 00:18:50,270 Four hundred epochs. 198 00:18:50,940 --> 00:18:57,330 We were saved the processing pain and or draining all of our model because we used at least dumping 199 00:18:58,110 --> 00:19:01,320 this model in the fifth epoch is stored. 200 00:19:01,350 --> 00:19:09,920 And this best model of stopping or it's five, you can load that model using the load model is DFI function. 201 00:19:12,930 --> 00:19:16,000 Well, that's all in this lecture. 202 00:19:16,110 --> 00:19:21,720 We saw how to see all the information of a model into a separate file. 203 00:19:23,490 --> 00:19:28,950 The benefit of this is you can share this file with your colleagues, with your student or anyone in 204 00:19:28,950 --> 00:19:29,520 your office. 205 00:19:30,300 --> 00:19:37,380 And whenever someone loads a model from that file, that model will have the exact same architecture 206 00:19:37,500 --> 00:19:39,380 and debate that your model had. 207 00:19:42,090 --> 00:19:44,700 Then we saw the importance of using callbacks. 208 00:19:45,300 --> 00:19:48,900 Callbacks helped us save model at each epoch. 209 00:19:49,620 --> 00:19:51,090 Then we saw how to save. 210 00:19:52,350 --> 00:19:58,500 Then we saw how to use saved best only parameter to save only one file. 211 00:19:58,710 --> 00:20:01,680 Instead of saving separate files for each book. 212 00:20:03,420 --> 00:20:10,710 And lastly, we saw the use of at least orbing functionality so that we can prevent overfitting and 213 00:20:10,860 --> 00:20:13,320 excessive processing time to bring our model. 214 00:20:15,210 --> 00:20:15,650 Thanks.