1 00:00:00,530 --> 00:00:07,970 How exciting is this whole we've gone a long way from battling to get our images and stances our labels 2 00:00:07,970 --> 00:00:10,600 into tenses preparing our data. 3 00:00:10,610 --> 00:00:19,330 Choosing a model on tensor flow hub creating model callbacks we're finally at the stage where we're 4 00:00:19,330 --> 00:00:25,820 ready to train our first deep learning model so let's create a heading training a model. 5 00:00:25,900 --> 00:00:32,380 Now we're gonna put on a subset of data because if you remember back when we split our data up we're 6 00:00:32,380 --> 00:00:39,690 only going to start training on 1000 images because why do we want to do that. 7 00:00:39,760 --> 00:00:40,690 Let's write here. 8 00:00:40,780 --> 00:00:51,080 Our first model is only going to train on 1000 images to make sure everything is working. 9 00:00:51,490 --> 00:00:52,630 So that's what we're trying to do. 10 00:00:52,630 --> 00:00:55,200 We're trying to minimize our time between experiments. 11 00:00:55,210 --> 00:01:00,720 So we begin by starting on a subset of the data to make sure all of this code. 12 00:01:00,730 --> 00:01:02,620 I mean it might break. 13 00:01:02,620 --> 00:01:02,950 Right. 14 00:01:03,640 --> 00:01:10,150 So we want to make sure that it's working before we spend a long time training on 10000 images because 15 00:01:10,150 --> 00:01:16,420 training on a thousand images is going to go a lot faster than 10000 images and you could imagine the 16 00:01:16,420 --> 00:01:23,020 same goes if you increase it to 100000 images or in the case of image net fourteen point two million 17 00:01:24,240 --> 00:01:32,910 so there's one more variable that we have to define before we can get into training a model. 18 00:01:33,170 --> 00:01:35,760 And that is number of epochs. 19 00:01:35,900 --> 00:01:40,120 And so the number of epochs is how many passes of the data. 20 00:01:40,160 --> 00:01:47,270 We'd like our model to do and a pass is you can imagine is equivalent to our model trying to find patterns 21 00:01:47,360 --> 00:01:54,290 in each dog image and see which patterns relate to each label that's going to hear Naam epochs 22 00:01:57,550 --> 00:02:01,660 equals one hundred and we're going to do something cool here we're gonna create a little slider like 23 00:02:01,660 --> 00:02:14,420 we have before param and then we're going to create type slider and then the men will be 10 the max 24 00:02:14,420 --> 00:02:19,100 can be hundred and then the step can be 10 looking at how cool is that. 25 00:02:19,160 --> 00:02:24,770 So if we wanted to do 10 epochs we could do 10 but we're gonna stick it to 100 because we've got our 26 00:02:24,860 --> 00:02:30,910 early stopping callback and we're gonna see all of this in action in a second but essentially what a 27 00:02:30,920 --> 00:02:36,680 training pass is going to do if we go back to our Kino one epoch would be giving our model a single 28 00:02:36,680 --> 00:02:43,390 chance to look at all the training data and then validating it from their 100 epochs so we're giving 29 00:02:43,390 --> 00:02:50,200 our model up to 100 chances to go through the training data set and figure out the patterns so it's 30 00:02:50,200 --> 00:02:56,110 going to what a pass is it's going to go through these layers we're going to pass our training images 31 00:02:56,110 --> 00:03:01,930 to our model it's gonna go through these layers and then our model as it learns patterns is going to 32 00:03:01,930 --> 00:03:08,560 make guesses on which label belongs to which image and the worse off the guess is the higher the loss 33 00:03:08,560 --> 00:03:09,570 will be. 34 00:03:09,790 --> 00:03:11,710 So that's where Adam comes in. 35 00:03:11,710 --> 00:03:16,780 Remember that guy down the bottom of the hill at the International Hill descent championships is telling 36 00:03:16,780 --> 00:03:24,160 our model how it can improve its guesses with each epoch and then we're gonna see how our model is doing 37 00:03:24,250 --> 00:03:29,950 on an accuracy level because we've got that spectator judge at the bottom of the hill watching how well 38 00:03:29,980 --> 00:03:31,990 our model is performing. 39 00:03:31,990 --> 00:03:35,920 So again if all this doesn't make sense it's going to make more sense once we actually start to run 40 00:03:35,920 --> 00:03:42,550 a model so let's do one final check the final check is one last time to make sure we're using a GP you 41 00:03:43,540 --> 00:03:49,660 because if we're not using a GP you this training our model on images is going to take a very very long 42 00:03:49,660 --> 00:03:56,560 time you saw right at the beginning a GP you can speed up our code by up to 30 times and sometimes more 43 00:03:57,010 --> 00:04:07,030 let's write that code check to make sure we're still running on a GPO so print 44 00:04:10,050 --> 00:04:10,800 GP you 45 00:04:14,050 --> 00:04:27,900 available and then we're gonna go Yes If TAF got config not least physical devices and then we want 46 00:04:27,900 --> 00:04:37,170 to check GPA else not available so this is just the same code we wrote above but we need a GP to be 47 00:04:37,170 --> 00:04:39,900 out of run this in a considerable amount of time 48 00:04:42,600 --> 00:04:47,700 yes we could also have checked here remember that's a little shortcut trick but I love saying this print 49 00:04:47,700 --> 00:04:55,660 out here now boom we've got a GP running we've got a number of epochs set up let's create a simple function 50 00:04:55,720 --> 00:05:05,020 which trains a model so let's write that here let's create a function which trains a model and the function 51 00:05:05,020 --> 00:05:16,840 is going to create a model using create model and then we want to go set up a tensor board callback 52 00:05:19,030 --> 00:05:25,960 using our tensor board function so create C this is where creating functions has really come into play 53 00:05:25,960 --> 00:05:26,760 for us here. 54 00:05:27,250 --> 00:05:33,340 Otherwise we'd have to be writing code ad hoc but writing functions to begin with takes a little longer 55 00:05:33,670 --> 00:05:42,320 but in the long run saves us time so call the fit function on our model. 56 00:05:42,320 --> 00:05:57,950 Passing it the training data validation data number of epochs to train for which is now a box we just 57 00:05:57,950 --> 00:06:03,890 defined that without sick little slider and the callbacks which are our helper functions. 58 00:06:03,890 --> 00:06:09,060 We'd like to use and then finally we want it to return the model. 59 00:06:09,350 --> 00:06:15,120 We're gonna change this in a markdown that is looking phenomenal. 60 00:06:15,120 --> 00:06:16,910 Okay let's go here. 61 00:06:17,160 --> 00:06:25,050 Build a function to train and return a trained model. 62 00:06:25,050 --> 00:06:25,830 So here we are. 63 00:06:25,950 --> 00:06:28,730 Def train well I can't believe we're finally up to this. 64 00:06:28,740 --> 00:06:30,510 This is amazing. 65 00:06:30,630 --> 00:06:38,790 We're training our first deep learning and no network so trains a given model and returns the trained 66 00:06:38,790 --> 00:06:39,300 version 67 00:06:43,630 --> 00:06:45,250 nice and simple. 68 00:06:45,310 --> 00:06:50,950 Remember even though we're getting excited doesn't mean that we're losing our communicative habits so 69 00:06:50,950 --> 00:06:52,000 we'll create a model. 70 00:06:53,070 --> 00:06:57,100 So we're going to model equals say this is where our create model function comes in. 71 00:06:57,220 --> 00:06:59,810 Oh yeah that's so satisfying to write. 72 00:07:00,130 --> 00:07:08,860 And then we go create new tensor board session every time we train a model because remember with our 73 00:07:08,860 --> 00:07:19,000 tensor board with our tensor board callback we are actually we just call this tensor board. 74 00:07:19,000 --> 00:07:20,180 We don't need that callback. 75 00:07:20,950 --> 00:07:23,870 We set up our function. 76 00:07:24,160 --> 00:07:29,740 So if we come back up here every time it's called it's going to create a new folder in logs with the 77 00:07:29,740 --> 00:07:30,670 current day time. 78 00:07:30,670 --> 00:07:35,800 So this is important because every time we run our train model function so every time we try to train 79 00:07:35,800 --> 00:07:43,110 a new model a new experiment we're creating a logs folder which we can then track our model's performance. 80 00:07:43,180 --> 00:07:47,950 So that's a real helpful thing later on when we're trying to evaluate which experiment did better than 81 00:07:47,950 --> 00:07:48,330 another. 82 00:07:49,210 --> 00:07:58,460 And then we go fit the model to the data passing it the callbacks we created. 83 00:07:58,480 --> 00:08:03,250 Now again there's probably a better way to do this function but we're just trying to make it so that 84 00:08:03,280 --> 00:08:07,780 it works and then we can go back after we've sort of got through the phase of fitting a model for the 85 00:08:07,780 --> 00:08:10,120 first time and refine it. 86 00:08:10,260 --> 00:08:17,320 That's got a model to fit and we're gonna pass it X equals the train data which is a data batch. 87 00:08:17,320 --> 00:08:24,340 If you remember which contains the images and labels so epochs is going to be num epochs a.k.a.. 88 00:08:24,340 --> 00:08:30,190 How many times is our model allowed to look at the training data before it stops or how many chances 89 00:08:30,190 --> 00:08:36,250 did our model have to pass over the entire training dataset to find patterns. 90 00:08:36,250 --> 00:08:45,890 And then the validation data is going to be Val data which is a data bunch as well or data back. 91 00:08:45,890 --> 00:08:46,650 Sorry. 92 00:08:46,850 --> 00:08:54,470 And the validation frequency is how often we want to test the patterns that our model has found on our 93 00:08:54,470 --> 00:09:00,260 validation set and we set it to 1 because we want it to test its patterns that it's found in the training 94 00:09:00,260 --> 00:09:02,620 set on the validation data every epoch. 95 00:09:02,630 --> 00:09:09,620 So once an epoch and then our callbacks are tense a board which two e's just created above with our 96 00:09:09,620 --> 00:09:14,920 function and early stopping who. 97 00:09:15,610 --> 00:09:19,530 And then we want to return model put a little 98 00:09:22,070 --> 00:09:25,030 return the fitted model. 99 00:09:25,190 --> 00:09:27,260 How cool is that. 100 00:09:28,250 --> 00:09:40,900 And now we can fit our model to the data by as simple as going model equals train model. 101 00:09:41,060 --> 00:09:46,160 You see the benefits in creating functions so we could run this function after doing a bunch of different 102 00:09:46,160 --> 00:09:50,000 experiments later on and so are you ready now. 103 00:09:50,000 --> 00:09:51,520 A little tidbit here. 104 00:09:51,710 --> 00:09:57,050 When training a model for the first time especially on image data or any other kind of large scale data 105 00:09:57,770 --> 00:10:02,270 the first epoch will usually take the longest compared to the rest. 106 00:10:02,360 --> 00:10:07,820 Now this is because the functions that we've written is getting the data and it's been initialized so 107 00:10:07,850 --> 00:10:16,140 a.k.a. loading the data into the memory of our GP you now using more data will generally take longer 108 00:10:16,200 --> 00:10:24,270 which is why we've started with a thousand images and so after the first epoch right we've set it up 109 00:10:24,270 --> 00:10:25,710 so it's capped at 100. 110 00:10:25,710 --> 00:10:30,780 The first one might take a couple of minutes after the first epoch is subsequent epoch should only take 111 00:10:30,780 --> 00:10:33,030 a couple of seconds so you're ready. 112 00:10:33,300 --> 00:10:34,200 We're gonna do this together. 113 00:10:34,200 --> 00:10:40,280 Fingers crossed all of our functions because this actually depends on a fair few lines of code up here. 114 00:10:40,410 --> 00:10:47,110 Let's say it works 3 2 1 Oh look at that. 115 00:10:47,110 --> 00:10:49,780 Building a model with my bone at V2 116 00:10:53,790 --> 00:10:56,860 I've got my fingers crossed. 117 00:10:57,060 --> 00:10:58,030 I hope you do too. 118 00:11:00,570 --> 00:11:02,530 Oh yes. 119 00:11:02,950 --> 00:11:04,850 It's starting up a posse. 120 00:11:04,860 --> 00:11:05,280 There we go. 121 00:11:05,280 --> 00:11:14,310 A pork one of 100 because a box is set to 100 so it's going to try and for twenty five steps which means 122 00:11:14,970 --> 00:11:16,020 our batch size. 123 00:11:16,020 --> 00:11:21,120 So if we go we have eight hundred images in our training set because remember we started with a thousand. 124 00:11:21,330 --> 00:11:28,350 So if we divide that by thirty two we get twenty five so that's why training is twenty five steps because 125 00:11:28,380 --> 00:11:32,850 there's batch size 32 and validation. 126 00:11:32,850 --> 00:11:40,220 If we go 200 divided by 32 it's gonna rounded up so it's rounded it up to seven. 127 00:11:40,380 --> 00:11:46,810 That's why we have validation so 200 images over batch sizes 32 and check this out. 128 00:11:46,810 --> 00:11:51,690 There we go 88 about five minutes. 129 00:11:51,710 --> 00:11:54,050 And so here's what I'm talking about with the loss. 130 00:11:54,050 --> 00:11:59,160 So now all these things that we created right up here so the loss. 131 00:11:59,180 --> 00:12:06,080 So our goal is remember to minimize the loss because we're at the top of the hill we're afraid of heights. 132 00:12:06,220 --> 00:12:09,350 We want to get to the bottom of the hill you want to minimize the loss function. 133 00:12:09,400 --> 00:12:11,620 The loss is the height of the hill. 134 00:12:11,830 --> 00:12:20,690 We come down here so hopefully this number goes down but we want to maximize accuracy so that's how 135 00:12:20,690 --> 00:12:26,120 we'll know if our model is training or not if it's learning patterns this loss should go down in this 136 00:12:26,130 --> 00:12:34,010 accuracy should go up so it's gonna take a few minutes to get through the first epoch I'm going to wait 137 00:12:34,070 --> 00:12:40,640 until it's gone through the first epoch and then so I'll speed this video up and then I'll see you once 138 00:12:40,640 --> 00:12:46,480 it's past the first epoch because you're gonna see subsequent epochs after this so epoch 2 out of 100 139 00:12:46,490 --> 00:12:49,700 is gonna be pretty quick because we use the GP you. 140 00:12:49,820 --> 00:12:55,190 The only reason the first one takes a while is because it has to load all of those images into the GP 141 00:12:55,190 --> 00:12:56,720 Q memory. 142 00:12:56,720 --> 00:13:03,210 I'll see you pretty instantaneously but for me it'll be about three minutes Alrighty I'm back. 143 00:13:03,210 --> 00:13:09,120 So we've got an ACA here which stands for estimated time of arrival is about 12 seconds left in this 144 00:13:09,120 --> 00:13:09,890 first epoch. 145 00:13:09,900 --> 00:13:16,920 As long as everything goes to plan and so you can see the loss has reduced slightly and the accuracy 146 00:13:16,920 --> 00:13:17,880 has increased. 147 00:13:18,600 --> 00:13:20,300 So this is a good thing. 148 00:13:20,340 --> 00:13:26,640 Now what we should see at the end of this first epoch if we set up correctly we should say this is lost 149 00:13:26,640 --> 00:13:29,370 on the training set and accuracy on the training set. 150 00:13:29,550 --> 00:13:36,480 We should see the loss on the validation data and the accuracy on the validation set. 151 00:13:36,480 --> 00:13:40,310 Pop up at any second now so we're just waiting. 152 00:13:40,320 --> 00:13:45,840 So probably what it's doing now is it's been a little bit longer than 12 seconds but it's loading the 153 00:13:45,840 --> 00:13:54,070 validation data into memory so it can calculate and evaluate how it's performing on that dataset. 154 00:13:54,080 --> 00:13:58,110 I can't believe we're doing this we like training a model together in real time. 155 00:13:58,130 --> 00:13:59,390 This is phenomenal. 156 00:13:59,420 --> 00:14:00,930 I've never done this before. 157 00:14:04,580 --> 00:14:09,210 That is as long as your is working here we go. 158 00:14:09,240 --> 00:14:10,160 Look at that. 159 00:14:10,170 --> 00:14:12,000 So now we have this one I'm talking about. 160 00:14:12,000 --> 00:14:16,690 We have this metric is on the training data and this metric is on the validation. 161 00:14:16,710 --> 00:14:19,620 So see how quickly our accuracy is improving. 162 00:14:19,680 --> 00:14:22,260 Now I told you epochs are going to speed up. 163 00:14:22,260 --> 00:14:23,010 Look at that. 164 00:14:23,190 --> 00:14:27,010 The first one took about five minutes but these are taking five seconds. 165 00:14:27,090 --> 00:14:31,160 We're already at one point zero accuracy that's 100 percent accuracy on the training data. 166 00:14:31,160 --> 00:14:32,370 That is insane. 167 00:14:32,370 --> 00:14:38,460 So what this is telling us actually is it's a good thing our model is over fitting because it's performing 168 00:14:38,460 --> 00:14:44,670 way better on the training data than it is on the validation data but this is great. 169 00:14:44,670 --> 00:14:51,030 Our model is working it's finding patterns it's using what mobile Net V2 has learned an image net and 170 00:14:51,120 --> 00:14:57,750 it's applying it to our dog's dataset dog vision is coming to life. 171 00:14:58,020 --> 00:14:59,720 So stoked. 172 00:14:59,910 --> 00:15:05,260 Now I hope yours is running like this as well because we're training a model together in real time. 173 00:15:05,280 --> 00:15:12,870 This is beautiful so what we should see is once the validation accuracy stops improving for a number 174 00:15:12,870 --> 00:15:13,920 of epochs. 175 00:15:14,040 --> 00:15:19,610 So three because of our early stopping callback it's going to stop training. 176 00:15:19,800 --> 00:15:24,440 So we actually I don't think we'll reach the 100 epochs maybe not even close. 177 00:15:24,570 --> 00:15:33,320 Cassie here we're getting one point zero accuracy with a tiny loss on our training data so here we go 178 00:15:34,410 --> 00:15:38,590 it stopped. 179 00:15:38,830 --> 00:15:40,510 My heart is racing. 180 00:15:40,540 --> 00:15:41,680 That was phenomenal. 181 00:15:41,680 --> 00:15:42,850 So see what I meant. 182 00:15:42,850 --> 00:15:46,450 The first epoch takes a while because it's loading data into memory. 183 00:15:46,450 --> 00:15:52,510 Once it's loaded that initial data and the memory the GPO is just going to go all right time to step 184 00:15:52,510 --> 00:15:57,680 on the gas and we'll turn on the afterburners or gas no brakes maybe. 185 00:15:57,760 --> 00:16:04,540 So we reached 18 out of 100 epochs and we can see that our model on our training data set is performing 186 00:16:04,630 --> 00:16:06,280 at 100 per cent accuracy. 187 00:16:06,280 --> 00:16:07,760 That's pretty crazy 1.0. 188 00:16:07,810 --> 00:16:09,730 You're going to times this number by 100. 189 00:16:09,730 --> 00:16:11,380 The loss is fairly low. 190 00:16:11,380 --> 00:16:18,610 I mean a loss of zero is is perfect but we can tell that our model is over fitting because it's performing 191 00:16:18,610 --> 00:16:23,550 way better on the training data than it is on the validation data. 192 00:16:23,560 --> 00:16:30,430 So if we come back it's like our model has memorized the course materials rather than the problem solving 193 00:16:30,430 --> 00:16:37,120 principles of those course materials and it's struggling to adapt to a data set it hasn't seen before 194 00:16:37,210 --> 00:16:40,860 a.k.a. a practice exam or our validation set. 195 00:16:40,870 --> 00:16:45,770 So right now our model has a poor ability to generalize. 196 00:16:46,080 --> 00:16:48,320 So that was I think that's enough for one video. 197 00:16:48,330 --> 00:16:55,320 That's a lot of excitement what I want you to do a little bit of homework is before we go and check 198 00:16:55,320 --> 00:17:01,590 the tensor board box in the next video because we've used our tensor board callback before. 199 00:17:02,040 --> 00:17:07,120 I want you to check out this is a question is this is a really important one. 200 00:17:07,230 --> 00:17:12,730 After we've tried to model I want you to go question. 201 00:17:12,820 --> 00:17:19,690 It looks like our model is over feeding because it's performing 202 00:17:22,300 --> 00:17:35,290 far better on the training data set than the validation dataset What are some ways to prevent model 203 00:17:35,380 --> 00:17:46,470 over feeding in deep learning neural networks so we'll turn that into markdown. 204 00:17:46,540 --> 00:17:49,660 That's a question I want you to look up before we get into the next video. 205 00:17:50,830 --> 00:17:56,440 What are some ways to prevent model over feeding and deep known networks so check that out even if you're 206 00:17:56,440 --> 00:17:57,190 not sure of them. 207 00:17:57,190 --> 00:18:01,020 I just want you to start getting curious about what's going on. 208 00:18:01,020 --> 00:18:06,550 I want you to stop picking up on the clues as to when your model is over feeding such as performing 209 00:18:06,550 --> 00:18:10,290 far better on the training data set than the validation data set. 210 00:18:10,810 --> 00:18:20,330 And a little note here note over fitting to begin with is a good thing. 211 00:18:20,620 --> 00:18:29,890 It means our model is learning so that I'm phenomenally excited that we have just trained our first 212 00:18:29,890 --> 00:18:36,170 deep learning neural network together using transfer Learning dog vision is coming to life. 213 00:18:36,190 --> 00:18:41,110 So we finish training a model at least a first subset of a model. 214 00:18:41,110 --> 00:18:42,190 This is not on the full data. 215 00:18:42,190 --> 00:18:43,660 This is only on 1000 images. 216 00:18:43,660 --> 00:18:50,500 So our next step is to if we come into our keynote what are we up to so we fit the model to the data 217 00:18:50,560 --> 00:18:52,150 we haven't made a prediction yet. 218 00:18:52,660 --> 00:18:59,320 So what we might do is evaluate the model using our tensor board callback so you can see that and then 219 00:18:59,320 --> 00:19:06,640 we'll figure out how to make a prediction with our trained model sound like a plan so check out this 220 00:19:06,640 --> 00:19:09,110 question and I'll see you in the next video.