1 00:00:00,670 --> 00:00:04,630 In this lesson we will train our model. 2 00:00:04,630 --> 00:00:12,590 We will fit the model to our data and we will start to update the weights in our neural network so let's 3 00:00:12,950 --> 00:00:18,400 add a markdown cell in Jupiter notebook and commemorate this. 4 00:00:18,470 --> 00:00:23,620 So this market on cell will read fit the model. 5 00:00:24,050 --> 00:00:27,310 But how do we actually go about doing this. 6 00:00:27,320 --> 00:00:32,580 Well the first place to check is the caris API documentation. 7 00:00:32,630 --> 00:00:37,860 The relevant section is called Model parentheses functional API. 8 00:00:38,150 --> 00:00:40,840 And if we scroll on here we see that compile method. 9 00:00:41,240 --> 00:00:48,500 But we also see this fit method and this fit method can take a number of different arguments so let's 10 00:00:48,500 --> 00:00:55,570 call this fit method and supply our training data for the X and our labels for the Y. 11 00:00:55,580 --> 00:01:00,890 This is where our really tiny data sets will come in handy because we're gonna use these when we're 12 00:01:00,890 --> 00:01:03,230 iterating and trying things out. 13 00:01:03,380 --> 00:01:14,930 So let's write model underscore one dot fit parentheses x underscore a train underscore X S for extra 14 00:01:14,930 --> 00:01:19,230 small and y on a school train on a score. 15 00:01:19,340 --> 00:01:23,840 Access Now at the very top of the cell before his shift enter. 16 00:01:24,330 --> 00:01:31,430 I'm going to add some micro benchmarking Code 2 percent signs and time will tell us how long. 17 00:01:31,440 --> 00:01:35,270 Jupiter note book takes to execute this line of code. 18 00:01:35,280 --> 00:01:41,540 Now let me hit shift enter on the cell and what I see is that it took about two seconds to execute this 19 00:01:41,540 --> 00:01:43,890 cell but how did that go. 20 00:01:43,890 --> 00:01:45,950 Did we do the job. 21 00:01:46,830 --> 00:01:49,630 Well at this point I have no idea. 22 00:01:49,690 --> 00:01:56,820 So this is where we're gonna use tensor board tensor board will help us figure out what's going on behind 23 00:01:56,820 --> 00:01:58,160 the scenes. 24 00:01:58,170 --> 00:02:04,070 Now the question is how do we get the information from our model into tensor board. 25 00:02:04,080 --> 00:02:09,870 Well let's look back at this documentation here and here we see this parameter called callbacks which 26 00:02:09,870 --> 00:02:11,840 by default is set to None. 27 00:02:12,630 --> 00:02:16,150 But we've seen this would callbacks somewhere before right. 28 00:02:16,170 --> 00:02:23,040 So I recall in our import statements tensor board was actually a callback. 29 00:02:23,040 --> 00:02:27,550 So this means that we can actually use tensor board as a callback. 30 00:02:27,560 --> 00:02:33,270 Now is the time that we get to use our get tensor board function that we created earlier. 31 00:02:33,270 --> 00:02:36,840 Let's add that callback argument to our fit method now. 32 00:02:36,910 --> 00:02:38,590 So I'll add a comma. 33 00:02:38,760 --> 00:02:48,390 Right callbacks equal to and then square brackets get underscore tensor board open parentheses and our 34 00:02:48,390 --> 00:02:49,800 supply my model name. 35 00:02:49,830 --> 00:02:51,910 I'll call this one model 1. 36 00:02:52,650 --> 00:02:54,300 Why do I have the square brackets here. 37 00:02:54,810 --> 00:03:01,860 Well because callbacks is expected to contain a list and we can see this if we go to the documentation 38 00:03:02,280 --> 00:03:05,640 and scroll down to where it says callbacks. 39 00:03:05,640 --> 00:03:13,260 And here the carries documentation very clearly states that a list of callbacks is expected. 40 00:03:13,260 --> 00:03:21,600 So let's head back to our Jupiter notebook and hit shift enter on the cell and our output now reads 41 00:03:21,750 --> 00:03:26,960 successfully create a directory which we can see right here. 42 00:03:27,470 --> 00:03:33,060 My training time this time was around one point two seconds but I don't see tensor board anywhere. 43 00:03:33,060 --> 00:03:34,050 So how do we get hold of it. 44 00:03:35,010 --> 00:03:39,990 Well I actually have to open a terminal or the windows command prompt again at this point. 45 00:03:40,620 --> 00:03:50,440 So if I open up another window here new window and then I write tensor board and then as an argument 46 00:03:51,040 --> 00:03:57,000 I supply my logging directory so minus minus log. 47 00:03:57,040 --> 00:03:59,660 Dear is equal to. 48 00:03:59,920 --> 00:04:02,650 Well this folder right here. 49 00:04:02,650 --> 00:04:07,300 This is my logging directory that we've created since I'm on a Mac. 50 00:04:07,300 --> 00:04:14,740 All I have to do to get my path to this folder is drag and drop it into my terminal and there I see 51 00:04:14,740 --> 00:04:23,710 my path users and all projects tensor board sofar underscore logs let me hit enter on this right now 52 00:04:23,800 --> 00:04:27,860 and I'll show you how to get this path very quickly on windows in a minute. 53 00:04:28,900 --> 00:04:38,350 Once my command executes successfully what I get as a result is this here you are L and I can copy this 54 00:04:38,350 --> 00:04:47,770 your l go back into my browser pasted in and this is where tensor board lives so here's how you can 55 00:04:47,770 --> 00:04:55,420 get the path very quickly on Windows so go to your tensor board underscore CEF are on a underscore logs 56 00:04:55,420 --> 00:05:00,290 folder open it and then up here you see this address ball. 57 00:05:00,430 --> 00:05:07,990 And here you can click and then you see the path for this folder which you can copy and then when you 58 00:05:08,050 --> 00:05:17,890 open your an anaconda prompt all you need to do is of course type tensor board space hyphen hyphen log 59 00:05:18,030 --> 00:05:28,170 D.A. R equals and then right click to paste right click to paste is the trick. 60 00:05:28,210 --> 00:05:32,590 Okay so now that we've got tense about running what are we looking at. 61 00:05:32,590 --> 00:05:38,530 Well we've got two tabs up here graphs and scales in this module. 62 00:05:38,530 --> 00:05:41,880 I'm going to be focusing on these scalar section. 63 00:05:41,980 --> 00:05:45,280 The next thing is is that we get some data here. 64 00:05:45,430 --> 00:05:48,110 The accuracy and our loss. 65 00:05:48,280 --> 00:05:55,750 I want to make this a little larger and you can see here we've got a single data point here. 66 00:05:55,750 --> 00:06:02,080 Our model currently has an accuracy of about eleven percent in sample. 67 00:06:02,080 --> 00:06:07,110 So it has a 11 percent accuracy on the training data set. 68 00:06:07,150 --> 00:06:11,670 The next thing here that we see is a loss and for the loss. 69 00:06:11,800 --> 00:06:15,550 We also only have a single data point. 70 00:06:15,610 --> 00:06:17,070 Now this is a little strange. 71 00:06:17,080 --> 00:06:19,900 This is not exactly what we expect. 72 00:06:19,930 --> 00:06:26,340 So let's dig into the documentation and see why that might be when we look at our fit method. 73 00:06:26,340 --> 00:06:32,170 Then we see two additional parameters batch underscore size equals none. 74 00:06:32,190 --> 00:06:35,560 And epochs equals one. 75 00:06:35,580 --> 00:06:37,650 So what does that mean. 76 00:06:37,650 --> 00:06:39,870 Let's tackle the epoch first. 77 00:06:39,870 --> 00:06:41,610 What's an epoch. 78 00:06:41,610 --> 00:06:50,880 Well an epoch is when the entire dataset has been passed through the neural network a single time considering 79 00:06:50,880 --> 00:06:54,560 that the default value for epoch is 1. 80 00:06:54,570 --> 00:07:02,110 It also makes sense to see only a single data point on our tensor board now passing the entire dataset 81 00:07:02,410 --> 00:07:07,220 through the model a single time almost sounds like it should be enough right. 82 00:07:08,020 --> 00:07:10,640 Well unfortunately the answer is no. 83 00:07:10,810 --> 00:07:18,880 As we saw in the gradient descent module the optimization process really is an iterative process meaning 84 00:07:19,000 --> 00:07:25,420 yes the weights were updated that one time that we ran the fit method but a single pass is actually 85 00:07:25,420 --> 00:07:26,530 not enough. 86 00:07:26,530 --> 00:07:33,970 We have to pass the entire dataset through the network again and again and again for the neural network 87 00:07:34,210 --> 00:07:37,380 to differentiate amongst our images. 88 00:07:37,450 --> 00:07:41,470 But wait why don't we have lots and lots of training data. 89 00:07:41,500 --> 00:07:44,460 What if we have a huge dataset. 90 00:07:44,590 --> 00:07:49,290 Well here at the practicalities of training a neural network set in. 91 00:07:49,660 --> 00:07:57,370 If your computer is powerful enough to handle your entire dataset or your dataset is small enough then 92 00:07:57,370 --> 00:08:00,690 yes you can probably process everything at once. 93 00:08:01,000 --> 00:08:09,250 But usually what you have to do is you have to split up your data set and process one piece of it at 94 00:08:09,250 --> 00:08:15,050 a time a single piece of your data set is called a batch. 95 00:08:15,340 --> 00:08:22,090 And if you're splitting up your data set like this then you're training your model on one batch at a 96 00:08:22,090 --> 00:08:23,040 time. 97 00:08:23,110 --> 00:08:30,480 So that means that it will take multiple iterations to actually go through the entire dataset. 98 00:08:30,580 --> 00:08:36,910 The formula for the number of iterations that it takes to chew through the entire training data is actually 99 00:08:36,910 --> 00:08:38,490 pretty straightforward. 100 00:08:38,530 --> 00:08:46,450 All you need to do is divide the number of training samples by the number of samples in the batch. 101 00:08:46,450 --> 00:08:52,480 That's how you can work out how many iterations it takes to feed the entire data set through the model 102 00:08:52,600 --> 00:08:54,510 a single time. 103 00:08:54,610 --> 00:09:01,450 In other words if you have 100 data points in total and your batch size is equal to 50 then you would 104 00:09:01,450 --> 00:09:05,320 need two iterations to go through the entire dataset. 105 00:09:05,380 --> 00:09:10,300 It would take you two iterations to complete a single epoch. 106 00:09:11,710 --> 00:09:19,160 So the number of iterations is just the number of batches needed to complete one epoch with this vocabulary 107 00:09:19,280 --> 00:09:26,660 out of the way let's head back into Jupiter and change both the batch size and the number of epochs 108 00:09:26,690 --> 00:09:29,550 that we're using to train our model. 109 00:09:29,570 --> 00:09:36,580 So what I'll do is I'll add another cell here and I'll create a variable called samples on a school 110 00:09:36,600 --> 00:09:42,680 per on a score batch and I'll set that equal to 1000. 111 00:09:42,710 --> 00:09:48,560 Now we've got some micro benchmarking code here in the cell below where we're fitting our model and 112 00:09:48,590 --> 00:09:55,100 what I encourage you to do is to try out different patch sizes and see how long it takes the cell to 113 00:09:55,100 --> 00:09:56,960 execute. 114 00:09:56,960 --> 00:10:00,430 Now let's set the number of epochs equal to 20. 115 00:10:00,470 --> 00:10:03,660 Let's train our model for 20 epochs. 116 00:10:03,830 --> 00:10:09,920 I'll create another variable in this cell that reads and hour underscore epochs and I'll set that equal 117 00:10:09,920 --> 00:10:13,850 to 20 and then inside our fit method. 118 00:10:13,910 --> 00:10:21,170 I'm going to add another comma here before callbacks and here I'll add batch underscore size and I'll 119 00:10:21,170 --> 00:10:26,290 set that equal to samples on a scale per underscore batch. 120 00:10:26,430 --> 00:10:31,700 My autocomplete isn't working because I haven't actually hit shift enter on the cell. 121 00:10:31,700 --> 00:10:42,130 Now I can also add my epochs which had the parameter name while epochs and that I can set equal to an 122 00:10:42,130 --> 00:10:45,420 hour on a score epochs. 123 00:10:45,590 --> 00:10:48,300 Now the single line is getting very very long. 124 00:10:48,410 --> 00:10:53,900 So what I'll do is I'll put my callbacks on the line below. 125 00:10:53,900 --> 00:11:02,420 Now let me hit shift and her on the cell and we can see our model being trained throughout the 20 epochs 126 00:11:03,860 --> 00:11:05,170 now as part of the output. 127 00:11:05,180 --> 00:11:11,960 You can see the total amount of training data you can see the loss that was calculated and you can see 128 00:11:12,140 --> 00:11:16,130 the accuracy in sample per batch. 129 00:11:16,130 --> 00:11:23,150 And one thing that you'll notice is that in this case our model isn't learning anything it's just guessing 130 00:11:23,150 --> 00:11:24,040 randomly. 131 00:11:24,110 --> 00:11:26,300 It's getting about 12 percent right. 132 00:11:26,300 --> 00:11:32,150 Meaning if it has 10 things to classify and it's getting about a 10 percent accuracy it's just randomly 133 00:11:32,150 --> 00:11:33,260 guessing. 134 00:11:33,260 --> 00:11:39,110 Now we would have hoped to see some learning happen by epoch number 20. 135 00:11:39,110 --> 00:11:42,870 The fact that we armed means there's a bit of a problem now. 136 00:11:42,920 --> 00:11:50,480 We could have this problem for a number of reasons but one possible reason is that maybe we have a bad 137 00:11:50,480 --> 00:11:51,920 starting point. 138 00:11:51,920 --> 00:11:58,270 Maybe our optimizer is stuck and it cannot minimize this loss any further. 139 00:11:58,280 --> 00:12:06,380 So one thing we can try is we can go up here to where we're defining our model and recompile it. 140 00:12:06,980 --> 00:12:14,830 So just hitting shift enter on the cell coming down here and rerunning our fit method. 141 00:12:15,020 --> 00:12:16,800 What do we see now. 142 00:12:17,300 --> 00:12:24,110 In this case we see some things are happening as we scroll down our model. 143 00:12:24,110 --> 00:12:26,790 Indeed appears to be learning. 144 00:12:26,990 --> 00:12:35,750 It started out with about 10 percent in epoch number two and by epoch number 20 it had improved this 145 00:12:35,750 --> 00:12:38,560 to about 16 percent. 146 00:12:38,570 --> 00:12:40,540 Now here's the big advantage of tensor board. 147 00:12:40,580 --> 00:12:48,500 It makes us a lot more clear if we refresh our page it's going to redraw all our graphs and what we 148 00:12:48,500 --> 00:12:54,960 see here now is if we enlarge this slightly we see the R three runs right. 149 00:12:55,190 --> 00:12:58,430 Our very first run was just one epoch. 150 00:12:58,430 --> 00:13:02,010 Our second run our model did not learn anything. 151 00:13:02,090 --> 00:13:06,800 The training accuracy stayed constant in our third run. 152 00:13:06,800 --> 00:13:13,010 We can see how the training accuracy evolved over time and what we see as this improvement from around 153 00:13:13,010 --> 00:13:16,870 10 percent to around 16 percent. 154 00:13:17,000 --> 00:13:22,790 The way that 10 support works is that it read some files from our disk and it reads these so-called 155 00:13:22,880 --> 00:13:24,560 event files. 156 00:13:24,560 --> 00:13:28,200 This is where it pulls the data from to throw up onto these charts. 157 00:13:28,250 --> 00:13:33,620 Now I can actually see I've got four folders here but I've only got three charts. 158 00:13:33,620 --> 00:13:40,400 And that's because last night I created this empty folder here which does not contain one of these event 159 00:13:40,400 --> 00:13:46,010 files so I'm just going to delete this folder and of course if I delete one of these event files here 160 00:13:46,550 --> 00:13:52,060 and I refresh my page then the data will also disappear from 10 support. 161 00:13:53,000 --> 00:13:57,600 So let's take that blue second run here at eleven thirty nine. 162 00:13:57,650 --> 00:14:02,250 I'll delete the folder and then I'll refresh my page. 163 00:14:03,070 --> 00:14:04,060 And here we go. 164 00:14:04,090 --> 00:14:07,480 It disappeared from tensor board. 165 00:14:07,530 --> 00:14:14,770 Now back to some more urgent questions seeing our model learn over the course of 20 epochs seems promising 166 00:14:14,770 --> 00:14:15,550 right. 167 00:14:15,580 --> 00:14:20,230 Perhaps all we need to do is run our model for longer. 168 00:14:20,710 --> 00:14:27,220 So maybe let's come up to where we're training our model and change this whole thing from twenty to 169 00:14:27,220 --> 00:14:29,620 one hundred and fifty epochs. 170 00:14:29,740 --> 00:14:37,070 Now if I don't want to see all this output then I can actually mutate looking at the carries documentation. 171 00:14:37,090 --> 00:14:45,040 There's a parameter called verbose by default it's set to 1 but we didn't have to have it set to 1. 172 00:14:45,130 --> 00:14:50,890 If we said boats equal to zero then we mute a lot of this output. 173 00:14:50,890 --> 00:14:55,090 We're getting this intensive Ward anyhow so let's try this. 174 00:14:55,090 --> 00:15:03,210 Let's see what happens when I run my model with one hundred and fifty epochs in this case it's going 175 00:15:03,210 --> 00:15:04,790 to run a lot longer. 176 00:15:05,310 --> 00:15:11,720 I've successfully created my directory my model is running and if I go into tensor board and I refresh 177 00:15:11,720 --> 00:15:18,130 it then what you should see is your new graph appearing. 178 00:15:18,130 --> 00:15:22,210 The red line here is clearly above the blue line. 179 00:15:22,210 --> 00:15:25,410 Now one thing I don't really like is how it's going off the chart here. 180 00:15:25,690 --> 00:15:30,570 But more importantly have a look at the starting point of this red line here. 181 00:15:30,610 --> 00:15:38,290 My latest run actually starts at around 16 percent and this is no coincidence the previous run actually 182 00:15:38,290 --> 00:15:40,670 finished around 16 percent. 183 00:15:40,750 --> 00:15:48,800 So if I refresh this cell over and over again the model actually picks up training where it left off. 184 00:15:48,850 --> 00:15:55,030 So if we actually want to compare the different runs starting from scratch starting from zero instead 185 00:15:55,030 --> 00:16:05,430 of adding to them what we have to do is we have to come back to the cell here refreshed and then come 186 00:16:05,430 --> 00:16:08,610 back down here and fit our model. 187 00:16:08,760 --> 00:16:16,290 In this case when I go back to tensor board and I wait a little bit then I can actually see that it 188 00:16:16,290 --> 00:16:23,320 picks up the new file that was created with the latest run right here at eleven fifty two. 189 00:16:23,460 --> 00:16:27,710 And it actually starts plotting the thing while my code is running. 190 00:16:27,840 --> 00:16:34,740 So every once in a while this chart will actually update with the latest model grabbing the latest file 191 00:16:35,010 --> 00:16:38,610 from the disk even as the training is in progress. 192 00:16:39,690 --> 00:16:45,240 So what we see now is that we're starting at a similar starting point as the previous runs where we're 193 00:16:45,240 --> 00:16:46,670 starting from scratch. 194 00:16:46,950 --> 00:16:54,140 Then the optimizer is working working working and improving the accuracy on the training data. 195 00:16:54,570 --> 00:17:01,200 At the same time when we scroll down here the loss on the training data is decreasing and decreasing 196 00:17:01,200 --> 00:17:03,190 and decreasing. 197 00:17:03,210 --> 00:17:05,170 So this seems promising right. 198 00:17:05,400 --> 00:17:13,600 After about 150 epochs we're on 40 percent accuracy but that's on the training data right. 199 00:17:13,680 --> 00:17:15,660 What about our validation data. 200 00:17:15,870 --> 00:17:21,190 What about on images that this model has not seen yet. 201 00:17:21,210 --> 00:17:28,650 Looking at the caris documentation we can see that there is another parameter that we can supply called 202 00:17:28,650 --> 00:17:38,170 validation underscore data the fit method expects to receive the validation data as a tuple. 203 00:17:38,170 --> 00:17:42,130 So let's add that to our fit method and refresh the cell. 204 00:17:42,130 --> 00:17:46,320 I'll add a comma and then I'll type validation on a score. 205 00:17:46,390 --> 00:17:57,490 Data is equal to parentheses to create that tuple X underscore Val comma Y underscore Val. 206 00:17:57,520 --> 00:18:00,820 Now what I'll do is I'll quickly come back up here. 207 00:18:00,820 --> 00:18:08,180 Refresh this cell and rerun our fit back intense aboard. 208 00:18:08,320 --> 00:18:15,190 I'm going to refer to page and now I can see my latest run in pink. 209 00:18:16,390 --> 00:18:21,820 I quite like the fact that tensor board scale this graph a little better and I can see my computer doing 210 00:18:21,820 --> 00:18:27,580 work and that little pink dot moving up as the model is being trained. 211 00:18:27,700 --> 00:18:36,250 You see that after fifty five steps fifty five epochs we've got about 32 percent accuracy on the training 212 00:18:36,250 --> 00:18:44,050 data back in my Jupiter notebook I can see that this whole run took me about one minute and 12 seconds. 213 00:18:44,050 --> 00:18:48,670 Mind you my computer actually does not have a dedicated GP you. 214 00:18:48,790 --> 00:18:54,370 So this is definitely taking a little bit longer on my four or five year old machine than it would on 215 00:18:54,370 --> 00:18:56,850 something a little bit more powerful. 216 00:18:56,950 --> 00:19:02,560 So let's see how this is going intensive board the training accuracy once again seems like it's off 217 00:19:02,560 --> 00:19:04,000 the charts right. 218 00:19:04,030 --> 00:19:12,010 So let me try and refresh this page again see if we actually can redraw the plot so that it shows us 219 00:19:12,010 --> 00:19:14,120 where it actually ends. 220 00:19:14,230 --> 00:19:18,250 If we're lucky it does just that and we are brilliant. 221 00:19:18,610 --> 00:19:22,540 Now one thing that you might ask at this point is well hold on a second. 222 00:19:22,600 --> 00:19:25,330 Why is it that all these lines are different. 223 00:19:25,330 --> 00:19:29,520 We're using the same data and we're using the same number of epochs. 224 00:19:29,530 --> 00:19:32,800 Why am I always getting a different chart. 225 00:19:32,800 --> 00:19:39,130 Well the thing is there's some randomness involved in the optimizer one way the randomness for instance 226 00:19:39,130 --> 00:19:45,820 comes in is the starting point where the optimizer starts to optimize and what we saw the very first 227 00:19:45,820 --> 00:19:50,740 time that we fitted our model was that our model didn't learn anything at all. 228 00:19:50,830 --> 00:19:54,840 And so at this point you can see how this might just be a fluke. 229 00:19:54,850 --> 00:19:55,060 Right. 230 00:19:55,060 --> 00:20:01,510 This might be just bad luck by fitting the model a couple of times we can actually see how each training 231 00:20:01,510 --> 00:20:03,190 run is going. 232 00:20:03,190 --> 00:20:09,280 This latest one ending with 55 percent accuracy after one hundred and fifty epochs seems to have gone 233 00:20:09,280 --> 00:20:10,690 particularly well. 234 00:20:10,690 --> 00:20:10,990 Right. 235 00:20:12,010 --> 00:20:14,850 But remember this is the training accuracy. 236 00:20:14,920 --> 00:20:21,640 Our model is learning to classify our training dataset Let's scroll down here and what we should see 237 00:20:21,850 --> 00:20:29,710 is having added the validation data said to our fit method we have two new sections down here intensive. 238 00:20:29,920 --> 00:20:37,540 The first one is the validation accuracy and because our previous runs did not include the validation 239 00:20:37,540 --> 00:20:38,300 data. 240 00:20:38,390 --> 00:20:40,970 They are not appearing on this chart. 241 00:20:41,150 --> 00:20:43,720 So let's see how we're doing. 242 00:20:43,720 --> 00:20:53,260 After about 150 epochs we're getting about 30 odd percent accuracy on the validation data set and I 243 00:20:53,260 --> 00:21:00,890 can tell you that the validation accuracy is always going to be lower than your training accuracy but 244 00:21:01,070 --> 00:21:04,880 let's take a look at this last section the validation loss. 245 00:21:04,940 --> 00:21:07,020 Now this is very interesting. 246 00:21:07,190 --> 00:21:12,830 What do we see is that our validation loss starts high and then it starts to come down as we're training 247 00:21:13,010 --> 00:21:21,170 right to point to with step 18 to point one on epoch number 48 the validation loss is increasing and 248 00:21:21,170 --> 00:21:25,680 decreasing but it's getting a little bit more jagged down. 249 00:21:26,060 --> 00:21:33,270 And by epoch 100 but we're actually seeing is that our validation loss starts to creep up again. 250 00:21:33,530 --> 00:21:37,610 Our validation loss starts to increase. 251 00:21:37,610 --> 00:21:39,840 This is actually a problem. 252 00:21:39,980 --> 00:21:45,890 And this is why evaluating your model on the validation data set is so important. 253 00:21:45,890 --> 00:21:52,100 The problem that we're seeing here has to do with over fitting and in the next lesson we're going to 254 00:21:52,100 --> 00:21:54,510 see how we can tackle this. 255 00:21:54,830 --> 00:21:59,600 Now even though we haven't written a lot of code in this lesson we've actually covered quite a few different 256 00:21:59,600 --> 00:22:00,820 concepts. 257 00:22:00,830 --> 00:22:04,220 The first thing that we saw was how we can fit our model. 258 00:22:04,730 --> 00:22:06,940 And that was using this fit method. 259 00:22:07,430 --> 00:22:12,940 But one thing that we learned about the fit method was that it starts right left off. 260 00:22:12,950 --> 00:22:18,290 So if we've trained the model for 20 epochs and we call the fit method again it will remember what the 261 00:22:18,290 --> 00:22:21,870 values of the weights were during the previous run. 262 00:22:21,950 --> 00:22:26,070 The next thing that we learned was that not every run is the same. 263 00:22:26,070 --> 00:22:32,360 There is some random this baked in and the optimizer will start at a different starting point every 264 00:22:32,360 --> 00:22:35,050 time you run the fit method. 265 00:22:35,060 --> 00:22:40,890 The third thing that we learned was the process of passing the data set through the network. 266 00:22:41,120 --> 00:22:43,570 And this process was iterative. 267 00:22:43,640 --> 00:22:49,340 Not only did we have to pass the entire data set through the network more than once in order for the 268 00:22:49,340 --> 00:22:50,770 weights to become meaningful. 269 00:22:51,200 --> 00:22:59,630 What we also learned was this technique of splitting up our data into batches and finally we learned 270 00:22:59,630 --> 00:23:01,480 a little bit more about tensor board. 271 00:23:01,550 --> 00:23:07,820 We learned how 10 support creates these files on our desk and then reads from these files to create 272 00:23:07,850 --> 00:23:09,620 our chance. 273 00:23:09,620 --> 00:23:13,800 Now I know this was a very dense lesson with a lot of information to take in. 274 00:23:14,030 --> 00:23:19,320 But don't worry we'll be reinforcing a lot of these concepts in the upcoming lessons. 275 00:23:19,340 --> 00:23:20,560 I'll see you there. 276 00:23:20,570 --> 00:23:21,080 Take care.