1 00:00:00,770 --> 00:00:01,300 All right. 2 00:00:01,320 --> 00:00:07,920 So in this lesson what we're gonna do is going to vary up the hyper parameters and an experiment with 3 00:00:07,920 --> 00:00:09,170 a dropout layer. 4 00:00:09,270 --> 00:00:14,630 We're also going to look at a news summary intensive bought namely the histogram summary. 5 00:00:14,940 --> 00:00:18,990 All of this will give us a chance to run our model for a different number of epochs. 6 00:00:18,990 --> 00:00:23,410 We can see how our accuracy changes as we continue trading our model. 7 00:00:23,430 --> 00:00:27,800 We can also vary up the learning rate and we can change the architecture. 8 00:00:27,930 --> 00:00:29,070 So let's get on it. 9 00:00:29,250 --> 00:00:34,200 The very first thing I'm going to do is I'm going to go up to the function where I'm creating all my 10 00:00:34,200 --> 00:00:35,250 layers. 11 00:00:35,250 --> 00:00:42,180 And in this function I'm going to add two more summaries because I'm very much interested in what happens 12 00:00:42,180 --> 00:00:50,010 to my weights and my biases as my model continues training in order to see what happens with our weights 13 00:00:50,070 --> 00:01:01,680 and our biases we're going to use the histogram summary so TAF does summary don't histogram open parentheses 14 00:01:01,830 --> 00:01:08,610 and then I'll have single quotes weights as the name for the summary and then the values are going to 15 00:01:08,610 --> 00:01:17,970 be my lower case w want to be tracking my variables for all my layers in this histogram summary and 16 00:01:17,970 --> 00:01:20,990 then also going to track my biases. 17 00:01:21,000 --> 00:01:24,780 So once again this is gonna be a histogram summary. 18 00:01:25,020 --> 00:01:30,630 I'm going to call this one biases and the value is going to be lowercase b. 19 00:01:30,690 --> 00:01:33,540 The next thing I want to do is to scroll down a little bit. 20 00:01:33,930 --> 00:01:40,080 And since this here is actually where we're setting up our model I'm going to create another variable 21 00:01:40,080 --> 00:01:43,690 here called model on a score name. 22 00:01:44,430 --> 00:01:47,040 And this one is gonna be an F string. 23 00:01:47,040 --> 00:01:50,310 And in this model name I'm going to be very descriptive. 24 00:01:50,940 --> 00:01:53,540 I'm gonna put down the architecture of this model. 25 00:01:53,550 --> 00:02:06,220 So an underscore hidden one close in curly braces hyphen curly braces and underscore hidden to allow 26 00:02:06,620 --> 00:02:11,280 for learning rate curly braces learning on a school rate. 27 00:02:11,280 --> 00:02:13,090 That's this one right here. 28 00:02:13,350 --> 00:02:21,870 And at the end of May maybe have capital E curly braces and R underscore epochs. 29 00:02:22,050 --> 00:02:29,310 So that way would have a very specific model name him that includes the number of neurons the learning 30 00:02:29,310 --> 00:02:35,790 rate and the number of epochs some capturing some of these hyper parameters in the model name. 31 00:02:35,790 --> 00:02:40,570 What I'll do next is where I'm setting up the directories for tensor board. 32 00:02:40,590 --> 00:02:49,460 I'm actually gonna change this from Model 1 to kind of braces and then model on a score name. 33 00:02:49,470 --> 00:02:54,650 So now when we're setting up our Directories we get the model name in the directory. 34 00:02:54,690 --> 00:03:01,110 And this means the model name is gonna show up here and it's going to show up on the runs on our graph. 35 00:03:01,110 --> 00:03:03,690 Now we've done quite a few runs already. 36 00:03:03,690 --> 00:03:11,910 So what I'm gonna do is I want to delete all the folders inside my amnesty digit logs directory and 37 00:03:11,910 --> 00:03:14,070 that way when I read run my cells. 38 00:03:14,070 --> 00:03:15,790 I have a blank slate. 39 00:03:15,900 --> 00:03:21,990 The next thing I'm going to do is I'm going to change the number of epochs from five to 50. 40 00:03:21,990 --> 00:03:28,620 So we're gonna do a whole lot more training and we're going to be able to see how our loss and our accuracy 41 00:03:28,950 --> 00:03:32,760 evolve over time as we continue training our model. 42 00:03:32,760 --> 00:03:38,790 So let's get on it and select this cell here and then I'm going to say run all below. 43 00:03:38,910 --> 00:03:43,140 And now we play the waiting game as you can see training for 50 epochs. 44 00:03:43,140 --> 00:03:46,380 Takes a lot more time to just training for five. 45 00:03:46,470 --> 00:03:52,470 If you're running all of this in the Google collab notebook then this is gonna run a lot quicker but 46 00:03:52,770 --> 00:03:57,930 you'll have the disadvantage of not being able to follow on tensor bored so easily. 47 00:03:57,930 --> 00:03:58,350 All right. 48 00:03:58,500 --> 00:04:03,590 So let me refresh tensor board I get a little error message here but that's not a problem. 49 00:04:03,600 --> 00:04:06,350 The reason is is that we're still in the process of training. 50 00:04:06,360 --> 00:04:11,250 So I think it's a bit confused since we've already talked about our graph and we've talked about the 51 00:04:11,250 --> 00:04:14,210 images I'm going to look at my scales. 52 00:04:14,400 --> 00:04:19,900 And here I have my two metrics my accuracy metric and my cost. 53 00:04:19,950 --> 00:04:27,330 This is my loss remember grouped in the same section the section is called performance because performance 54 00:04:27,480 --> 00:04:31,650 is the name scope for both of these summaries. 55 00:04:31,650 --> 00:04:34,890 I'm going to enlarge my chart here in Orange. 56 00:04:34,890 --> 00:04:41,600 I've got the accuracy on the training data set and in blue I've got the accuracy on the validation data 57 00:04:41,600 --> 00:04:48,660 set and what I can see is that most of the learning takes place in the first ten epochs and then it 58 00:04:48,660 --> 00:04:53,980 takes longer and longer and longer to get that extra little bit of accuracy. 59 00:04:54,060 --> 00:05:04,760 So going from 86 to 87 percent accuracy takes a lot longer than going from say 60 to 70 percent accuracy. 60 00:05:04,780 --> 00:05:07,870 In other words this is the classic shape of a chart. 61 00:05:07,870 --> 00:05:14,350 When you've got diminishing returns now let's scroll down to our losses in the last module we talked 62 00:05:14,350 --> 00:05:20,740 about how it was a problem when our validation loss stopped going down when the validation loss starts 63 00:05:20,740 --> 00:05:25,740 to stabilize and even starts going up then we're over fitting our model. 64 00:05:25,750 --> 00:05:32,300 So in this case we can see that it's still going down at epoch number 50 but not by much. 65 00:05:32,320 --> 00:05:39,250 Now what about these other tabs that showed up distributions and histogram is the reason we get those 66 00:05:39,250 --> 00:05:44,830 two new tabs is because we've added the histogram summary these two lines of code are now going to show 67 00:05:44,830 --> 00:05:51,690 us a histogram and a distribution for our weights and our biases over time check it out. 68 00:05:51,700 --> 00:05:56,380 Let's start out looking at the histogram and what I'm going to do is I'm just gonna take our training 69 00:05:56,380 --> 00:06:03,400 run here and then I'm going to make these charts a little bigger and bigger and bigger and I mean expand 70 00:06:03,400 --> 00:06:05,160 the output layer here. 71 00:06:05,370 --> 00:06:06,440 What's hidden before. 72 00:06:06,670 --> 00:06:09,500 And then make that one bigger. 73 00:06:09,520 --> 00:06:16,240 Now let me scroll to the very top and let's take a look at what this is actually showing us because 74 00:06:16,300 --> 00:06:18,730 it's very pretty but what does it mean. 75 00:06:19,120 --> 00:06:23,410 Well what it's showing us is a distribution. 76 00:06:23,410 --> 00:06:29,890 It's showing us a distribution of our biases for the first hidden layer over time. 77 00:06:29,890 --> 00:06:34,530 Now if you remember all our biases started out at zero. 78 00:06:34,630 --> 00:06:39,540 And by the end of the first training run our biases got a little update. 79 00:06:39,640 --> 00:06:47,680 And then one hundred and eighty three of them were at about zero point 0 0 5 0 5 and about sixty three 80 00:06:47,680 --> 00:06:52,860 of them were around minus zero point zero zero three nine nine. 81 00:06:52,870 --> 00:06:59,190 So in the very back here we've got the oldest histogram at the end of epoch number zero. 82 00:06:59,470 --> 00:07:04,860 Our biases updated and some of them about smaller and some of them got larger. 83 00:07:04,930 --> 00:07:09,850 Now for hidden layer number one there were a five hundred and twelve different biases. 84 00:07:09,910 --> 00:07:17,440 So what we get in this picture is an idea of how the distribution of these five hundred and twelve different 85 00:07:17,440 --> 00:07:21,010 biases changed over time on the y axis. 86 00:07:21,010 --> 00:07:22,340 You can see the steps. 87 00:07:22,480 --> 00:07:26,890 So we've recorded a summary in every single epoch in our for loop. 88 00:07:26,950 --> 00:07:32,220 So here we're moving forward in time until we get to the most recent epoch. 89 00:07:32,260 --> 00:07:35,550 So this was the one right at the end of the training. 90 00:07:35,650 --> 00:07:41,500 And here we see that some buy's values that are quite large and some values for the biases got quite 91 00:07:41,500 --> 00:07:42,340 small. 92 00:07:42,340 --> 00:07:48,250 The reason why these little steps correspond to one epoch is because this is when we decided to add 93 00:07:48,250 --> 00:07:52,450 our summary and tell our file writer to write to our disk. 94 00:07:52,450 --> 00:07:57,280 How do we structured our for loop a little differently than we might have written a summary every two 95 00:07:57,280 --> 00:07:59,550 epochs every five epochs. 96 00:07:59,590 --> 00:08:06,100 So the way to think about what you're seeing on this axis is the step size how often during the training 97 00:08:06,400 --> 00:08:08,420 you're writing to your file. 98 00:08:08,440 --> 00:08:13,610 So in our first layer our biases changed quite a bit during the course of the training. 99 00:08:13,630 --> 00:08:15,610 What about our weights. 100 00:08:15,610 --> 00:08:17,550 Now this is quite interesting right. 101 00:08:17,560 --> 00:08:25,150 Our weights started out as a truncated normal distribution so they started out as kind of bell curve 102 00:08:25,330 --> 00:08:27,770 without the tails on either end. 103 00:08:28,000 --> 00:08:36,580 And what we see is done by the end of the training all the weights still seem to more or less fall into 104 00:08:36,580 --> 00:08:38,220 this truncated distribution. 105 00:08:38,230 --> 00:08:43,060 They still seem to have the same distribution as when they started. 106 00:08:43,060 --> 00:08:48,910 So to me to suggest that all the learning that's been taking place in that first hidden layer actually 107 00:08:48,910 --> 00:08:52,940 happened in the biases and not in the weights. 108 00:08:53,020 --> 00:08:54,410 What about the second hidden layer. 109 00:08:54,790 --> 00:08:56,700 So this is quite interesting. 110 00:08:56,790 --> 00:08:59,650 Their biases changed dramatically. 111 00:08:59,650 --> 00:09:01,210 They all started out at zero. 112 00:09:01,240 --> 00:09:03,520 We had 64 of them to begin with. 113 00:09:03,520 --> 00:09:06,460 And by the end they had this kind of distribution. 114 00:09:06,460 --> 00:09:11,140 So there's some over here quite negative somewhere here quite positive. 115 00:09:11,140 --> 00:09:18,370 So once again all 64 weights in the second hidden layer started out with a truncated normal distribution. 116 00:09:18,370 --> 00:09:21,430 And by the end there hasn't been that much of a change. 117 00:09:21,850 --> 00:09:23,530 What about the output layer though. 118 00:09:23,530 --> 00:09:26,670 Here we only had 10 biases and these. 119 00:09:26,720 --> 00:09:30,040 Yeah they moved around a little bit since there's only 10 values here. 120 00:09:30,100 --> 00:09:32,550 This visualization looks very very jagged. 121 00:09:32,590 --> 00:09:38,140 It tends to be a little bit more realistic when you've got a lot of values but for 10 you kind of get 122 00:09:38,140 --> 00:09:40,020 this like mountain landscape. 123 00:09:40,180 --> 00:09:45,790 But regardless what we do see is there was a change from the beginning of the training towards the end 124 00:09:45,790 --> 00:09:50,950 of the training and the same can actually be said for the weights. 125 00:09:50,950 --> 00:09:57,070 In this case we've had 10 weights and as you can see by the change in the shape here there was some 126 00:09:57,070 --> 00:10:02,690 change to the weights in the output layer now tens of what actually gives us so many different ways 127 00:10:02,690 --> 00:10:05,210 of looking at the same kind of data. 128 00:10:05,300 --> 00:10:08,600 For example we can change the mode here. 129 00:10:08,700 --> 00:10:12,380 You can change the histogram mode from offset to overlay. 130 00:10:12,980 --> 00:10:18,500 And when we do that we rotate the whole thing by 45 degrees and we're looking at from the side. 131 00:10:18,680 --> 00:10:24,590 So this is another way of looking at how things change over time and mousing over it. 132 00:10:24,590 --> 00:10:31,100 We can see the different steps the different histogram as the different curve gets highlighted and all 133 00:10:31,100 --> 00:10:38,680 the slices are no longer spread out over time but instead they're all plotted on the same y axis. 134 00:10:38,690 --> 00:10:43,790 So if you prefer looking at the data in this way then overlay is your friend. 135 00:10:43,790 --> 00:10:46,550 But let's take a look at this other tab here as well. 136 00:10:46,580 --> 00:10:52,430 This is quite interesting because what this shows us as well shows is the same data as before. 137 00:10:52,430 --> 00:10:58,010 So I'm going to look at only the training data this time and I'm going to take a little look at what 138 00:10:58,100 --> 00:11:00,100 it is that we're looking at here. 139 00:11:00,110 --> 00:11:06,500 So you're calling this one distributions and what we're actually looking at here are the percentile 140 00:11:06,530 --> 00:11:09,280 changes in the middle of this distribution. 141 00:11:09,290 --> 00:11:15,160 We've got the median value and on the top we've got the maximum value for lightly colored. 142 00:11:15,260 --> 00:11:20,680 And at the bottom we've got the minimum value also very lightly colored. 143 00:11:20,690 --> 00:11:23,890 And on the x axis here we have the steps. 144 00:11:23,930 --> 00:11:27,270 So at Step Number Zero we were here. 145 00:11:27,320 --> 00:11:34,970 And step number 50 we were over here the information on the distribution tab is essentially some high 146 00:11:34,970 --> 00:11:41,270 level statistics each line in the chart represents a percentile in the distribution of the data. 147 00:11:42,110 --> 00:11:48,890 So what I'm seeing on this distribution of the weights in the first hidden layer is that nothing much 148 00:11:48,890 --> 00:11:52,400 happened in the middle of my distribution right. 149 00:11:52,400 --> 00:12:00,080 So all the weights that were between zero and zero point one nothing really changed for them between 150 00:12:00,440 --> 00:12:02,960 starting out and finishing training. 151 00:12:03,200 --> 00:12:07,070 But I can see that there was some change at the extreme. 152 00:12:07,070 --> 00:12:11,150 So the smallest weights that started out at minus zero point two. 153 00:12:11,150 --> 00:12:12,230 They did change. 154 00:12:12,230 --> 00:12:19,280 They got more negative and the largest weights at positive zero point two got larger. 155 00:12:19,280 --> 00:12:24,410 This was something that was quite difficult to see on the histogram but if you look very closely you 156 00:12:24,410 --> 00:12:28,880 can see that the tails off the distribution got a bit longer. 157 00:12:28,880 --> 00:12:29,180 Right. 158 00:12:29,210 --> 00:12:34,670 So initially the truncated distribution kind of was very steep and it ended here but then it kind of 159 00:12:34,670 --> 00:12:37,160 fanned out a bit by step 49. 160 00:12:37,160 --> 00:12:41,810 In other words the changes in that first hidden layer were happening on the edges. 161 00:12:42,380 --> 00:12:48,200 So in summary this histogram tab and its distribution tab gives you a chance to see what's happening 162 00:12:48,200 --> 00:12:52,100 with your weights what's happening with your biases over time. 163 00:12:52,130 --> 00:12:58,250 And when I say time I mean over the course of the training is your network learning whereas it learning 164 00:12:58,480 --> 00:13:05,990 and such tensor board really acts as a flashlight to help you better understand the goings on of your 165 00:13:05,990 --> 00:13:07,420 neural network. 166 00:13:07,430 --> 00:13:14,270 So what I want to do now is I want to make a change to my hyper parameters and then I want to see how 167 00:13:14,270 --> 00:13:21,080 these histogram is and how does distributions and how my scales change in response the hyper parameter 168 00:13:21,140 --> 00:13:26,300 that I'm going to change is going to be my learning rate I'm going to make it 10 times larger than what 169 00:13:26,300 --> 00:13:27,480 it is currently. 170 00:13:27,770 --> 00:13:37,220 I'm going to change it from E to the minus four which is zero point zero zero zero one to e to the minus 171 00:13:37,220 --> 00:13:41,020 three which is zero point zero zero one. 172 00:13:41,030 --> 00:13:47,210 So without further ado you change this parameter here. 173 00:13:47,360 --> 00:13:52,780 Now I'm going to click here and say run all below. 174 00:13:54,050 --> 00:14:01,730 So let's run this thing again for 50 epochs and we'll take a look back intensive board and when I refresh 175 00:14:01,730 --> 00:14:04,640 the page just kick start this thing. 176 00:14:04,640 --> 00:14:06,790 So we've got a higher learning rate now. 177 00:14:07,070 --> 00:14:16,130 And what we can see and what we can see is that we're approaching a much higher accuracy much faster. 178 00:14:16,220 --> 00:14:17,770 So this is quite interesting right. 179 00:14:17,780 --> 00:14:25,790 Using a higher learning rate we've been able to train our model a lot quicker in fewer epochs than with 180 00:14:25,790 --> 00:14:27,540 this lower learning rate. 181 00:14:27,590 --> 00:14:34,310 So that's a massive improvement in performance just from changing this one hyper parameter check out 182 00:14:34,310 --> 00:14:40,010 the performance in blue here on the validation data set with each of the minus four versus e to the 183 00:14:40,010 --> 00:14:41,060 minus three. 184 00:14:41,210 --> 00:14:46,070 If we look at the loss we had a massive decrease at epoch number five. 185 00:14:46,160 --> 00:14:54,860 And then slowly starting to stabilize slowly starting to decrease at a much much lower rate. 186 00:14:54,860 --> 00:14:59,630 So the question is really what it's start over fitting by the end maybe we have to change our model 187 00:14:59,630 --> 00:15:05,560 of and add some regularization or a drop out layer to prevent it from over fitting. 188 00:15:06,060 --> 00:15:12,360 Okay so I'm done training my accuracy on my training data set is very very high. 189 00:15:12,390 --> 00:15:14,630 But that's not surprising. 190 00:15:14,650 --> 00:15:17,780 The key is the accuracy on the validation data set. 191 00:15:17,860 --> 00:15:21,430 Looking at the performance this seems very promising. 192 00:15:21,430 --> 00:15:27,580 Towards the end of training we've got an incredibly high accuracy around ninety eight percent on the 193 00:15:27,580 --> 00:15:30,950 validation data set and looking at our loss. 194 00:15:31,060 --> 00:15:32,700 It doesn't seem to be going up. 195 00:15:32,710 --> 00:15:34,940 It does seem to have stabilized. 196 00:15:34,940 --> 00:15:37,170 It doesn't seem to be decreasing that much anymore. 197 00:15:37,240 --> 00:15:43,640 As we're continuing to train but it doesn't seem to have started over fitting yet in particular but 198 00:15:43,660 --> 00:15:51,220 I want to pull up side by side are my histogram is for that first run with that really small learning 199 00:15:51,220 --> 00:15:58,580 rate and my histogram is for my second run in with my learning rate that was ten times larger. 200 00:15:58,600 --> 00:16:00,280 So here are my biases. 201 00:16:00,430 --> 00:16:07,720 And here are my weights and what I can see that is that the distribution changed massively between the 202 00:16:07,720 --> 00:16:14,650 two runs right now if you remember on our first run by the end of the training our distribution had 203 00:16:14,650 --> 00:16:21,610 a much wider tail but on the second run the change in the weights even in the very first layer was much 204 00:16:21,610 --> 00:16:23,050 more dramatic. 205 00:16:23,050 --> 00:16:28,870 We see the same pattern emerge where previously the weights weren't really changing all that much. 206 00:16:28,870 --> 00:16:36,130 With that lower learning rate now the all the distribution is shifting from epoch to epoch. 207 00:16:36,130 --> 00:16:38,420 And the same is true for the output layer. 208 00:16:38,560 --> 00:16:44,020 So previously when the learning rate was low most of the adjustments seemed to have been made in the 209 00:16:44,020 --> 00:16:48,930 output layer and then fewer and fewer adjustments happen in the second layer. 210 00:16:48,970 --> 00:16:54,880 And then the first hidden layer this time around there were also big changes in the output layer. 211 00:16:55,060 --> 00:16:59,190 So given that the distribution of these 10 weights started out like this. 212 00:16:59,250 --> 00:17:05,110 And by the end we had something more like this more and more of the learning seems to have filtered 213 00:17:05,110 --> 00:17:11,580 through to the second hidden layer and the very first hidden layer when the learning rate was higher. 214 00:17:11,830 --> 00:17:13,740 But that's just the weight side of course. 215 00:17:13,870 --> 00:17:17,320 In both cases the biases changed dramatically. 216 00:17:17,320 --> 00:17:23,160 So even with the learning rate was low the biases changed quite a lot in the output layer in the first 217 00:17:23,280 --> 00:17:23,630 layer. 218 00:17:23,650 --> 00:17:29,260 And the second hidden layer and we see the same pattern emerge with the higher learning rate the biases 219 00:17:29,350 --> 00:17:36,010 also changed quite a lot which is a good indication that our model is learning having looked at the 220 00:17:36,010 --> 00:17:42,730 histogram side by side for the two runs allowed us to see the differences from the beginning and the 221 00:17:42,730 --> 00:17:45,100 end for the first run in the second run. 222 00:17:45,100 --> 00:17:49,950 But we can also look at the change in the percentiles for the distributions. 223 00:17:50,410 --> 00:17:56,500 The smallest changes occurred in that first hidden layer there we had bigger changes in the biases and 224 00:17:56,500 --> 00:17:59,690 smaller changes in the weights with the higher learning rate. 225 00:17:59,740 --> 00:18:06,040 We still had quite big changes in the biases and smaller changes in the weights but comparing the distribution 226 00:18:06,040 --> 00:18:11,230 in the percentiles between the second run and the first run we can see that in the second run we've 227 00:18:11,230 --> 00:18:13,380 got much more of a spread. 228 00:18:13,540 --> 00:18:15,920 The same is true for the second hidden lamb. 229 00:18:16,060 --> 00:18:23,080 Notice for instance the scale here on the y axis for the first run we're looking at values between zero 230 00:18:23,080 --> 00:18:25,450 point three and minus zero point three. 231 00:18:25,590 --> 00:18:31,510 And on that second run we're looking at values spewing minus zero point six and zero point six almost 232 00:18:31,510 --> 00:18:34,270 double the range in the previous module. 233 00:18:34,390 --> 00:18:38,350 I showed you how to use a tripod layer in Caris. 234 00:18:38,350 --> 00:18:42,490 Now I want to show you how to include a drop out layer just using tensor flow. 235 00:18:42,490 --> 00:18:45,130 This is more for illustration than necessity. 236 00:18:45,130 --> 00:18:50,280 I don't think this model is over fitting but I would like to cover this nonetheless because you're going 237 00:18:50,280 --> 00:18:52,470 to need it on your future projects. 238 00:18:52,510 --> 00:18:57,400 I'll copy paste this cell here and I'm going to comb it out. 239 00:18:57,430 --> 00:18:58,450 This line of code here. 240 00:18:59,200 --> 00:19:03,350 So this was our model without drop out. 241 00:19:03,610 --> 00:19:09,190 And here we're going to include a drop out layer and we're going to include that droplet layer after 242 00:19:09,220 --> 00:19:10,450 our first hidden layer. 243 00:19:11,110 --> 00:19:13,820 So I'm going to change my model name to include this. 244 00:19:13,860 --> 00:19:19,700 Oh here for dropout and then what I'm going to do is I'm going to add a dropout layer here. 245 00:19:20,080 --> 00:19:27,070 Call this layer on a squat drop and the way I'm going to add this dropout layer is using T F dot and 246 00:19:27,070 --> 00:19:28,390 then dot dropout. 247 00:19:28,750 --> 00:19:30,990 This will create the dropout layer for me. 248 00:19:31,060 --> 00:19:38,650 All it needs is an x and something called a keep probability the keep probability is the probability 249 00:19:38,920 --> 00:19:41,050 that an element is kept. 250 00:19:41,050 --> 00:19:47,710 So if I wanted 20 percent of the neurons in the previous layer to drop out then I would set my key probability 251 00:19:48,040 --> 00:19:49,210 at 80 percent. 252 00:19:49,960 --> 00:19:51,220 So let's try it out. 253 00:19:51,280 --> 00:19:59,600 TMF Dot and Dot drop out parentheses and then it needs the input the input will be the previous layer. 254 00:19:59,810 --> 00:20:02,440 So layer under score one. 255 00:20:02,750 --> 00:20:04,670 Then we need to specify the. 256 00:20:04,790 --> 00:20:09,450 Keep probability and let's call this zero point eight. 257 00:20:09,530 --> 00:20:12,970 And finally let's give this whole thing a name right. 258 00:20:13,050 --> 00:20:23,810 Name is equal to or drop out on a score layer but since we added this Trump on him we have to change 259 00:20:23,930 --> 00:20:26,560 the input for Layer number two. 260 00:20:26,570 --> 00:20:29,440 In this case the input is no longer in a row one. 261 00:20:29,750 --> 00:20:31,790 But it's this tripod layer in between. 262 00:20:32,090 --> 00:20:39,320 So it's not really a real layer but it is what forms the input to that second hidden layer. 263 00:20:39,350 --> 00:20:39,770 All right. 264 00:20:39,800 --> 00:20:44,400 So my training has finished and I've refreshed tensor flow. 265 00:20:44,600 --> 00:20:53,960 So in green we have the validation run with dropout and in blue we have the validation run without dropout. 266 00:20:55,010 --> 00:21:00,890 What we see is kind of as expected when you include drop out then the learning takes place a little 267 00:21:00,890 --> 00:21:02,370 bit more slowly. 268 00:21:02,450 --> 00:21:08,240 That's why the performance of the model with the drop out is a bit below the previous one. 269 00:21:08,480 --> 00:21:15,050 But it does catch up and towards the end both are pretty much identical in terms of the loss. 270 00:21:15,050 --> 00:21:20,840 We didn't really have a problem with over fitting previously and we most certainly don't have one now. 271 00:21:21,020 --> 00:21:22,700 So that's concludes this lesson. 272 00:21:22,700 --> 00:21:26,150 We've explored tensor board extensively. 273 00:21:26,210 --> 00:21:32,450 We've looked at a lot of different summaries from scales to images to our graph to the history grams 274 00:21:32,540 --> 00:21:38,290 and the distributions and we were able to see what was happening as we were training our model. 275 00:21:38,360 --> 00:21:42,220 We were able to see our lost decrease and change over time. 276 00:21:42,260 --> 00:21:45,260 We were able to see our accuracy increase over time. 277 00:21:45,320 --> 00:21:52,370 We're able to see how much of a difference the learning rate can make and we're also able to see how 278 00:21:52,370 --> 00:21:58,180 our biases and our weights changed over time as we were training our model. 279 00:21:58,190 --> 00:22:04,610 The changes were most significant in the output layer which had the fewest neurons and then the changes 280 00:22:04,610 --> 00:22:12,030 got smaller and smaller as they propagated down the network to that first hidden layer which had 512 281 00:22:12,110 --> 00:22:13,470 neurons. 282 00:22:13,490 --> 00:22:18,560 Initially it seemed that the biases were doing most of the adjustment but when we cranked up the learning 283 00:22:18,560 --> 00:22:24,660 rate we could see that there were also bigger changes happening with the weights in the next lesson. 284 00:22:24,860 --> 00:22:31,130 We're gonna be evaluating our model on our testing dataset and we're also going to make a prediction 285 00:22:31,550 --> 00:22:34,300 on a single image for all that and more. 286 00:22:34,490 --> 00:22:35,870 I'll see you in the next lesson.