1 00:00:00,210 --> 00:00:06,960 In this lesson we're finally going to train our model and this is going to involve starting a session 2 00:00:07,650 --> 00:00:11,940 patching our data and then finally running the training loop. 3 00:00:12,000 --> 00:00:14,910 It's gonna be a pretty jam packed lesson coming up. 4 00:00:14,910 --> 00:00:16,280 Let's dive right in. 5 00:00:16,620 --> 00:00:23,180 The first thing I'll do is I'll add a markdown cell here that reads run session. 6 00:00:24,570 --> 00:00:32,310 So the weight tensor flow works is that it's got these session objects and the session objects encapsulate 7 00:00:32,430 --> 00:00:40,170 the environment under which all the operations and all the calculations take place and are executed. 8 00:00:40,170 --> 00:00:46,380 So we've done all this setup previously with worked out these place holders and we've said what calculations 9 00:00:46,380 --> 00:00:52,770 should take place and what the operation is going to be for minimizing the loss of our lost function 10 00:00:53,070 --> 00:00:54,020 and so on. 11 00:00:54,180 --> 00:00:59,640 All of these calculations only take place inside a session. 12 00:00:59,640 --> 00:01:08,730 The way we create one of these is with T F dot session and parentheses at the end. 13 00:01:08,730 --> 00:01:14,920 I'm actually going to store this session that I'm creating here in a variable called cess. 14 00:01:15,090 --> 00:01:20,670 The next thing I have to do is I'm going to have to initialize all the variables. 15 00:01:20,670 --> 00:01:24,830 I'll just add a comment here so I'll say in it. 16 00:01:25,020 --> 00:01:33,690 These are all our variables and they're going to be initialized with T F dot global variables initialized. 17 00:01:34,410 --> 00:01:36,060 So there we go. 18 00:01:36,060 --> 00:01:40,880 And next I want to feed all of these variables to the session. 19 00:01:41,040 --> 00:01:46,940 So says don't run open parentheses in it. 20 00:01:47,520 --> 00:01:50,340 These three lines of code get us all set up. 21 00:01:51,390 --> 00:01:58,600 So let me come up here where I see set up tensor flow graph and go to cell and run all below. 22 00:01:58,620 --> 00:02:05,220 Now that we've create our session initialized our variables and have run our session we can actually 23 00:02:05,220 --> 00:02:08,050 peek inside some of these tenses. 24 00:02:08,100 --> 00:02:14,370 So for example if we wanted to see what weights we've actually got for our first hidden layer then we 25 00:02:14,370 --> 00:02:23,070 can come down here and we can say w 1 will take our tensor and we'll say Val evaluate and then we have 26 00:02:23,070 --> 00:02:30,790 to provide the session as an argument if I had shift into I can actually see our starting weights. 27 00:02:31,020 --> 00:02:34,250 The same is true for our biases right. 28 00:02:34,250 --> 00:02:41,990 B One dot eval and then our session will show us that our biases are all initialized with the values 29 00:02:41,990 --> 00:02:44,490 0 for our first hidden layer. 30 00:02:44,490 --> 00:02:51,740 We've cut a lot of biases but for our output layer b 3 We've only got 10. 31 00:02:52,170 --> 00:02:57,990 So the way the tensor flow works is that with the setup what we're doing is we're building out this 32 00:02:58,170 --> 00:03:00,560 graph these calculations. 33 00:03:00,870 --> 00:03:04,690 It's almost like laying pipes when we actually run a session. 34 00:03:04,800 --> 00:03:10,230 The data starts flowing through these pipes that we've laid out and then we can actually evaluate our 35 00:03:10,230 --> 00:03:13,740 calculations and we can look at what's inside our variables. 36 00:03:13,740 --> 00:03:19,650 We can get some outputs and the calculations are actually executed. 37 00:03:19,650 --> 00:03:25,470 So now that we've covered that we're almost ready for writing our loop for training our model. 38 00:03:25,710 --> 00:03:33,120 But before we do that we're going to write some code that will split up our training data into smaller 39 00:03:33,120 --> 00:03:34,220 components. 40 00:03:34,350 --> 00:03:40,650 And the reason we're doing that is because we want to be able to train our model on batches of 1000 41 00:03:40,650 --> 00:03:43,140 samples at a time. 42 00:03:43,200 --> 00:03:49,040 Chances are that for your own projects you're gonna be working with quite large datasets. 43 00:03:49,080 --> 00:03:55,610 So having the skill of dividing up your data is gonna be essential let me add a quick mark down sell 44 00:03:55,610 --> 00:03:58,110 him and add a variable here. 45 00:03:58,130 --> 00:04:08,210 That's going to read size of batch and set that equal to 1000 and then I'll create three more variables. 46 00:04:08,210 --> 00:04:14,240 The number of examples I'm going to store in num underscore examples. 47 00:04:14,740 --> 00:04:20,080 And that's gonna be equal to the number of examples in our training dataset. 48 00:04:20,120 --> 00:04:26,580 So why on a school train dot shape square brackets zero. 49 00:04:26,600 --> 00:04:32,520 The next thing I'll do is figure out the number of iterations that we need for training. 50 00:04:32,570 --> 00:04:42,950 And here I see an our underscore iterations is equal to the number of examples divided by the size of 51 00:04:43,400 --> 00:04:44,000 the batch. 52 00:04:45,800 --> 00:04:52,960 Now if I wanted to make sure that this was indeed an integer then I could cast it or converted to integer 53 00:04:53,000 --> 00:04:55,420 using I.A. parentheses. 54 00:04:55,560 --> 00:04:58,760 Some examples divided by size of batch. 55 00:04:58,760 --> 00:05:05,810 The last thing I'll do is create an index variable so I'll see index on the score in underscore e Paul 56 00:05:06,470 --> 00:05:08,890 and I'll set that equal to zero. 57 00:05:09,140 --> 00:05:15,320 The index here will help us keep track of where one batch ends and the other batch of samples should 58 00:05:15,320 --> 00:05:22,460 start because what we want to do is we want the first batch to go from zero to nine hundred and ninety 59 00:05:22,470 --> 00:05:31,550 nine and we want the second batch to go from 1000 to one thousand nine and ninety nine and so on until 60 00:05:31,550 --> 00:05:37,280 we've chewed through all fifty thousand examples in our training dataset. 61 00:05:37,520 --> 00:05:41,580 The next thing we're gonna do is we're gonna define a function right. 62 00:05:41,600 --> 00:05:50,300 When I say next underscore batch is gonna encapsulate the logic for going from one batch to the next 63 00:05:50,300 --> 00:05:51,590 batch. 64 00:05:51,710 --> 00:05:54,560 This function of ours is going to take three parameters. 65 00:05:54,560 --> 00:05:57,770 The first will call batch size. 66 00:05:57,770 --> 00:06:00,740 So how big are the batches gonna be. 67 00:06:00,770 --> 00:06:03,720 The second thing is what's the dataset. 68 00:06:03,800 --> 00:06:06,800 And the third thing shall be the labels. 69 00:06:07,730 --> 00:06:16,010 So this function needs our exes and our Y's next I'm going to use the global keyword to get a hold of 70 00:06:16,130 --> 00:06:23,360 a variable that's outside of this function someone a reference number of examples which is outside right 71 00:06:23,360 --> 00:06:25,900 here using this global keyword. 72 00:06:26,750 --> 00:06:31,630 And I'm also going to reference the index in the epoch. 73 00:06:31,670 --> 00:06:35,790 This one here our index variable inside this function. 74 00:06:36,110 --> 00:06:42,250 And the reason I'm doing this is because we need to figure out the starting and the ending point the 75 00:06:42,260 --> 00:06:44,900 starting point is gonna be equal to our next variable. 76 00:06:45,230 --> 00:06:50,690 So it's gonna be equal to zero in the beginning but then our index will update. 77 00:06:50,690 --> 00:06:50,950 Right. 78 00:06:51,470 --> 00:06:58,070 So we'll take our index and we're going to add the batch size to it. 79 00:06:58,070 --> 00:07:03,600 So it starts out at zero and then next time round it's gonna be equal to 1000. 80 00:07:04,210 --> 00:07:13,310 So we'll see index in epoch is equal to index and epoch plus batch on the score size. 81 00:07:13,570 --> 00:07:20,390 Now oftentimes when you're updating a variable like this there is a shorthand notation so you can write 82 00:07:20,630 --> 00:07:24,460 plus equals and then batch size. 83 00:07:24,530 --> 00:07:30,350 So now that we've got the start figured out and we're moving our index along by the batch size we can 84 00:07:30,350 --> 00:07:31,340 think about the end. 85 00:07:31,340 --> 00:07:31,700 Right. 86 00:07:31,970 --> 00:07:39,320 So the end of the batch is gonna be at index in epoch. 87 00:07:39,320 --> 00:07:46,970 What this allows us to do is then return our x values and our y values that are between these two points 88 00:07:47,030 --> 00:07:54,860 between the starting point and the ending point so we can do this with our data square brackets start 89 00:07:55,280 --> 00:07:57,210 colon end. 90 00:07:57,470 --> 00:08:02,630 So what are we getting the very first time round that we call this function well we're gonna get all 91 00:08:02,630 --> 00:08:07,560 the values between 0 and 999. 92 00:08:07,580 --> 00:08:08,560 Why. 93 00:08:08,690 --> 00:08:12,010 Because it won't supply 1000 to the size of the batch. 94 00:08:12,590 --> 00:08:15,300 So the starting value will be equal to zero. 95 00:08:15,320 --> 00:08:21,440 Because that's where index and e pork is starting out then it's gonna be index and epoch is equal to 96 00:08:21,440 --> 00:08:27,280 itself plus 1000 and then the ending value will be equal to 1000. 97 00:08:27,440 --> 00:08:35,390 So if this notation you'll look at all the values between zero and this ending value not inclusive. 98 00:08:35,420 --> 00:08:36,320 Right. 99 00:08:36,380 --> 00:08:42,670 So why underscore train for example square brackets zero colon three. 100 00:08:42,740 --> 00:08:45,910 Well give us all the rows between 0 and 3. 101 00:08:45,920 --> 00:08:46,260 Right. 102 00:08:46,260 --> 00:08:47,940 0 1 2 right. 103 00:08:47,950 --> 00:08:51,890 That last one is at index 2 not 3. 104 00:08:51,980 --> 00:08:57,970 And that means the next time around this method gets called the starting value will be equal to 1000. 105 00:08:58,200 --> 00:09:03,020 Then the next updates to two thousand and then the ending value updates to two thousand. 106 00:09:03,240 --> 00:09:08,290 And we get everything between 1000 and one thousand nine hundred and ninety nine. 107 00:09:09,150 --> 00:09:14,520 But if we're going to return to things we're going to return our features which I've called data and 108 00:09:14,520 --> 00:09:23,040 we're going to return our labels right labels square brackets start colon and the only thing left to 109 00:09:23,040 --> 00:09:30,780 do is to include some logic between these two lines to capture what will happen when we get to the end 110 00:09:30,780 --> 00:09:32,490 of the dataset. 111 00:09:32,520 --> 00:09:39,930 So what will happen if the index is greater than the number of examples. 112 00:09:39,960 --> 00:09:43,420 So in this case we have to do a couple of things in this case. 113 00:09:43,470 --> 00:09:46,060 We should start the next epoch right. 114 00:09:46,080 --> 00:09:56,790 So start should be reset to equal zero and the index should be reset to the batch size. 115 00:09:56,790 --> 00:09:57,690 Brilliant. 116 00:09:57,690 --> 00:10:05,760 So now we can move on to our next step which is actually training the model lot a markdown cell here 117 00:10:06,210 --> 00:10:09,300 and that's going to read training loop. 118 00:10:09,300 --> 00:10:10,890 So what are we going to do him. 119 00:10:10,890 --> 00:10:18,960 What we'll do in our training loop is we'll go from epoch number 0 through to epoch number 5 so we'll 120 00:10:18,960 --> 00:10:25,640 start at zero and we'll go for whatever we've specified as one of our hyper parameters here. 121 00:10:25,770 --> 00:10:36,660 So we'll a for punk in range number of epochs so we're iterating through each epoch and then will iterate 122 00:10:36,870 --> 00:10:38,230 through our data itself. 123 00:10:38,370 --> 00:10:38,940 So we'll see. 124 00:10:38,940 --> 00:10:49,440 For i in range number of iterations number of iterations was equal to the number of examples divided 125 00:10:49,440 --> 00:10:52,500 by the size of the batch. 126 00:10:52,500 --> 00:10:56,130 If you'd like to know what that number is we can actually print it out here. 127 00:10:56,310 --> 00:10:59,410 It's 50 so there'll be 50 iterations. 128 00:10:59,410 --> 00:11:05,210 Paul take us through all 50000 examples 1000 examples at a time. 129 00:11:05,340 --> 00:11:06,560 It's a number of iterations. 130 00:11:07,310 --> 00:11:09,040 And what are we doing in each iteration. 131 00:11:09,210 --> 00:11:16,650 While the first thing is we'll need a batch of 1000 samples for features and 1000 samples for our labels 132 00:11:16,920 --> 00:11:28,380 so we'll see patch underscore X come up batch underscore Y is equal to next underscore batch the method 133 00:11:28,410 --> 00:11:32,370 that we created a minute ago will return to us two things. 134 00:11:32,580 --> 00:11:37,920 Some x values and some y values but it requires three inputs. 135 00:11:37,920 --> 00:11:42,710 So the patch signs are x values and y values right. 136 00:11:42,780 --> 00:11:52,410 So in this case they'll be batch size is equal to size of batch data shall be equal to X on a school 137 00:11:52,440 --> 00:11:59,580 train and labels shall be equal to y on the school train. 138 00:11:59,580 --> 00:12:02,780 That's that how we get to the really fun part. 139 00:12:02,850 --> 00:12:05,640 And by that I mean working with tensor flow. 140 00:12:06,330 --> 00:12:13,590 So what we have to do here is we have to create something called a feed dictionary a feed dictionary 141 00:12:13,710 --> 00:12:16,360 is nothing other than a Python dictionary. 142 00:12:16,410 --> 00:12:23,410 So we'll see feed underscore dictionary is equal to curly braces. 143 00:12:23,820 --> 00:12:29,150 And inside those curly braces will provide two key value pairs. 144 00:12:29,160 --> 00:12:37,350 The first is gonna be our X that capital X is gonna be the key and batch underscore X is gonna be the 145 00:12:37,350 --> 00:12:38,620 value. 146 00:12:38,640 --> 00:12:41,160 The second thing will be no surprise there. 147 00:12:41,190 --> 00:12:47,400 Capital Y as the key and batch underscore y as the value. 148 00:12:47,400 --> 00:12:52,760 The reason that we've created this is so that we can feed it to our session. 149 00:12:52,920 --> 00:12:56,300 Our session is gonna run all our calculations for us. 150 00:12:56,490 --> 00:13:00,680 So we'll take our session as ESFS. 151 00:13:00,690 --> 00:13:07,710 Remember we've created it up here and our session is gonna have this run method and a run method will 152 00:13:07,770 --> 00:13:09,420 do the calculations. 153 00:13:09,420 --> 00:13:17,970 The key calculation in our case is the training step the training step is the operation from our optimizer 154 00:13:18,150 --> 00:13:20,610 that minimizes our loss. 155 00:13:20,610 --> 00:13:22,740 What is it going to minimize the loss on. 156 00:13:23,490 --> 00:13:30,000 Well it's going to minimize it on the data that we're providing through our feed dictionary. 157 00:13:30,000 --> 00:13:37,380 So our feed dictionary is going to have our x's and our Y's are features and our labels and when we 158 00:13:37,380 --> 00:13:45,390 provide this alongside the lost minimization to the session it will run the calculations and update 159 00:13:45,660 --> 00:13:47,040 our weights. 160 00:13:47,040 --> 00:13:51,740 These two lines of code are really the second piece of this tensor flow puzzle. 161 00:13:51,810 --> 00:13:57,340 The first piece is setting up all these place holders and initializing all these variables and setting 162 00:13:57,340 --> 00:13:59,580 up all the calculations ahead of time. 163 00:13:59,590 --> 00:14:02,730 The second piece is actually running the calculations. 164 00:14:02,800 --> 00:14:10,060 So running that training step running our optimizer and running that calculation on our data which in 165 00:14:10,060 --> 00:14:13,750 our case is a batch of 1000 samples. 166 00:14:13,870 --> 00:14:18,410 But even though we're training our model let's not forget our accuracy metric. 167 00:14:18,430 --> 00:14:24,010 We've defined this calculation up here now's our chance to use it once again. 168 00:14:24,010 --> 00:14:29,140 This is where our session is going to come into play using the run method on our session can give us 169 00:14:29,140 --> 00:14:31,030 this accuracy as an output. 170 00:14:31,330 --> 00:14:41,380 So let's say batch underscore accuracy is equal to says Don't run parentheses. 171 00:14:41,400 --> 00:14:47,680 And here we need to specify what the session should fetch for us what output we want to get out of this 172 00:14:47,680 --> 00:14:51,580 session and tensor flow has a really funny word for this. 173 00:14:51,580 --> 00:14:52,930 It's called fetches. 174 00:14:52,930 --> 00:14:55,570 It's almost like a Caris callback. 175 00:14:55,570 --> 00:15:01,640 So what should fetches be equal to it should be equal to our accuracy calculation. 176 00:15:01,930 --> 00:15:07,930 And what's the accuracy going to be calculated on the accuracy is going to be calculated on our batch. 177 00:15:07,930 --> 00:15:15,250 So feed on a score dict is gonna be equal to our feed dictionary once again. 178 00:15:15,460 --> 00:15:21,730 Now since we're fetching this let's print it out but let's not print it out inside this for loop which 179 00:15:21,730 --> 00:15:23,300 will run 50 times. 180 00:15:23,380 --> 00:15:24,280 Let's print it out. 181 00:15:24,640 --> 00:15:27,280 Every epoch has that. 182 00:15:27,280 --> 00:15:31,080 So once as for loop is done well hit a print statement. 183 00:15:31,090 --> 00:15:36,280 We'll see print and I'll put an F string here and the first thing I'll do is I'll print out the epoch 184 00:15:36,310 --> 00:15:43,980 that's running so epoch curly braces lowercase epoch which is gonna be this one right here. 185 00:15:44,350 --> 00:15:52,420 Just the index in our loop and then I'm gonna tab over so I'll have an escape character backslash T 186 00:15:52,820 --> 00:16:04,340 and I'll put a pipe symbol there and I'll see training accuracy equal to curly braces batch underscore 187 00:16:05,500 --> 00:16:14,500 accuracy and when we're all done I'll just say print done training and this is the Mon we've all been 188 00:16:14,500 --> 00:16:14,990 waiting for. 189 00:16:14,990 --> 00:16:15,250 Right. 190 00:16:15,640 --> 00:16:21,920 So let's shift enter on this L in epoch 0. 191 00:16:22,030 --> 00:16:27,840 I've got a training accuracy on my batch of about 64 percent by Epoch 3. 192 00:16:27,910 --> 00:16:29,750 I'm up to 86 percent. 193 00:16:30,190 --> 00:16:38,290 And by epoch number four namely the fifth epoch I'm up to eighty seven point two percent on my training 194 00:16:38,290 --> 00:16:39,250 data set. 195 00:16:39,280 --> 00:16:40,970 Fantastic. 196 00:16:40,990 --> 00:16:47,470 So now that we've been successful in training our neural network we can add some bells and whistles 197 00:16:47,470 --> 00:16:56,290 to our code and we can also investigate in a bit more detail how it works what it does and we can modify 198 00:16:56,290 --> 00:17:02,320 it more to help us better evaluate what's actually going on for all of that. 199 00:17:02,470 --> 00:17:08,440 We're gonna be using tensor board once we have tensor boards set up we can start looking at our performance 200 00:17:08,800 --> 00:17:14,290 a bit more closely especially the performance on the evaluation data set. 201 00:17:14,290 --> 00:17:18,190 So far all of that and more I'll see you in the next lesson. 202 00:17:18,190 --> 00:17:19,800 You know the drill C in a bit.