1 00:00:00,910 --> 00:00:07,390 All right so in this lesson the rubber is going to meet the road in this lesson we're going to build 2 00:00:07,570 --> 00:00:10,710 our model out in Charisse. 3 00:00:11,060 --> 00:00:16,400 Now the thing is anytime you're doing anything intensive flow and in this case carris is using tensor 4 00:00:16,400 --> 00:00:18,350 flow as the back end. 5 00:00:18,470 --> 00:00:24,440 It involves using a three step process to build and construct your model. 6 00:00:24,440 --> 00:00:27,300 The first step is defining your model. 7 00:00:27,350 --> 00:00:30,420 So you have to set out the structure of the model. 8 00:00:30,440 --> 00:00:36,380 That's step one step two is compiling the model. 9 00:00:36,530 --> 00:00:42,350 And that means telling tensor flow in advance how you'd want to measure the loss what kind of calculations 10 00:00:42,350 --> 00:00:44,580 you'd want to run to adjust the weights. 11 00:00:44,630 --> 00:00:47,360 And this is still part of the setup. 12 00:00:47,360 --> 00:00:50,740 We need to tell tensor flow how it should optimize our model. 13 00:00:50,840 --> 00:00:56,300 Once it comes to training it tensor flow needs to know about the calculations needs to run ahead of 14 00:00:56,300 --> 00:01:03,900 time speaking of training the model that's actually step three training the model is what you can do 15 00:01:03,990 --> 00:01:11,000 after you've done the setup and the word you'll see to describe this increase is the word fit during 16 00:01:11,000 --> 00:01:15,710 this step tensor flow will actually crunch through the data and do all the calculations. 17 00:01:16,520 --> 00:01:22,100 So let's talk through Step 1 defining the model because we're going to use carrots and I want to talk 18 00:01:22,100 --> 00:01:27,410 you through the code before we're actually going to write it in Jupiter notebook. 19 00:01:27,410 --> 00:01:31,250 Let's look at a simple example for image classification. 20 00:01:31,340 --> 00:01:37,700 Suppose we are aiming to have this structure here that I'm showing here on the slide this very very 21 00:01:37,700 --> 00:01:42,570 simple artificial neural network that we're going to build actually has a name. 22 00:01:42,620 --> 00:01:50,990 It's called the multi layer perception and since that one is defining our model we have to set up the 23 00:01:51,080 --> 00:01:53,300 architecture for this perception. 24 00:01:53,420 --> 00:01:55,650 And that means creating the layers. 25 00:01:56,180 --> 00:02:02,330 The very first layer that we will create in our code is actually not the input layer. 26 00:02:02,390 --> 00:02:09,560 The very first layer that we need to create is the first hidden layer Charisse only needs a little bit 27 00:02:09,560 --> 00:02:13,420 of help from us to work out what the input layer is. 28 00:02:13,490 --> 00:02:15,680 So what does this help that we need to provide carrots. 29 00:02:16,340 --> 00:02:21,850 Well we have to tell carrots how many inputs there are to this first layer. 30 00:02:22,190 --> 00:02:28,280 If we're working with an image then the number of inputs is gonna be determined by the resolution of 31 00:02:28,280 --> 00:02:31,520 that image and the color space of that image. 32 00:02:31,970 --> 00:02:40,070 So if I've got this image right here and it's 32 pixels by 32 pixels and it's a color image then the 33 00:02:40,070 --> 00:02:51,350 total number of inputs is gonna be 32 by 32 by 3 3 because we've got a red green and blue value for 34 00:02:51,440 --> 00:03:00,660 every single pixel so 32 by 32 by three is gonna be equal to three thousand and seventy two. 35 00:03:00,660 --> 00:03:07,200 That's the information we're gonna provide in our code when we create our first hidden layer the function 36 00:03:07,200 --> 00:03:14,620 call from Caris to create this layer is called dense dense will create all the neurons for us in this 37 00:03:14,620 --> 00:03:21,010 layer and the arguments that we have to feed to the dance function is the number of outputs that we 38 00:03:21,010 --> 00:03:21,570 want. 39 00:03:21,580 --> 00:03:27,250 So in this case the number of units or the number of neurons in the layer itself. 40 00:03:27,250 --> 00:03:32,110 And because we are calling this for the very first time and we're creating our very first hidden layer 41 00:03:32,470 --> 00:03:35,410 we have to specify the input dimensions. 42 00:03:35,410 --> 00:03:39,180 And this is where that three thousand and seventy two comes in. 43 00:03:39,250 --> 00:03:43,720 The final thing that we should specify is the activation function. 44 00:03:43,780 --> 00:03:46,900 In other words how are the neurons going to behave. 45 00:03:46,900 --> 00:03:48,420 And again we have a choice here. 46 00:03:48,760 --> 00:03:51,190 And I'm going to talk about this in detail in a minute. 47 00:03:52,000 --> 00:03:56,710 So now that we've set up the first hidden layer what would our code look like for a second hidden layer 48 00:03:57,460 --> 00:04:03,670 this hidden layer has five outputs because there's five nodes so the number five has to be specified 49 00:04:03,850 --> 00:04:06,700 as the units in the dense function. 50 00:04:06,790 --> 00:04:13,510 And again we specified activation function but in this case carries a smart enough to know how many 51 00:04:13,600 --> 00:04:16,710 inputs the second hidden layer is going to have. 52 00:04:16,720 --> 00:04:22,120 It only needed to know the number of inputs explicitly for the very very first hidden layer for that 53 00:04:22,120 --> 00:04:27,490 second layer it knows to look towards the first hidden layer that we've already created to work out 54 00:04:27,610 --> 00:04:31,860 the number of inputs let me ask you this. 55 00:04:31,940 --> 00:04:39,040 What's the code roughly going to look like for that final output layer unsurprisingly. 56 00:04:39,060 --> 00:04:44,910 We're also going to create this layer using the dense function from Caris and for the number of units 57 00:04:45,180 --> 00:04:46,000 we've got for. 58 00:04:46,020 --> 00:04:51,630 Because we've got four neurons on the slide but the only difference here is for the output layer we've 59 00:04:51,630 --> 00:04:55,180 got a different activation function instead of Rela. 60 00:04:55,200 --> 00:04:57,890 We're using soft banks now. 61 00:04:57,990 --> 00:05:01,200 I promise to you to talk a little bit more about activation of functions. 62 00:05:01,200 --> 00:05:02,050 So let's do that now. 63 00:05:02,070 --> 00:05:08,280 This is a really good opportunity to dive into this topic in a little bit more detail one of the things 64 00:05:08,280 --> 00:05:14,730 that we really talked about with neurons is that they may activate and pass that signal down to all 65 00:05:14,730 --> 00:05:18,240 the neurons that are connected downstream. 66 00:05:18,240 --> 00:05:24,900 The very first models of these neurons imagined a sort of binary output for the neurons. 67 00:05:25,000 --> 00:05:30,900 The imagine that the neuron would either activate or not activate and this can be modeled with an activation 68 00:05:30,900 --> 00:05:34,840 function that takes the form of a stepwise function. 69 00:05:34,890 --> 00:05:41,550 In other words if the activation function looks like this then you either get a value between 0 and 70 00:05:41,550 --> 00:05:44,340 1 from the neuron. 71 00:05:44,340 --> 00:05:47,730 Now these were sort of the early models from the 1940s. 72 00:05:47,730 --> 00:05:54,550 Since then we found that well maybe a neuron can activate strongly or weakly or somewhere in the middle. 73 00:05:54,570 --> 00:05:58,910 So we really want to get sort of a range of different signals from a neuron. 74 00:05:59,040 --> 00:06:04,680 And if this activation function were shaped a little bit differently then we can accomplish exactly 75 00:06:04,680 --> 00:06:08,450 that end to a different activation function. 76 00:06:08,520 --> 00:06:16,710 In this case we've got a sigmoid activation function which will give us a signal between 0 and 1. 77 00:06:16,710 --> 00:06:22,960 Now our neuron can send a strong or a weak signal depending on where on the curve it lies. 78 00:06:22,980 --> 00:06:28,680 Looking at this chart if it's getting input value of six then it's going to send a strong signal of 79 00:06:28,680 --> 00:06:29,000 one. 80 00:06:29,430 --> 00:06:34,080 But if it's getting an input value of negative six then it's not going to activate it it's not going 81 00:06:34,080 --> 00:06:37,370 to send anything if the inputs are somewhere in between. 82 00:06:37,470 --> 00:06:41,550 It's going to activate by 25 percent or by 75 percent. 83 00:06:41,550 --> 00:06:48,030 This is the idea behind an activation function that's a curve like the sigmoid function but a few minutes 84 00:06:48,030 --> 00:06:52,830 earlier we've seen two other functions we've seen the real function and a soft backs function as the 85 00:06:52,830 --> 00:06:54,780 activation function for the neurons. 86 00:06:54,810 --> 00:06:58,260 So what are those and what do they look like. 87 00:06:58,260 --> 00:07:06,060 Well the really function stands for rectified linear unit which is quite a mouthful and why everybody 88 00:07:06,390 --> 00:07:07,630 appreciates this. 89 00:07:07,800 --> 00:07:15,630 Reload and the function itself actually has a very simple shape it's a straight line for all the negative 90 00:07:15,630 --> 00:07:21,660 values and it's a 45 degree line for all the positive values. 91 00:07:21,660 --> 00:07:27,240 Now we've seen three different activation of functions in this case so which one should you choose. 92 00:07:27,240 --> 00:07:29,370 Well there's actually many more. 93 00:07:29,370 --> 00:07:33,390 If you go to the cross documentation we are presented with a menu. 94 00:07:33,420 --> 00:07:43,800 So you've got available activations soft backs you see Lou soft plus soft sign reload which we've talked 95 00:07:43,800 --> 00:07:48,350 about 10 H sigmoid hard sigmoid and so on. 96 00:07:48,390 --> 00:07:53,160 So we've got quite a few different activations to choose from for our neurons. 97 00:07:53,160 --> 00:07:58,560 Now at first this seems really intimidating but there's an easy solution. 98 00:07:58,560 --> 00:08:04,950 What you often find is that a particular field a particular type of problem actually favors a particular 99 00:08:04,950 --> 00:08:06,390 activation function. 100 00:08:06,390 --> 00:08:12,030 So all you have to do is look what kind of activation functions are used for the problem that you're 101 00:08:12,030 --> 00:08:12,770 trying to solve. 102 00:08:13,200 --> 00:08:19,170 And stick with that one that's going to be a really good starting point in our case. 103 00:08:19,170 --> 00:08:20,970 We're gonna go off the reload function. 104 00:08:21,060 --> 00:08:27,740 This is going to be our activation function of choice for all the neurons in our hidden layers. 105 00:08:27,780 --> 00:08:30,380 The big exception was the output layer. 106 00:08:30,510 --> 00:08:33,080 This is where we saw that would soft Max. 107 00:08:33,270 --> 00:08:36,980 We had real activation functions in our first hidden layer. 108 00:08:37,000 --> 00:08:43,070 And our second hidden layer but we had the soft Max activation function in our output layer. 109 00:08:43,080 --> 00:08:44,580 Why is that. 110 00:08:44,580 --> 00:08:51,980 Well soft Max is a mathematical function that will transform our outputs into a probability. 111 00:08:52,650 --> 00:08:57,270 And this means that we get a really nice interpretation for our output. 112 00:08:57,270 --> 00:09:03,540 In other words the model can tell us something like there is a 20 percent chance of this image containing 113 00:09:03,540 --> 00:09:04,380 a cat. 114 00:09:04,410 --> 00:09:10,160 And the reason that we can interpret the output of a model like this is thanks to this soft Max activation 115 00:09:10,170 --> 00:09:17,610 function in the output layer soft Max will give us a distribution for all our output numbers and all 116 00:09:17,610 --> 00:09:24,660 these numbers are going to be between 0 and 1 and they're all going to sum to one soft Max is basically 117 00:09:24,660 --> 00:09:27,510 going to reach scale our output. 118 00:09:27,510 --> 00:09:33,240 And this is why you often see soft Max in the output layer for all of these multi class classification 119 00:09:33,240 --> 00:09:34,520 problems. 120 00:09:34,650 --> 00:09:36,300 But enough about the theory. 121 00:09:36,420 --> 00:09:42,750 Let's head back into our Jupiter notebook and configure our multilayer perception. 122 00:09:42,960 --> 00:09:53,700 I'll add a markdown cell here that's going to read define the neural network using carrots. 123 00:09:53,750 --> 00:10:00,120 Next we're going to go to the very top to our input statements and input the caris functionality that 124 00:10:00,120 --> 00:10:16,070 we need and that includes from Caris thought models import sequential also from Kerry's dot layers. 125 00:10:16,080 --> 00:10:25,800 We're gonna import you guessed that dense and we're also going to import activation lets it shift enter 126 00:10:25,800 --> 00:10:33,510 on the cell scroll all the way back down and add our code to create our perception. 127 00:10:33,510 --> 00:10:38,370 So I'm going to call this one model underscore one because we're gonna have more than one model that 128 00:10:38,370 --> 00:10:40,600 we're going to try out next. 129 00:10:40,650 --> 00:10:49,470 I'm going to use sequential parentheses and then square brackets and to add on this line between the 130 00:10:49,470 --> 00:10:50,820 two square brackets. 131 00:10:50,820 --> 00:10:57,940 I'm going to call Dennis open parentheses and I'll say units as equal to. 132 00:10:58,170 --> 00:11:03,160 And here I get to choose how many output units my first hidden layer will have. 133 00:11:03,420 --> 00:11:06,630 I'm going to go with 1 128. 134 00:11:06,750 --> 00:11:13,080 Put a comma after that and then I'm going to specify my input dimensions. 135 00:11:13,080 --> 00:11:16,970 Input underscore DRM and a set that equal to. 136 00:11:16,980 --> 00:11:20,960 We said well three thousand and seventy two right. 137 00:11:21,060 --> 00:11:24,470 But we don't have to use this magic number here. 138 00:11:24,600 --> 00:11:26,910 We've already added a constant here. 139 00:11:27,360 --> 00:11:29,730 So let's use this one instead. 140 00:11:29,790 --> 00:11:32,600 Total underscore inputs. 141 00:11:33,720 --> 00:11:41,990 We'll put that right here and then we can specify something else namely the activation. 142 00:11:42,060 --> 00:11:46,860 And here we're going to go with single quotes reload all lowercase. 143 00:11:47,130 --> 00:11:51,410 That's our very first hidden layer in our neural network. 144 00:11:51,690 --> 00:11:57,690 Now that we've got our first 10 layered on let's add a couple of more layers to this neural network. 145 00:11:57,690 --> 00:12:02,620 So to do that we just have to call that dense function a couple of more times. 146 00:12:02,660 --> 00:12:09,450 So when I had a comma here on this line go down to the next line and add our second hidden layer here. 147 00:12:09,450 --> 00:12:12,620 And this is going to have 64 neurons in it. 148 00:12:13,260 --> 00:12:20,670 And as the activation function we're also going to go with redo rectified linear unit but we don't have 149 00:12:20,670 --> 00:12:28,570 to specify the input dimensions because Caris is smart enough to work this out from the previous layer. 150 00:12:28,860 --> 00:12:30,740 That's our second hidden layer done. 151 00:12:30,840 --> 00:12:33,000 Let's add a third hidden layer. 152 00:12:33,150 --> 00:12:34,580 We're really really going deep. 153 00:12:34,620 --> 00:12:36,270 Inception style. 154 00:12:36,660 --> 00:12:46,260 And on this one we'll have sixteen units or 16 neurons will add the activation function again will specify 155 00:12:46,590 --> 00:12:51,540 reload and add a comma at the end go to the next line. 156 00:12:51,540 --> 00:12:54,450 And here we're going to specify our output layer. 157 00:12:54,960 --> 00:12:58,050 In this case we're going to have 10 different outputs. 158 00:12:58,050 --> 00:12:58,600 Right. 159 00:12:58,620 --> 00:13:03,670 We're gonna have 10 different categories that we're differentiating in the model. 160 00:13:03,720 --> 00:13:07,290 So we'll have 10 neurons or 10 units. 161 00:13:07,290 --> 00:13:13,770 And for the activation we won't have really too but because this is our output layer we're gonna be 162 00:13:13,770 --> 00:13:22,530 using soft Max soft Max is ideal for giving us a probability interpretation in a classification problem. 163 00:13:22,530 --> 00:13:26,760 So this is what we'll use when creating the layers. 164 00:13:26,790 --> 00:13:32,700 I've added the parameter name here for the first two and I've left it out for the second two. 165 00:13:32,760 --> 00:13:38,760 This is how you'll often see it written in the documentation and on Stack Overflow. 166 00:13:38,760 --> 00:13:42,600 Now let's hit shift enter on the cell and that's it. 167 00:13:42,600 --> 00:13:44,530 We've created our Caris model. 168 00:13:44,640 --> 00:13:53,160 If we check the type of model underscore one we see that this object is a sequential model. 169 00:13:53,250 --> 00:13:54,520 Fantastic. 170 00:13:54,570 --> 00:13:56,660 So we figured out quite a lot of stuff. 171 00:13:56,730 --> 00:14:03,240 We've written some very terse code where a lot is happening behind the scenes and we've demystified 172 00:14:03,510 --> 00:14:07,760 a lot of the vocabulary that we're seeing in this code here. 173 00:14:07,860 --> 00:14:14,520 Having laid out the structure of our model with in this case three hidden lives and an output layer 174 00:14:15,060 --> 00:14:17,970 it's time to compile our model. 175 00:14:17,970 --> 00:14:24,540 It's time to specify what calculations are going to take place during the training step and how the 176 00:14:24,540 --> 00:14:27,720 weights are adjust that and how the losses are measured. 177 00:14:27,720 --> 00:14:30,600 All that and more in the next lesson. 178 00:14:30,630 --> 00:14:31,500 I'll see you there.