1 00:00:00,266 --> 00:00:01,166 Hello my friends. 2 00:00:01,166 --> 00:00:04,200 Welcome to part two of this implementation 3 00:00:04,200 --> 00:00:08,266 where we're going to build together the artificial neural network. 4 00:00:08,333 --> 00:00:10,166 I'm super excited to start. 5 00:00:10,166 --> 00:00:12,333 We will do it in four steps. 6 00:00:12,333 --> 00:00:16,233 And the first step we will initialize to a and as a sequence of layers. 7 00:00:16,733 --> 00:00:18,900 Then we will add the input layer. 8 00:00:18,900 --> 00:00:20,533 And the first hidden layer 9 00:00:20,533 --> 00:00:23,866 composed of a certain number of neurons which we'll choose together. 10 00:00:24,233 --> 00:00:26,133 Then we will add the second hidden layer. 11 00:00:26,133 --> 00:00:27,300 You know, in order to build 12 00:00:27,300 --> 00:00:31,800 indeed a deep learning model as opposed to a shallow learning model. 13 00:00:32,066 --> 00:00:35,200 And then finally we will add the output layer 14 00:00:35,333 --> 00:00:37,600 which will contain what we want to predict. 15 00:00:37,600 --> 00:00:38,100 All right. 16 00:00:38,100 --> 00:00:38,966 Let's do this. 17 00:00:38,966 --> 00:00:42,466 We're going to tackle these four steps in this one same tutorial. 18 00:00:42,500 --> 00:00:45,500 So let's do this starting with this one initializing 19 00:00:45,500 --> 00:00:48,500 the a and then as sequence of layers. 20 00:00:48,600 --> 00:00:50,333 So let's create new code cell. 21 00:00:50,333 --> 00:00:53,333 And now let me explain how we need to proceed. 22 00:00:53,500 --> 00:00:57,300 So the first thing we have to do is obviously to create a variable 23 00:00:57,300 --> 00:01:01,266 that will be nothing else than the artificial neural network itself. 24 00:01:01,266 --> 00:01:02,233 And guess what? 25 00:01:02,233 --> 00:01:06,900 This artificial neural network variable will be created as an object 26 00:01:07,100 --> 00:01:10,900 of a certain class, and that certain class is the sequential class, 27 00:01:11,100 --> 00:01:15,133 which allows us exactly to build an artificial neural network, 28 00:01:15,133 --> 00:01:19,566 but as a sequence of layers, as opposed to a computational graph. 29 00:01:19,566 --> 00:01:20,000 Right. 30 00:01:20,000 --> 00:01:23,733 You saw in the intuition lectures that an artificial neural network 31 00:01:23,733 --> 00:01:27,000 is actually a sequence of layers, you know, starting from the input layer. 32 00:01:27,000 --> 00:01:28,033 And then successively 33 00:01:28,033 --> 00:01:32,066 we have the fully connected layers up to the final output layer. 34 00:01:32,333 --> 00:01:34,766 That's what I mean by a sequence of layers. 35 00:01:34,766 --> 00:01:38,600 And then the other type of neural network is indeed a computational graph, 36 00:01:38,600 --> 00:01:42,066 which are, you know, neurons connected anyway, not in successive layers. 37 00:01:42,166 --> 00:01:44,733 And an example of this is Boltzmann machines. 38 00:01:44,733 --> 00:01:45,200 Right? 39 00:01:45,200 --> 00:01:46,633 Restricted Boltzmann machines 40 00:01:46,633 --> 00:01:50,833 or deep Boltzmann machines or great examples of computational graph. 41 00:01:51,066 --> 00:01:53,233 Of course, they're not covered in this course 42 00:01:53,233 --> 00:01:55,500 because this is really advanced deep learning, 43 00:01:55,500 --> 00:01:58,300 but they are covered in our deep learning. 44 00:01:58,300 --> 00:02:00,500 It is a course if you are really interested 45 00:02:00,500 --> 00:02:03,500 in deep learning and want to go deeper into this branch, 46 00:02:03,500 --> 00:02:06,200 well, I'll be happy to welcome you into the deep learning. 47 00:02:06,200 --> 00:02:06,900 It is it course. 48 00:02:06,900 --> 00:02:09,033 But now let's just get the right 49 00:02:09,033 --> 00:02:12,666 introduction to deep learning with only fully connected neural networks. 50 00:02:12,800 --> 00:02:17,266 And so that's why we will create our new variable, which we're going to call a. 51 00:02:17,266 --> 00:02:18,933 And then and which will be nothing else. 52 00:02:18,933 --> 00:02:21,933 Then the artificial neural network we're going to build. 53 00:02:22,100 --> 00:02:24,900 And we will create that variable as an object 54 00:02:24,900 --> 00:02:28,600 of the sequential class okay. 55 00:02:28,733 --> 00:02:32,500 But of course the sequential class is not taken from nowhere. 56 00:02:32,500 --> 00:02:37,733 It is actually taken from the models module from the Keras library, 57 00:02:37,733 --> 00:02:41,366 which since TensorFlow 2.0 belongs to TensorFlow, 58 00:02:41,400 --> 00:02:44,400 right before we had TensorFlow and Keras separated. 59 00:02:44,666 --> 00:02:48,366 But now since the new TensorFlow, the new version of TensorFlow 2.0, 60 00:02:48,400 --> 00:02:51,400 well, Keras was integrated into TensorFlow 61 00:02:51,600 --> 00:02:54,300 and therefore the way to call the sequential class 62 00:02:54,300 --> 00:02:57,933 here is to go first TensorFlow, which has a shortcut tf, 63 00:02:58,133 --> 00:03:01,500 from which we're going to call the Keras library, 64 00:03:01,900 --> 00:03:05,933 and from which we're going to call the models module. 65 00:03:06,300 --> 00:03:09,333 Perfect and from which we call indeed that sequential class. 66 00:03:09,800 --> 00:03:14,666 And so that all this, you know, creates this a and n variable 67 00:03:14,666 --> 00:03:17,933 which represents our artificial neural network created 68 00:03:17,933 --> 00:03:21,500 as an instance of that sequential class, which initializes 69 00:03:21,500 --> 00:03:24,633 our artificial neural network as a sequence of layers. 70 00:03:24,633 --> 00:03:27,200 And that's our first step. Congratulations. 71 00:03:27,200 --> 00:03:31,500 Now you really made your step into how to build an artificial neural network. 72 00:03:31,500 --> 00:03:33,066 So let's move on to the next step 73 00:03:33,066 --> 00:03:36,933 which is to add the input layer and the first hidden layer. 74 00:03:37,200 --> 00:03:41,566 And that's where we're going to start using the famous dense class 75 00:03:41,566 --> 00:03:45,533 in TensorFlow and even in PyTorch, which is another great library 76 00:03:45,533 --> 00:03:47,100 to build neural networks. 77 00:03:47,100 --> 00:03:52,166 The way to add a fully connected layer into an artificial neural network 78 00:03:52,166 --> 00:03:55,133 at whatever phase you are, you know, whatever the state 79 00:03:55,133 --> 00:03:59,066 of your artificial neural network is, well, is to use the dense class. 80 00:03:59,066 --> 00:04:03,933 And the way we use it is very simply by taking our artificial neural network 81 00:04:03,966 --> 00:04:04,633 object. 82 00:04:04,633 --> 00:04:06,933 You know, that instance of the sequential class 83 00:04:06,933 --> 00:04:10,033 from which we're going to call one of the methods 84 00:04:10,033 --> 00:04:12,933 of the sequential class and that method is Add. 85 00:04:12,933 --> 00:04:17,900 You know, we certainly hope that there is an add method inside a sequential class. 86 00:04:17,900 --> 00:04:18,266 Right? 87 00:04:18,266 --> 00:04:22,800 So that's the method we need right now to add anything we want, whether it is a 88 00:04:22,800 --> 00:04:27,366 hidden layer or a dropout layer, you know, which allows us to prevent overfitting. 89 00:04:27,600 --> 00:04:30,366 Or, you know, we will see with convolutional neural networks that we can 90 00:04:30,366 --> 00:04:34,133 also add, well, a conv 2D layer, which is a convolutional layer. 91 00:04:34,300 --> 00:04:35,966 Well, we can add anything 92 00:04:35,966 --> 00:04:39,966 but right now what we want to add is a simple fully connected layer. 93 00:04:40,233 --> 00:04:45,133 And the way to add this is to enter in these parentheses because at is method. 94 00:04:45,433 --> 00:04:49,200 Well it is to add exactly that fully connected layer 95 00:04:49,366 --> 00:04:52,000 which will be a new object. 96 00:04:52,000 --> 00:04:55,200 You know, it will be a new instance of a new class. 97 00:04:55,200 --> 00:04:58,200 And that new class is of course the dense class. 98 00:04:58,200 --> 00:04:59,566 So the fully connected layer 99 00:04:59,566 --> 00:05:03,300 we're about to build will be created as an object of the dense class. 100 00:05:03,533 --> 00:05:07,900 And therefore now the only thing we have to do is to call that dense class. 101 00:05:07,900 --> 00:05:11,166 And this will create that fully connected layer object. 102 00:05:11,166 --> 00:05:14,566 And at the same time it will automatically add the input layer. 103 00:05:15,133 --> 00:05:16,700 All right. So let's call that dense class. 104 00:05:16,700 --> 00:05:19,866 And once again the dense class is not taken from nowhere. 105 00:05:19,866 --> 00:05:22,800 It belongs to a certain path of libraries. 106 00:05:22,800 --> 00:05:26,700 And of course the root of that library is our TensorFlow library 107 00:05:26,966 --> 00:05:31,500 then from which we're going to call once again the Keras library. 108 00:05:31,866 --> 00:05:33,000 There we go. 109 00:05:33,000 --> 00:05:36,666 From which this time we're not going to call the models module, 110 00:05:36,833 --> 00:05:40,800 but actually this one, you know it's top of the list layers. 111 00:05:40,800 --> 00:05:42,900 That's exactly what we need to add here. 112 00:05:42,900 --> 00:05:45,300 This is the module that contains the different tools. 113 00:05:45,300 --> 00:05:47,800 And by tools I mean classes to add. 114 00:05:47,800 --> 00:05:51,000 Well any layer you want in your artificial neural network. 115 00:05:51,233 --> 00:05:52,800 So layers here. 116 00:05:52,800 --> 00:05:54,300 And speaking of these classes. 117 00:05:54,300 --> 00:05:59,300 Well that's from this layer is module that we're going to call our dense class 118 00:05:59,666 --> 00:06:02,600 which as any class can take several arguments. 119 00:06:02,600 --> 00:06:05,300 And here we have to indeed enter these arguments. 120 00:06:05,300 --> 00:06:09,533 The most important one is this one unit, which corresponds exactly 121 00:06:09,533 --> 00:06:12,700 to the number of neurons, you know, to the number of hidden neurons 122 00:06:12,866 --> 00:06:15,600 you want to have in this first hidden layer. 123 00:06:15,600 --> 00:06:16,800 You know, not in the input layer. 124 00:06:16,800 --> 00:06:19,666 We will automatically have our different features. 125 00:06:19,666 --> 00:06:21,066 You know, in the input layer, 126 00:06:21,066 --> 00:06:25,466 the input neurons will simply be all these features starting from credit scores. 127 00:06:25,466 --> 00:06:29,366 That will be one neuron, then another input neuron, then another one. 128 00:06:29,366 --> 00:06:34,066 You know, up to this one, all these will be the input neurons in the input layer. 129 00:06:34,266 --> 00:06:35,233 But then when 130 00:06:35,233 --> 00:06:38,766 we create that first hidden layer, we will have some hidden neurons inside. 131 00:06:39,000 --> 00:06:40,533 And in this dense function. 132 00:06:40,533 --> 00:06:45,566 Now well we can specify of course how many hidden neurons we want to have. 133 00:06:46,066 --> 00:06:51,200 And now now comes the most frequently asked question in deep learning, 134 00:06:51,366 --> 00:06:55,366 there's a very famous question how do we know how many neurons we want? 135 00:06:55,533 --> 00:06:59,333 Is there a rule of thumb or should we just experiment? 136 00:06:59,666 --> 00:07:02,733 Well, unfortunately, there is no rule of thumb. 137 00:07:03,200 --> 00:07:06,100 It is just based on experimentation. 138 00:07:06,100 --> 00:07:08,433 Or, you know, we call it the work of a novice. 139 00:07:08,433 --> 00:07:11,933 You have to experiment with different hyperparameters. 140 00:07:11,933 --> 00:07:13,900 You know, we call them hyperparameters in the sense 141 00:07:13,900 --> 00:07:17,866 that these are parameters that won't be trained during the training process. 142 00:07:18,133 --> 00:07:20,400 So unfortunately, there is no rule of thumb. 143 00:07:20,400 --> 00:07:24,333 And therefore we just have to pick one number here which wouldn't sound 144 00:07:24,333 --> 00:07:27,966 irrelevant or extravagant, and that number will be six. 145 00:07:28,166 --> 00:07:30,566 I actually tried several numbers and I got more 146 00:07:30,566 --> 00:07:33,433 or less the same accuracy in the end, so it's all good. 147 00:07:33,433 --> 00:07:36,666 You can try different ones if you want, but six is totally fine. 148 00:07:36,666 --> 00:07:41,033 So here inside this dense class, we will enter for our first parameter 149 00:07:41,033 --> 00:07:42,033 which is unit. 150 00:07:42,033 --> 00:07:45,033 Well units equals six. 151 00:07:45,300 --> 00:07:46,433 Perfect. All right. 152 00:07:46,433 --> 00:07:48,033 And now the next parameter. 153 00:07:48,033 --> 00:07:51,233 That is important to note among this huge list of parameters 154 00:07:51,233 --> 00:07:52,133 you can see many of them. 155 00:07:52,133 --> 00:07:55,333 But no worries we will keep the default value for all of them 156 00:07:55,333 --> 00:07:59,733 except this one, which corresponds of course to the activation function. 157 00:07:59,933 --> 00:08:02,600 And you saw in the intuition lectures with Kirill 158 00:08:02,600 --> 00:08:06,966 that the activation function in the hidden layer is of a fully connected 159 00:08:06,966 --> 00:08:10,766 neural network, must be the rectifier activation function. 160 00:08:10,966 --> 00:08:13,700 And therefore that's exactly what we must specify here. 161 00:08:13,700 --> 00:08:16,700 We of course don't want no activation function. 162 00:08:16,933 --> 00:08:20,666 So here we have to specify that we want to rectify the activation function. 163 00:08:20,666 --> 00:08:23,666 And the way to specify this is to enter here 164 00:08:23,733 --> 00:08:26,666 in the activation parameter. 165 00:08:26,666 --> 00:08:31,866 Well the code name for the rectified activation function which is in quotes. 166 00:08:32,200 --> 00:08:33,733 Well ReLU. 167 00:08:33,733 --> 00:08:36,733 That's the code name for the rectifier activation function. 168 00:08:36,933 --> 00:08:40,100 And that is all you have to enter here in order 169 00:08:40,100 --> 00:08:44,133 to make a fully working first fully connected hidden layer. 170 00:08:44,266 --> 00:08:46,766 Congratulations. Now you know how to build. 171 00:08:46,766 --> 00:08:50,100 Actually, you know a shallow neural network and you will know 172 00:08:50,100 --> 00:08:53,400 in a second how to build a deep neural network, 173 00:08:53,600 --> 00:08:59,100 because the way to actually add a second hidden layer here couldn't be more simple. 174 00:08:59,333 --> 00:09:03,266 The only thing that you have to do is just copy this line of code, 175 00:09:03,500 --> 00:09:06,300 and then in a new line of code here 176 00:09:06,300 --> 00:09:09,400 for the second hidden layer, you just need to paste it. 177 00:09:09,833 --> 00:09:13,933 That's what I mean by this add method can add any new layer 178 00:09:14,133 --> 00:09:19,166 at whatever stage of the construction process of your A, and then you're into. 179 00:09:19,366 --> 00:09:22,133 Right? You can use this add method to add anything. 180 00:09:22,133 --> 00:09:23,500 And the weight add a second 181 00:09:23,500 --> 00:09:27,100 hidden layer is just the same as adding the first hidden layer. 182 00:09:27,333 --> 00:09:28,100 Unless of course 183 00:09:28,100 --> 00:09:29,033 you know you want to change 184 00:09:29,033 --> 00:09:32,733 the number of hidden neurons, but you know six hidden neurons in the first hidden 185 00:09:32,733 --> 00:09:36,400 layer and six other ones, and the second hidden layer is just fine. 186 00:09:36,400 --> 00:09:40,100 But once again, feel free to change the hyperparameter values here. 187 00:09:40,300 --> 00:09:42,700 Maybe you will get a better accuracy in the end. 188 00:09:42,700 --> 00:09:43,566 And if that's the case, 189 00:09:43,566 --> 00:09:46,800 well, please share it in the comments or by private message. 190 00:09:47,433 --> 00:09:48,233 Okay. 191 00:09:48,233 --> 00:09:52,100 However, now to add the output layer you have to do something special. 192 00:09:52,100 --> 00:09:54,666 You know, something different than what we did here. 193 00:09:54,666 --> 00:09:55,933 So let's do this together. 194 00:09:55,933 --> 00:09:58,300 Let's create a new code cell. 195 00:09:58,300 --> 00:10:03,733 And well let's actually paste what we just copied before once again here. 196 00:10:04,033 --> 00:10:07,100 But this time we'll have to change two things which correspond 197 00:10:07,100 --> 00:10:10,133 actually to the values of these two parameters. 198 00:10:10,500 --> 00:10:13,200 But first let me explain why all the rest is the same. 199 00:10:13,200 --> 00:10:16,933 Well, that's of course, because, you know, we are adding a new layer and this add 200 00:10:16,933 --> 00:10:20,800 missing can add any layer you want, including of course, the output layer. 201 00:10:20,800 --> 00:10:25,133 So here we're still using the add method to add this final output layer. 202 00:10:25,333 --> 00:10:27,766 And then of course we still want our output layer 203 00:10:27,766 --> 00:10:30,666 to be fully connected to that second hidden layer. 204 00:10:30,666 --> 00:10:33,666 And therefore we're using again here the dense class. 205 00:10:33,800 --> 00:10:34,933 So all good here. 206 00:10:34,933 --> 00:10:38,433 But then these two parameters have to be changed. 207 00:10:38,633 --> 00:10:40,800 And if you follow the intuition lecture 208 00:10:40,800 --> 00:10:43,800 you should know what must be these two changes. 209 00:10:43,800 --> 00:10:45,366 All right. So let's start with this one. 210 00:10:45,366 --> 00:10:48,600 According to you what do we need to replace here. 211 00:10:48,600 --> 00:10:49,933 Well that's of course this value. 212 00:10:49,933 --> 00:10:53,700 But according to you six has to be replaced by which value? 213 00:10:54,100 --> 00:10:58,266 Well, to get the answer we need to have a look at our dependent variable again, 214 00:10:58,266 --> 00:10:59,566 which is this one. 215 00:10:59,566 --> 00:11:04,200 Because remember the output layer contains the dimensions of the output. 216 00:11:04,233 --> 00:11:06,133 You know, the output you want to predict. 217 00:11:06,133 --> 00:11:09,766 And here since we actually want to predict a binary variable 218 00:11:09,900 --> 00:11:15,333 which can take the value 1 or 0, well the dimension is actually one, right. 219 00:11:15,333 --> 00:11:19,133 Because we only need one neuron to get that final prediction 001. 220 00:11:19,366 --> 00:11:24,600 However, if we were doing classification with a non binary dependent variable 221 00:11:24,600 --> 00:11:28,466 like a dependent variable that has three classes, let's say ABC, 222 00:11:28,666 --> 00:11:31,900 well we would actually need three dimensions, you know three output 223 00:11:31,900 --> 00:11:35,400 neurons to once again one hot encode that dependent variable. 224 00:11:35,400 --> 00:11:36,166 Because of course 225 00:11:36,166 --> 00:11:40,500 once again there is no relationship order between the classes A, B and C. 226 00:11:40,500 --> 00:11:44,033 So A for example would have to be encoded by 100. 227 00:11:44,266 --> 00:11:47,200 Then B would have to be encoded by 010. 228 00:11:47,200 --> 00:11:50,333 And c would have to be encoded by 001. 229 00:11:50,333 --> 00:11:53,633 And therefore you need three neurons getting these values zero and one 230 00:11:53,633 --> 00:11:56,633 to encode your three classes A, B and C. 231 00:11:56,966 --> 00:11:59,866 But here, since we actually have a binary variable, 232 00:11:59,866 --> 00:12:04,033 a binary outcome, well, you only need one neuron to encode 233 00:12:04,033 --> 00:12:09,600 these outcomes into 1 or 0, and therefore the value of that unit 234 00:12:09,600 --> 00:12:14,400 parameter here that we have to replace right now is actually one 235 00:12:14,766 --> 00:12:19,133 okay, one output neuron encoding the dependent variable. 236 00:12:19,400 --> 00:12:23,333 And then second change corresponds of course to that activation function. 237 00:12:23,333 --> 00:12:26,333 And more specifically to the value of the activation function. 238 00:12:26,533 --> 00:12:29,533 Well once again remember in the intuition lecture is that 239 00:12:29,633 --> 00:12:32,300 for the activation function of the output layer, 240 00:12:32,300 --> 00:12:35,633 well you don't want to have a rectifier activation function, 241 00:12:35,866 --> 00:12:38,933 but a sigmoid activation function. 242 00:12:39,300 --> 00:12:40,233 Why is that? 243 00:12:40,233 --> 00:12:44,700 It's because having a sigmoid activation function allows to get 244 00:12:44,700 --> 00:12:48,900 not only ultimately the predictions, but even better, 245 00:12:48,900 --> 00:12:53,300 it will give you the probabilities that the binary outcome is one, 246 00:12:53,600 --> 00:12:57,600 so that we will not only get the predictions of whether the customers 247 00:12:57,600 --> 00:13:02,233 choose to leave or not the bank, but we will also have for each customer 248 00:13:02,400 --> 00:13:05,233 the probability that the customer leaves the bank. 249 00:13:05,233 --> 00:13:09,900 And all this thanks to that sigmoid activation function. 250 00:13:10,233 --> 00:13:13,300 So you definitely want that sigmoid activation function 251 00:13:13,500 --> 00:13:17,066 for the output layer only you know all the other layers. 252 00:13:17,066 --> 00:13:18,333 You know the other fully connected 253 00:13:18,333 --> 00:13:22,400 layers will get indeed that rectifier activation function. 254 00:13:22,833 --> 00:13:26,400 And now now I really must say congratulations 255 00:13:26,700 --> 00:13:28,366 because we were actually done 256 00:13:28,366 --> 00:13:32,466 with the creation of this very first artificial neural network. 257 00:13:32,666 --> 00:13:34,500 So you can be proud of yourself. 258 00:13:34,500 --> 00:13:37,066 You just built an artificial brain. 259 00:13:37,066 --> 00:13:39,733 Was that hard? Was that overwhelming? 260 00:13:39,733 --> 00:13:40,766 I don't think so. 261 00:13:40,766 --> 00:13:43,966 And that is the beauty of TensorFlow 2.0. 262 00:13:44,133 --> 00:13:45,900 So I hope you enjoyed this. 263 00:13:45,900 --> 00:13:48,800 I hope you enjoyed building your very first artificial brain. 264 00:13:48,800 --> 00:13:50,100 But that's not over now. 265 00:13:50,100 --> 00:13:53,666 We only have a brain so far, you know, but which is totally stupid 266 00:13:53,666 --> 00:13:57,466 actually because it was not trained yet on the data set. 267 00:13:57,733 --> 00:14:01,600 So we're going to make it smart and we're going to make it smart in part 268 00:14:01,600 --> 00:14:06,566 three training the CNN, in which we will first compile the A 269 00:14:06,566 --> 00:14:09,900 and then with you know, an optimizer and then a loss function, 270 00:14:10,200 --> 00:14:14,133 and then we will train finally our artificial neural network 271 00:14:14,133 --> 00:14:16,966 on the whole training set in a certain number of epochs. 272 00:14:16,966 --> 00:14:21,033 And you will see that the training process will be very exciting to visualize. 273 00:14:21,366 --> 00:14:23,133 I can't wait to show this to you. 274 00:14:23,133 --> 00:14:25,900 Let's tackle part three together in the next tutorial. 275 00:14:25,900 --> 00:14:27,866 And until then, enjoy machine learning.