1 00:00:00,233 --> 00:00:00,966 Hello my friends. 2 00:00:00,966 --> 00:00:01,733 Welcome back. 3 00:00:01,733 --> 00:00:05,700 I'm sure you feel amazing after having built your very first artificial 4 00:00:05,700 --> 00:00:06,700 neural network. 5 00:00:06,700 --> 00:00:09,066 But remember, that's only half part of the job. 6 00:00:09,066 --> 00:00:13,700 The second half will be, of course, to train it on the whole training set. 7 00:00:13,700 --> 00:00:16,066 And we are going to do this in two steps. 8 00:00:16,066 --> 00:00:20,233 The first one is to compile the A and then with an optimizer, 9 00:00:20,233 --> 00:00:23,933 a loss function and a metric, which will be of course the accuracy 10 00:00:24,166 --> 00:00:26,400 because we're doing some classification. 11 00:00:26,400 --> 00:00:29,600 And then the second step will be of course to train the A 12 00:00:29,600 --> 00:00:32,866 and on the training set over a certain number of epochs. 13 00:00:33,266 --> 00:00:34,266 Are you ready? 14 00:00:34,266 --> 00:00:35,100 Let's do this. 15 00:00:35,100 --> 00:00:38,466 Starting with this first step compiling the an. 16 00:00:39,100 --> 00:00:39,500 All right. 17 00:00:39,500 --> 00:00:43,000 So once again doing this will be super simple 18 00:00:43,000 --> 00:00:46,466 thanks to the TensorFlow library that integrated Keras. 19 00:00:46,700 --> 00:00:50,866 Because indeed, to compile our A and then we first need to start from our 20 00:00:50,866 --> 00:00:56,166 A an object which I remind was created as an instance of the sequential class. 21 00:00:56,600 --> 00:01:00,233 And then from this object we're going to call a new method, 22 00:01:00,233 --> 00:01:02,733 which this time of course won't be the add method. 23 00:01:02,733 --> 00:01:05,666 But can you actually guess what this method is going to be? 24 00:01:05,666 --> 00:01:09,400 You know, there is no trap in TensorFlow nor any confusion. 25 00:01:09,666 --> 00:01:13,600 Well, the method to compile an artificial neural network 26 00:01:13,800 --> 00:01:16,866 is simply the compile method. 27 00:01:17,066 --> 00:01:18,866 As simple as that, right? 28 00:01:18,866 --> 00:01:22,466 We didn't even have to look at the TensorFlow documentation, 29 00:01:22,466 --> 00:01:26,066 which by the way, I still recommend to have a look at because you will get 30 00:01:26,066 --> 00:01:30,000 a lot of information on the diverse tools you have in the TensorFlow library. 31 00:01:30,300 --> 00:01:32,866 But here it's super intuitive, it's super easy. 32 00:01:32,866 --> 00:01:35,100 So now I have a next question for you. 33 00:01:35,100 --> 00:01:35,833 According to you, 34 00:01:35,833 --> 00:01:39,600 what do we have to enter as parameters inside this compile method? 35 00:01:39,833 --> 00:01:41,333 Well I actually said it. 36 00:01:41,333 --> 00:01:43,200 We have to enter three parameters. 37 00:01:43,200 --> 00:01:46,200 The first one is the optimizer to choose and not to miser. 38 00:01:46,433 --> 00:01:49,466 Then the second one is the loss to choose the loss function. 39 00:01:49,700 --> 00:01:53,633 And the third one is the metrics with an s parameter. 40 00:01:53,733 --> 00:01:54,166 Because. 41 00:01:54,166 --> 00:01:58,033 Note that you can actually choose several metrics to evaluate your 42 00:01:58,100 --> 00:02:00,833 and at the same time, but we will only choose one 43 00:02:00,833 --> 00:02:02,533 and we will choose the accuracy. 44 00:02:02,533 --> 00:02:03,233 But there you go. 45 00:02:03,233 --> 00:02:04,833 These are the three parameters. 46 00:02:04,833 --> 00:02:07,833 So I suggest that we start by entering them. 47 00:02:07,900 --> 00:02:10,533 And then we will enter their values okay. 48 00:02:10,533 --> 00:02:15,433 So let's start with the first one optimizer equals all right. 49 00:02:15,433 --> 00:02:20,166 And comma the next one is the loss for the loss function. 50 00:02:20,500 --> 00:02:25,500 And finally the third one is the metrics parameter. 51 00:02:25,966 --> 00:02:26,533 All right. 52 00:02:26,533 --> 00:02:29,600 So for the optimizer which one would you like to get. 53 00:02:29,933 --> 00:02:33,866 Well, in the intuition lectures Kirill mentioned that the best one 54 00:02:33,866 --> 00:02:37,366 are the optimizers that can perform stochastic gradient descent. 55 00:02:37,366 --> 00:02:40,433 And the best of them, you know, the one that I recommend by default 56 00:02:40,566 --> 00:02:44,400 is the Adam Optimizer, which is very performance 57 00:02:44,400 --> 00:02:47,466 optimizer that can perform stochastic gradient descent. 58 00:02:47,466 --> 00:02:51,366 And by that, let me just remind what stochastic gradient descent allows to do. 59 00:02:51,600 --> 00:02:56,966 Well, you know, it is what will update the weights in order to reduce the loss 60 00:02:57,000 --> 00:03:00,300 error between your predictions and the real results. 61 00:03:00,300 --> 00:03:04,800 You know, when we trained in and on the training set, we will at each iteration 62 00:03:04,800 --> 00:03:09,900 compare the predictions in a batch to the real results in the same batch. 63 00:03:10,100 --> 00:03:13,966 And that optimizer here will update the weights through 64 00:03:13,966 --> 00:03:16,300 stochastic gradient descent, because we're going to choose Adam 65 00:03:16,300 --> 00:03:20,533 optimizer two at the next iteration hopefully reduce the loss. 66 00:03:21,033 --> 00:03:21,366 All right. 67 00:03:21,366 --> 00:03:25,366 So that's why right here we have to choose an optimizer but also a loss function 68 00:03:25,466 --> 00:03:27,500 which is the way to compute the difference 69 00:03:27,500 --> 00:03:29,233 between the predictions and the real results. 70 00:03:29,233 --> 00:03:33,433 And then the accuracy of course because that's our final evaluation metric. 71 00:03:33,900 --> 00:03:34,200 All right. 72 00:03:34,200 --> 00:03:37,000 So as we said we're going to choose the Adam optimizer. 73 00:03:37,000 --> 00:03:40,566 And the code name for that is simply but with no capital letter. 74 00:03:40,866 --> 00:03:43,700 Adam okay. Congratulations. 75 00:03:43,700 --> 00:03:47,666 Now you know how to compile an artificial neural network with an optimizer. 76 00:03:48,033 --> 00:03:51,033 But then we also have to compile it with the loss function. 77 00:03:51,133 --> 00:03:53,633 And now you have to know something very important. 78 00:03:53,633 --> 00:03:56,966 When you are doing binary classification, you know, classification 79 00:03:57,100 --> 00:04:02,000 when you have to predict a binary outcome, well, the loss function must always be 80 00:04:02,366 --> 00:04:06,566 the following one entered in quotes, of course, which is binary 81 00:04:07,100 --> 00:04:10,466 underscore cross entropy. 82 00:04:11,200 --> 00:04:12,400 Just like that. 83 00:04:12,400 --> 00:04:16,000 And now let me tell you what you would have to enter if you were doing 84 00:04:16,000 --> 00:04:17,500 non binary classification. 85 00:04:17,500 --> 00:04:20,300 You know, like for example predicting three different categories. 86 00:04:20,300 --> 00:04:26,000 Well here you would have to enter categorical cross entropy loss okay. 87 00:04:26,000 --> 00:04:29,600 For binary classification the loss must be binary cross entropy. 88 00:04:29,600 --> 00:04:34,133 And for non binary classification the loss must be categorical cross entropy. 89 00:04:34,333 --> 00:04:36,100 And then also you know when doing 90 00:04:36,100 --> 00:04:39,833 non binary classification when predicting more than two categories. 91 00:04:40,000 --> 00:04:45,000 Well the activation should not be sigmoid but softmax right. 92 00:04:45,000 --> 00:04:49,133 I take this opportunity to also give you the other cases of classification 93 00:04:49,133 --> 00:04:51,866 which you could encounter. Okay. So now you know everything. 94 00:04:51,866 --> 00:04:55,700 And then remember that for regression because we can also do 95 00:04:55,700 --> 00:04:57,866 artificial neural networks for regression. 96 00:04:57,866 --> 00:05:00,533 Well we have this free course which I gave you the link. 97 00:05:00,533 --> 00:05:04,733 You can just take this course for free and you will get the full implementation 98 00:05:04,733 --> 00:05:08,366 of an artificial neural network for a regression case study. 99 00:05:08,366 --> 00:05:13,133 So you have really everything that you can do with an artificial neural network. 100 00:05:13,633 --> 00:05:14,900 All right. Great. 101 00:05:14,900 --> 00:05:18,233 And now let's enter the final parameter here metrics. 102 00:05:18,666 --> 00:05:22,066 As I said we can actually choose several metrics at the same time. 103 00:05:22,300 --> 00:05:25,200 Therefore in order to enter the values of this parameter 104 00:05:25,200 --> 00:05:28,200 well we have to enter them in a pair of square brackets, 105 00:05:28,200 --> 00:05:31,266 which is supposed to be, you know, the list of the different metrics 106 00:05:31,433 --> 00:05:35,100 with which you want to evaluate your in and during the training, 107 00:05:35,366 --> 00:05:38,566 but we will only choose the main one, you know, the most essential one, 108 00:05:38,766 --> 00:05:42,933 which is the accuracy and which you have to enter in quotes. 109 00:05:42,933 --> 00:05:43,333 All right. 110 00:05:43,333 --> 00:05:46,233 So accuracy, just like the classic spelling. 111 00:05:46,233 --> 00:05:48,633 And now now congratulations. 112 00:05:48,633 --> 00:05:51,633 You know how to do a full compile of your. 113 00:05:51,633 --> 00:05:55,600 And then with an optimizer a loss and some metrics. 114 00:05:56,100 --> 00:05:56,733 Perfect. 115 00:05:56,733 --> 00:06:01,166 So now let's move on to the ultimate step meaning the step 116 00:06:01,166 --> 00:06:05,733 where we will train the A and onto the whole training set. 117 00:06:05,933 --> 00:06:08,133 So let's create a new code cell. 118 00:06:08,133 --> 00:06:12,000 And now according to you how do we need to start this training. 119 00:06:12,333 --> 00:06:14,933 Well once again you know it's always the same thing. 120 00:06:14,933 --> 00:06:18,633 We need to take our A and object, then call a new method 121 00:06:18,633 --> 00:06:22,100 which will perform the training and then enter a couple of parameters. 122 00:06:22,333 --> 00:06:23,400 So let's do this. 123 00:06:23,400 --> 00:06:26,400 Let's start with a and and first our object. 124 00:06:26,400 --> 00:06:29,600 And then according to you what will be the method 125 00:06:29,600 --> 00:06:32,600 that can train your artificial neural network on the training set. 126 00:06:32,700 --> 00:06:34,633 Well nothing has changed here. 127 00:06:34,633 --> 00:06:39,600 And actually I think I said it earlier in the course, the method to train 128 00:06:39,600 --> 00:06:42,700 whatever machine learning model is always the same one. 129 00:06:42,700 --> 00:06:46,300 It is the fit method, the fit method, 130 00:06:46,500 --> 00:06:49,400 and which will take always the same parameters. 131 00:06:49,400 --> 00:06:52,200 The first one is x train 132 00:06:52,200 --> 00:06:55,200 for you know, the matrix of features of the training set. 133 00:06:55,400 --> 00:07:00,300 Then y train for the dependent variable vector of the training set. 134 00:07:00,466 --> 00:07:04,566 And then when training an artificial neural network, we actually need to enter 135 00:07:04,566 --> 00:07:09,066 two more parameters which are first, the batch size. 136 00:07:09,300 --> 00:07:14,066 Because indeed batch learning is always more efficient and more perform. 137 00:07:14,066 --> 00:07:17,133 And when training an artificial neural network, meaning that 138 00:07:17,133 --> 00:07:21,000 instead of comparing your prediction to the real result one by one, 139 00:07:21,000 --> 00:07:24,066 you know to compute and reduce the loss, well, you're going to do that 140 00:07:24,066 --> 00:07:28,966 with several predictions compared to several real results into a batch. 141 00:07:29,133 --> 00:07:33,333 And the batch size here, you know, the batch size parameter gives exactly 142 00:07:33,333 --> 00:07:35,800 the number of predictions you want to have in the batch 143 00:07:35,800 --> 00:07:38,800 to be compared to that same number of real results. 144 00:07:39,000 --> 00:07:44,666 And the classic value of the batch size that is usually chosen is 32, right? 145 00:07:44,666 --> 00:07:48,766 If you don't want to spend too much time tuning this hyperparameter, 146 00:07:48,966 --> 00:07:52,066 well, I recommend to choose the default value 32. 147 00:07:52,066 --> 00:07:55,866 But anyway, I wanted to highlight that hyperparameter here because indeed, 148 00:07:55,866 --> 00:07:59,766 it is very important to remember that we are doing batch learning. 149 00:07:59,766 --> 00:08:02,366 Okay, so batch size equal 32. 150 00:08:02,366 --> 00:08:07,100 And finally I'm sure you know which final parameter we have to add to here. 151 00:08:07,266 --> 00:08:10,333 That's of course the number of epochs. 152 00:08:10,333 --> 00:08:13,233 You know, a neural network has to be trained over 153 00:08:13,233 --> 00:08:17,700 a certain amount of epochs so as to improve the accuracy over time. 154 00:08:17,700 --> 00:08:20,700 And we will clearly see that once we execute this cell. 155 00:08:21,066 --> 00:08:25,833 So the name of the parameter for the number of epochs is simply epochs. 156 00:08:26,200 --> 00:08:28,933 And well, you will see that it will go very fast. 157 00:08:28,933 --> 00:08:30,900 So we can just take 100 epochs. 158 00:08:30,900 --> 00:08:33,400 But once again feel free to choose another number 159 00:08:33,400 --> 00:08:36,433 as long as it is not too small, because you know your neural network 160 00:08:36,433 --> 00:08:39,600 needs a certain amount of epochs in order to learn properly. 161 00:08:39,600 --> 00:08:43,200 You know, learn the correlations to get the ultimate best predictions. 162 00:08:43,900 --> 00:08:45,266 All right. Great. 163 00:08:45,266 --> 00:08:47,500 So we're actually done with part three now. 164 00:08:47,500 --> 00:08:49,500 So I suggest we no longer wait. 165 00:08:49,500 --> 00:08:51,633 And that we execute all the cells. 166 00:08:51,633 --> 00:08:56,300 We haven't executed so far which I think you know start from part two right. Yes. 167 00:08:56,300 --> 00:08:59,200 This was the last cell of the data preprocessing phase. 168 00:08:59,200 --> 00:09:00,633 It was run properly. 169 00:09:00,633 --> 00:09:03,366 So let's actually run each cell one by one 170 00:09:03,366 --> 00:09:06,766 and see what we're going to get in the end during the training. 171 00:09:06,766 --> 00:09:07,933 So let's start with this one. 172 00:09:07,933 --> 00:09:10,633 Initializing the and good. 173 00:09:10,633 --> 00:09:14,100 Now this one adding the input layer and the first hidden layer. 174 00:09:14,766 --> 00:09:15,600 Good. 175 00:09:15,600 --> 00:09:18,233 Now this one adding the second hidden layer. 176 00:09:18,233 --> 00:09:19,166 All good. 177 00:09:19,166 --> 00:09:21,666 And now this one adding the output layer. 178 00:09:21,666 --> 00:09:22,866 All good still. 179 00:09:22,866 --> 00:09:25,000 Now we end to part three. 180 00:09:25,000 --> 00:09:29,333 Executing first this cell compiling the an all good. 181 00:09:29,566 --> 00:09:31,733 And now are you ready my friends. 182 00:09:31,733 --> 00:09:35,100 We're about to train the artificial neural network 183 00:09:35,100 --> 00:09:39,366 on the training set over 100 epochs. 184 00:09:39,366 --> 00:09:40,333 And here we go. 185 00:09:40,333 --> 00:09:41,766 The training is starting. 186 00:09:41,766 --> 00:09:44,733 And as I told you, it's going pretty fast. But look at this. 187 00:09:44,733 --> 00:09:48,466 Look at the accuracy and how it is evolving over the epochs. 188 00:09:48,466 --> 00:09:52,000 And we can see that it is actually increasing pretty fast. 189 00:09:52,000 --> 00:09:52,800 And mostly 190 00:09:52,800 --> 00:09:56,800 we see that it is actually converging, you know, converging pretty quickly. 191 00:09:56,800 --> 00:10:02,966 You know, we converged at oh point 86, you know, at about the 20 epochs. 192 00:10:02,966 --> 00:10:06,900 We actually didn't need that 100 epochs, but 20 was fine. 193 00:10:06,900 --> 00:10:08,566 But anyway, you know, it's going really fast 194 00:10:08,566 --> 00:10:10,433 and I'm sure it's very over soon. 195 00:10:10,433 --> 00:10:13,200 Now because indeed, yes, there we go. 196 00:10:13,200 --> 00:10:17,700 The training was done in more or less 20s and the final accuracy 197 00:10:17,700 --> 00:10:20,733 we get on the training set, we'll have to check the same on 198 00:10:20,733 --> 00:10:24,300 the test set is averaging around oh point 86. 199 00:10:24,300 --> 00:10:25,000 That's really good. 200 00:10:25,000 --> 00:10:29,500 That means that out of 100 observations you have 86 correct predictions. 201 00:10:29,766 --> 00:10:34,133 So congratulations, you made a very good first deep learning model. 202 00:10:34,133 --> 00:10:35,433 You can be proud of yourself. 203 00:10:35,433 --> 00:10:39,300 And mostly now you can take a little break because we're going to answer part four. 204 00:10:39,300 --> 00:10:41,366 And not only we're going to end support for 205 00:10:41,366 --> 00:10:44,266 but also you're going to see that you're going to have a little homework, 206 00:10:44,266 --> 00:10:48,266 which will consist of predicting the result of a single observation, 207 00:10:48,266 --> 00:10:49,966 meaning a single customer. 208 00:10:49,966 --> 00:10:53,666 You will have to predict if this customer will stay in or leave the bank. 209 00:10:53,833 --> 00:10:55,900 You will enter your solution here 210 00:10:55,900 --> 00:10:59,833 and we will implement the solution together in the next tutorial. 211 00:11:00,166 --> 00:11:01,533 So make sure to do it. 212 00:11:01,533 --> 00:11:03,700 Please try at least to do it. 213 00:11:03,700 --> 00:11:07,066 You actually know how to do it because we already learned how 214 00:11:07,066 --> 00:11:10,566 to do a single prediction before you know the prediction of a single observation. 215 00:11:10,800 --> 00:11:12,000 So you have everything. 216 00:11:12,000 --> 00:11:15,600 Maybe check out again part three classification if you have a doubt. 217 00:11:15,833 --> 00:11:17,900 But there you go. That's your homework. 218 00:11:17,900 --> 00:11:21,400 You have to use R A and model to predict if the customer 219 00:11:21,400 --> 00:11:24,966 with the following information will leave the bank yes or no. 220 00:11:25,300 --> 00:11:28,633 And these following informations are that it is a French customer 221 00:11:28,733 --> 00:11:31,600 with a credit score of 600 and male one. 222 00:11:31,600 --> 00:11:33,033 He is four years old. 223 00:11:33,033 --> 00:11:35,033 He has been in the bank for three years. 224 00:11:35,033 --> 00:11:38,266 He has $60,000 in his account. 225 00:11:38,500 --> 00:11:40,166 He has two products in the bank. 226 00:11:40,166 --> 00:11:41,266 He has a credit card. 227 00:11:41,266 --> 00:11:47,033 Indeed, he is also an active member and he has an estimated salary of $50,000. 228 00:11:47,233 --> 00:11:51,000 And the question is, so should we say goodbye to that customer? 229 00:11:51,266 --> 00:11:55,333 Well, please figure it out and we will see if you right in the next tutorial. 230 00:11:55,700 --> 00:11:57,566 Until then, enjoy machine learning.