1 00:00:00,233 --> 00:00:02,533 Hello and welcome to this art tutorial. 2 00:00:02,533 --> 00:00:03,266 So we just. 3 00:00:03,266 --> 00:00:06,266 Trained our artificial neural network on the training set, 4 00:00:06,433 --> 00:00:08,666 and now it's time to make the predictions on the. 5 00:00:08,666 --> 00:00:09,600 Test set. 6 00:00:09,600 --> 00:00:13,233 So lucky for us, we already have everything ready here, 7 00:00:13,266 --> 00:00:15,366 thanks to our classification. Templates. 8 00:00:15,366 --> 00:00:17,433 That we pasted in the first tutorial. 9 00:00:17,433 --> 00:00:20,100 So actually this. Section predicts. 10 00:00:20,100 --> 00:00:21,566 The test set results. 11 00:00:21,566 --> 00:00:24,233 And this section makes the confusion matrix. 12 00:00:24,233 --> 00:00:25,233 Thanks to which we. 13 00:00:25,233 --> 00:00:26,400 Will obtain the. 14 00:00:26,400 --> 00:00:28,433 Accuracy. On the. Test set. 15 00:00:28,433 --> 00:00:31,400 That is some accuracy. On you. Observations. 16 00:00:31,400 --> 00:00:31,933 On which. 17 00:00:31,933 --> 00:00:32,600 Artificial. 18 00:00:32,600 --> 00:00:35,033 Neural network model wasn't trained. 19 00:00:35,033 --> 00:00:36,833 So first let's take care of this. 20 00:00:36,833 --> 00:00:38,366 Section and. 21 00:00:38,366 --> 00:00:39,966 Let's. See what we need to change. 22 00:00:39,966 --> 00:00:44,200 So first of all this first line gets the predicted probabilities 23 00:00:44,200 --> 00:00:46,033 thanks to this predict function. 24 00:00:46,033 --> 00:00:47,133 But that was the. 25 00:00:47,133 --> 00:00:49,133 Predict function used. For. 26 00:00:49,133 --> 00:00:50,966 Built in our. Packages. 27 00:00:50,966 --> 00:00:52,200 But here since we are using. 28 00:00:52,200 --> 00:00:54,966 The H2O package, that is kind of special. There are. 29 00:00:54,966 --> 00:00:57,166 Actually some things that we need to change here. 30 00:00:57,166 --> 00:00:58,766 But a very few things. 31 00:00:58,766 --> 00:01:00,800 So first, as for. 32 00:01:00,800 --> 00:01:01,666 All the functions we. 33 00:01:01,666 --> 00:01:04,433 Used with this H2O package. 34 00:01:04,433 --> 00:01:06,866 Well, you notice that when we use the function we. 35 00:01:06,866 --> 00:01:08,533 First take the H2O package. 36 00:01:08,533 --> 00:01:10,733 Then a dot and then the name of the function. 37 00:01:10,733 --> 00:01:13,200 Well we need to do the same. Here for the predict. 38 00:01:13,200 --> 00:01:13,800 Function. 39 00:01:13,800 --> 00:01:19,033 So here we just need to add h to zero dot predict. 40 00:01:19,500 --> 00:01:21,700 Okay. So that's the first thing we need to change. 41 00:01:21,700 --> 00:01:24,100 And then let's see. Let's go inside the function. 42 00:01:24,100 --> 00:01:26,700 So the first. Argument. Is classifier. 43 00:01:26,700 --> 00:01:28,233 Let's press here. 44 00:01:28,233 --> 00:01:31,033 F1 to get some information about. 45 00:01:31,033 --> 00:01:32,200 The predict. 46 00:01:32,200 --> 00:01:35,100 Function of the. H2O. Model. 47 00:01:35,100 --> 00:01:38,100 So let's scroll down to have a look at the arguments and let's see. 48 00:01:38,100 --> 00:01:39,466 What they are. 49 00:01:39,466 --> 00:01:40,666 So as we can see we have. 50 00:01:40,666 --> 00:01:44,100 Only two main arguments and then some additional arguments. 51 00:01:44,333 --> 00:01:46,800 But that we. Will not focus on. 52 00:01:46,800 --> 00:01:49,400 Instead, we will focus on the two main arguments here. 53 00:01:49,400 --> 00:01:52,400 Which are the object. And. Mu data. 54 00:01:52,466 --> 00:01:54,100 So the first thing we can see. 55 00:01:54,100 --> 00:01:56,966 Is that there is no type. Argument. 56 00:01:56,966 --> 00:01:57,900 So here simply. 57 00:01:57,900 --> 00:02:00,000 We will remove. This type equals. 58 00:02:00,000 --> 00:02:02,233 Response argument and input. 59 00:02:02,233 --> 00:02:05,033 Because we actually. Don't need it. 60 00:02:05,033 --> 00:02:05,466 All right. 61 00:02:05,466 --> 00:02:07,133 And now we are left. With the. 62 00:02:07,133 --> 00:02:09,533 Two arguments we. Are required. To input. 63 00:02:09,533 --> 00:02:12,533 That is the object which is our classifier here. 64 00:02:12,533 --> 00:02:13,300 That is the A. 65 00:02:13,300 --> 00:02:15,700 And model. Itself that we have just built. 66 00:02:15,700 --> 00:02:17,000 On the training set. 67 00:02:17,000 --> 00:02:19,300 And then the second argument, new data. 68 00:02:19,300 --> 00:02:21,333 And this. New data. Argument is expecting. 69 00:02:21,333 --> 00:02:22,400 Of course, the. 70 00:02:22,400 --> 00:02:24,600 Observations of which it has to make the. 71 00:02:24,600 --> 00:02:25,800 Predictions. 72 00:02:25,800 --> 00:02:27,833 All right. So that's exactly our test set. 73 00:02:27,833 --> 00:02:29,200 And here we remove. 74 00:02:29,200 --> 00:02:30,433 The dependent variable. 75 00:02:30,433 --> 00:02:33,433 Column thanks to this minus three. Here. 76 00:02:33,566 --> 00:02:36,000 But we need to replace this three because. 77 00:02:36,000 --> 00:02:37,733 This number three. Here corresponds. 78 00:02:37,733 --> 00:02:38,966 To the index. Of the. 79 00:02:38,966 --> 00:02:39,333 Dependent. 80 00:02:39,333 --> 00:02:42,766 Variable of the data set that we worked with in part three. 81 00:02:42,766 --> 00:02:43,900 Classification. 82 00:02:43,900 --> 00:02:46,900 And here of course the index of our dependent variable is not three. 83 00:02:47,133 --> 00:02:49,066 But is. 11. 84 00:02:49,066 --> 00:02:50,500 Remember we. Already replaced. 85 00:02:50,500 --> 00:02:52,266 The index three here in this. 86 00:02:52,266 --> 00:02:53,766 Feature scaling part. 87 00:02:53,766 --> 00:02:58,400 So we replaced the four indexes three that were here by 11. 88 00:02:58,400 --> 00:03:00,066 And so here we need to do the same. 89 00:03:00,066 --> 00:03:03,100 We will replace this three index here by. 90 00:03:03,333 --> 00:03:05,200 Index. 11. 91 00:03:05,200 --> 00:03:05,733 All right. 92 00:03:05,733 --> 00:03:06,900 So now this. 93 00:03:06,900 --> 00:03:09,400 Is taking the test set observations as. New data. 94 00:03:09,400 --> 00:03:13,666 That is it will predict the probabilities that the dependent variable 95 00:03:13,666 --> 00:03:15,066 exited equals one. 96 00:03:15,066 --> 00:03:17,100 For the observations in the test set. 97 00:03:17,100 --> 00:03:17,833 And therefore. 98 00:03:17,833 --> 00:03:19,766 It will predict for each customer. In the test. 99 00:03:19,766 --> 00:03:22,800 Set the probability that this customer leaves the. 100 00:03:22,800 --> 00:03:24,166 Bank. And since we. 101 00:03:24,166 --> 00:03:28,800 Have the real results of whether the customers of the test set left or stayed. 102 00:03:28,800 --> 00:03:32,100 In the bank, well, we will compare our predictions to. 103 00:03:32,100 --> 00:03:36,400 These real results, these actual results, and that's how we'll get the accuracy. 104 00:03:36,600 --> 00:03:38,166 By computing the number of correct 105 00:03:38,166 --> 00:03:41,733 predictions divided by the total number of observations in the test set. 106 00:03:42,000 --> 00:03:43,800 That is, two. Thousand. 107 00:03:43,800 --> 00:03:46,000 And then if we get. A good accuracy. 108 00:03:46,000 --> 00:03:48,266 Then maybe we'll get a good and powerful model. 109 00:03:48,266 --> 00:03:50,433 And if that's the. Case, we will give it to. 110 00:03:50,433 --> 00:03:54,233 The bank on the plate and tell the bank, okay, now you can rank. 111 00:03:54,533 --> 00:03:57,000 All your customers, all. The customers in the bank. 112 00:03:57,000 --> 00:03:59,300 By their probability to leave the bank. 113 00:03:59,300 --> 00:04:01,200 That is, for each of your customers. 114 00:04:01,200 --> 00:04:03,800 You can predict with a good accuracy and will be. 115 00:04:03,800 --> 00:04:06,566 Able to tell them precisely. Where this accuracy is. 116 00:04:06,566 --> 00:04:07,533 You'll be able to predict 117 00:04:07,533 --> 00:04:11,100 with a good accuracy the probability that the customer leaves the bank. 118 00:04:11,333 --> 00:04:12,566 And then you can add. 119 00:04:12,566 --> 00:04:14,766 Therefore, I. Can give you a ranking of. 120 00:04:14,766 --> 00:04:18,300 All your customers ranked by their probability to leave the bank. 121 00:04:18,300 --> 00:04:18,600 From. 122 00:04:18,600 --> 00:04:21,566 The highest probability to the lowest probability. 123 00:04:21,566 --> 00:04:25,466 And therefore you can do some customers segmentation and consider, for example, 124 00:04:25,466 --> 00:04:27,233 the top 10%. Probabilities. 125 00:04:27,233 --> 00:04:29,100 That the customers leave the bank. 126 00:04:29,100 --> 00:04:31,333 And in this segment, you can analyze. 127 00:04:31,333 --> 00:04:34,366 Deeper the factors that lead the customers. 128 00:04:34,366 --> 00:04:35,400 To leave the bank. 129 00:04:35,400 --> 00:04:37,266 By. Using some data mining techniques. 130 00:04:37,266 --> 00:04:40,266 Like for example, doing a chi square test or. 131 00:04:40,266 --> 00:04:43,400 Applying the step summary function on your independent variables 132 00:04:43,400 --> 00:04:46,533 to understand which independent variables have the most impact 133 00:04:46,766 --> 00:04:47,766 on the dependent variable. 134 00:04:47,766 --> 00:04:48,833 That is, which. 135 00:04:48,833 --> 00:04:51,600 Independent variable explains the most. 136 00:04:51,600 --> 00:04:53,400 Why customers are leaving? 137 00:04:53,400 --> 00:04:55,033 Well, you know how to do that. 138 00:04:55,033 --> 00:04:57,500 That's exactly what. We did in part two and three. 139 00:04:57,500 --> 00:04:59,700 When we use this summary function to get. 140 00:04:59,700 --> 00:05:00,166 The p. 141 00:05:00,166 --> 00:05:03,033 Values and statistical significance levels. 142 00:05:03,033 --> 00:05:05,700 To see which. Independent variables are the. Most. 143 00:05:05,700 --> 00:05:08,866 Optimistically significant and therefore explain the best, the dependent. 144 00:05:08,866 --> 00:05:11,533 Variable. That is why customers are leaving. 145 00:05:11,533 --> 00:05:12,966 So that's the purpose. 146 00:05:12,966 --> 00:05:15,300 Behind making these predictions on the test. Set. 147 00:05:15,300 --> 00:05:16,400 It's just to get the. 148 00:05:16,400 --> 00:05:19,466 Accuracy on your observations to validate the model. So. 149 00:05:19,466 --> 00:05:21,600 That we can give this model to the bank. 150 00:05:21,600 --> 00:05:21,900 All right. 151 00:05:21,900 --> 00:05:24,133 So now let's make the predictions. 152 00:05:24,133 --> 00:05:27,133 So we are almost done here. 153 00:05:27,166 --> 00:05:29,333 We just need to add. One more thing. 154 00:05:29,333 --> 00:05:30,233 Which is. 155 00:05:30,233 --> 00:05:31,200 Again related. 156 00:05:31,200 --> 00:05:34,066 To the fact that we are using the H2O package. 157 00:05:34,066 --> 00:05:35,600 And as you can see in. 158 00:05:35,600 --> 00:05:40,033 This new data argument well this new data is of course the test set. 159 00:05:40,366 --> 00:05:42,300 But this test set is expected to. 160 00:05:42,300 --> 00:05:44,333 Be an. H2O frame. 161 00:05:44,333 --> 00:05:46,233 Right now it is a standard data frame. 162 00:05:46,233 --> 00:05:48,066 But our H2O. 163 00:05:48,066 --> 00:05:51,066 Predict function is expecting an H2O frame. 164 00:05:51,466 --> 00:05:53,500 So how can we convert. This test a data. 165 00:05:53,500 --> 00:05:56,000 Frame into a needs to frame? 166 00:05:56,000 --> 00:05:59,533 Well, by doing exactly the same as what we did. 167 00:05:59,700 --> 00:06:00,800 To convert. 168 00:06:00,800 --> 00:06:01,766 This training. 169 00:06:01,766 --> 00:06:04,800 Set data frame into this. H2O. Frame. 170 00:06:05,100 --> 00:06:06,666 That is, by applying. On the test. 171 00:06:06,666 --> 00:06:10,666 Set the as dot h to O. 172 00:06:11,166 --> 00:06:12,433 Function. 173 00:06:12,433 --> 00:06:12,800 All right. 174 00:06:12,800 --> 00:06:15,333 So I'm putting. The test. Set in the function. 175 00:06:15,333 --> 00:06:16,700 Like that. 176 00:06:16,700 --> 00:06:18,700 And here we. Go. Now I think. 177 00:06:18,700 --> 00:06:19,866 Everything is ready. 178 00:06:19,866 --> 00:06:24,166 We are ready to make the predictions which so far will be the. 179 00:06:24,166 --> 00:06:27,066 Prediction of the. Probabilities that the class equals one. 180 00:06:27,066 --> 00:06:30,066 That is, the probabilities that the customers leave the bank. 181 00:06:30,366 --> 00:06:33,266 So let's select this. 182 00:06:33,266 --> 00:06:36,033 And get the. Predicted probabilities. 183 00:06:37,066 --> 00:06:38,266 And here we go. 184 00:06:38,266 --> 00:06:41,133 We now. Have the prob pred vector. 185 00:06:41,133 --> 00:06:43,533 Containing all the. Predicted probabilities. 186 00:06:43,533 --> 00:06:45,966 In the form of an environment. 187 00:06:45,966 --> 00:06:46,633 So that's good. 188 00:06:46,633 --> 00:06:47,666 But we. Cannot have. 189 00:06:47,666 --> 00:06:50,000 A look at these predicted probabilities yet. 190 00:06:50,000 --> 00:06:51,500 We will need. To convert. It. 191 00:06:51,500 --> 00:06:53,700 Back into a standard. Vector. 192 00:06:53,700 --> 00:06:54,833 But before we do that. 193 00:06:54,833 --> 00:06:56,633 Convert it into a. Vector. 194 00:06:56,633 --> 00:07:01,100 Well we need to apply this line as well, which will, you know. 195 00:07:01,266 --> 00:07:02,266 Transform the. 196 00:07:02,266 --> 00:07:04,433 Probabilities into the. 197 00:07:04,433 --> 00:07:07,033 Predictions in the form one. Or zero. 198 00:07:07,033 --> 00:07:07,966 That is exactly the. 199 00:07:07,966 --> 00:07:09,233 Predictions. Of the. 200 00:07:09,233 --> 00:07:11,133 Dependent variable. Exited. 201 00:07:11,133 --> 00:07:13,700 And to do this we're using this ifelse function. 202 00:07:13,700 --> 00:07:17,700 And basically what we do is we choose a threshold such that 203 00:07:17,700 --> 00:07:22,300 if the predicted probability is above the threshold, then we predict one. 204 00:07:22,500 --> 00:07:27,400 And if the predicted probability is below the threshold, then we predict zero. 205 00:07:27,866 --> 00:07:29,000 So that's a natural. 206 00:07:29,000 --> 00:07:31,200 Threshold to take when we get our predictions. 207 00:07:31,200 --> 00:07:32,833 In terms. Of probabilities. 208 00:07:32,833 --> 00:07:33,933 No that it is not. 209 00:07:33,933 --> 00:07:36,933 Necessarily always 50. Percent 0.5. 210 00:07:37,000 --> 00:07:37,766 That's the case. 211 00:07:37,766 --> 00:07:40,800 For example, in medicine when we have to predict some sensitive 212 00:07:40,800 --> 00:07:44,300 informations, like for example, predicting if a tumor is malignant. 213 00:07:44,433 --> 00:07:46,000 Well that's. More sensitive. 214 00:07:46,000 --> 00:07:47,033 So in that case with. 215 00:07:47,033 --> 00:07:48,900 Better be sure of. Our predictions. 216 00:07:48,900 --> 00:07:52,733 And therefore we would choose a higher threshold like for example 80%. 217 00:07:53,266 --> 00:07:54,033 But here we are. 218 00:07:54,033 --> 00:07:55,833 Predicting if a customer leaves the bank. 219 00:07:55,833 --> 00:07:58,466 So we are fine with the 50% threshold. 220 00:07:58,466 --> 00:07:59,500 So that's okay. 221 00:07:59,500 --> 00:08:03,666 And by the way there is a more simple way to get these. 222 00:08:04,033 --> 00:08:06,133 Predictions in the form 0 or 1. 223 00:08:06,133 --> 00:08:07,200 Without using. 224 00:08:07,200 --> 00:08:10,133 This if else function. It's by simply. 225 00:08:10,133 --> 00:08:12,900 Removing this one and zero. Here and. 226 00:08:12,900 --> 00:08:14,366 Removing this. If else. 227 00:08:15,666 --> 00:08:17,033 And by using this. 228 00:08:17,033 --> 00:08:19,100 Prop, read. Larger than 0.5. 229 00:08:19,100 --> 00:08:24,133 Because this will return a boolean, which will be true if prop read is. 230 00:08:24,133 --> 00:08:25,500 Larger than 0.5. 231 00:08:25,500 --> 00:08:28,466 And false if prop read is below. 232 00:08:28,466 --> 00:08:32,266 0.5 and wipe read in the form of this boolean, true and false. 233 00:08:32,366 --> 00:08:33,366 Will be accepted. 234 00:08:33,366 --> 00:08:35,266 In this confusion matrix here. 235 00:08:35,266 --> 00:08:37,100 So that's more simple. And now. 236 00:08:37,100 --> 00:08:38,933 Let's get this. 237 00:08:38,933 --> 00:08:41,233 Predictions in the form of booleans. 238 00:08:41,233 --> 00:08:42,566 All right. So I'm going to. 239 00:08:42,566 --> 00:08:45,066 Select this. Line and. Execute it. 240 00:08:45,066 --> 00:08:45,433 All right. 241 00:08:45,433 --> 00:08:48,266 So now we have our white. Print in the form of booleans. 242 00:08:48,266 --> 00:08:50,066 But it is still. 243 00:08:50,066 --> 00:08:52,866 An H2O. Object because it is the result. 244 00:08:52,866 --> 00:08:53,500 In the first. 245 00:08:53,500 --> 00:08:56,266 Place of this H2 dot predict. Function. 246 00:08:56,266 --> 00:08:58,200 So it still needs to object. 247 00:08:58,200 --> 00:09:00,533 And therefore. Now what we. Have to do is. 248 00:09:00,533 --> 00:09:01,500 To convert. 249 00:09:01,500 --> 00:09:04,500 This H2 object back. Into a. Vector. 250 00:09:04,600 --> 00:09:05,466 Because this table. 251 00:09:05,466 --> 00:09:08,066 Function here will only. Accept a vector. 252 00:09:08,066 --> 00:09:09,566 A standard vector. 253 00:09:09,566 --> 00:09:12,600 And of course will never accept this H2 object. 254 00:09:13,033 --> 00:09:15,233 So let's convert it back into a vector. 255 00:09:15,233 --> 00:09:16,800 And that's actually really simple. 256 00:09:16,800 --> 00:09:21,166 It's actually kind of the same as converting a data frame into an H 257 00:09:21,166 --> 00:09:21,766 two frame. 258 00:09:21,766 --> 00:09:25,000 But instead of using H2 here we will use vector. 259 00:09:25,400 --> 00:09:28,066 So here we simply need to type y pred 260 00:09:29,566 --> 00:09:32,633 equals as dot vector. 261 00:09:33,233 --> 00:09:33,533 And in. 262 00:09:33,533 --> 00:09:36,533 Parentheses of course y print. 263 00:09:37,100 --> 00:09:38,733 All right. So let's check it out. 264 00:09:38,733 --> 00:09:41,733 I'm going to select this line and. Execute. 265 00:09:42,000 --> 00:09:43,066 And now as you can. 266 00:09:43,066 --> 00:09:44,266 See y pred. 267 00:09:44,266 --> 00:09:48,100 Became this vector of integers containing 2000 elements. 268 00:09:48,400 --> 00:09:49,433 And that's the standard. 269 00:09:49,433 --> 00:09:52,100 Vector of r we were used to working. With. 270 00:09:52,100 --> 00:09:54,166 So we can actually have a look at the. 271 00:09:54,166 --> 00:09:56,166 Predictions of. The test. 272 00:09:56,166 --> 00:09:59,000 Observations by typing here in the. Console. 273 00:09:59,000 --> 00:10:00,433 Why pred? 274 00:10:00,433 --> 00:10:01,566 Here we go. That's all. 275 00:10:01,566 --> 00:10:02,866 The predictions of the. Tested. 276 00:10:02,866 --> 00:10:05,666 Observations 2000 predictions. 277 00:10:05,666 --> 00:10:06,533 So here we go. 278 00:10:06,533 --> 00:10:08,900 According to the model, the first. Customer stayed. 279 00:10:08,900 --> 00:10:11,700 In the bank. The second customer stayed in the bank. 280 00:10:11,700 --> 00:10:14,533 The third customer left the bank. 281 00:10:14,533 --> 00:10:16,866 The fourth one stayed, the fifth one stayed. 282 00:10:16,866 --> 00:10:17,900 Etc.. 283 00:10:17,900 --> 00:10:20,633 So if you want, you can actually compare these predictions with the. 284 00:10:20,633 --> 00:10:24,400 Real results that are in the last column of test set. 285 00:10:24,900 --> 00:10:26,033 This come here. 286 00:10:26,033 --> 00:10:30,466 So for example, 001000 are the real. 287 00:10:30,466 --> 00:10:31,800 Outcomes of the. 288 00:10:31,800 --> 00:10:33,000 First customers. 289 00:10:33,000 --> 00:10:35,233 And if. We compare that with the predictions. 290 00:10:35,233 --> 00:10:36,800 Well we see that the. 291 00:10:36,800 --> 00:10:37,466 Predictions. 292 00:10:37,466 --> 00:10:42,066 Are quite correct because here we get as well zero. 010. 293 00:10:42,066 --> 00:10:42,966 Zero zero. 294 00:10:42,966 --> 00:10:45,633 So the five first. Predictions. Are correct. 295 00:10:45,633 --> 00:10:47,766 So that smells pretty good for our accuracy that. 296 00:10:47,766 --> 00:10:49,066 We were about to compute. 297 00:10:49,066 --> 00:10:51,866 Because when we look at the first observations we can only see. 298 00:10:51,866 --> 00:10:53,033 Correct. Predictions. 299 00:10:53,033 --> 00:10:55,500 So now actually I can't wait. To see the accuracy. 300 00:10:55,500 --> 00:10:57,300 So let's computed right now. 301 00:10:57,300 --> 00:10:59,300 We will start. By making the confusion matrix. 302 00:10:59,300 --> 00:11:01,666 And of course. Here we need to replace. 303 00:11:01,666 --> 00:11:02,700 This index. 304 00:11:02,700 --> 00:11:04,033 Three here by 11. 305 00:11:04,033 --> 00:11:07,033 Because this corresponds to the index of the dependent variable. 306 00:11:07,066 --> 00:11:09,033 And so now. We are ready to. 307 00:11:09,033 --> 00:11:11,200 Make this confusion matrix. 308 00:11:11,200 --> 00:11:14,266 So I'm going to select this line and execute. 309 00:11:14,700 --> 00:11:17,100 Here we go. Confusion matrix. Created. 310 00:11:17,100 --> 00:11:18,600 So now let's have a look. 311 00:11:18,600 --> 00:11:19,066 I'm going. 312 00:11:19,066 --> 00:11:22,533 To. Type cmd here in the console and press enter. 313 00:11:23,100 --> 00:11:25,133 That's our confusion matrix. 314 00:11:25,133 --> 00:11:27,133 We can see. A lot of correct. Predictions. 315 00:11:27,133 --> 00:11:28,033 That's good. 316 00:11:28,033 --> 00:11:31,033 1500. And 36 correct. 317 00:11:31,033 --> 00:11:33,066 Predictions. Of customers who stayed. 318 00:11:33,066 --> 00:11:37,000 In the bank, and 195 correct predictions of. 319 00:11:37,000 --> 00:11:39,033 Customers who left the bank. 320 00:11:39,033 --> 00:11:41,066 And then we have 212. Plus. 321 00:11:41,066 --> 00:11:43,300 57 incorrect predictions. 322 00:11:43,300 --> 00:11:45,966 Of customers who either left or stayed. 323 00:11:45,966 --> 00:11:47,133 In the bank. 324 00:11:47,133 --> 00:11:48,533 So this looks pretty good. 325 00:11:48,533 --> 00:11:50,466 And now let's. Not wait anymore. 326 00:11:50,466 --> 00:11:52,200 Let's compute the accuracy. 327 00:11:52,200 --> 00:11:54,533 So the accuracy is the total. Number of. 328 00:11:54,533 --> 00:11:55,633 Correct. Predictions. 329 00:11:55,633 --> 00:11:57,566 That is 105,030. 330 00:11:57,566 --> 00:11:59,966 Six plus. 331 00:11:59,966 --> 00:12:02,400 190. Five divided. 332 00:12:02,400 --> 00:12:05,233 By the total number of observations in the. Test set. 333 00:12:05,233 --> 00:12:07,766 That is the total number of predictions actually. 334 00:12:07,766 --> 00:12:10,666 Which is 2000. 335 00:12:10,666 --> 00:12:12,966 All right. So let's check it out. 336 00:12:12,966 --> 00:12:15,966 Let's see if we can offer this model to the bank. 337 00:12:15,966 --> 00:12:17,866 Let's see if we'll. Get the bonus. 338 00:12:17,866 --> 00:12:19,000 Let's find out about this. 339 00:12:19,000 --> 00:12:21,566 Accuracy on 332. 340 00:12:21,566 --> 00:12:24,966 One. Go 80. 6.5. 341 00:12:24,966 --> 00:12:25,800 Percent. 342 00:12:25,800 --> 00:12:29,100 That's actually not bad at all 86.5%. 343 00:12:29,100 --> 00:12:29,900 Well, let's say 80. 344 00:12:29,900 --> 00:12:34,133 787% means that on 100 observations. 345 00:12:34,366 --> 00:12:36,566 87. Predictions should. Be correct. 346 00:12:36,566 --> 00:12:37,766 So this is pretty good. 347 00:12:37,766 --> 00:12:40,533 And besides, we. Haven't done any parameter tuning. 348 00:12:40,533 --> 00:12:42,500 And you will see that by doing some parameter 349 00:12:42,500 --> 00:12:45,966 tuning using some techniques like k fold cross-validation. 350 00:12:45,966 --> 00:12:48,900 Well we can get an even better accuracy score. 351 00:12:48,900 --> 00:12:51,133 No worries, we will. Do that in person. 352 00:12:51,133 --> 00:12:52,900 You can actually already. Practice. 353 00:12:52,900 --> 00:12:55,633 To improve. This accuracy score. 354 00:12:55,633 --> 00:12:58,066 And please let me know if you get an awesome one. 355 00:12:58,066 --> 00:12:59,633 And now just one last thing. 356 00:12:59,633 --> 00:13:00,966 Since we were connected to. This. 357 00:13:00,966 --> 00:13:02,466 H2O instance. 358 00:13:02,466 --> 00:13:04,566 It's better to. Disconnect from it now. 359 00:13:04,566 --> 00:13:05,800 And to do. This we. 360 00:13:05,800 --> 00:13:07,000 Only need to. 361 00:13:07,000 --> 00:13:08,100 Apply a last. 362 00:13:08,100 --> 00:13:08,766 Function of. 363 00:13:08,766 --> 00:13:14,933 H2O, which is the H2O dot shut down. 364 00:13:14,933 --> 00:13:17,466 Here it is with no arguments inside. 365 00:13:17,466 --> 00:13:19,600 You just need to select this. 366 00:13:19,600 --> 00:13:21,866 And this will disconnect you from the server. 367 00:13:21,866 --> 00:13:23,500 So let's execute. 368 00:13:23,500 --> 00:13:24,200 Are you sure. 369 00:13:24,200 --> 00:13:27,033 You want to shut down the H2O instance running at this. 370 00:13:27,033 --> 00:13:27,900 Address? 371 00:13:27,900 --> 00:13:29,966 Then you just need to type here capital. 372 00:13:29,966 --> 00:13:32,000 Y and then enter. 373 00:13:32,000 --> 00:13:33,766 And now we are disconnected. 374 00:13:33,766 --> 00:13:36,500 True means yes we did disconnect. 375 00:13:36,500 --> 00:13:37,800 So congratulations. 376 00:13:37,800 --> 00:13:39,933 You have built your first artificial. 377 00:13:39,933 --> 00:13:43,000 Neural network on. R using the H2O package. 378 00:13:43,366 --> 00:13:47,000 I was very happy to build this first deep learning model with you, and we are. 379 00:13:47,000 --> 00:13:49,233 Getting to the end of this section. 380 00:13:49,233 --> 00:13:53,866 Next section will be about Convolutional Neural networks, another branch of machine 381 00:13:53,866 --> 00:13:56,966 learning specialized for computer vision, because it will consider. 382 00:13:57,000 --> 00:13:58,100 Spatial. Structure in the. 383 00:13:58,100 --> 00:14:01,100 Data exactly as it is the case. For images. 384 00:14:01,200 --> 00:14:03,500 Where the. Position of the pixels. Matters. 385 00:14:03,500 --> 00:14:04,066 So we will. 386 00:14:04,066 --> 00:14:05,400 See that in the next section. 387 00:14:05,400 --> 00:14:07,200 And until then, enjoy machine learning.