1 00:00:00,300 --> 00:00:01,166 Hello, my friends. 2 00:00:01,166 --> 00:00:02,833 All right, let's see if you got this right. 3 00:00:02,833 --> 00:00:05,033 Again, predicting the test. Results. 4 00:00:05,033 --> 00:00:07,833 And displaying the vector. Of the. Predictions next. 5 00:00:07,833 --> 00:00:10,600 To the vector of the real. Results, meaning the real. 6 00:00:10,600 --> 00:00:12,400 Purchase. Decisions. 7 00:00:12,400 --> 00:00:14,100 And I really want you to. 8 00:00:14,100 --> 00:00:16,033 Juggle with all your toolkits, you know, 9 00:00:16,033 --> 00:00:19,566 whether it is the data preprocessing toolkit or your other. 10 00:00:19,566 --> 00:00:21,500 Machine learning models, where you have indeed. 11 00:00:21,500 --> 00:00:23,066 Several tools inside. 12 00:00:23,066 --> 00:00:24,000 Because indeed, now. 13 00:00:24,000 --> 00:00:26,500 The tool we would like to use is that. You know. 14 00:00:26,500 --> 00:00:27,600 A little piece of code 15 00:00:27,600 --> 00:00:31,133 that allows to display the two vectors of predicted results. 16 00:00:31,133 --> 00:00:32,633 And real. Results. 17 00:00:32,633 --> 00:00:35,333 So what I expected you to do, you know, to. 18 00:00:35,333 --> 00:00:37,333 Be the most efficient as you can. 19 00:00:37,333 --> 00:00:40,666 Was to go to part two regression. 20 00:00:40,933 --> 00:00:45,300 And then into the multiple linear regression folder and then open this 21 00:00:45,300 --> 00:00:50,033 multiple linear regression implementation, which indeed contains inside. 22 00:00:50,033 --> 00:00:52,000 So I already opened it. 23 00:00:52,000 --> 00:00:55,533 That tool allowing to display the two vectors 24 00:00:55,533 --> 00:00:58,033 of predicted results and real results. 25 00:00:58,033 --> 00:01:00,733 I'm talking, of course, about this tool. Right. 26 00:01:00,733 --> 00:01:02,700 We implemented it many times. 27 00:01:02,700 --> 00:01:05,333 That's why I didn't want to do it again here in R. 28 00:01:05,333 --> 00:01:07,000 Just a. Regression implementation. 29 00:01:07,000 --> 00:01:07,833 Plus I. 30 00:01:07,833 --> 00:01:10,133 Really want to train you and incentivize. 31 00:01:10,133 --> 00:01:10,900 You to. 32 00:01:10,900 --> 00:01:12,633 Be the most efficient as you can 33 00:01:12,633 --> 00:01:16,466 by juggling between your different models implementations. 34 00:01:16,733 --> 00:01:17,233 And so. 35 00:01:17,233 --> 00:01:20,833 If you did this, if you had the reflex to grab that tool right here. 36 00:01:20,833 --> 00:01:23,966 In multiple linear regression or many other models 37 00:01:23,966 --> 00:01:27,100 where we implemented that, well congratulations. 38 00:01:27,100 --> 00:01:28,233 You did amazing. 39 00:01:28,233 --> 00:01:30,600 All right. So here we're going to take this. 40 00:01:30,600 --> 00:01:31,133 And of. 41 00:01:31,133 --> 00:01:31,666 Course if you. 42 00:01:31,666 --> 00:01:34,600 Reimplemented it. Yourself that's also. Amazing. 43 00:01:34,600 --> 00:01:38,700 Especially if you managed to be more efficient than how we were. 44 00:01:38,700 --> 00:01:39,400 Able to be. 45 00:01:39,400 --> 00:01:39,900 All right. 46 00:01:39,900 --> 00:01:44,300 Because what we only have to do here is to copy this little piece of code. 47 00:01:44,300 --> 00:01:46,633 You know that tool. And. 48 00:01:46,633 --> 00:01:47,400 Paste it. 49 00:01:47,400 --> 00:01:48,100 In a new. 50 00:01:48,100 --> 00:01:51,600 Code cell here to predict the test results. 51 00:01:51,600 --> 00:01:53,200 Because indeed we have. All the. 52 00:01:53,200 --> 00:01:55,200 Same names here. For the vector of. 53 00:01:55,200 --> 00:01:56,000 Predictions y. 54 00:01:56,000 --> 00:01:58,800 Pred, which will be the result of the. 55 00:01:58,800 --> 00:01:59,866 Predict method. 56 00:01:59,866 --> 00:02:00,800 Applied to test. 57 00:02:00,800 --> 00:02:03,800 Set and called of course from our not. 58 00:02:03,933 --> 00:02:07,133 Regressor object but classifier object. 59 00:02:07,366 --> 00:02:08,400 There we go. 60 00:02:08,400 --> 00:02:10,700 So that's the. First change you had to make. 61 00:02:10,700 --> 00:02:13,633 And then. Well. Since this time, you know. 62 00:02:13,633 --> 00:02:14,966 Our predicted purchase. 63 00:02:14,966 --> 00:02:17,100 Decisions and the real purchase decisions are. 64 00:02:17,100 --> 00:02:18,766 Either zero. One. 65 00:02:18,766 --> 00:02:21,333 Well we don't need to add anything here. To. 66 00:02:21,333 --> 00:02:24,300 You know, forced a number of decimals after the comma to. 67 00:02:24,300 --> 00:02:25,500 Be only two. 68 00:02:25,500 --> 00:02:27,833 Right here we're. Only dealing with integers. 69 00:02:27,833 --> 00:02:29,000 So we can remove this. 70 00:02:29,000 --> 00:02:30,300 We don't need this. 71 00:02:30,300 --> 00:02:32,166 And then final question here. 72 00:02:32,166 --> 00:02:34,200 Do we have. To change anything. 73 00:02:34,200 --> 00:02:35,833 Well absolutely. Not. 74 00:02:35,833 --> 00:02:39,700 And that's what I mean by, you know, grabbing a tool and applying it 75 00:02:39,700 --> 00:02:43,133 on your new model by only having to change 1 or 2 things. 76 00:02:43,133 --> 00:02:47,066 Here we only changed the name of the model type, you know, from. 77 00:02:47,066 --> 00:02:49,200 Regressor to classifier. 78 00:02:49,200 --> 00:02:50,766 Okay. So let's check it out. 79 00:02:50,766 --> 00:02:51,900 Let's see if it works. 80 00:02:51,900 --> 00:02:53,966 Let's press play here. 81 00:02:53,966 --> 00:02:55,733 And indeed. 82 00:02:55,733 --> 00:02:58,966 We get the two vector is. Next. 83 00:02:58,966 --> 00:03:00,266 To each other with. 84 00:03:00,266 --> 00:03:01,633 First on the left. 85 00:03:01,633 --> 00:03:03,300 Your vector of. Predictions. 86 00:03:03,300 --> 00:03:05,933 You know of the. Predicted purchase. Decisions for. 87 00:03:05,933 --> 00:03:06,400 All the. 88 00:03:06,400 --> 00:03:09,333 Customers of of. Course the test set right. 89 00:03:09,333 --> 00:03:11,333 This was applied to excess here. 90 00:03:11,333 --> 00:03:14,066 So that's all the customers of the test set. 91 00:03:14,066 --> 00:03:16,200 And on the right in the second column you. 92 00:03:16,200 --> 00:03:19,200 Have the real purchased decisions. 93 00:03:19,400 --> 00:03:21,900 And so here what's interesting to see is to compare. 94 00:03:21,900 --> 00:03:22,933 The predicted purchase. 95 00:03:22,933 --> 00:03:24,566 Decisions to the real ones. 96 00:03:24,566 --> 00:03:27,233 For all. The customers in the test. Set. 97 00:03:27,233 --> 00:03:27,533 All right. 98 00:03:27,533 --> 00:03:28,933 So let's see for the first. 99 00:03:28,933 --> 00:03:30,000 Customer of the test set, 100 00:03:30,000 --> 00:03:33,766 you know remember of age 30 and estimated salary 87,000. 101 00:03:33,766 --> 00:03:34,433 Dollars. 102 00:03:34,433 --> 00:03:36,533 Well the prediction is no. 103 00:03:36,533 --> 00:03:39,466 This customer didn't buy the new SUV. 104 00:03:39,466 --> 00:03:41,900 And the real result is indeed no. 105 00:03:41,900 --> 00:03:44,666 In reality, that. Customer didn't buy the new. 106 00:03:44,666 --> 00:03:47,666 SUV. Good. Same for the second customer. 107 00:03:47,666 --> 00:03:48,966 That customer was predicted 108 00:03:48,966 --> 00:03:52,800 not to buy that new SUV, and indeed it did not buy the new SUV. 109 00:03:53,133 --> 00:03:56,533 Third customer this time the third customer actually. 110 00:03:56,533 --> 00:03:58,333 Bought that new SUV. 111 00:03:58,333 --> 00:04:01,866 And our model predicted that indeed, this new customer bought it. 112 00:04:02,133 --> 00:04:05,266 Well, it's funny, we actually have a lot of a correct predictions. 113 00:04:05,266 --> 00:04:06,533 That's amazing. Right? 114 00:04:06,533 --> 00:04:09,600 All this so far is correct. This is correct. 115 00:04:09,733 --> 00:04:10,333 And here. 116 00:04:10,333 --> 00:04:13,300 We go. We have our first. Incorrect prediction. 117 00:04:13,300 --> 00:04:15,066 Here. Our logistic. Regression model. 118 00:04:15,066 --> 00:04:17,200 Predicted that this particular. 119 00:04:17,200 --> 00:04:19,066 Customer didn't buy the. 120 00:04:19,066 --> 00:04:21,700 SUV because we have a prediction of zero here. 121 00:04:21,700 --> 00:04:23,566 But in reality that. 122 00:04:23,566 --> 00:04:26,600 Customer bought that new amazing. SUV. Okay. 123 00:04:26,600 --> 00:04:28,800 Because the real result here is a one. 124 00:04:28,800 --> 00:04:31,200 Then here. Is correct. Correct. And here we go. 125 00:04:31,200 --> 00:04:33,300 Another incorrect prediction 126 00:04:33,300 --> 00:04:36,900 where our model predicted again that this customer didn't buy the. 127 00:04:36,900 --> 00:04:38,900 SUV, whereas in reality. 128 00:04:38,900 --> 00:04:41,166 That customer. But the new SUV. 129 00:04:41,166 --> 00:04:42,433 All right. And you see so. 130 00:04:42,433 --> 00:04:45,166 That looks really really good actually we will get a. 131 00:04:45,166 --> 00:04:47,033 Very nice confusion matrix. 132 00:04:47,033 --> 00:04:48,800 I will explain very soon what it is. 133 00:04:48,800 --> 00:04:49,566 And mostly a. 134 00:04:49,566 --> 00:04:53,133 Very good accuracy because the accuracy. 135 00:04:53,333 --> 00:04:54,133 In the test set. 136 00:04:54,133 --> 00:04:54,600 Of course. 137 00:04:54,600 --> 00:04:57,600 Is simply the number of correct predictions. 138 00:04:57,733 --> 00:04:58,966 Divided by the. 139 00:04:58,966 --> 00:05:01,666 Total number of observations in the test set. 140 00:05:01,666 --> 00:05:04,266 And this. Is exactly what we're about to get. 141 00:05:04,266 --> 00:05:05,033 In the test set. 142 00:05:05,033 --> 00:05:08,400 We will not only get the confusion matrix showing. 143 00:05:08,400 --> 00:05:10,600 So there. You go. I'm about to explain what it is. 144 00:05:10,600 --> 00:05:13,733 The confusion matrix. Will show us exactly. 145 00:05:13,733 --> 00:05:17,300 The number of correct predictions and the number of incorrect predictions. 146 00:05:17,400 --> 00:05:20,500 For the two cases where the real result was zero. 147 00:05:20,500 --> 00:05:21,266 Or one to. 148 00:05:21,266 --> 00:05:22,800 Basically, we will have a nice matrix 149 00:05:22,800 --> 00:05:26,400 showing how many mistakes and correct predictions our model made. 150 00:05:26,766 --> 00:05:29,466 And of course inside the same new. 151 00:05:29,466 --> 00:05:31,366 Step or code cell. 152 00:05:31,366 --> 00:05:32,300 We will compute. 153 00:05:32,300 --> 00:05:35,200 The accuracy and we will see what is the. 154 00:05:35,200 --> 00:05:39,266 Percentage of correct predictions are moral made on the test. Set. 155 00:05:39,666 --> 00:05:41,766 So should I ask you to try to do it. 156 00:05:41,766 --> 00:05:44,700 On your own? Well, yes. Why not? Because, you know, you. 157 00:05:44,700 --> 00:05:46,800 Just have to go to the API of scikit. 158 00:05:46,800 --> 00:05:48,633 Learn and. Figure out. 159 00:05:48,633 --> 00:05:51,600 How to make that confusion matrix and how to compute the. 160 00:05:51,600 --> 00:05:53,933 Accuracy. And now I'll just give you a little hint. 161 00:05:53,933 --> 00:05:54,300 You will. 162 00:05:54,300 --> 00:05:55,533 Have to look. 163 00:05:55,533 --> 00:05:58,966 Into the metrics module from scikit learn.