1 00:00:00,066 --> 00:00:02,500 Hello and welcome to this art tutorial. 2 00:00:02,500 --> 00:00:05,666 So now that we fitted our logistic regression to the training set, 3 00:00:05,966 --> 00:00:09,533 we are going to predict the test set results using this classifier 4 00:00:09,533 --> 00:00:12,266 that we just built in the previous tutorial. 5 00:00:12,266 --> 00:00:14,766 So usually we can do this in one step. 6 00:00:14,766 --> 00:00:17,700 But for logistic regression we're going to do it in two steps. 7 00:00:17,700 --> 00:00:21,500 And we are more interested in having the zero and one predictions 8 00:00:21,833 --> 00:00:23,633 rather than the probabilities. 9 00:00:23,633 --> 00:00:25,733 So first let's take anyway the probabilities. 10 00:00:25,733 --> 00:00:28,733 So we are going to call them prob 11 00:00:29,100 --> 00:00:31,966 pred which is the vector of the, 12 00:00:31,966 --> 00:00:36,166 which will be the vector of the predicted probabilities of our test set. 13 00:00:36,333 --> 00:00:38,500 Observations by the classifier. 14 00:00:38,500 --> 00:00:41,500 So prop read equals predict. 15 00:00:41,766 --> 00:00:45,300 So we're going to use this predict function to predict the probabilities 16 00:00:45,566 --> 00:00:48,566 based on our GLM classifier. 17 00:00:48,700 --> 00:00:52,833 And speaking of classifier this is our first argument in the predict function. 18 00:00:52,833 --> 00:00:55,066 So here we input classifier 19 00:00:56,500 --> 00:00:57,000 come up. 20 00:00:57,000 --> 00:01:00,000 Then the next argument is type. 21 00:01:00,933 --> 00:01:05,366 And for logistic regression we should choose the response type 22 00:01:05,800 --> 00:01:09,166 because that will give us the probabilities listed in a single vector. 23 00:01:09,666 --> 00:01:11,900 So that's what we want. 24 00:01:11,900 --> 00:01:16,100 And then we have to specify the new observations that we want to predict. 25 00:01:16,633 --> 00:01:19,200 So in our case these new observations 26 00:01:19,200 --> 00:01:22,200 are the test set observations. 27 00:01:22,200 --> 00:01:25,366 So new data equals test set. 28 00:01:26,866 --> 00:01:29,866 And we're just going to remove the last column 29 00:01:29,900 --> 00:01:33,966 of the test set because the last column is the dependent variable. 30 00:01:33,966 --> 00:01:35,900 And that's what we want to predict. 31 00:01:35,900 --> 00:01:38,100 So we will only take test set minus three. 32 00:01:38,100 --> 00:01:42,033 That is the two independent variables age and salary of the test set. 33 00:01:42,800 --> 00:01:45,333 All right so let's just summarize this line. 34 00:01:45,333 --> 00:01:48,733 We are predicting the test set observations 35 00:01:49,133 --> 00:01:53,100 using our classifier which is the logistic regression classifier. 36 00:01:54,266 --> 00:01:54,600 All right. 37 00:01:54,600 --> 00:01:58,566 But that returns the probabilities because as you can see if I select this line 38 00:01:58,566 --> 00:02:01,566 and execute here if I type prob 39 00:02:02,400 --> 00:02:05,400 pred in the console I will get. 40 00:02:06,233 --> 00:02:09,300 As you can see, I will get the probabilities of each of my test 41 00:02:09,300 --> 00:02:11,033 set observations. 42 00:02:11,033 --> 00:02:14,033 So here, for example, that's let's go to our test set. 43 00:02:14,533 --> 00:02:17,533 The first observation has index two. 44 00:02:18,066 --> 00:02:19,833 The real result is zero. 45 00:02:19,833 --> 00:02:23,533 That means that the user number two didn't buy the SUV. 46 00:02:23,966 --> 00:02:26,833 And now if we go back to logistic regression 47 00:02:26,833 --> 00:02:29,866 we can see that for this user number two 48 00:02:30,033 --> 00:02:33,033 prob pred returns 0.0 16. 49 00:02:33,100 --> 00:02:34,900 So what is this probability exactly. 50 00:02:34,900 --> 00:02:39,600 This probability is the probability that the dependent variable is equal to one. 51 00:02:39,900 --> 00:02:43,866 That is the probability that the user buys the SUV. 52 00:02:44,266 --> 00:02:47,266 So here, since it's very small, it's 0.0 16. 53 00:02:47,600 --> 00:02:50,566 That means that the classifier predicts a very low 54 00:02:50,566 --> 00:02:54,000 probability of the dependent variable being equal to one. 55 00:02:54,266 --> 00:02:57,066 That means that it predicts that the user number two 56 00:02:57,066 --> 00:02:59,833 has very low chance to buy the SUV. 57 00:03:00,966 --> 00:03:03,666 Therefore, in short, prop pred returns 58 00:03:03,666 --> 00:03:08,400 the predicted probabilities that the user will buy the SUV. 59 00:03:09,033 --> 00:03:10,400 But we don't really want that. 60 00:03:10,400 --> 00:03:13,400 We would rather have the zero and one result. 61 00:03:13,466 --> 00:03:14,933 And to do this, it's very simple. 62 00:03:14,933 --> 00:03:17,933 We're just going to do some kind of a conversion. 63 00:03:19,033 --> 00:03:21,900 So we're going to create a vector of predicted results. 64 00:03:21,900 --> 00:03:24,900 That's we're going to call Y pred 65 00:03:26,066 --> 00:03:27,000 equals. 66 00:03:27,000 --> 00:03:30,300 And then we're just going to use ifelse to transform 67 00:03:30,300 --> 00:03:33,533 those probabilities into zero and one results. 68 00:03:33,800 --> 00:03:36,800 So here I'm going to write if else. 69 00:03:36,900 --> 00:03:39,900 So the first argument of if else is the condition. 70 00:03:40,300 --> 00:03:43,300 And the condition is going to be from 71 00:03:43,833 --> 00:03:47,066 pred over 0.5, 72 00:03:48,766 --> 00:03:51,666 because if prob pred is larger than 0.5, 73 00:03:51,666 --> 00:03:56,100 that means that the user has more chance to buy the SUV. 74 00:03:56,700 --> 00:03:59,700 So in that case, that means that we want to predict one. 75 00:03:59,766 --> 00:04:03,466 And actually the second argument of the if else function is the result 76 00:04:03,466 --> 00:04:06,433 you want to take when this condition is true. 77 00:04:06,433 --> 00:04:08,100 So here we're going to put a one. 78 00:04:09,066 --> 00:04:09,900 All right. 79 00:04:09,900 --> 00:04:13,133 And the last argument is the result we want to return 80 00:04:13,166 --> 00:04:14,866 if this condition is false. 81 00:04:14,866 --> 00:04:16,133 And if this condition is false 82 00:04:16,133 --> 00:04:20,333 that means that the predicted probability is lower than 0.5. 83 00:04:20,533 --> 00:04:23,833 That means that the user has less chance to buy the SUV, 84 00:04:24,266 --> 00:04:27,266 and therefore it's going to be a zero. 85 00:04:27,400 --> 00:04:28,233 And that's it. 86 00:04:28,233 --> 00:04:32,333 Now you're going to see if I select this line and execute. 87 00:04:33,266 --> 00:04:34,233 All right. 88 00:04:34,233 --> 00:04:37,100 Let's look at wipe Red. Now 89 00:04:37,100 --> 00:04:39,933 wipe red as you can see 90 00:04:39,933 --> 00:04:42,066 I only have zeros or one. 91 00:04:42,066 --> 00:04:44,900 I no longer have the probabilities. 92 00:04:44,900 --> 00:04:48,866 And remember the real result for our user two was zero. 93 00:04:48,866 --> 00:04:51,866 So that's what happened in reality. 94 00:04:52,900 --> 00:04:55,566 And here our classifier predicted zero. 95 00:04:55,566 --> 00:04:57,633 So that's a correct prediction. 96 00:04:57,633 --> 00:05:01,400 And that's because remember before the probability was very low. 97 00:05:01,600 --> 00:05:03,566 So it was below 0.5. 98 00:05:03,566 --> 00:05:05,566 And therefore it returned zero. 99 00:05:06,800 --> 00:05:07,200 All right. 100 00:05:07,200 --> 00:05:09,566 So we predicted our test results. 101 00:05:09,566 --> 00:05:10,233 That's a good thing. 102 00:05:10,233 --> 00:05:14,733 Now we are going to evaluate this prediction. 103 00:05:15,000 --> 00:05:19,100 Thanks to the confusion matrix that we going to build in the next tutorial. 104 00:05:19,400 --> 00:05:20,866 So I look forward to seeing you there. 105 00:05:20,866 --> 00:05:23,866 And until then enjoy machine learning.