1 00:00:00,200 --> 00:00:00,866 All right. 2 00:00:00,866 --> 00:00:03,600 So what we're going to do now is we're going 3 00:00:03,600 --> 00:00:06,633 to get that whole code, you know, from the original file. 4 00:00:06,900 --> 00:00:11,433 We're going to paste it inside or new implementation. 5 00:00:12,233 --> 00:00:13,066 There we go. 6 00:00:13,066 --> 00:00:16,466 And we're going to do the same for the to set. 7 00:00:17,400 --> 00:00:19,766 And then I will actually try 8 00:00:19,766 --> 00:00:23,100 not to show you the test it now but let's get the code. 9 00:00:23,533 --> 00:00:27,000 Let's paste that below in a new code cell. 10 00:00:27,300 --> 00:00:29,766 And now let's enjoy the results. 11 00:00:29,766 --> 00:00:33,400 So what we're going to do is first execute the cell. 12 00:00:33,833 --> 00:00:37,333 It's going to take a little while actually because you know the step is oh point 13 00:00:37,333 --> 00:00:42,233 25 meaning we will get a very very dense grid as I've just explained. 14 00:00:42,500 --> 00:00:44,066 And therefore, you know, the predict 15 00:00:44,066 --> 00:00:47,933 method is applied on each of these dense points of the grid. 16 00:00:48,233 --> 00:00:52,633 And that's why they're actually a lot, lot, lot of predictions to compute. 17 00:00:52,633 --> 00:00:54,533 And that's why it's taking a little time. 18 00:00:54,533 --> 00:00:55,666 But there you go. It's coming. 19 00:00:55,666 --> 00:00:56,233 Here we go. 20 00:00:56,233 --> 00:00:58,333 So not that long. That's good. 21 00:00:58,333 --> 00:01:00,900 We just got the results of the training set. 22 00:01:00,900 --> 00:01:03,766 Let's also plot now the results of the test set. 23 00:01:03,766 --> 00:01:07,000 And then I will make my comments on the results. 24 00:01:07,300 --> 00:01:07,533 All right. 25 00:01:07,533 --> 00:01:09,200 So we can just let it run 26 00:01:09,200 --> 00:01:14,500 and start by observing and interpreting the results of the training set. 27 00:01:15,466 --> 00:01:15,733 All right. 28 00:01:15,733 --> 00:01:18,966 So just to recap you have to understand four things in this plot. 29 00:01:19,200 --> 00:01:22,600 Is that all the points that you see here, whether they're red or green, 30 00:01:22,800 --> 00:01:27,766 are the real customers, and they're real results in the training set. 31 00:01:28,133 --> 00:01:31,200 The green points correspond, of course, to the customers 32 00:01:31,200 --> 00:01:34,766 who bought the previous SUVs, and the red points correspond, 33 00:01:34,766 --> 00:01:38,700 of course, to the customers who didn't buy any previous SUV. 34 00:01:39,500 --> 00:01:43,233 And then the other two things to understand is that those colored regions 35 00:01:43,233 --> 00:01:47,733 here, you know, this red region and the green region are the prediction regions. 36 00:01:47,733 --> 00:01:51,300 So this region is the region where our model predicts 37 00:01:51,300 --> 00:01:53,400 that the customer didn't buy the SUV. 38 00:01:53,400 --> 00:01:56,900 And this region is the region where our largest regression 39 00:01:56,900 --> 00:02:00,400 model predicts that the customers but the previous SUV. 40 00:02:00,866 --> 00:02:04,800 And so in order to figure out where the correct predictions are 41 00:02:04,800 --> 00:02:09,533 and the incorrect predictions are, well, the correct predictions are where we have 42 00:02:09,766 --> 00:02:13,000 some observation points with the same color 43 00:02:13,166 --> 00:02:16,833 as the prediction region, and the incorrect predictions are 44 00:02:16,833 --> 00:02:20,833 where we have some observation points with a color that is different 45 00:02:20,966 --> 00:02:22,100 than the prediction region. 46 00:02:22,100 --> 00:02:25,466 So for example, this customer here who in reality 47 00:02:25,566 --> 00:02:28,766 but the SUV, you know, corresponding to one here 48 00:02:29,200 --> 00:02:33,333 is actually an incorrect prediction because it falls into the wrong region. 49 00:02:33,333 --> 00:02:35,266 The red region, and vice versa. 50 00:02:35,266 --> 00:02:39,600 This customer here, who in reality didn't buy the SUV because it corresponds 51 00:02:39,600 --> 00:02:44,200 to zero, well, it's the wrong prediction because it falls into the green region 52 00:02:44,200 --> 00:02:47,633 where customers are predicted to buy the SUV. 53 00:02:48,166 --> 00:02:49,266 And then finally, 54 00:02:49,266 --> 00:02:53,466 what's the most interesting in all this is the prediction boundary. 55 00:02:53,800 --> 00:02:56,766 As you understood, the prediction boundary is 56 00:02:56,766 --> 00:03:00,466 the boundary between those two prediction regions. 57 00:03:00,466 --> 00:03:03,500 You know, the green prediction region and the red prediction region. 58 00:03:03,800 --> 00:03:07,066 It is where your classifier separates 59 00:03:07,300 --> 00:03:10,800 basically the two classes the class one and the class zero. 60 00:03:11,500 --> 00:03:14,800 And now you have to understand something very, very important. 61 00:03:15,133 --> 00:03:18,800 It is the fact, you know, it is the observation that the prediction 62 00:03:18,800 --> 00:03:21,933 curve of the logistic regression model is actually 63 00:03:21,933 --> 00:03:24,933 a straight line for one specific reason. 64 00:03:25,000 --> 00:03:29,466 It is because the logistic regression model is a linear classifier. 65 00:03:29,766 --> 00:03:33,566 For any linear classifier, the prediction boundary 66 00:03:33,566 --> 00:03:36,966 or the prediction curve will always be a straight line. 67 00:03:37,166 --> 00:03:38,433 You know, in two dimensions. 68 00:03:38,433 --> 00:03:40,966 In three dimensions it will be a straight plan. 69 00:03:40,966 --> 00:03:44,166 Okay, so that's what we get for linear classifier.