1 00:00:00,566 --> 00:00:03,300 All right, so now you understand it better. 2 00:00:03,300 --> 00:00:07,300 But remember that these are the training set results. 3 00:00:07,366 --> 00:00:12,233 Therefore all these customers that we see here are actually in the training set. 4 00:00:12,266 --> 00:00:17,366 Therefore these are customers with which or logistic regression model was trained. 5 00:00:17,500 --> 00:00:20,200 And therefore that's kind of easy to provide such results 6 00:00:20,200 --> 00:00:23,166 because these are exactly the observations of the training. 7 00:00:23,166 --> 00:00:27,600 But now what we would like to see is how our logistic regression model 8 00:00:27,600 --> 00:00:31,566 was able to perform on new observations, meaning 9 00:00:31,566 --> 00:00:34,566 on the observations of the test set, the customers of the test set. 10 00:00:34,766 --> 00:00:39,300 Because indeed the customers of the test set are new customers 11 00:00:39,300 --> 00:00:42,766 with which our logistic regression model wasn't trained. 12 00:00:43,033 --> 00:00:46,066 And so we have to see if our logistic regression 13 00:00:46,066 --> 00:00:49,033 model was still able to separate. 14 00:00:49,033 --> 00:00:53,000 Well, the two classes, meaning the customers who bought SUV 15 00:00:53,000 --> 00:00:56,433 and the customers who didn't buy the SUV, even despite the fact 16 00:00:56,433 --> 00:00:59,833 that these are new customers on which the model wasn't trained. 17 00:01:00,033 --> 00:01:05,200 And that's exactly what we're about to see now when visualizing the test results. 18 00:01:05,366 --> 00:01:07,600 We already executed the cell here. 19 00:01:07,600 --> 00:01:09,900 And so these are the test results. 20 00:01:09,900 --> 00:01:14,166 And still our logistic regression model was perfectly able to separate. 21 00:01:14,166 --> 00:01:16,666 Well the two classes zero. 22 00:01:16,666 --> 00:01:18,233 You know all those red points here. 23 00:01:18,233 --> 00:01:20,400 And one all those green points. 24 00:01:20,400 --> 00:01:23,833 There are still some incorrect predictions of course, like this customer 25 00:01:23,833 --> 00:01:28,200 who in reality didn't buy the new the brand new beautiful SUV. 26 00:01:28,533 --> 00:01:30,033 But was predicted to buy it. 27 00:01:30,033 --> 00:01:33,066 And a few incorrect predictions here of the other class. 28 00:01:33,233 --> 00:01:36,366 Meaning these customers who in reality but the SUV 29 00:01:36,633 --> 00:01:39,600 but were predicted not to because they fall in the red region. 30 00:01:40,733 --> 00:01:41,400 All right. 31 00:01:41,400 --> 00:01:43,266 And so how can we conclude here? 32 00:01:43,266 --> 00:01:46,166 What should we conclude and what are the takeaways 33 00:01:46,166 --> 00:01:49,533 we should get for our future class fixation models. 34 00:01:49,833 --> 00:01:54,000 Well, first the logistic regression model does a very good job 35 00:01:54,000 --> 00:01:55,933 at separating our two classes 36 00:01:55,933 --> 00:01:59,600 and therefore at predicting whether the customers but the SUV. 37 00:01:59,933 --> 00:02:03,700 But we actually would hope to build a model 38 00:02:03,900 --> 00:02:06,900 that has less prediction errors. 39 00:02:07,033 --> 00:02:08,200 And how can we build one. 40 00:02:08,200 --> 00:02:11,900 What would we need to get, you know, as the prediction curve in order 41 00:02:11,900 --> 00:02:16,033 not to predict incorrectly all these wrong predictions here. 42 00:02:16,033 --> 00:02:18,333 You know, all these customers here. 43 00:02:18,333 --> 00:02:21,133 Well, we actually would need a prediction boundary 44 00:02:21,133 --> 00:02:23,300 that is something else than a straight line. 45 00:02:23,300 --> 00:02:27,466 Because even if you try to rotate your prediction line, for example, 46 00:02:27,466 --> 00:02:31,766 to be like that, well it will still catch many incorrect predictions. 47 00:02:32,033 --> 00:02:36,700 So what we would need to get, you know, ultimately is some kind of curve, 48 00:02:36,700 --> 00:02:40,100 some kind of prediction curve that goes this way, catches 49 00:02:40,100 --> 00:02:43,100 all the red points, you know, all the red customers here 50 00:02:43,100 --> 00:02:47,100 and then go around like this in order to catch all the red points. 51 00:02:47,100 --> 00:02:48,800 The red customers, and leave 52 00:02:48,800 --> 00:02:52,600 all the green points to green customers inside the green region. 53 00:02:53,033 --> 00:02:56,433 And well, as you might guess, this is what we might be able 54 00:02:56,433 --> 00:02:59,266 to get with nonlinear classifiers. 55 00:02:59,266 --> 00:03:03,666 I won't tell you more now, but be ready for some even more performance 56 00:03:03,833 --> 00:03:06,700 classification models that managed to separate. 57 00:03:06,700 --> 00:03:09,000 Even better, these two classes. 58 00:03:09,000 --> 00:03:10,100 So there we go. 59 00:03:10,100 --> 00:03:12,066 That was the big part of the job. You did it. 60 00:03:12,066 --> 00:03:13,000 And now follow me 61 00:03:13,000 --> 00:03:16,800 in the next sections to implement the other classification models. 62 00:03:17,033 --> 00:03:18,966 And until then, enjoy machine learning.