1 00:00:00,066 --> 00:00:01,033 All right, my friends, 2 00:00:01,033 --> 00:00:04,866 here we are at the final step of this implementation 3 00:00:04,866 --> 00:00:08,966 and actually the most exciting one, because this is the step where 4 00:00:08,966 --> 00:00:13,166 we're going to visualize it on a nice 2D plot, the prediction curve 5 00:00:13,166 --> 00:00:16,900 and the prediction regions of the logistic regression model. 6 00:00:17,466 --> 00:00:17,966 All right. 7 00:00:17,966 --> 00:00:21,366 So more specifically, what we're about to plot 8 00:00:21,800 --> 00:00:24,600 is a two dimensional plot with therefore 9 00:00:24,600 --> 00:00:27,766 two axes x and y on the x axis. 10 00:00:27,766 --> 00:00:32,466 You will have the first feature corresponding to the h, and on the y axis 11 00:00:32,466 --> 00:00:36,833 you'll have the second feature corresponding to the estimated salary, 12 00:00:37,233 --> 00:00:41,200 and therefore each of the observation points you will see on the 2D 13 00:00:41,200 --> 00:00:44,666 plot will correspond to a specific customer. 14 00:00:44,966 --> 00:00:47,933 It will either be a customer of the training set. 15 00:00:47,933 --> 00:00:50,400 You know, on the plot of the training set results, 16 00:00:50,400 --> 00:00:54,333 or a customer of the test set on the plot of the test result. 17 00:00:54,800 --> 00:00:58,133 And what is most interesting to see in 18 00:00:58,133 --> 00:01:02,966 this plot will be the prediction regions, meaning the regions 19 00:01:03,100 --> 00:01:07,133 where our logistic regression model predicts the class zero, 20 00:01:07,133 --> 00:01:11,533 meaning the customers didn't buy the SUV, and the other region 21 00:01:11,533 --> 00:01:15,366 where our logistic regression model predicts the class one, 22 00:01:15,366 --> 00:01:18,366 meaning the customer, but the SUV. 23 00:01:18,366 --> 00:01:21,366 And lastly, what will be really interesting to see 24 00:01:21,433 --> 00:01:25,533 is the curve separating these two regions. 25 00:01:25,533 --> 00:01:29,666 You know, the region of the prediction zero and the region of the predictions 26 00:01:29,666 --> 00:01:33,966 one. And this is exactly how we are going to see the difference 27 00:01:33,966 --> 00:01:38,066 between linear classifiers and nonlinear classifiers. 28 00:01:38,066 --> 00:01:42,033 So here we only starting with one classification model logistic regression. 29 00:01:42,266 --> 00:01:44,466 So we won't compare that yet. 30 00:01:44,466 --> 00:01:48,900 But you will see in the next sections of this part that the prediction 31 00:01:48,900 --> 00:01:52,833 boundary between these two prediction regions will be different 32 00:01:52,966 --> 00:01:56,700 depending on whether or not your classifier is linear. 33 00:01:56,866 --> 00:01:59,166 All right. So I can't wait to show you this. 34 00:01:59,166 --> 00:02:02,900 And let's start first by visualizing these training 35 00:02:02,900 --> 00:02:05,966 set and test results for the logistic regression model. 36 00:02:06,700 --> 00:02:07,266 All right. 37 00:02:07,266 --> 00:02:11,266 So the code to visualize this is actually pretty advanced. 38 00:02:11,400 --> 00:02:14,600 And not only it is pretty advanced, but also you will 39 00:02:14,600 --> 00:02:17,600 probably never use it again in your career. 40 00:02:17,600 --> 00:02:20,600 Or let's say you will never have to implement that again. 41 00:02:20,700 --> 00:02:21,466 Why is that? 42 00:02:21,466 --> 00:02:23,933 It's because in your career you will mostly work 43 00:02:23,933 --> 00:02:27,300 with data sets, having many features, you know, more than two. 44 00:02:27,733 --> 00:02:31,633 And here the only reason why we have a data set of two features 45 00:02:31,833 --> 00:02:36,333 is so that we can be able to visualize indeed, well, these prediction regions 46 00:02:36,333 --> 00:02:37,566 and prediction boundary, 47 00:02:37,566 --> 00:02:41,900 because indeed, in order to visualize this, we need maximum two features, 48 00:02:41,900 --> 00:02:46,100 because one feature corresponds to one dimension in this plot. 49 00:02:46,500 --> 00:02:50,400 So what I suggest is that we don't waste too much time, 50 00:02:50,533 --> 00:02:54,200 you know, understanding the whole code and re-implemented ourselves. 51 00:02:54,500 --> 00:02:57,300 Because really, I'm going to show it to you right away, you know, 52 00:02:57,300 --> 00:03:00,500 on the original logistic regression implementation, 53 00:03:00,933 --> 00:03:03,166 you will see that the code is pretty advanced. 54 00:03:03,166 --> 00:03:07,700 You know, it's not like plotting a regression curve like we did in part two. 55 00:03:08,133 --> 00:03:10,233 So that's the test results. 56 00:03:10,233 --> 00:03:12,066 Let me show you the training set results. 57 00:03:12,066 --> 00:03:12,466 All right. 58 00:03:12,466 --> 00:03:13,533 So that's the code. 59 00:03:13,533 --> 00:03:18,066 You see it's uses a lot of tricks to plot all these observation points. 60 00:03:18,066 --> 00:03:20,400 Prediction regions and prediction boundary. 61 00:03:20,400 --> 00:03:23,400 So if you want to have a look at it and understand it fine. 62 00:03:23,400 --> 00:03:28,266 But really for the others it's totally okay if we don't cover this code in detail 63 00:03:28,300 --> 00:03:30,933 because this is only for training purposes. 64 00:03:30,933 --> 00:03:32,166 Just so that I can show you 65 00:03:32,166 --> 00:03:35,600 the differences between linear classifiers and nonlinear classifiers, 66 00:03:35,800 --> 00:03:39,600 and you will probably never use that again in your future machine learning project. 67 00:03:39,933 --> 00:03:43,166 However, what I will do just now is explain how it's done. 68 00:03:43,500 --> 00:03:47,066 Basically, what we do is we create, as you can see, a grid 69 00:03:47,266 --> 00:03:51,133 which is basically this frame here containing all the edges of your features 70 00:03:51,133 --> 00:03:54,133 and all the estimated salaries, you know, the ranges, 71 00:03:54,266 --> 00:03:57,300 and you create this grid with a high density, meaning that 72 00:03:57,300 --> 00:04:02,933 the pixels of this grid are not separated one by one, but every oh point 25. 73 00:04:02,933 --> 00:04:05,400 So here, for example, for the age, it goes this way. 74 00:04:05,400 --> 00:04:11,366 It goes from 10 to 10.20 5 to 10.5 to 10.75 to 11, etcetera. 75 00:04:11,366 --> 00:04:15,400 Up to 69, 69.25, 69.5, 76 00:04:15,400 --> 00:04:20,100 69.75, 70 okay, and same for the estimated salary. 77 00:04:20,300 --> 00:04:25,800 It goes from 20,000, then 20,000.25, 20,000.5, etc. 78 00:04:25,800 --> 00:04:31,100 up to somewhere around 149,000 149,000.25. 79 00:04:31,100 --> 00:04:36,000 You see, so resulting in having super dense points inside this grid 80 00:04:36,400 --> 00:04:39,700 and then the trick, you know, what we did is not only 81 00:04:39,700 --> 00:04:43,133 we plotted all the real observation points in the grid. 82 00:04:43,133 --> 00:04:47,266 So all the points that you see here are the customers of either your training sets 83 00:04:47,266 --> 00:04:50,666 and then later on your test set, the green points are, of course, 84 00:04:50,666 --> 00:04:54,766 the customers who bought the SUV, you know, represented by one here. 85 00:04:55,166 --> 00:04:58,400 And the red points are, of course, the customers who didn't buy 86 00:04:58,566 --> 00:05:01,433 the SUV represented by zero here. 87 00:05:01,433 --> 00:05:05,066 Okay, so all the points are your observation points. 88 00:05:05,066 --> 00:05:06,166 Your customers. 89 00:05:06,166 --> 00:05:10,033 And then so the trick in order to plot the prediction regions 90 00:05:10,033 --> 00:05:14,300 and therefore that prediction boundary here separating the two regions 91 00:05:14,800 --> 00:05:17,433 is to apply to predict method 92 00:05:17,433 --> 00:05:20,900 onto each of these dense points in the grid, 93 00:05:21,166 --> 00:05:24,800 so that all the dense points here, you know, in this region 94 00:05:24,900 --> 00:05:29,600 were actually predicted to be zero, meaning all the customers, 95 00:05:29,600 --> 00:05:34,800 you know, other customers inside this region are predicted not to buy the SUV. 96 00:05:35,100 --> 00:05:38,266 And all the observation points in this green 97 00:05:38,266 --> 00:05:41,733 region are actually predicted to by the SUV. 98 00:05:42,066 --> 00:05:43,800 So you see how this works. That's the trick. 99 00:05:43,800 --> 00:05:45,900 And then really you don't have to understand 100 00:05:45,900 --> 00:05:50,000 all the techniques used to implement this, because once again, 101 00:05:50,000 --> 00:05:53,766 you will probably never have to implement that kind of code in your career.