1 00:00:00,100 --> 00:00:02,066 And now we're eventually at the final round 2 00:00:02,066 --> 00:00:06,400 of our logistic regression model, which is the fun part, because we're going 3 00:00:06,400 --> 00:00:09,733 to visualize the training set results as well as the test set results. 4 00:00:10,200 --> 00:00:13,800 And basically, we're going to make a plot that will represent 5 00:00:13,800 --> 00:00:16,800 everything that is happening with our logistic regression model. 6 00:00:16,933 --> 00:00:18,566 So let's make this chart. 7 00:00:18,566 --> 00:00:20,100 I'm going to do like in Python. 8 00:00:20,100 --> 00:00:23,966 I'm going to take the code that I've prepared and paste it here. 9 00:00:24,533 --> 00:00:27,300 I will select and execute the code. 10 00:00:27,300 --> 00:00:28,966 And then for those of you who are interested 11 00:00:28,966 --> 00:00:32,666 in the idea behind the code, that is how the code plots the graph. 12 00:00:32,933 --> 00:00:35,933 I will explain how the code works at the end of this tutorial. 13 00:00:36,433 --> 00:00:38,733 Okay, so right now I'm going to paste the code. 14 00:00:38,733 --> 00:00:39,200 Here we go. 15 00:00:39,200 --> 00:00:42,200 Then this code contains 15 lines of code. 16 00:00:42,266 --> 00:00:43,800 That was also the case in Python. 17 00:00:43,800 --> 00:00:47,933 And that's because it's the exact same code based on the same idea. 18 00:00:48,266 --> 00:00:49,600 And I will explain this idea. 19 00:00:49,600 --> 00:00:53,033 That means I will explain how this code works at the end of this tutorial. 20 00:00:53,700 --> 00:00:57,600 But for now let's select this code and let's see what happens. 21 00:00:57,900 --> 00:01:00,900 So command and Control plus enter to execute. 22 00:01:01,266 --> 00:01:02,700 And let's wait. 23 00:01:02,700 --> 00:01:04,933 It's taking a little time. That's normal. 24 00:01:04,933 --> 00:01:06,100 But it's going to plot something. 25 00:01:08,566 --> 00:01:09,766 There it is. 26 00:01:09,766 --> 00:01:11,966 There it is. That's the plot. 27 00:01:11,966 --> 00:01:12,400 Okay. 28 00:01:12,400 --> 00:01:13,566 So for those of you 29 00:01:13,566 --> 00:01:17,300 who follow the Python tutorial, we obtain exactly the same plot. 30 00:01:17,866 --> 00:01:19,033 So let's describe this plot. 31 00:01:19,033 --> 00:01:22,033 I'm going to enlarge this. 32 00:01:22,566 --> 00:01:22,900 Okay. 33 00:01:22,900 --> 00:01:25,933 So let's analyze this graph step by step. 34 00:01:26,466 --> 00:01:29,000 First let's focus on all the points. Here. 35 00:01:29,000 --> 00:01:30,900 We can see that we have some red points. 36 00:01:30,900 --> 00:01:32,900 And some green points. 37 00:01:32,900 --> 00:01:35,600 So all these points that we see on this graph 38 00:01:35,600 --> 00:01:38,666 are the observation points of our training set. 39 00:01:39,066 --> 00:01:42,466 That is these are all the users of the social network 40 00:01:42,833 --> 00:01:45,833 that were selected to go to the training set. 41 00:01:45,833 --> 00:01:50,466 And each of these users here is characterized by its age. 42 00:01:50,466 --> 00:01:55,033 Here on the x axis and its estimated salary here on the y axis. 43 00:01:55,566 --> 00:01:59,366 Now we can see that there are some red points here 44 00:01:59,766 --> 00:02:02,366 and some green points here. 45 00:02:02,366 --> 00:02:06,633 The red points are the training set observations for which the dependent 46 00:02:06,633 --> 00:02:12,033 variable purchased is equal to zero, and the green points are the training set. 47 00:02:12,033 --> 00:02:16,633 Observations for which the dependent variable purchased is equal to one. 48 00:02:17,366 --> 00:02:20,866 That means that the red points here are the users 49 00:02:20,866 --> 00:02:24,033 who didn't buy the SUV, and the green points. 50 00:02:24,033 --> 00:02:27,300 Here are the users who bought who actually bought the SUV. 51 00:02:27,600 --> 00:02:32,033 So now, as a first step of analysis, let's give an interpretation 52 00:02:32,266 --> 00:02:35,066 of what we observe here with these users. 53 00:02:35,066 --> 00:02:35,366 Okay. 54 00:02:35,366 --> 00:02:41,633 So first we can see that the users who are young with a low estimated salary. 55 00:02:41,633 --> 00:02:44,766 So these users here actually didn't buy the SUV. 56 00:02:45,433 --> 00:02:50,766 Then if we look at the users who are older and with a higher estimated salary, 57 00:02:50,933 --> 00:02:54,266 well, we can see that most of these users actually bought the SUV. 58 00:02:54,400 --> 00:02:58,366 And it actually makes sense because the SUV is more like a family car 59 00:02:58,466 --> 00:02:59,700 and therefore more interesting 60 00:02:59,700 --> 00:03:02,700 for these older users here with a high estimated salary. 61 00:03:03,033 --> 00:03:06,033 Besides, we can also see that some older people, 62 00:03:06,233 --> 00:03:09,833 even with a low estimated salary, actually bought the SUV, 63 00:03:10,266 --> 00:03:12,866 because we can see that we have some green points here 64 00:03:12,866 --> 00:03:16,166 that correspond to an age above the average. 65 00:03:16,166 --> 00:03:17,900 The average is here, 66 00:03:17,900 --> 00:03:21,166 but an estimated salary below the average because the average is here. 67 00:03:22,000 --> 00:03:22,900 Okay. 68 00:03:22,900 --> 00:03:26,433 So these guys, these older guys, although they have a low estimated salary, 69 00:03:26,433 --> 00:03:29,866 actually bought the SUV, probably because they've been saving up some money. 70 00:03:29,866 --> 00:03:32,233 Or maybe they finished paying of their mortgage. 71 00:03:32,233 --> 00:03:36,133 I don't know, but what's for sure is that they couldn't resist buying this 72 00:03:36,566 --> 00:03:40,666 very cool luxury SUV offered at a ridiculously low price. 73 00:03:42,000 --> 00:03:42,566 And on the other 74 00:03:42,566 --> 00:03:46,166 hand, we can also see that there are some young people here 75 00:03:46,333 --> 00:03:50,600 with a high estimated salary who actually bought the SUV. 76 00:03:50,933 --> 00:03:53,033 You know, maybe because it's a very cool SUV 77 00:03:53,033 --> 00:03:56,000 and they want to impress their friends and take them into road trips. 78 00:03:56,000 --> 00:03:58,900 Or maybe they already have a family, I don't know. 79 00:03:58,900 --> 00:04:01,400 Anyway, they bought the SUV. 80 00:04:01,400 --> 00:04:05,800 Actually, there are a lot of buyers, so this must be a very cool and cheap SUV. 81 00:04:06,400 --> 00:04:09,733 Okay, and now what is the goal of classification? 82 00:04:10,033 --> 00:04:12,233 Now we're talking machine learning. 83 00:04:12,233 --> 00:04:14,700 Why are we making some classifiers? 84 00:04:14,700 --> 00:04:16,733 And what will classifiers do. 85 00:04:16,733 --> 00:04:20,533 And least what are we trying to make them do for this particular business problem. 86 00:04:21,100 --> 00:04:23,700 Well the goal here is to classify 87 00:04:23,700 --> 00:04:26,700 the right users into the right categories. 88 00:04:26,700 --> 00:04:29,700 That is, we are trying to make a classifier 89 00:04:29,866 --> 00:04:32,966 that will catch the right users into the right category, 90 00:04:33,300 --> 00:04:37,166 which are yes, they buy the SUV and no, they don't buy the SUV. 91 00:04:37,466 --> 00:04:41,100 And we represented the way our classifier catches these users 92 00:04:41,400 --> 00:04:44,400 by plotting what I called the prediction regions. 93 00:04:44,633 --> 00:04:48,866 And so the prediction regions are the two regions that we see on this graph. 94 00:04:49,200 --> 00:04:52,466 This red one here and this green one here. 95 00:04:52,733 --> 00:04:56,600 And the red prediction region is the region where our classifier catches 96 00:04:56,600 --> 00:04:59,600 all the users that don't buy the SUV. 97 00:04:59,800 --> 00:05:01,266 And the green prediction region 98 00:05:01,266 --> 00:05:05,100 is the region where a classifier catches all the users that buy the SUV. 99 00:05:05,400 --> 00:05:06,566 But be careful. 100 00:05:06,566 --> 00:05:11,200 This is according to the classifier that is for each user of this 101 00:05:11,200 --> 00:05:13,266 red prediction region here. 102 00:05:13,266 --> 00:05:17,400 Our logistic regression classifier predicts that the user doesn't 103 00:05:17,400 --> 00:05:18,833 buy the SUV. 104 00:05:18,833 --> 00:05:21,833 And for each user of this green prediction region here, 105 00:05:21,966 --> 00:05:25,533 our classifier will predict that the user buys the SUV. 106 00:05:25,933 --> 00:05:29,400 So that makes an awesome tool because for each 107 00:05:29,400 --> 00:05:33,033 new user of the social network, well, a logistic regression 108 00:05:33,033 --> 00:05:37,333 classifier will tell based on its age and its estimated salary. 109 00:05:37,733 --> 00:05:41,100 If this user belongs to this red prediction region here, 110 00:05:41,333 --> 00:05:44,633 and therefore doesn't buy the SUV, or if this user belongs 111 00:05:44,633 --> 00:05:48,066 to this green prediction region here and therefore buys the SUV.