1 00:00:00,366 --> 00:00:00,666 Okay. 2 00:00:00,666 --> 00:00:03,100 So let's go back to our classifier. 3 00:00:03,100 --> 00:00:05,000 So the pre-processing is done. 4 00:00:05,000 --> 00:00:07,933 And now we're going to create our classifier. 5 00:00:07,933 --> 00:00:09,533 Everything should go well. 6 00:00:09,533 --> 00:00:12,733 I don't have to select this because mine was already selected. 7 00:00:13,233 --> 00:00:16,500 So let's press Command and Control plus enter to execute. 8 00:00:16,966 --> 00:00:20,166 And here it is classifier created. 9 00:00:20,166 --> 00:00:21,833 All good. Okay. 10 00:00:21,833 --> 00:00:26,466 And now we can make some predictions of new observations which are here to test 11 00:00:26,466 --> 00:00:27,000 that observation. 12 00:00:27,000 --> 00:00:30,100 So let's do it. Why pred wipe. Right. 13 00:00:30,133 --> 00:00:30,633 All right. 14 00:00:30,633 --> 00:00:34,500 So let's have a look at why pred y pred. 15 00:00:36,033 --> 00:00:39,033 So these are all the predictions of the test set observations. 16 00:00:39,033 --> 00:00:40,966 That is, for each user of the test 17 00:00:40,966 --> 00:00:44,300 set, our classifier predicts if the user buys the SUV or not. 18 00:00:44,800 --> 00:00:45,733 So let's compare 19 00:00:45,733 --> 00:00:49,500 the predictions with the truth which are contained in the test set. 20 00:00:50,000 --> 00:00:54,033 So this column tells the truth about whether the users 21 00:00:54,033 --> 00:00:57,033 but yes or no, the SUV and this are the predictions. 22 00:00:57,033 --> 00:01:00,800 That is our SVM classifier predicted whether each user bought 23 00:01:00,800 --> 00:01:02,100 yes or no the SUV. 24 00:01:02,100 --> 00:01:04,633 So let's look at, for example, the well. 25 00:01:04,633 --> 00:01:04,933 Okay. 26 00:01:04,933 --> 00:01:08,400 So all these first guys here where predicted not to buy the SUV. 27 00:01:08,400 --> 00:01:11,400 We have all zeros until the 103. 28 00:01:11,433 --> 00:01:14,933 But here as you can see we have some guys who actually bought 29 00:01:14,933 --> 00:01:19,333 the SUV in reality is to 18, 19, 20 and 22. 30 00:01:19,400 --> 00:01:21,966 So our classifier predicted that they didn't buy it. 31 00:01:21,966 --> 00:01:24,400 And the truth is that they actually bought it. 32 00:01:24,400 --> 00:01:26,400 So that's incorrect predictions. 33 00:01:26,400 --> 00:01:30,166 But let's look at the incorrect predictions in a more efficient way 34 00:01:30,766 --> 00:01:32,866 by looking at the confusion matrix. 35 00:01:32,866 --> 00:01:35,933 So let's select this line here and execute. 36 00:01:36,133 --> 00:01:40,200 And now let's find out about the real number of incorrect predictions. 37 00:01:40,600 --> 00:01:43,600 So let's type CM here in the council and press enter. 38 00:01:43,933 --> 00:01:48,300 And wow that's actually a quite large number of incorrect predictions. 39 00:01:48,633 --> 00:01:52,500 And by the way we don't obtain the same results as in Python because we have some 40 00:01:52,533 --> 00:01:53,933 random factors in the model. 41 00:01:53,933 --> 00:01:55,666 And here we didn't specify C. 42 00:01:55,666 --> 00:01:57,600 So you might actually have some different results. 43 00:01:57,600 --> 00:01:59,766 But the idea is here. 44 00:01:59,766 --> 00:02:03,133 So okay so let's look at now the graph to see how it's doing. 45 00:02:03,600 --> 00:02:08,000 So for those of you who actually didn't watch the Python tutorial about SVM, 46 00:02:08,533 --> 00:02:11,766 a good exercise is to try to guess what's going to happen. 47 00:02:11,966 --> 00:02:14,566 That is what are going to be the prediction regions. 48 00:02:14,566 --> 00:02:17,033 What is going to be the prediction boundary? 49 00:02:17,033 --> 00:02:19,566 What do you guess you will see? 50 00:02:19,566 --> 00:02:21,100 So I will let you think for a second. 51 00:02:21,100 --> 00:02:22,500 You can pause on the video. 52 00:02:22,500 --> 00:02:24,133 And right now I'm going to tell you. 53 00:02:24,133 --> 00:02:27,766 So as you notice we chose a linear kernel, 54 00:02:27,766 --> 00:02:32,033 which means that our SVM classifier is a linear classifier. 55 00:02:32,366 --> 00:02:37,133 So as I explained in the logistic regression tutorials on Python and R, 56 00:02:37,600 --> 00:02:42,500 well, a linear classifier in 2D dimensional space is a straight line. 57 00:02:42,533 --> 00:02:46,466 So here I'm telling you this right now we are going to get a straight line. 58 00:02:46,500 --> 00:02:50,333 Not to get any disappointment because I know we improved our model with the K. 59 00:02:50,400 --> 00:02:54,300 And before we obtained a good prediction boundary that got, 60 00:02:54,600 --> 00:02:59,166 you know, those users on the bottom right corner that actually bought the SUV, 61 00:02:59,266 --> 00:03:03,066 but who were incorrect predictions for logistic regression? 62 00:03:03,066 --> 00:03:07,033 Well, here it's also going to be some incorrect prediction for the linear 63 00:03:07,033 --> 00:03:10,166 SVM because it's actually a linear classifier. 64 00:03:10,500 --> 00:03:12,133 So let's look at the results right now. 65 00:03:12,133 --> 00:03:15,133 Select this and press Command and Control. 66 00:03:15,300 --> 00:03:16,800 Press enter to execute. 67 00:03:17,966 --> 00:03:20,033 And here are the results. 68 00:03:20,033 --> 00:03:22,566 Okay. So as you can see that's exactly what I just told you. 69 00:03:22,566 --> 00:03:26,900 You know those users here with a lower salary and the higher age. 70 00:03:27,066 --> 00:03:30,066 Well, they actually bought the SUV in reality, 71 00:03:30,266 --> 00:03:33,266 because the points are green, the points are the real observations. 72 00:03:33,500 --> 00:03:38,833 But they fell into the red region here because since the linear classifier 73 00:03:38,833 --> 00:03:42,700 is a straight line, it couldn't, you know, make some kind of a curve here 74 00:03:42,700 --> 00:03:46,400 to catch all the red points into the right place. 75 00:03:46,700 --> 00:03:50,866 And therefore it cut some green points, putting them in the red region. 76 00:03:51,400 --> 00:03:54,400 So yeah, that's exactly the same as for logistic regression. 77 00:03:54,666 --> 00:03:57,133 Sorry about the disappointment, but don't worry. 78 00:03:57,133 --> 00:04:00,966 In the next section we will introduce a new kind of classifier 79 00:04:00,966 --> 00:04:05,300 which will be well the kernel SVM with a different kernel than linear kernel. 80 00:04:05,566 --> 00:04:07,033 It's going to be a Gaussian kernel. 81 00:04:07,033 --> 00:04:09,000 Or even we can try some more kernels. 82 00:04:09,000 --> 00:04:11,166 Well you can practice that yourself. 83 00:04:11,166 --> 00:04:13,166 It can be very good practice for you. 84 00:04:13,166 --> 00:04:15,766 But here. Yes, it's a linear classifier. 85 00:04:15,766 --> 00:04:18,766 So basically it's the same as logistic regression. 86 00:04:19,266 --> 00:04:22,800 And if we look at the test set results is going to be the same. 87 00:04:22,833 --> 00:04:23,866 Let's have a look. 88 00:04:25,033 --> 00:04:26,933 And here is the test set okay. 89 00:04:26,933 --> 00:04:28,466 So yeah same thing here. 90 00:04:28,466 --> 00:04:32,566 We have some users of higher age and lower estimated salary. 91 00:04:32,566 --> 00:04:37,033 Who bought the SUV but fell into the red region. 92 00:04:37,033 --> 00:04:39,900 Again because our class for you is a straight line. 93 00:04:39,900 --> 00:04:43,800 And that's the best you could do to classify those two points. 94 00:04:43,800 --> 00:04:46,600 And place them into the corresponding category. 95 00:04:46,600 --> 00:04:50,100 So a few incorrect predictions here, a few incorrect predictions here. 96 00:04:50,500 --> 00:04:52,966 And if you want, you can count the number of incorrect predictions. 97 00:04:52,966 --> 00:04:55,266 That is the number of green points in the red region 98 00:04:55,266 --> 00:04:58,266 plus the number of red points in the green region. 99 00:04:58,433 --> 00:05:02,900 And you will count the number of incorrect predictions we found in the confusion 100 00:05:02,900 --> 00:05:05,900 matrix, which was 20 incorrect predictions. 101 00:05:06,266 --> 00:05:10,200 But that's great that we have this result, because that gives us the motivation 102 00:05:10,200 --> 00:05:13,433 to improve our classifier, improve our model, 103 00:05:13,700 --> 00:05:15,766 and that's what we're going to do in the next section. 104 00:05:15,766 --> 00:05:18,800 So I look forward to seeing you in the next section 105 00:05:18,800 --> 00:05:22,800 and show you how we can substantially improve our classification model. 106 00:05:23,033 --> 00:05:25,033 So I look forward to showing you the next level. 107 00:05:25,033 --> 00:05:26,866 And until then, enjoy machine learning.