1 00:00:00,500 --> 00:00:03,466 Run. All this will run all the cells, including this one, 2 00:00:03,466 --> 00:00:06,900 which will get us a new model with a new non-linear kernel. 3 00:00:06,900 --> 00:00:09,933 Where is it? Right here. Kernel equals RBF. 4 00:00:10,200 --> 00:00:11,233 And so now. 5 00:00:11,233 --> 00:00:15,433 Now it's very interesting because this time we have a non-linear classifier. 6 00:00:15,633 --> 00:00:19,500 And therefore I look forward to showing you the visualization results 7 00:00:19,500 --> 00:00:21,833 at the end, you know, both for the training set and tested. 8 00:00:21,833 --> 00:00:25,300 Because you will see that we will get a super beautiful curve 9 00:00:25,466 --> 00:00:27,666 that will catch the right observation point. 10 00:00:27,666 --> 00:00:30,866 You know, the right red customers, in the right red region 11 00:00:31,066 --> 00:00:34,200 and the right green customers in the right green prediction region. 12 00:00:34,466 --> 00:00:37,966 Not all of them, of course, but the ones which couldn't get 13 00:00:37,966 --> 00:00:41,833 cut properly by the previous, you know, linear models 14 00:00:41,833 --> 00:00:45,566 such as the support vector machine or the logistic regression model. 15 00:00:45,600 --> 00:00:47,700 Right. Remember this straight line? 16 00:00:47,700 --> 00:00:50,400 The straight line could not catch these customer 17 00:00:50,400 --> 00:00:53,566 properly in the right green region and these ones as well. 18 00:00:53,566 --> 00:00:56,933 And so you will see that now we will get much better results. 19 00:00:56,933 --> 00:00:58,000 But first because you know 20 00:00:58,000 --> 00:01:01,833 right now I think it should be you know loading running the code. Yes. 21 00:01:01,833 --> 00:01:03,800 So we don't have yet the final result. 22 00:01:03,800 --> 00:01:04,466 But that's okay. 23 00:01:04,466 --> 00:01:06,666 Let's observe you know the different predictions. 24 00:01:06,666 --> 00:01:11,600 So first our kernel SVM was perfectly able to predict the right result here. 25 00:01:11,600 --> 00:01:17,233 Remember with that customer of age 30 and estimated salary $87,000, 26 00:01:17,466 --> 00:01:20,600 who in reality didn't buy indeed any SUV. 27 00:01:20,766 --> 00:01:22,800 And here we get the same prediction. 28 00:01:22,800 --> 00:01:23,933 So that's all good. 29 00:01:23,933 --> 00:01:26,700 Then let's look at the test results. 30 00:01:26,700 --> 00:01:29,700 All right. So let's scroll back up a bit. 31 00:01:29,933 --> 00:01:31,300 So that's the test results. 32 00:01:31,300 --> 00:01:34,766 And well once again we have many correct predictions. 33 00:01:35,033 --> 00:01:36,366 An incorrect one here. 34 00:01:36,366 --> 00:01:38,766 The real result is zero meaning that in reality 35 00:01:38,766 --> 00:01:41,366 this particular customer didn't buy the SUV. 36 00:01:41,366 --> 00:01:44,400 But our model predicted that discussing the but the SUV. 37 00:01:44,733 --> 00:01:46,933 Then we have the same incorrect prediction here. 38 00:01:46,933 --> 00:01:49,000 Then all correct, all correct. 39 00:01:49,000 --> 00:01:50,033 It looks very good. 40 00:01:50,033 --> 00:01:52,266 Here we have an incorrect prediction, the inverse one. 41 00:01:52,266 --> 00:01:54,866 In reality the customer, but the SUV. 42 00:01:54,866 --> 00:01:58,333 But our model predicted that the customer didn't buy the SUV. 43 00:01:58,566 --> 00:02:00,133 Then all correct are correct. 44 00:02:00,133 --> 00:02:01,100 Are correct. 45 00:02:01,100 --> 00:02:04,033 It looks very very very good. Right? 46 00:02:04,033 --> 00:02:06,900 I really wonder if we can beat this accuracy. 47 00:02:06,900 --> 00:02:07,600 And that's 48 00:02:07,600 --> 00:02:11,566 what we were about to find out right now with, you know, the confusion matrix. 49 00:02:11,866 --> 00:02:12,566 Are you ready. 50 00:02:12,566 --> 00:02:13,766 Are you going to beat that. 51 00:02:13,766 --> 00:02:17,566 So for best accuracy we got which was I remind 52 00:02:17,600 --> 00:02:22,200 93% from the k nearest neighbor model. 53 00:02:22,200 --> 00:02:24,633 Here it is 93%. 54 00:02:24,633 --> 00:02:27,266 And now let's see if we managed to beat it. 55 00:02:27,266 --> 00:02:28,400 All right. Let's scroll down. 56 00:02:28,400 --> 00:02:32,400 And the accuracy is 93% again okay. 57 00:02:32,566 --> 00:02:36,766 So we didn't beat it but we reached the same level as the K. 58 00:02:36,766 --> 00:02:38,833 And then and now let's see if we get the curve. 59 00:02:38,833 --> 00:02:41,600 No still not all right. You know it's still running. 60 00:02:41,600 --> 00:02:44,566 That's because we have a small step here oh point 25. 61 00:02:44,566 --> 00:02:48,600 If you want this to run faster you can actually increase the step to 0.5 62 00:02:48,600 --> 00:02:49,400 or even one. 63 00:02:49,400 --> 00:02:52,300 But you will get a less smooth curve, you know, less nice curve. 64 00:02:52,300 --> 00:02:54,033 But that's okay. 65 00:02:54,033 --> 00:02:58,433 So you know, just in case it will get us the results in a few minutes or two. 66 00:02:58,433 --> 00:02:59,233 Long time. 67 00:02:59,233 --> 00:03:03,600 Well, let's maybe have a look at them in the original implementation 68 00:03:03,600 --> 00:03:05,266 which is right here. Okay. 69 00:03:05,266 --> 00:03:06,800 Just let's do that. 70 00:03:06,800 --> 00:03:08,533 Let's observe the final results. 71 00:03:08,533 --> 00:03:09,933 And here they are. 72 00:03:09,933 --> 00:03:14,466 As you can see, we get a super nice prediction boundary curve 73 00:03:14,733 --> 00:03:18,733 that catches this time perfectly well those green customers 74 00:03:18,733 --> 00:03:20,300 who couldn't get cut properly. 75 00:03:20,300 --> 00:03:23,300 But the linear classifiers, you know with the straight line 76 00:03:23,333 --> 00:03:25,466 this time it is catching them perfectly well. 77 00:03:25,466 --> 00:03:26,400 Except these ones, of course, 78 00:03:26,400 --> 00:03:29,500 because they're trapped in the middle of many red customers. 79 00:03:29,500 --> 00:03:32,600 But you know, these ones which couldn't be cut properly 80 00:03:32,600 --> 00:03:37,166 by the logistic regression model or, you know, the SVM classifier, right? 81 00:03:37,166 --> 00:03:40,766 If we have a look at it again, remember that the 82 00:03:41,500 --> 00:03:44,733 customers here couldn't get cut properly because of that straight line. 83 00:03:44,733 --> 00:03:47,200 But now that we have a curve, oh there we go. 84 00:03:47,200 --> 00:03:48,166 We have the results. 85 00:03:48,166 --> 00:03:51,166 Now that we have this curve, well, these same customers, 86 00:03:51,166 --> 00:03:52,633 you know, these are exactly the same one. 87 00:03:52,633 --> 00:03:56,766 These same customers are now cut properly in the green region. 88 00:03:57,200 --> 00:03:59,900 And same for these ones actually, you know, these ones are cuts 89 00:03:59,900 --> 00:04:03,533 in the right green region by very short and same for these ones. 90 00:04:03,533 --> 00:04:06,233 And here you know, these ones are incorrect predictions. 91 00:04:06,233 --> 00:04:08,133 But once again that's totally fine. 92 00:04:08,133 --> 00:04:13,100 We want to avoid overfitting anyway because indeed what's mostly important 93 00:04:13,366 --> 00:04:16,833 are the results of the test set, which we still don't have. 94 00:04:16,833 --> 00:04:20,300 It is still running, but we should give them in a second. 95 00:04:20,300 --> 00:04:23,366 Or you know, let's just have a look at them right here. 96 00:04:23,366 --> 00:04:27,500 So now the challenge is to see whether we have the same results on new 97 00:04:27,500 --> 00:04:31,233 observations, meaning on observations with which our model wasn't trained. 98 00:04:31,633 --> 00:04:35,366 And that's exactly the observations that said, you know, the 100 customers 99 00:04:35,366 --> 00:04:39,166 of the tested and well, well here now it looks even better, 100 00:04:39,300 --> 00:04:40,300 which makes sense, right? 101 00:04:40,300 --> 00:04:42,633 Because we had 93% accuracy. 102 00:04:42,633 --> 00:04:45,200 And since they are 100 customers in a test set, 103 00:04:45,200 --> 00:04:49,500 we actually have 93 correct predictions and seven incorrect predictions. 104 00:04:49,500 --> 00:04:50,733 And we can actually count them here. 105 00:04:50,733 --> 00:04:52,533 You know, the seven incorrect predictions. 106 00:04:52,533 --> 00:04:56,566 You have the two first ones here, the red customers filling in the wrong green 107 00:04:56,566 --> 00:04:57,633 prediction region. 108 00:04:57,633 --> 00:05:01,333 Then these two here, two other red customers falling in the wrong green 109 00:05:01,333 --> 00:05:06,433 prediction regions, and then these three green customers falling in the wrong red 110 00:05:06,600 --> 00:05:07,500 prediction region. 111 00:05:07,500 --> 00:05:09,000 But that's totally fine. 112 00:05:09,000 --> 00:05:11,666 Once again, thanks to this beautiful curve here, 113 00:05:11,666 --> 00:05:15,966 we managed to catch these customers in the right region, which couldn't be the case 114 00:05:15,966 --> 00:05:20,233 previously because of the straight line resulting from the linear classifier. 115 00:05:20,433 --> 00:05:21,400 But this time, there we go. 116 00:05:21,400 --> 00:05:24,933 We have a nonlinear classifier, which is the reason why we have this curve 117 00:05:25,166 --> 00:05:28,166 and that's why we get a better accuracy. 118 00:05:28,366 --> 00:05:28,966 All right. 119 00:05:28,966 --> 00:05:31,966 So I'm at the same time very happy and excited 120 00:05:31,966 --> 00:05:35,000 because this curve is beautiful and we got great results. 121 00:05:35,166 --> 00:05:39,500 But I'm still hoping that with a future classification model 122 00:05:39,500 --> 00:05:45,600 like the three ones we have left, we can beat that 93% accuracy, 123 00:05:45,866 --> 00:05:49,333 which we got so far as the best ones by two models, 124 00:05:49,500 --> 00:05:52,966 the K-nearest neighbors and the kernel SVM. 125 00:05:53,466 --> 00:05:56,600 So I look forward to seeing you in this next practical activity. 126 00:05:56,600 --> 00:05:58,800 And until then, enjoy machine learning.