1 00:00:00,100 --> 00:00:01,633 And now let's do the whole thing 2 00:00:01,633 --> 00:00:04,633 and you're going to see that we won't get the error. 3 00:00:05,000 --> 00:00:07,466 So let's select this. Okay. 4 00:00:07,466 --> 00:00:12,200 So that's all the data pre-processing steps with now the encoding included. 5 00:00:12,566 --> 00:00:15,200 So execute. All right. 6 00:00:15,200 --> 00:00:18,766 Now we're going to fit the naive base model to the training set. 7 00:00:19,900 --> 00:00:20,866 Execute. 8 00:00:20,866 --> 00:00:22,033 All good. 9 00:00:22,033 --> 00:00:25,200 Now we're going to create our vector of predictions y pred. 10 00:00:25,866 --> 00:00:26,366 Okay. 11 00:00:26,366 --> 00:00:31,900 You're going to see that if I type y pred here we will not have the predictions. 12 00:00:32,266 --> 00:00:36,433 So we can compare it to y set which is the third column of the test set. 13 00:00:37,033 --> 00:00:38,700 Okay. So we can compare them. 14 00:00:38,700 --> 00:00:39,600 But let's not do this. 15 00:00:39,600 --> 00:00:42,333 Let's just get to the point where I want to show you that. 16 00:00:42,333 --> 00:00:46,800 Now a confusion matrix is going to be created without any error. 17 00:00:46,833 --> 00:00:49,766 So let's select this execute. 18 00:00:49,766 --> 00:00:54,300 And now as you can see the confusion matrix is created without any problem. 19 00:00:54,866 --> 00:00:58,666 So and now we can have a look to see the number of incorrect predictions 20 00:00:58,666 --> 00:01:02,433 which is 7.7 equals 14 incorrect predictions. 21 00:01:03,166 --> 00:01:03,733 Not bad. 22 00:01:03,733 --> 00:01:06,266 Out of the 100 observations of the test set. 23 00:01:06,266 --> 00:01:07,500 Okay, great. 24 00:01:07,500 --> 00:01:11,600 And now we're finally getting to the fun part, which is to visualize the results. 25 00:01:11,833 --> 00:01:14,333 Okay. So I'll just you can pause on the video. 26 00:01:14,333 --> 00:01:15,766 And now I'll just select this. 27 00:01:17,033 --> 00:01:18,633 So let's see command and control. 28 00:01:18,633 --> 00:01:20,666 Press enter to execute. 29 00:01:20,666 --> 00:01:24,300 And there is our Naive Bayes graphic results. 30 00:01:24,633 --> 00:01:29,166 Isn't it beautiful how this prediction boundary is a smooth curve? 31 00:01:29,233 --> 00:01:32,633 Classifying quite well the data sets. 32 00:01:32,900 --> 00:01:34,666 That is a data set of observations. 33 00:01:34,666 --> 00:01:36,300 Nonlinear. Separable. 34 00:01:36,300 --> 00:01:38,566 It's kind of like the kernel SVM curve. 35 00:01:38,566 --> 00:01:42,900 You know it's a beautiful smooth curve that manages to catch those green users 36 00:01:42,900 --> 00:01:46,200 that couldn't be caught by linear classifiers 37 00:01:46,200 --> 00:01:47,900 because we had the straight line 38 00:01:47,900 --> 00:01:50,533 and therefore it couldn't catch the green users here 39 00:01:50,533 --> 00:01:53,533 and put them in the green category that were in the red category. 40 00:01:53,700 --> 00:01:58,133 But thanks to this curve, we can see that it's making less incorrect predictions. 41 00:01:58,133 --> 00:02:01,733 But still some like those 123 here. 42 00:02:02,266 --> 00:02:06,300 Regarding this users here, this all the users with a low estimated salary 43 00:02:07,000 --> 00:02:10,066 that were incorrectly predicted by the linear classifiers. 44 00:02:10,566 --> 00:02:13,733 And then, however, it's still making some few mistakes here. 45 00:02:13,766 --> 00:02:17,733 We would have liked to have a lower curve here, like the curve starting from here. 46 00:02:18,133 --> 00:02:21,800 But that's what naive Bayes could do here, and that's already quite a good job. 47 00:02:22,333 --> 00:02:25,566 So now let's see what it does on the test set results. 48 00:02:26,133 --> 00:02:29,000 Here we have the test result code. 49 00:02:29,000 --> 00:02:30,900 Let's execute it. 50 00:02:30,900 --> 00:02:32,733 And here is the test set. 51 00:02:32,733 --> 00:02:35,333 If the execution of this code is taking too much time, 52 00:02:35,333 --> 00:02:37,633 you can try to take a lower resolution. 53 00:02:37,633 --> 00:02:38,633 Because right now you can see that 54 00:02:38,633 --> 00:02:42,966 we have a very high resolution with this by 0.01 step here. 55 00:02:43,033 --> 00:02:46,033 We cannot see the pixels here thanks to this resolution. 56 00:02:46,500 --> 00:02:50,500 If you take a no point one resolution, the script will execute much faster. 57 00:02:50,500 --> 00:02:52,466 But then you will see the pixels point. 58 00:02:52,466 --> 00:02:53,700 So it's as you want. 59 00:02:53,700 --> 00:02:56,866 So the test set results are actually not bad as well. 60 00:02:57,366 --> 00:02:59,966 It did a pretty good job classifying this green uses 61 00:02:59,966 --> 00:03:04,900 here to the right green category, but still some incorrect predictions. 62 00:03:04,900 --> 00:03:06,033 Resist. 63 00:03:06,033 --> 00:03:09,033 Since those green points here stay in the red region. 64 00:03:09,600 --> 00:03:12,633 All right, so that's the best graphic results. 65 00:03:12,633 --> 00:03:14,100 I hope you enjoyed what you saw. 66 00:03:14,100 --> 00:03:17,400 We're going to have other different surprises of other classifiers. 67 00:03:17,400 --> 00:03:20,600 You'll see that we will get very different kind of prediction boundaries 68 00:03:20,600 --> 00:03:24,600 when we look at the decision trees classifiers and the random forest. 69 00:03:24,600 --> 00:03:25,566 Guess first. 70 00:03:25,566 --> 00:03:27,566 So I look forward to showing that to you. 71 00:03:27,566 --> 00:03:29,400 And until then, enjoy machine learning.