1 00:00:00,333 --> 00:00:01,033 Perfect. 2 00:00:01,033 --> 00:00:04,933 The next step is to, of course, create an instance of this class, 3 00:00:04,933 --> 00:00:09,400 which will be an object representing exactly that naive base model. 4 00:00:09,700 --> 00:00:10,500 And so there we go. 5 00:00:10,500 --> 00:00:13,200 We're going to call this as usual classifier 6 00:00:13,200 --> 00:00:16,500 in order to be coherent with the next sections 7 00:00:16,500 --> 00:00:20,000 of this implementation, and mostly so that we don't have to change anything. 8 00:00:20,000 --> 00:00:20,400 Right. 9 00:00:20,400 --> 00:00:21,933 Because then we call this 10 00:00:21,933 --> 00:00:25,800 classifier variable to predict the results and visualize the result. 11 00:00:26,100 --> 00:00:27,866 So there we go classifier here. 12 00:00:27,866 --> 00:00:32,400 And then and then we're going to call this Gaussian NB class 13 00:00:32,633 --> 00:00:36,033 in order to create indeed this Naive Bayes model. 14 00:00:36,600 --> 00:00:37,733 Okay. Perfect. 15 00:00:37,733 --> 00:00:39,933 And now you know how to finish this. 16 00:00:39,933 --> 00:00:44,300 We need to take our classifier again, from which we're going to call the fit 17 00:00:44,633 --> 00:00:49,166 method, which will train this classifier on the training set 18 00:00:49,400 --> 00:00:54,733 composed of indeed x train and Y train. 19 00:00:55,133 --> 00:00:55,500 Right? 20 00:00:55,500 --> 00:00:57,600 I hope you did it even faster than me, 21 00:00:57,600 --> 00:01:00,333 because indeed, this is exactly the same as before. 22 00:01:00,333 --> 00:01:02,533 And now you are also independent 23 00:01:02,533 --> 00:01:06,233 and know how to find the information you need in the API. 24 00:01:07,033 --> 00:01:09,533 Okay, great. So once again there we go. 25 00:01:09,533 --> 00:01:14,100 That implementation is over and we ready to get the final result. 26 00:01:14,100 --> 00:01:17,233 And mostly we're ready to find out if we can beat 27 00:01:17,233 --> 00:01:20,233 the record accuracy you know if 93%. 28 00:01:20,300 --> 00:01:22,766 So I can't wait to see. So let's do it right now. 29 00:01:22,766 --> 00:01:27,000 Let's click this folder button to upload a data set. 30 00:01:27,100 --> 00:01:27,433 Right. 31 00:01:27,433 --> 00:01:29,366 We have to do it in order to train. 32 00:01:29,366 --> 00:01:31,666 Indeed that Naive Bayes model on the training set. 33 00:01:31,666 --> 00:01:36,466 Right now your notebook is connecting to a runtime to enable file browsing. 34 00:01:36,733 --> 00:01:39,866 And once again, in this second we should get the up button. 35 00:01:39,866 --> 00:01:42,666 There we go. So we're going to click it. 36 00:01:42,666 --> 00:01:45,800 And here we are in the kernel SVM folder. 37 00:01:45,900 --> 00:01:48,266 So let me show you the path once again please. 38 00:01:48,266 --> 00:01:49,733 Fine your whole machinery. 39 00:01:49,733 --> 00:01:54,000 It is that folder which you could download in the previous tutorial if not already. 40 00:01:54,400 --> 00:01:57,233 And then inside we're going to go to port three classification. 41 00:01:57,233 --> 00:01:59,033 Then section 18. 42 00:01:59,033 --> 00:02:00,600 We're making good progress here. 43 00:02:00,600 --> 00:02:05,200 Naive Bayes then Python and then social network 44 00:02:05,200 --> 00:02:08,633 ads dot csv okay we press okay. 45 00:02:08,633 --> 00:02:10,800 Here we have the data set. All good. 46 00:02:10,800 --> 00:02:16,800 And now now we can run everything in order to get indeed our new result. 47 00:02:16,800 --> 00:02:19,033 So let's do this run all. 48 00:02:19,033 --> 00:02:21,166 And now all the cells are running. 49 00:02:21,166 --> 00:02:22,800 And especially this one. There we go. 50 00:02:22,800 --> 00:02:26,066 We now have our Gaussian Naive Bayes model. 51 00:02:26,433 --> 00:02:30,100 And well let's see the results one by one starting with this one. 52 00:02:30,100 --> 00:02:32,633 So that's the prediction of a single result. 53 00:02:32,633 --> 00:02:34,266 You know that first customer of the test 54 00:02:34,266 --> 00:02:38,700 set of age 30 and estimated salary $87,000. 55 00:02:38,933 --> 00:02:41,966 And remember in the white says the real outcome 56 00:02:41,966 --> 00:02:45,000 was zero meaning that this customer didn't buy the SUV. 57 00:02:45,200 --> 00:02:48,466 And that's the prediction, which is indeed the correct prediction. 58 00:02:48,900 --> 00:02:51,766 And then when predicting the test results, well, 59 00:02:51,766 --> 00:02:55,200 once again we see that we have a lot of correct predictions. 60 00:02:55,200 --> 00:02:56,233 All this is correct. 61 00:02:56,233 --> 00:02:57,300 All this is correct. 62 00:02:57,300 --> 00:02:59,400 This is our first incorrect prediction. 63 00:02:59,400 --> 00:03:02,866 Another one here and then another one here. 64 00:03:02,866 --> 00:03:05,933 All correct. I'll correct another one here. 65 00:03:06,300 --> 00:03:09,566 Oh I'm not sure we're going to beat actually that accuracy 66 00:03:09,566 --> 00:03:14,333 we seem to have more than seven incorrect predictions at first I'm not sure. 67 00:03:14,333 --> 00:03:15,600 But let's see, let's see. 68 00:03:15,600 --> 00:03:18,233 Well, that's exactly what we're about to find out right now. 69 00:03:18,233 --> 00:03:19,333 So are you ready? 70 00:03:19,333 --> 00:03:23,166 The question is, will we beat the accuracy of 93%? 71 00:03:23,166 --> 00:03:27,966 Which was the best accuracy resulting from both Kilian and any kernel SVM. 72 00:03:28,200 --> 00:03:30,833 And so let's see what we get with Naive Bayes. 73 00:03:30,833 --> 00:03:33,833 And no, unfortunately we don't beat the record. 74 00:03:34,100 --> 00:03:38,633 Indeed, the accuracy we get with that Naive Bayes model is 90%, 75 00:03:38,633 --> 00:03:42,666 which beats indeed logistic regression, but does equally the same 76 00:03:42,666 --> 00:03:46,000 as the classic SVM model with a linear kernel. 77 00:03:46,666 --> 00:03:50,566 All right, but still, I think we will get nice visualization results. 78 00:03:50,633 --> 00:03:53,733 That's the code cell where we visualize the training set results. 79 00:03:53,733 --> 00:03:56,266 And well, this time we got the results pretty fast. 80 00:03:56,266 --> 00:03:59,100 You can see that the cell is already executed. 81 00:03:59,100 --> 00:04:02,700 So let's see I can show you that indeed the Naive Bayes 82 00:04:02,700 --> 00:04:04,833 curve is pretty nice right? 83 00:04:04,833 --> 00:04:07,800 It is a nice smooth curve right? 84 00:04:07,800 --> 00:04:10,533 That catches well indeed these 85 00:04:10,533 --> 00:04:13,500 green customers here, you know, the ones who in reality. 86 00:04:13,500 --> 00:04:18,400 But the SUV in the right green region but unfortunately you know it's separated 87 00:04:18,400 --> 00:04:19,066 the two classes. 88 00:04:19,066 --> 00:04:19,233 You know 89 00:04:19,233 --> 00:04:23,100 with these two prediction regions a bit large, you know, not very precisely. 90 00:04:23,100 --> 00:04:26,533 And that's why we don't get an accuracy that is higher than 93%. 91 00:04:26,766 --> 00:04:30,066 But still, you know, we made a progress with respect to logistic regression 92 00:04:30,066 --> 00:04:33,066 because indeed remember that for logistic regression 93 00:04:33,200 --> 00:04:36,700 these green customers here could not be well classified. 94 00:04:36,766 --> 00:04:38,300 Right. Because of the straight line. 95 00:04:38,300 --> 00:04:40,366 They fall in the red region. 96 00:04:40,366 --> 00:04:44,266 And here in our Naive Bayes implementation, well these green customers 97 00:04:44,266 --> 00:04:47,200 fall in the right region. So at least it corrected that. 98 00:04:47,200 --> 00:04:51,666 But since here there is kind of a large margin, well these green customers 99 00:04:51,666 --> 00:04:55,633 which were correctly classified with the kernel SVM, 100 00:04:55,633 --> 00:04:58,933 if you remember right, these are the training set results. 101 00:04:59,300 --> 00:05:01,266 Right. So I'm talking about these ones here. 102 00:05:01,266 --> 00:05:04,466 They are correctly classified except these two and the third one. 103 00:05:04,466 --> 00:05:06,400 But clearly with Naive Bayes. 104 00:05:06,400 --> 00:05:09,000 Well they fall into the wrong region okay. 105 00:05:09,000 --> 00:05:13,566 But anyway, at least you see the prediction curve of the Naive Bayes model. 106 00:05:13,566 --> 00:05:18,166 And mostly you see that Naive Bayes model is clearly a nonlinear classifier. 107 00:05:18,400 --> 00:05:22,800 And, you know, in some other situations, because the naive Bayes is causing less 108 00:05:22,800 --> 00:05:27,300 overfitting, well, in some situations it will do better than your other models. 109 00:05:27,300 --> 00:05:30,033 That's why it's always very important to try all of them. 110 00:05:30,033 --> 00:05:33,333 And remember, at the end of this part, I will actually deploy 111 00:05:33,333 --> 00:05:37,100 all our models with new simplified code templates for each model. 112 00:05:37,100 --> 00:05:39,000 You know, without all the prints and everything, 113 00:05:39,000 --> 00:05:42,566 in order to deploy them in a flashlight so that we can quickly figure out 114 00:05:42,733 --> 00:05:46,266 what is the best classification model for any data set. 115 00:05:46,266 --> 00:05:48,300 You know, regardless of the number of features.