1 00:00:00,133 --> 00:00:00,966 Hello my friends. 2 00:00:00,966 --> 00:00:03,233 Welcome to this new practical activity. 3 00:00:03,233 --> 00:00:06,733 On this time, the Naive Bayes classification model, 4 00:00:06,900 --> 00:00:12,300 we already implemented logistic regression k nearest neighbors, SVM and kernel SVM. 5 00:00:12,500 --> 00:00:15,866 And so far the best accuracy we've got with these four models 6 00:00:16,066 --> 00:00:19,433 is oh point 93, which we got with both 7 00:00:19,566 --> 00:00:22,566 K-nearest neighbors and kernel SVM. 8 00:00:22,600 --> 00:00:25,900 And now we're about to implement a new classification model, 9 00:00:25,900 --> 00:00:28,966 the Naive Bayes one, a very well known and widely used. 10 00:00:29,100 --> 00:00:31,466 So you definitely want to have it in a toolkit. 11 00:00:31,466 --> 00:00:34,366 And so the big question is will we beat 12 00:00:34,366 --> 00:00:37,366 that best accuracy of 0.03. 13 00:00:37,466 --> 00:00:38,600 And now let's do this. 14 00:00:38,600 --> 00:00:42,700 Let's start the implementation of Naive Bayes. 15 00:00:42,700 --> 00:00:44,300 So we're going to go into this folder 16 00:00:44,300 --> 00:00:46,300 and we're going to start with Python as usual. 17 00:00:46,300 --> 00:00:49,166 And inside this folder you will find two files 18 00:00:49,166 --> 00:00:52,166 the Naive Bayes implementation in the Ipynb format 19 00:00:52,333 --> 00:00:57,700 and the same social network as data set, which I will quickly re-explain. 20 00:00:57,700 --> 00:01:00,866 So this is a dataset containing 400 customers. 21 00:01:00,866 --> 00:01:04,366 Each row here represents the customer, and for each customer 22 00:01:04,400 --> 00:01:07,733 we get the two features age and estimated salary. 23 00:01:07,833 --> 00:01:12,000 With which we are going to predict the dependent variable purchased, 24 00:01:12,200 --> 00:01:16,500 which tells if yes or no, the customer but the SUV. 25 00:01:16,500 --> 00:01:19,733 So zero means that indeed the customer didn't buy the SUV, 26 00:01:19,733 --> 00:01:22,733 and one means that the customer but the SUV. 27 00:01:22,733 --> 00:01:25,933 All right, so we're going to train a new classification model to understand 28 00:01:26,100 --> 00:01:26,866 the correlations 29 00:01:26,866 --> 00:01:31,100 between these two features age and salary, and that dependent variable purchased 30 00:01:31,200 --> 00:01:35,133 in order to predict the customers that will buy the SUV. 31 00:01:35,133 --> 00:01:38,066 And once we get these predictions, we will target these customers 32 00:01:38,066 --> 00:01:41,733 on social networks with some beautiful ad of this beautiful car. 33 00:01:42,066 --> 00:01:43,900 All right, so that's the same story. 34 00:01:43,900 --> 00:01:47,500 And now we're going to start our implementation. 35 00:01:48,000 --> 00:01:51,333 And I can't wait to see if we're going to beat the accuracy. 36 00:01:51,333 --> 00:01:55,333 And I can also to show you the visualization results at the end. 37 00:01:55,333 --> 00:01:55,600 All right. 38 00:01:55,600 --> 00:01:58,566 So right now it is loading and laying out the notebook. 39 00:01:58,566 --> 00:01:59,400 And here it is. 40 00:01:59,400 --> 00:02:01,366 Here is a Naive Bayes implementation 41 00:02:01,366 --> 00:02:04,500 still resulting from the same classification template 42 00:02:04,500 --> 00:02:07,033 which we made in the first section of this part three. 43 00:02:07,033 --> 00:02:09,066 When implementing logistic regression. 44 00:02:09,066 --> 00:02:11,933 So basically all the cells here are the same 45 00:02:11,933 --> 00:02:16,166 as in this logistic regression implementation or classification template. 46 00:02:16,333 --> 00:02:20,500 The only cell that will change will be, you know, the one where we build 47 00:02:20,700 --> 00:02:24,066 and train the classification model, meaning this one. 48 00:02:24,333 --> 00:02:26,466 That's the only cell that we will 49 00:02:26,466 --> 00:02:29,600 re-implement ourselves because all the rest is the same. 50 00:02:29,600 --> 00:02:32,533 But this notebook is in read only mode. 51 00:02:32,533 --> 00:02:36,000 And therefore, in order to re-implement that cell, we need to create 52 00:02:36,000 --> 00:02:40,800 a copy of this notebook by clicking save a Copy in Drive. 53 00:02:40,800 --> 00:02:42,900 And as you can see, this will create a copy 54 00:02:42,900 --> 00:02:46,800 in which we will be able to re-implement that cell. 55 00:02:47,266 --> 00:02:47,666 All right. 56 00:02:47,666 --> 00:02:50,100 So once again laying out the notebook and there we go. 57 00:02:50,100 --> 00:02:54,866 Here is our copy on which we are authorized to modify anything. 58 00:02:54,866 --> 00:02:58,433 And especially that cell we want to re implement. 59 00:02:58,433 --> 00:03:00,733 So I'm scrolling down here to find it. 60 00:03:00,733 --> 00:03:04,500 Here it is training the naive base model on the training set. 61 00:03:04,500 --> 00:03:06,200 So let's immediately 62 00:03:06,200 --> 00:03:10,233 remove that cell because I want you to re-implement it from scratch. 63 00:03:10,233 --> 00:03:13,166 As if we had no idea on how to implement this. 64 00:03:13,166 --> 00:03:15,166 All right, let's create a new code cell. 65 00:03:15,166 --> 00:03:17,066 And now your turn. 66 00:03:17,066 --> 00:03:22,600 I want you to train your machine learning independence by figuring out by yourself 67 00:03:22,600 --> 00:03:27,200 how to indeed build and train that naive base model on the training set. 68 00:03:27,866 --> 00:03:30,000 So of course, to do this, you have several options. 69 00:03:30,000 --> 00:03:33,600 You can directly type in the search bar of Google or Bing. 70 00:03:33,866 --> 00:03:37,033 Well, Naive Bayes scikit learn class okay. 71 00:03:37,033 --> 00:03:41,466 Or you can navigate the cycling API which is right here. 72 00:03:41,700 --> 00:03:46,066 In order to find that class which we need to build our Naive Bayes model. 73 00:03:46,400 --> 00:03:50,100 And my personal recommendation is to try it with the second option, 74 00:03:50,100 --> 00:03:53,766 because indeed, this will get you familiar with this API. 75 00:03:53,766 --> 00:03:56,033 And the more you get familiar with it, the better. 76 00:03:56,033 --> 00:03:57,600 Okay, so please press pause. 77 00:03:57,600 --> 00:03:59,400 Now try to find this class 78 00:03:59,400 --> 00:04:02,866 and then try to build this naive base model in training on the training set. 79 00:04:03,166 --> 00:04:07,000 And now in two seconds we will implement the solution together. 80 00:04:08,000 --> 00:04:09,500 All right let's do this. 81 00:04:09,500 --> 00:04:12,500 Let's build this naive base model with scikit learn. 82 00:04:12,866 --> 00:04:14,566 So the API is huge. 83 00:04:14,566 --> 00:04:17,366 And this is organized in alphabetical order. 84 00:04:17,366 --> 00:04:21,300 And so the first thing we'll try is to find, you know, a module called 85 00:04:21,300 --> 00:04:23,200 maybe Naive Bayes. Right. 86 00:04:23,200 --> 00:04:24,600 Just the name of the model. 87 00:04:24,600 --> 00:04:28,533 So we will scroll down down to N, you know the letter N 88 00:04:28,800 --> 00:04:33,600 and we'll see if we get Naive Bayes somewhere or something close to it. 89 00:04:33,600 --> 00:04:34,200 Let's see. 90 00:04:34,200 --> 00:04:38,566 Linear model meaningful learning metrics we're getting close. 91 00:04:38,566 --> 00:04:42,900 And Gaussian mixture models model selection multiclass. 92 00:04:42,900 --> 00:04:47,100 Well there is a lot of n okay perfect naive Bayes. 93 00:04:47,100 --> 00:04:49,200 So let's click this Naive Bayes. 94 00:04:49,200 --> 00:04:49,800 All right. 95 00:04:49,800 --> 00:04:53,233 And so the model indeed is naive underscore Bayes. 96 00:04:53,500 --> 00:04:54,733 And among these models. 97 00:04:54,733 --> 00:04:57,600 Well according to which one are we going to take here. 98 00:04:57,600 --> 00:04:59,033 That's a good question actually. 99 00:04:59,033 --> 00:05:03,100 Well in fact the classic one and the one that you learned in the intuition 100 00:05:03,100 --> 00:05:07,700 lecture is of course, this one, the Gaussian, naive Bayes, Gaussian and B. 101 00:05:07,733 --> 00:05:13,000 And that's exactly what we'll use to implement our Naive Bayes model. 102 00:05:13,566 --> 00:05:13,833 All right. 103 00:05:13,833 --> 00:05:16,066 So the name of the class is Gaussian B. 104 00:05:16,066 --> 00:05:17,033 And this time 105 00:05:17,033 --> 00:05:18,900 the good news is that we won't have to worry 106 00:05:18,900 --> 00:05:23,566 too much about the parameters because there are actually only two parameters. 107 00:05:23,866 --> 00:05:26,000 And so here very simply 108 00:05:26,000 --> 00:05:30,200 we will just call this class without inputting any parameters. 109 00:05:30,366 --> 00:05:31,800 So that's super easy. 110 00:05:31,800 --> 00:05:37,266 Let's copy this and let's go back to our implementation to copy one. 111 00:05:37,733 --> 00:05:39,533 Let's paste that here. 112 00:05:39,533 --> 00:05:41,700 Let's remove this little thing here. 113 00:05:41,700 --> 00:05:43,800 And now you know how to adapt this. 114 00:05:43,800 --> 00:05:45,833 We need to start with from, 115 00:05:45,833 --> 00:05:49,666 you know, the Naive Bayes module of the scikit learn library. 116 00:05:49,933 --> 00:05:52,933 And then here you add import. 117 00:05:53,400 --> 00:05:56,533 There we go that Gaussian and B class.