1 00:00:00,133 --> 00:00:02,366 Hello and welcome to this art tutorial. 2 00:00:02,366 --> 00:00:06,633 Today we are going to implement the Naive Bayes algorithm on R. 3 00:00:06,900 --> 00:00:07,800 So let's start right now. 4 00:00:07,800 --> 00:00:10,800 Let's first set the folder as working directory. 5 00:00:10,866 --> 00:00:14,166 So I'm going to part three classification. 6 00:00:14,600 --> 00:00:16,100 And that's the folder. 7 00:00:16,100 --> 00:00:18,800 So make sure you have the social network add CSV file. 8 00:00:18,800 --> 00:00:19,933 And if that's the case 9 00:00:19,933 --> 00:00:23,400 click on this more button here to set this folder as working directory. 10 00:00:24,000 --> 00:00:27,766 And now let's start implementing Naive Bayes on R. 11 00:00:28,200 --> 00:00:30,733 So we're going to take our classification template. 12 00:00:30,733 --> 00:00:34,366 We're going to take everything from here to here. 13 00:00:36,766 --> 00:00:38,466 And then 14 00:00:38,466 --> 00:00:40,300 paste right here. 15 00:00:40,300 --> 00:00:43,033 And now we just need to change a few things. 16 00:00:43,033 --> 00:00:47,166 So first let's change the titles r usual so that we don't forget. 17 00:00:47,233 --> 00:00:50,400 So here we are with the Naive Bayes classifier. 18 00:00:51,500 --> 00:00:54,400 Same for the training set. 19 00:00:54,400 --> 00:00:57,133 So that is plotting the title in our graph. 20 00:00:57,133 --> 00:00:59,700 And now what we only need to do 21 00:00:59,700 --> 00:01:03,666 is to create our Naive Bayes classifier. 22 00:01:04,366 --> 00:01:06,000 So let's do this. 23 00:01:06,000 --> 00:01:08,333 Going to remove this. 24 00:01:08,333 --> 00:01:09,833 And now what are we going to do. 25 00:01:09,833 --> 00:01:10,633 Well it's actually funny 26 00:01:10,633 --> 00:01:14,300 because we are going to use a library that we already used before. 27 00:01:14,400 --> 00:01:17,400 But for a different classifier 28 00:01:17,400 --> 00:01:20,400 it's the E 1071 library. 29 00:01:20,566 --> 00:01:23,200 So there is no surprise why this library is very popular. 30 00:01:23,200 --> 00:01:27,133 One of the most popular ones, because it contains a lot of tools 31 00:01:27,133 --> 00:01:30,900 tools for SVM and tools for Naive Bayes. 32 00:01:31,366 --> 00:01:33,633 So here again we'll use this library. 33 00:01:33,633 --> 00:01:37,066 So for those of you who didn't follow the SVM tutorials on R, 34 00:01:37,300 --> 00:01:41,033 and for those of you who are starting R for the first time, maybe you don't have 35 00:01:41,033 --> 00:01:45,600 the E 1071 packages in your packages list. 36 00:01:45,900 --> 00:01:49,666 So if that's the case, you just need to type the command here. 37 00:01:49,666 --> 00:01:54,433 Install dot packages and then in quotes here in the parenthesis 38 00:01:54,733 --> 00:01:59,866 you just add the name of the library which is E 1071. 39 00:02:00,533 --> 00:02:04,533 And then you just need to select this and execute to install the package. 40 00:02:04,533 --> 00:02:07,533 I won't do it here because I already have mine installed. 41 00:02:07,600 --> 00:02:10,300 But I promise you it will work properly. 42 00:02:10,300 --> 00:02:13,300 So I'm just going to put that in comment. 43 00:02:13,533 --> 00:02:14,133 Here we go. 44 00:02:14,133 --> 00:02:18,400 And now let's just also import the library that is to select it 45 00:02:18,666 --> 00:02:19,633 in the packages list. 46 00:02:19,633 --> 00:02:21,033 Because here it's selected. 47 00:02:21,033 --> 00:02:22,700 But it might not always be the case. 48 00:02:22,700 --> 00:02:25,933 And if you want to automate your machine learning scripts, it's better 49 00:02:25,933 --> 00:02:30,200 to always have this script line here that selects automatically your packages 50 00:02:30,733 --> 00:02:33,333 through the library. E10 71. 51 00:02:33,333 --> 00:02:36,900 And now we are ready to start creating our Naive Bayes classifier. 52 00:02:37,666 --> 00:02:39,133 Okay, so let's do this. 53 00:02:39,133 --> 00:02:41,900 As usual, we will create a variable classifier. 54 00:02:44,466 --> 00:02:45,966 And then we will use the 55 00:02:45,966 --> 00:02:50,500 Naive Bayes function of the E 1071 library. 56 00:02:50,833 --> 00:02:53,400 The E 1071 library contains a lot of functions, 57 00:02:53,400 --> 00:02:57,000 and one of them is to create a Naive Bayes classifier. 58 00:02:57,700 --> 00:03:00,800 So okay, so careful with the, capitals. 59 00:03:00,800 --> 00:03:04,933 So it's, not capital N, the capital B, all right. 60 00:03:04,933 --> 00:03:07,133 But then, you know, R is helping you with the term. 61 00:03:07,133 --> 00:03:10,500 So you would have this, Naive Bayes popping up here. 62 00:03:10,500 --> 00:03:12,666 So you just have to press enter. 63 00:03:12,666 --> 00:03:14,733 And now let's input the parameters. 64 00:03:14,733 --> 00:03:16,333 So the parameters what are they. 65 00:03:16,333 --> 00:03:17,400 Let's look at them. 66 00:03:17,400 --> 00:03:20,400 We press F1 here to have a look. 67 00:03:20,600 --> 00:03:22,233 And here there are. 68 00:03:22,233 --> 00:03:26,666 So we will actually need to input only the two first arguments. 69 00:03:26,900 --> 00:03:30,133 We will need to import the formula like for the other one. 70 00:03:30,133 --> 00:03:33,133 It will work perfectly well with only the first two ones. 71 00:03:33,566 --> 00:03:34,266 So you'll see. 72 00:03:34,266 --> 00:03:38,533 Let's input x here, which is by the way the training set. 73 00:03:39,300 --> 00:03:40,200 So the training set. 74 00:03:40,200 --> 00:03:41,933 But we need to remove 75 00:03:41,933 --> 00:03:45,533 the last column of our training set because X is actually the matrix features. 76 00:03:45,533 --> 00:03:49,233 You see it's written here in numeric matrix or data frame. 77 00:03:49,566 --> 00:03:53,833 And by that they mean the matrix of features that is the matrix 78 00:03:54,133 --> 00:03:55,700 of the independent variable. 79 00:03:55,700 --> 00:03:59,466 So here, since the training set contains both the independent variables 80 00:03:59,466 --> 00:04:03,033 and the dependent variable, we need to exclude the dependent variable. 81 00:04:03,333 --> 00:04:06,200 So to do this we're going to add brackets here 82 00:04:06,200 --> 00:04:10,466 and remove the last column by inputting here a minus. 83 00:04:10,466 --> 00:04:14,100 And then the index of the last column which is well we can see here 84 00:04:14,100 --> 00:04:17,100 it's minus three. So it's minus three. 85 00:04:17,233 --> 00:04:19,366 All right. So perfect. 86 00:04:19,366 --> 00:04:22,966 Then comma and then next argument. 87 00:04:23,100 --> 00:04:26,100 So do you guess what the next argument is. 88 00:04:26,100 --> 00:04:29,500 Well of course to train a classifier we need the independent variables 89 00:04:29,500 --> 00:04:30,733 and the response. 90 00:04:30,733 --> 00:04:32,700 And the response is the dependent variable. 91 00:04:32,700 --> 00:04:35,700 So of course y is going to be the dependent variable. 92 00:04:36,000 --> 00:04:40,466 So y equals and to take it we are going to take it this way. 93 00:04:40,466 --> 00:04:42,600 We're going to input training set 94 00:04:44,466 --> 00:04:47,333 dollar purchased. 95 00:04:47,333 --> 00:04:50,866 So I'm choosing to write it this way because we can clearly see that 96 00:04:51,000 --> 00:04:54,000 we are taking the dependent variable purchased.