1 00:00:00,400 --> 00:00:00,866 All right. 2 00:00:00,866 --> 00:00:03,866 And that's all you will see that it will work perfectly well. 3 00:00:03,900 --> 00:00:06,300 So I'll show you right now 4 00:00:06,300 --> 00:00:09,833 let's select what's above, which is all the pre-processing steps. 5 00:00:10,166 --> 00:00:12,966 So command and control plus end to execute. 6 00:00:12,966 --> 00:00:15,666 All right. Pre-processing. Well done. 7 00:00:15,666 --> 00:00:17,066 We are good to go. 8 00:00:17,066 --> 00:00:19,833 And now we can create our classifier. 9 00:00:19,833 --> 00:00:21,900 So yes let's do this. 10 00:00:21,900 --> 00:00:26,066 If your 1071 package is not selected you need to select that as well. 11 00:00:26,533 --> 00:00:29,166 So command and control plus enter to execute. 12 00:00:29,166 --> 00:00:32,766 And here is our classifier. 13 00:00:33,266 --> 00:00:34,333 Awesome. 14 00:00:34,333 --> 00:00:36,633 So as you can see it works perfectly well. 15 00:00:36,633 --> 00:00:41,366 The classifier is well created using only those two informations. 16 00:00:41,366 --> 00:00:43,133 And when you think about it 17 00:00:43,133 --> 00:00:46,000 well this is all we need to train a classifier 18 00:00:46,000 --> 00:00:49,166 because we need the information of the independent variables 19 00:00:49,366 --> 00:00:50,366 and the information 20 00:00:50,366 --> 00:00:54,333 of the dependent variable, so that it can learn the correlations 21 00:00:54,333 --> 00:00:57,333 between the independent variables and the dependent variable. 22 00:00:57,900 --> 00:00:59,933 All right. So that makes sense. 23 00:00:59,933 --> 00:01:02,000 And now we can create our vector of prediction. 24 00:01:02,000 --> 00:01:06,600 Why pred using the predict function on our classifier and on our new data 25 00:01:06,766 --> 00:01:08,133 which is the test set. 26 00:01:08,133 --> 00:01:09,766 So here we go. 27 00:01:09,766 --> 00:01:11,033 White bread is created. 28 00:01:11,033 --> 00:01:13,500 Let's have a quick look. Why pred. 29 00:01:14,600 --> 00:01:17,533 So for the first time we are obtaining something here. 30 00:01:17,533 --> 00:01:20,466 Remember before when we entered white bread in the console, 31 00:01:20,466 --> 00:01:23,833 we had all the predictions listed in the console. 32 00:01:24,133 --> 00:01:26,300 But here we have this factor zero. 33 00:01:26,300 --> 00:01:27,600 Here. 34 00:01:27,600 --> 00:01:32,000 So that means that white bread is a vector of vectors but with no vector. 35 00:01:32,333 --> 00:01:34,566 So it's basically an empty vector. 36 00:01:34,566 --> 00:01:38,300 And that is because the naive Bayes function of the 1871 library 37 00:01:38,666 --> 00:01:42,000 didn't recognize our dependent variable vector purchased 38 00:01:42,266 --> 00:01:45,433 as a categorical variable with zero and one factors. 39 00:01:46,200 --> 00:01:49,100 So far, the libraries and functions we've been using 40 00:01:49,100 --> 00:01:51,900 recognized the dependent variable as factors. 41 00:01:51,900 --> 00:01:54,466 So we didn't have any issues with our predictions, 42 00:01:54,466 --> 00:01:58,500 and we didn't have to encode the dependent variable purchased as factors. 43 00:01:59,100 --> 00:02:01,633 But here with Naive Bayes it's not the case. 44 00:02:01,633 --> 00:02:04,933 It doesn't recognize the dependent variable purchased as vectors. 45 00:02:05,233 --> 00:02:08,166 So we need to encode it as factors. 46 00:02:08,166 --> 00:02:11,133 Because as you can see, if we try to compute the confusion 47 00:02:11,133 --> 00:02:14,300 matrix below, we will get this following error message. 48 00:02:14,933 --> 00:02:17,366 All arguments must have the same length. 49 00:02:17,366 --> 00:02:22,366 Well yes indeed we have a problem because the two arguments here are to set three. 50 00:02:22,700 --> 00:02:26,933 That is the third column purchased of the test set, and then white bread. 51 00:02:27,666 --> 00:02:31,966 So since the set three has length 100 and white bread has length 52 00:02:31,966 --> 00:02:34,966 zero, because it's an empty vector with no vectors, 53 00:02:35,133 --> 00:02:38,133 then obviously we can not compute any confusion matrix. 54 00:02:38,666 --> 00:02:41,933 So what we need to do is to jump back up to the pre-processing. 55 00:02:41,933 --> 00:02:46,433 First step to encode are dependent variable purchased as vectors. 56 00:02:47,133 --> 00:02:50,400 Then the naive base function will recognize the dependent variable 57 00:02:50,400 --> 00:02:55,000 as vectors, and will perfectly be able to create a classifier 58 00:02:55,000 --> 00:02:58,966 that will allow the predict function to return the expected vector 59 00:02:58,966 --> 00:03:00,466 of predictions of y print. 60 00:03:00,466 --> 00:03:04,433 So it's great that you see this error, because this is a classic error in machine 61 00:03:04,433 --> 00:03:08,700 learning in R, and that way from now on, you will make this error in the future. 62 00:03:08,700 --> 00:03:11,000 Or if you make it, you'll know how to fix it. 63 00:03:12,833 --> 00:03:13,966 So let's do it right now. 64 00:03:13,966 --> 00:03:15,700 Let's add a new code section here. 65 00:03:15,700 --> 00:03:18,700 And let's call it and coding 66 00:03:19,100 --> 00:03:22,100 the target feature 67 00:03:22,600 --> 00:03:25,133 as vector. 68 00:03:25,133 --> 00:03:25,733 All right. 69 00:03:25,733 --> 00:03:30,500 And now let's factorize R column purchased okay. 70 00:03:30,500 --> 00:03:31,366 Let's do this. 71 00:03:31,366 --> 00:03:33,966 So we will take it from start. 72 00:03:33,966 --> 00:03:36,233 That means that we will take the purchased 73 00:03:36,233 --> 00:03:38,900 dependent variable column of the data set. 74 00:03:38,900 --> 00:03:41,733 And then we will recompute all of these to set 75 00:03:41,733 --> 00:03:44,766 this purchase column everywhere for all of set. 76 00:03:44,766 --> 00:03:47,500 That is a training set. And the test set okay. 77 00:03:47,500 --> 00:03:50,333 So we're going to take our last column from our data set. 78 00:03:50,333 --> 00:03:54,466 So we're going to type data set dollar purchased 79 00:03:55,766 --> 00:03:58,066 write equals 80 00:03:58,066 --> 00:04:01,500 factor and in parenthesis data set. 81 00:04:01,666 --> 00:04:04,966 And then again we're taking the purchased column 82 00:04:04,966 --> 00:04:07,033 that is the dependent variable column of our data set. 83 00:04:07,033 --> 00:04:11,400 So again we are taking the dollar sign here and purchased to take this column. 84 00:04:12,333 --> 00:04:13,300 And then here 85 00:04:13,300 --> 00:04:16,733 we will just specify the levels or levels are 86 00:04:17,100 --> 00:04:20,100 C zero and one. 87 00:04:20,500 --> 00:04:21,600 All right. 88 00:04:21,600 --> 00:04:22,300 And that's it. 89 00:04:22,300 --> 00:04:27,033 That's how you encode your dependent variable column purchased into factors. 90 00:04:27,333 --> 00:04:29,700 All right. So we're going to start from scratch now. 91 00:04:29,700 --> 00:04:31,200 So to do this we can clear everything. 92 00:04:31,200 --> 00:04:33,966 So that's what we're going to do to everything here. 93 00:04:33,966 --> 00:04:37,433 So we're cleaning the data clearing the script here. 94 00:04:37,433 --> 00:04:40,433 So I'm pressing control L to clear. 95 00:04:40,700 --> 00:04:43,700 And that's.