1 00:00:00,066 --> 00:00:00,800 Hello my friends. 2 00:00:00,800 --> 00:00:03,100 Congratulations again for having implemented 3 00:00:03,100 --> 00:00:06,100 your first classification model, logistic regression. 4 00:00:06,300 --> 00:00:09,133 And now, before we move on to the next model, 5 00:00:09,133 --> 00:00:12,966 I would like to make a classification template that will allow us 6 00:00:12,966 --> 00:00:17,800 to build much more efficiently the next classification models of this part. 7 00:00:18,100 --> 00:00:20,066 And not only that, it is a template 8 00:00:20,066 --> 00:00:23,933 that you will also be able to use for your personal data set. 9 00:00:24,100 --> 00:00:24,733 All right. 10 00:00:24,733 --> 00:00:25,666 So let's do this. 11 00:00:25,666 --> 00:00:29,600 At the top of our studio we're going to click file and then click New File. 12 00:00:29,600 --> 00:00:31,800 And then our script. All right. 13 00:00:31,800 --> 00:00:36,566 And we're going to save this as classification template. 14 00:00:36,866 --> 00:00:41,266 All right then we click save and you will see it appear in your file here. 15 00:00:41,300 --> 00:00:43,300 That's the new file we just created. 16 00:00:43,300 --> 00:00:46,800 All right then we're going to take our logistic regression implementation. 17 00:00:46,800 --> 00:00:47,200 Again. 18 00:00:47,200 --> 00:00:48,733 We're going to select everything 19 00:00:48,733 --> 00:00:53,166 copy it and then paste it in our classification template. 20 00:00:53,166 --> 00:00:55,033 And then let's change the title. 21 00:00:55,033 --> 00:00:58,766 We're going to replace this by classification template. 22 00:00:59,300 --> 00:01:02,100 And then in the data preprocessing sections here we're going 23 00:01:02,100 --> 00:01:05,400 to keep everything as is except one thing. 24 00:01:05,700 --> 00:01:09,966 We're going to add one line of code that will make sure the classes of 25 00:01:09,966 --> 00:01:14,300 our dependent variable are indeed classes and, you know, not continuous numbers. 26 00:01:14,500 --> 00:01:17,700 So for most of the models like logistic regression k 27 00:01:17,733 --> 00:01:22,800 and then naive Bayes, SVM and kernel SVM, the package that we'll use to build 28 00:01:22,800 --> 00:01:26,433 those models will recognize the classes of our data sets as classes. 29 00:01:26,633 --> 00:01:30,033 But there will be two models that will build, which are decision tree 30 00:01:30,033 --> 00:01:33,566 and random forest that will not recognize them as classes. 31 00:01:33,833 --> 00:01:37,600 And therefore, to make this template applicable to all the models 32 00:01:37,600 --> 00:01:41,366 we're going to build in this part three, we just need to add this little code 33 00:01:41,366 --> 00:01:48,133 section here that I'm going to call, and coding the target feature as factor. 34 00:01:48,300 --> 00:01:51,600 And that will make sure that the purchased 35 00:01:51,900 --> 00:01:54,800 dependent variable of our data 36 00:01:54,800 --> 00:01:57,800 set will be included as factors. 37 00:01:57,800 --> 00:02:02,300 And to do so we use the factor function inside which we enter 38 00:02:02,300 --> 00:02:06,266 first the dependent variable column which we access from our data set. 39 00:02:06,400 --> 00:02:07,800 Then adding a dollar again. 40 00:02:07,800 --> 00:02:10,033 And then there we go purchased. 41 00:02:10,033 --> 00:02:12,400 So that's the first argument of this factor function. 42 00:02:12,400 --> 00:02:16,500 And then we just need to specify the two classes here with this argument 43 00:02:16,500 --> 00:02:20,466 levels that will be equal to our two classes zero and one, 44 00:02:20,466 --> 00:02:25,800 which we have to enter in this C function which will take as input zero and one. 45 00:02:26,133 --> 00:02:28,300 All right. So let me recap this. 46 00:02:28,300 --> 00:02:32,466 This line of code is optional for the logistic regression model. 47 00:02:32,466 --> 00:02:35,033 The k nearest neighbor model which will implement next. 48 00:02:35,033 --> 00:02:38,833 Then the SVM model the kernel SVM model and the Naive Bayes model. 49 00:02:39,100 --> 00:02:42,733 And it is compulsory for the decision tree classification model 50 00:02:42,733 --> 00:02:45,166 and for the random forest classification model. 51 00:02:45,166 --> 00:02:45,733 All right. 52 00:02:45,733 --> 00:02:48,900 So I just want to include it here so that we can apply this template 53 00:02:49,133 --> 00:02:51,600 to all the classification models of this part tree. 54 00:02:51,600 --> 00:02:53,866 And for that let's make the other required changes. 55 00:02:53,866 --> 00:02:55,433 So here when splitting the data 56 00:02:55,433 --> 00:02:58,566 set into training set and a test set, we don't have to change anything. 57 00:02:58,800 --> 00:03:00,900 Same we can keep feature scaling. 58 00:03:00,900 --> 00:03:02,766 And then that's where we will change some things. 59 00:03:02,766 --> 00:03:04,933 So first we will remove all this 60 00:03:04,933 --> 00:03:07,300 because that's where we'll build our next classifier. 61 00:03:07,300 --> 00:03:12,800 And so we can add a and saying create your classifier here okay. 62 00:03:13,000 --> 00:03:16,766 Then we can replace logistic regression here by classifier. 63 00:03:17,333 --> 00:03:17,666 All right. 64 00:03:17,666 --> 00:03:20,533 To make it generic then here very important. 65 00:03:20,533 --> 00:03:23,866 The logistic regression model is the only model of this part three 66 00:03:23,866 --> 00:03:27,500 that returns first its predictions in the form of probabilities. 67 00:03:27,666 --> 00:03:30,866 All the other models will return directly 0 or 1. 68 00:03:31,133 --> 00:03:33,600 And therefore we're going to remove this line. 69 00:03:33,600 --> 00:03:37,633 We're going to replace here prop pred by y bread. 70 00:03:37,933 --> 00:03:41,000 And here in the predict function we're going to keep classifier. 71 00:03:41,000 --> 00:03:44,400 Of course we're going to remove type equals response. 72 00:03:44,700 --> 00:03:47,766 And we're going to keep new data equals to set minus three. 73 00:03:48,000 --> 00:03:48,400 All right. 74 00:03:48,400 --> 00:03:49,466 And this line of code 75 00:03:49,466 --> 00:03:53,233 will be applicable to all the remaining classification models of part three. 76 00:03:53,333 --> 00:03:56,133 We won't have to change anything here okay. 77 00:03:56,133 --> 00:03:58,033 And then same for the confusion matrix. 78 00:03:58,033 --> 00:04:02,433 This is only for the logistic regression model because it returns probabilities. 79 00:04:02,633 --> 00:04:05,933 And so to adapt this to a model that returns only zero one, 80 00:04:05,933 --> 00:04:09,100 well of course we need to remove larger than 0.5. 81 00:04:09,133 --> 00:04:09,566 Right. 82 00:04:09,566 --> 00:04:13,933 Because the widespread of our future classification models will only contain 83 00:04:13,933 --> 00:04:17,966 zero one exactly as the dependent variable column of the test set. 84 00:04:18,566 --> 00:04:19,300 Okay, great. 85 00:04:19,300 --> 00:04:23,300 And then finally in the last two sections here visualizing the training set 86 00:04:23,300 --> 00:04:24,500 and test results. 87 00:04:24,500 --> 00:04:24,900 Same. 88 00:04:24,900 --> 00:04:27,666 We're going to remove the probabilities here. 89 00:04:27,666 --> 00:04:29,533 Then cut this way grid. 90 00:04:29,533 --> 00:04:33,300 And then replace this proposed variable by y grid. 91 00:04:33,633 --> 00:04:37,733 And then remove the type equals response parameter so that. 92 00:04:37,733 --> 00:04:39,200 Now we don't have our predictions 93 00:04:39,200 --> 00:04:42,200 returned in the form of probabilities but zeros and ones. 94 00:04:42,500 --> 00:04:45,966 And then of course we're going to make it generic again by replacing logistic 95 00:04:45,966 --> 00:04:49,033 regression here by classifier okay. 96 00:04:49,033 --> 00:04:49,600 Excellent. 97 00:04:49,600 --> 00:04:53,433 So now let's copy this because we're going to do this same down there 98 00:04:53,566 --> 00:04:56,533 for the visualization of the test result. 99 00:04:56,533 --> 00:05:00,900 So I'm going to replace these two lines by what we just copied. 100 00:05:00,900 --> 00:05:01,600 There we go. 101 00:05:01,600 --> 00:05:05,500 And then replace logistic regression again by classifier. 102 00:05:05,700 --> 00:05:09,633 And now congratulations we have a generic template 103 00:05:09,633 --> 00:05:12,833 that we will be able to use to build very efficiently. 104 00:05:12,833 --> 00:05:15,600 Our next classification rules of board three. 105 00:05:15,600 --> 00:05:16,933 So let's save this. 106 00:05:16,933 --> 00:05:19,900 And now look forward to building those next models with you. 107 00:05:19,900 --> 00:05:22,133 And until then enjoy machine learning.