1 00:00:00,166 --> 00:00:01,800 Hello my friends, and welcome 2 00:00:01,800 --> 00:00:06,366 to this new practical activity for Decision Tree Classification. 3 00:00:06,766 --> 00:00:10,166 And now please follow me into part three classification. 4 00:00:10,200 --> 00:00:14,933 And to implement this time the decision tree classification model. 5 00:00:15,366 --> 00:00:17,733 And as usual we're going to start with Python. 6 00:00:17,733 --> 00:00:20,033 This Python folder contains two files. 7 00:00:20,033 --> 00:00:24,733 The implementation of the decision tree classification model in ipynb format, 8 00:00:25,000 --> 00:00:28,566 and the same data set social network at the CSV, 9 00:00:28,800 --> 00:00:33,000 which contains information of 400 customers of a car dealership 10 00:00:33,233 --> 00:00:36,233 which just released a brand new luxury SUV. 11 00:00:36,300 --> 00:00:39,233 And the strategic team gathered this data to understand 12 00:00:39,233 --> 00:00:42,300 which customers buy the most SUV in order 13 00:00:42,300 --> 00:00:46,200 to target them with ads that will be posted on social networks. 14 00:00:46,400 --> 00:00:49,566 So each of these rows correspond to different customers 15 00:00:49,766 --> 00:00:51,166 and for each of these customers, 16 00:00:51,166 --> 00:00:54,466 we have two features the age and the estimated salary with 17 00:00:54,466 --> 00:00:58,666 which we're going to predict the dependent variable purchased, which takes 18 00:00:58,666 --> 00:01:02,766 binary values zero, meaning that the customer didn't buy the SUV, 19 00:01:03,066 --> 00:01:06,766 and one meaning that the customer but the SUV. 20 00:01:07,300 --> 00:01:08,700 All right, let's do this. 21 00:01:08,700 --> 00:01:12,000 Let's see if decision tree classification can beat 22 00:01:12,133 --> 00:01:15,566 the previous accuracy we got so far, meaning 93%. 23 00:01:15,900 --> 00:01:19,533 So without further ado, let's open that with Google Colab. 24 00:01:19,700 --> 00:01:20,966 Choose your favorite. 25 00:01:20,966 --> 00:01:25,100 And I'm going to put it here because this is our next classification model. 26 00:01:25,600 --> 00:01:26,633 All right. Perfect. 27 00:01:26,633 --> 00:01:31,133 So once again this implementation results from that classification template 28 00:01:31,133 --> 00:01:33,233 we made in the first section of this part three 29 00:01:33,233 --> 00:01:36,233 when we implemented the logistic regression model. 30 00:01:36,300 --> 00:01:40,366 And therefore the only cell that changes here in this implementation 31 00:01:40,366 --> 00:01:46,400 is, as usual, that cell where we build and train the classification model. 32 00:01:46,600 --> 00:01:47,266 Right. 33 00:01:47,266 --> 00:01:52,133 This cell here and all the rest, you know, all the other cells are exactly the same 34 00:01:52,133 --> 00:01:55,366 as in the logistic regression code template 35 00:01:55,366 --> 00:01:57,433 or you know, the classification code template, 36 00:01:57,433 --> 00:02:00,166 because indeed we use the same variable names classifier. 37 00:02:00,166 --> 00:02:03,800 And then x train x test everything is the same except the cell. 38 00:02:04,066 --> 00:02:07,033 So once again we're going to re-implement this cell. 39 00:02:07,033 --> 00:02:11,700 But to do this we need to create a copy because this is in read only mode. 40 00:02:11,966 --> 00:02:14,966 So right now we're going to create that copy 41 00:02:15,000 --> 00:02:18,100 on which we will be able to re-implement that cell. 42 00:02:18,300 --> 00:02:18,966 All right. 43 00:02:18,966 --> 00:02:20,966 So let's scroll down again 44 00:02:20,966 --> 00:02:25,000 to find that cell where we build and train the decision tree classification model. 45 00:02:25,166 --> 00:02:29,700 Now let's put it in the trash immediately because I don't want you to see, 46 00:02:29,700 --> 00:02:33,533 you know, the name of the class, because indeed I want you to find it on your own. 47 00:02:33,733 --> 00:02:35,166 So there you go. 48 00:02:35,166 --> 00:02:40,300 Now is the time you press pause on this video to find the right class 49 00:02:40,300 --> 00:02:44,333 name that allows us to build the decision tree classification model. 50 00:02:44,633 --> 00:02:48,200 You will have to either look for it directly through an online search 51 00:02:48,200 --> 00:02:52,900 or that other option is to look for it inside the scikit learn API. 52 00:02:52,900 --> 00:02:55,900 And this is exactly what we're going to do together. 53 00:02:56,066 --> 00:02:58,066 And all right let's do this. 54 00:02:58,066 --> 00:03:02,100 So let's go to the scikit learn API to find the class 55 00:03:02,100 --> 00:03:05,766 that allows us to build a decision tree classification model. 56 00:03:06,066 --> 00:03:06,433 All right. 57 00:03:06,433 --> 00:03:08,000 So API 58 00:03:08,000 --> 00:03:13,466 and by memory you know let's suppose actually that I have no idea of the module 59 00:03:13,466 --> 00:03:17,033 or even the class that built this decision tree classification model. 60 00:03:17,166 --> 00:03:21,000 So here I'm scrolling down, you know, to observe the different modules. 61 00:03:21,000 --> 00:03:24,100 Maybe it is an ensemble method because you know, I know that 62 00:03:24,100 --> 00:03:27,833 random forest we built, you know, the Random Forest Regressor in part two. 63 00:03:28,000 --> 00:03:30,233 Random forest is an in simple method. 64 00:03:30,233 --> 00:03:32,333 But no, here we don't see the decision tree. 65 00:03:32,333 --> 00:03:35,333 So let's scroll back down again. 66 00:03:35,400 --> 00:03:36,333 Which makes sense, right? 67 00:03:36,333 --> 00:03:38,333 Because the decision tree is just a single model. 68 00:03:38,333 --> 00:03:40,800 It is not an ensemble of models. 69 00:03:40,800 --> 00:03:43,800 All right. So many learning metrics. No, no. 70 00:03:43,966 --> 00:03:44,866 You know it's really important 71 00:03:44,866 --> 00:03:47,866 that you get familiar with this API because the more you get familiar, 72 00:03:48,000 --> 00:03:50,933 the more expert you'll become and the better you'll juggle 73 00:03:50,933 --> 00:03:51,833 with all the machine 74 00:03:51,833 --> 00:03:55,100 learning tools you know, besides the one that I'm giving you in this course. 75 00:03:55,600 --> 00:03:57,666 All right, so here I'm scrolling down more. 76 00:03:57,666 --> 00:03:57,966 All right. 77 00:03:57,966 --> 00:04:01,200 Preprocessing, random projection, semi-supervised 78 00:04:01,200 --> 00:04:04,266 learning, support vector machines, decision trees. 79 00:04:04,266 --> 00:04:05,433 There we go. 80 00:04:05,433 --> 00:04:08,500 You know, usually the module you're looking for in the circular 81 00:04:08,533 --> 00:04:11,833 API will have the same name here as the model you want to build. 82 00:04:12,033 --> 00:04:12,500 Right. 83 00:04:12,500 --> 00:04:15,266 And you know, this is organized in alphabetical order. 84 00:04:15,266 --> 00:04:18,000 So that's why I wanted you to do this exercise. 85 00:04:18,000 --> 00:04:20,966 It's actually very easy to find what you want. 86 00:04:20,966 --> 00:04:22,100 All right decision trees. 87 00:04:22,100 --> 00:04:24,700 Now which one do we want to get among these. 88 00:04:24,700 --> 00:04:26,266 Well that's of course the first one. 89 00:04:26,266 --> 00:04:28,866 You know, that was the one we used in part two regression. 90 00:04:28,866 --> 00:04:30,466 But now we're doing classification. 91 00:04:30,466 --> 00:04:33,633 So we want the decision tree classifier class. 92 00:04:33,700 --> 00:04:34,233 All right. 93 00:04:34,233 --> 00:04:36,366 So now what do we have to do first. 94 00:04:36,366 --> 00:04:38,700 Well let's do you know what do we have to do 95 00:04:38,700 --> 00:04:41,433 anyway which is to import that class. 96 00:04:41,433 --> 00:04:42,966 And so I'm copying this. 97 00:04:42,966 --> 00:04:46,733 And then back in my copy we're going to create a new code cell. 98 00:04:46,966 --> 00:04:48,233 We're going to paste that. 99 00:04:48,233 --> 00:04:51,766 And once again we're going to start from scikit learn. 100 00:04:51,766 --> 00:04:54,500 And then the tree module by scikit learn. 101 00:04:54,500 --> 00:04:57,500 And from that tree module we're going to import 102 00:04:57,600 --> 00:05:00,233 the decision tree classifier. 103 00:05:00,233 --> 00:05:02,066 So that's the first step right. 104 00:05:02,066 --> 00:05:03,533 You know it by heart. Now. 105 00:05:03,533 --> 00:05:06,166 And now the second step I don't even have to tell you. 106 00:05:06,166 --> 00:05:09,533 It is of course to create our classifier object, 107 00:05:09,900 --> 00:05:13,266 which will be created as an instance of this decision 108 00:05:13,266 --> 00:05:16,566 tree classifier class and which will represent nothing else. 109 00:05:16,566 --> 00:05:19,566 Then the decision tree classifier model. 110 00:05:19,933 --> 00:05:22,366 All right. So here we add some parenthesis. 111 00:05:22,366 --> 00:05:27,600 And now the second question is what do we have to input here as parameters. 112 00:05:27,600 --> 00:05:29,600 You know if this decision tree classifier class. 113 00:05:29,600 --> 00:05:30,600 So let's see. 114 00:05:30,600 --> 00:05:33,666 Let's go back to the documentation to scikit learn API. 115 00:05:34,000 --> 00:05:37,566 And well we will just change one you know default 116 00:05:37,566 --> 00:05:41,066 value of the parameters which is this one criterion. 117 00:05:41,333 --> 00:05:44,600 So the default value of criterion is Gini. 118 00:05:45,200 --> 00:05:48,466 But with respect to what you learned in the theory with key rules 119 00:05:48,466 --> 00:05:52,300 intuition lecture, we're going to choose an entropy criterion. 120 00:05:52,300 --> 00:05:56,100 So the criterion is of course the function to measure the quality of a splits. 121 00:05:56,366 --> 00:05:58,966 And that quality is measures by entropy.