1 00:00:00,100 --> 00:00:01,833 Hello my friends, and welcome 2 00:00:01,833 --> 00:00:06,766 to the final practical activity of this part three on classification. 3 00:00:07,133 --> 00:00:10,166 And now we're all going to go into part three classification. 4 00:00:10,166 --> 00:00:13,533 Then to implement the final classification 5 00:00:13,533 --> 00:00:17,233 model of this part three the random forest classification model. 6 00:00:17,766 --> 00:00:19,566 All right. And we're going to start with Python. 7 00:00:19,566 --> 00:00:20,200 Of course. 8 00:00:20,200 --> 00:00:23,200 And inside this folder you will get the same two files 9 00:00:23,200 --> 00:00:26,033 for the implementation of this classification model. 10 00:00:26,033 --> 00:00:29,033 And the social network ad data set, 11 00:00:29,100 --> 00:00:33,333 which contains 400 observations corresponding to 400 customers. 12 00:00:33,466 --> 00:00:37,100 You know, each row is a customer, and for each of them we get the two 13 00:00:37,100 --> 00:00:41,066 features age and estimated salary with which we're going to predict 14 00:00:41,066 --> 00:00:45,600 this dependent variable purchase which tells if yes or no, 15 00:00:45,766 --> 00:00:49,800 each customer but an SUV from this car dealership. 16 00:00:50,066 --> 00:00:52,866 And then once we train this model to understand the correlations 17 00:00:52,866 --> 00:00:55,766 between these two features and the dependent variable vector, 18 00:00:55,766 --> 00:00:58,900 we will be able to predict which new customers will buy 19 00:00:59,033 --> 00:01:03,866 that brand new SUV just released by this car company, and therefore, we'll be able 20 00:01:03,866 --> 00:01:09,433 to target the best way our customers through beautiful ads on social networks. 21 00:01:09,566 --> 00:01:11,633 All right, so let's do this. 22 00:01:11,633 --> 00:01:15,133 Let's start the implementation random forest classification. 23 00:01:15,366 --> 00:01:18,600 And let's open this with either Google Collaboratory 24 00:01:18,600 --> 00:01:22,566 or Jupyter Notebook, whatever is your favorite. 25 00:01:23,233 --> 00:01:23,600 All right. 26 00:01:23,600 --> 00:01:27,000 So right now it is opening the notebook loading it, laying it out. 27 00:01:27,000 --> 00:01:30,966 And here is the random forest classification implementation 28 00:01:31,233 --> 00:01:34,700 which results once again from that classification template 29 00:01:34,700 --> 00:01:37,700 we made in the first section on logistic regression. 30 00:01:37,966 --> 00:01:41,966 So all these cells here are exactly the same as in logistic regression. 31 00:01:41,966 --> 00:01:45,600 You know, with the same variable names and everything except 32 00:01:45,900 --> 00:01:48,900 this cell where we build and train the 33 00:01:49,600 --> 00:01:54,000 the classification model here, the random forest classification model. 34 00:01:54,300 --> 00:01:56,400 So we're going to re-implement that cell. 35 00:01:56,400 --> 00:02:00,533 And since this is in read only mode so that you can all access it. 36 00:02:00,800 --> 00:02:01,966 Well we are going to create 37 00:02:01,966 --> 00:02:05,866 a copy of this file by clicking here on save a copy in Drive. 38 00:02:06,133 --> 00:02:08,900 This creates a copy and there we go. 39 00:02:08,900 --> 00:02:12,233 We will be able to re-implement that cell to train 40 00:02:12,233 --> 00:02:15,366 our random forest classification model on the training set. 41 00:02:15,700 --> 00:02:16,433 All right. 42 00:02:16,433 --> 00:02:19,366 So first let's remove that cell. 43 00:02:19,366 --> 00:02:21,166 And now now is the time. 44 00:02:21,166 --> 00:02:21,600 Where are you going 45 00:02:21,600 --> 00:02:25,533 to press pause on the video to of course implement this yourself. 46 00:02:25,533 --> 00:02:29,433 And also to learn how to be independent in machine learning and learn 47 00:02:29,433 --> 00:02:33,866 how to get familiar with that scikit learn API, 48 00:02:33,866 --> 00:02:37,333 which is the way you're going to find the information you need right now 49 00:02:37,333 --> 00:02:40,333 to build this random forest classification model. 50 00:02:40,533 --> 00:02:41,066 All right. 51 00:02:41,066 --> 00:02:42,600 So let's do this together. 52 00:02:42,600 --> 00:02:46,966 Let's go to the API and let's find that class 53 00:02:46,966 --> 00:02:50,500 that we need to build a random forest classification model. 54 00:02:51,166 --> 00:02:51,500 All right. 55 00:02:51,500 --> 00:02:55,500 So here as opposed to before we won't find the model. 56 00:02:55,500 --> 00:03:00,066 We need easily you know by scrolling down for example down to Random Forest. 57 00:03:00,066 --> 00:03:00,800 Because know 58 00:03:00,800 --> 00:03:01,733 the name of the module 59 00:03:01,733 --> 00:03:05,400 is not random forest as it was the case with the previous classification models. 60 00:03:05,700 --> 00:03:08,000 This time it's actually right here it is. 61 00:03:08,000 --> 00:03:09,200 And symbol method. 62 00:03:09,200 --> 00:03:11,833 And the name of the module is exactly and symbol. 63 00:03:11,833 --> 00:03:13,800 So that's where you had to find. 64 00:03:13,800 --> 00:03:16,700 But you know if you looked for it by scrolling down that's fine. 65 00:03:16,700 --> 00:03:20,700 Because really I want you to get familiar with the scikit learn API. 66 00:03:21,066 --> 00:03:24,266 And so now the question is among all these and simple methods 67 00:03:24,500 --> 00:03:26,266 where is the one we want. 68 00:03:26,266 --> 00:03:29,733 Well, that's of course this one random forest classifier. 69 00:03:29,733 --> 00:03:30,833 Hard to miss right. 70 00:03:30,833 --> 00:03:33,033 So we're going to click this link. 71 00:03:33,033 --> 00:03:34,300 And there we go. 72 00:03:34,300 --> 00:03:37,933 This is the random forest classifier class with all the parameters. 73 00:03:37,933 --> 00:03:40,633 So check them out. We want enter all of them. 74 00:03:40,633 --> 00:03:44,700 But let me tell you right now the ones we will enter the first and most 75 00:03:44,700 --> 00:03:48,800 important one is the first one actually and estimators, which is of course 76 00:03:49,033 --> 00:03:52,633 the number of trees you want to have in your random forest classifier. 77 00:03:52,766 --> 00:03:55,266 Right? Number of trees in the forest. 78 00:03:55,266 --> 00:03:59,633 Then once again we'll choose another value of the criterion. 79 00:03:59,633 --> 00:04:03,233 And that's in order to be aligned with what you learned in the theory. 80 00:04:03,233 --> 00:04:05,600 You know, with key rules, intuition, lectures. 81 00:04:05,600 --> 00:04:08,900 He taught you about the random forest classification model with 82 00:04:09,066 --> 00:04:10,533 the entropy criterion. 83 00:04:10,533 --> 00:04:12,566 So we're going to select this. 84 00:04:12,566 --> 00:04:14,766 And that's it. No more parameters. 85 00:04:14,766 --> 00:04:17,933 You know for the other parameters here we'll just keep the default values. 86 00:04:18,133 --> 00:04:22,533 However we will just add a random state parameter and set its value to zero 87 00:04:22,666 --> 00:04:26,200 just so that we can have the same results displayed on our notebook. 88 00:04:26,333 --> 00:04:27,066 All right. 89 00:04:27,066 --> 00:04:28,800 So first let's copy this. 90 00:04:28,800 --> 00:04:31,833 You know the name of the class in the module right. 91 00:04:31,833 --> 00:04:33,400 So I'm copying this. 92 00:04:33,400 --> 00:04:39,300 Going back to our implementation creating a new code cell here pasting that. 93 00:04:39,700 --> 00:04:42,333 And then remember we have to start from. 94 00:04:42,333 --> 00:04:44,733 So from the scikit learn library. 95 00:04:44,733 --> 00:04:47,733 Then from the assemble module of the scikit learn library. 96 00:04:47,833 --> 00:04:51,266 And then remember we need to add here import. 97 00:04:51,766 --> 00:04:54,600 Well that random forest classifier 98 00:04:54,600 --> 00:04:58,233 which will allow us to build this random forest classification model. 99 00:04:58,500 --> 00:05:02,000 And speaking of building it, well that's exactly our next step here. 100 00:05:02,200 --> 00:05:05,166 We're going to build the classifier through this 101 00:05:05,166 --> 00:05:08,166 classifier variable, which will be nothing else 102 00:05:08,333 --> 00:05:12,833 than the instance of the random forest classifier class, therefore nothing else. 103 00:05:12,866 --> 00:05:15,666 Then the random forest classifier model itself. 104 00:05:15,666 --> 00:05:19,900 So here I'm copying this and basing it right here, adding some parentheses 105 00:05:20,133 --> 00:05:20,933 and there we go. 106 00:05:20,933 --> 00:05:23,100 Now let's add our two parameters. 107 00:05:23,100 --> 00:05:26,100 You know, the ones of which we're changing the default values. 108 00:05:26,133 --> 00:05:29,800 The first one is an T maters. 109 00:05:30,033 --> 00:05:32,200 So that's number of trees in the forest. 110 00:05:32,200 --> 00:05:34,900 The default value is actually 100. 111 00:05:34,900 --> 00:05:38,533 But you know it will be totally fine with ten estimators. 112 00:05:38,533 --> 00:05:40,100 You know ten trees in the forest. 113 00:05:40,100 --> 00:05:40,866 Why is that? 114 00:05:40,866 --> 00:05:43,466 That's because our data set is actually quite simple. 115 00:05:43,466 --> 00:05:48,333 It only contains two features and only 400 customers, you know, 400 observations. 116 00:05:48,600 --> 00:05:53,333 So we will definitely be fine with only ten trees in the forest. 117 00:05:53,533 --> 00:05:56,200 All right. And feel free to try out the numbers if you wish.