1 00:00:00,066 --> 00:00:02,100 Hello and welcome to this art tutorial. 2 00:00:02,100 --> 00:00:05,066 We are finally getting to the final round of our classification 3 00:00:05,066 --> 00:00:07,000 adventure in our machine learning journey. 4 00:00:07,000 --> 00:00:11,633 Because today we're making the very last classifier, the random Forest classifier. 5 00:00:12,000 --> 00:00:13,300 So let's start with the basics. 6 00:00:13,300 --> 00:00:14,700 We'll go to our Machine Learning A-Z 7 00:00:14,700 --> 00:00:17,533 folder to set the right folder as working directory. 8 00:00:17,533 --> 00:00:20,600 So this right folder is in by three classification. 9 00:00:20,933 --> 00:00:23,633 And here it is section 20 random forest classification. 10 00:00:23,633 --> 00:00:25,300 That's the right folder. 11 00:00:25,300 --> 00:00:29,100 And as I was telling you we are reaching the end of part 12 00:00:29,100 --> 00:00:33,266 three classification because the next section will be dedicated to evaluating 13 00:00:33,266 --> 00:00:36,633 classification models performance in order to improve the models. 14 00:00:37,100 --> 00:00:39,300 And therefore this is our last classifier. 15 00:00:39,300 --> 00:00:40,600 So let's go to this folder. 16 00:00:40,600 --> 00:00:42,600 That's the folder we want to set its working directory. 17 00:00:42,600 --> 00:00:45,533 Make sure that you have the social network add CSV file. 18 00:00:45,533 --> 00:00:46,466 And if that's the case 19 00:00:46,466 --> 00:00:49,966 click on this more button here and then set as working directory. 20 00:00:50,466 --> 00:00:51,300 All good here. 21 00:00:51,300 --> 00:00:54,800 And now let's go to our classification template to build 22 00:00:54,800 --> 00:00:57,833 our random forest classifier with the best efficiency. 23 00:00:58,066 --> 00:01:00,900 So we'll just take everything from here 24 00:01:00,900 --> 00:01:03,400 to the bottom copy 25 00:01:03,400 --> 00:01:06,433 and then we'll paste it here. 26 00:01:06,966 --> 00:01:07,333 All right. 27 00:01:07,333 --> 00:01:10,333 So I'll just give a quick reminder about what this template is doing. 28 00:01:10,633 --> 00:01:13,233 So the first step is to import the data set. 29 00:01:13,233 --> 00:01:15,566 Then we encode here the target feature as vector. 30 00:01:15,566 --> 00:01:19,333 Because we're just doing this to specify to the classifier that our last column 31 00:01:19,333 --> 00:01:23,866 purchased is a categorical variable with factor levels zero and one. 32 00:01:24,200 --> 00:01:26,566 So we don't need to do it for all the classifiers. 33 00:01:26,566 --> 00:01:30,300 But we do need to do it for some, like the Naive Bayes classifier as we saw. 34 00:01:30,600 --> 00:01:33,600 So it's good to keep this code section in the template. 35 00:01:33,600 --> 00:01:36,833 Then we split the data set into the training set and the test set. 36 00:01:36,833 --> 00:01:37,933 And then in the last section 37 00:01:37,933 --> 00:01:41,133 of the pre-processing phase we apply features getting to the data. 38 00:01:41,133 --> 00:01:44,900 And we are only doing this because at the end of this template we make this 39 00:01:44,900 --> 00:01:48,733 very cool graph for both the training set result and the test set results 40 00:01:48,733 --> 00:01:52,066 that will plot the prediction regions and the prediction boundary. 41 00:01:52,466 --> 00:01:55,466 Okay, so right now we need to change a few things in this template. 42 00:01:55,533 --> 00:01:59,100 First, let's not forget to change the title of the plot here 43 00:01:59,100 --> 00:02:02,100 will replace classifier by random 44 00:02:02,366 --> 00:02:05,366 forest classification. 45 00:02:05,500 --> 00:02:09,700 We'll copy this because we'll do the same in the test set results here. 46 00:02:09,700 --> 00:02:12,066 It is perfect okay. 47 00:02:12,066 --> 00:02:16,600 And now what we only need to do is to create our classifier here. 48 00:02:16,700 --> 00:02:18,333 And then we'll be ready. 49 00:02:18,333 --> 00:02:20,766 So I'll just space the same thing here. 50 00:02:20,766 --> 00:02:23,266 Random forest classification to the training set. 51 00:02:23,266 --> 00:02:25,933 And now let's create our classifier. 52 00:02:25,933 --> 00:02:26,333 All right. 53 00:02:26,333 --> 00:02:28,500 Let's do this. So it's very simple. 54 00:02:28,500 --> 00:02:31,900 We're going to use a library that is called random forest. 55 00:02:32,300 --> 00:02:33,000 So let's do this. 56 00:02:33,000 --> 00:02:37,800 Let's first install the package with the command install that packages. 57 00:02:38,200 --> 00:02:42,300 So that's for those of you who are using this package for the first time. 58 00:02:42,300 --> 00:02:44,933 Then you won't have it installed on your packages. 59 00:02:44,933 --> 00:02:46,033 This here. 60 00:02:46,033 --> 00:02:49,200 As you can see, mine should be already here because I used it 61 00:02:49,200 --> 00:02:49,700 several times. 62 00:02:49,700 --> 00:02:52,266 Yes, there it is, random forest. 63 00:02:52,266 --> 00:02:56,400 So if it's not the case for you then you need to install the package here. 64 00:02:56,700 --> 00:02:59,433 And so in this install the packages 65 00:02:59,433 --> 00:03:03,066 function you just need to input the name of the library in quotes. 66 00:03:03,400 --> 00:03:06,033 So it's random forest. 67 00:03:06,033 --> 00:03:10,033 Be careful with the capital F here and not capital R here. 68 00:03:10,300 --> 00:03:12,700 All right. So that's what you need to import anyway. 69 00:03:12,700 --> 00:03:14,200 That will install the package. 70 00:03:14,200 --> 00:03:16,800 I won't do it because I already have mine installed. 71 00:03:16,800 --> 00:03:20,666 But what you just need to do is to select this and press Command and Control. 72 00:03:20,666 --> 00:03:21,900 Plus enter to execute. 73 00:03:21,900 --> 00:03:24,900 And this will install the package without any issues. 74 00:03:25,200 --> 00:03:25,500 All right. 75 00:03:25,500 --> 00:03:29,966 So here I'll just press command plus shift plus C to make it as a comment. 76 00:03:30,200 --> 00:03:34,166 But then what I need to do is to add the command here library 77 00:03:34,600 --> 00:03:37,600 and then input the name of the Random Forest library. 78 00:03:37,833 --> 00:03:39,900 Because as you can see here, it's not selected. 79 00:03:39,900 --> 00:03:44,900 So I need to add this library random forest command to select it automatically, 80 00:03:44,900 --> 00:03:48,400 especially if I want to make some automated scripts in the future. 81 00:03:48,966 --> 00:03:51,833 So that's the thing to do that's very practical. 82 00:03:51,833 --> 00:03:53,166 And now we're all good. 83 00:03:53,166 --> 00:03:55,600 Let's create our classifier okay. 84 00:03:55,600 --> 00:03:58,600 So as usual we'll start by creating the variable classifier. 85 00:03:59,233 --> 00:04:01,900 That will be a random forest classifier itself. 86 00:04:01,900 --> 00:04:05,433 And now we will use the random forest function to build our classifier. 87 00:04:05,966 --> 00:04:08,966 So here I'll just take the function random forest. 88 00:04:09,733 --> 00:04:12,033 And now let's see what arguments we need to input. 89 00:04:12,033 --> 00:04:17,566 So we'll just go here and press F1 to get some info about this function. 90 00:04:17,566 --> 00:04:19,966 So here we need to click here. 91 00:04:19,966 --> 00:04:24,366 And here are some info about the Random Forest library and function. 92 00:04:24,600 --> 00:04:27,600 So let's look at the arguments which are which we are interested in. 93 00:04:28,200 --> 00:04:29,766 Okay. So the first argument is data. 94 00:04:29,766 --> 00:04:30,400 We will need it. 95 00:04:30,400 --> 00:04:33,033 As you can see it's an optional data frame. 96 00:04:33,033 --> 00:04:36,900 So it's not an argument that we need to build a random forest classifier. 97 00:04:37,333 --> 00:04:39,433 Same for subset and action. 98 00:04:39,433 --> 00:04:41,400 We actually don't need the three first parameters 99 00:04:41,400 --> 00:04:42,933 to build a random forest classifier. 100 00:04:42,933 --> 00:04:48,400 However, we of course need the x and y arguments here, which as you can 101 00:04:48,400 --> 00:04:52,500 guess, will be the matrix of features and the dependent variable vector. 102 00:04:53,100 --> 00:04:57,433 Indeed, as you can see, x is a data frame or a matrix of predictors. 103 00:04:57,433 --> 00:04:58,600 So that's pretty clear. 104 00:04:58,600 --> 00:05:01,966 X is our matrix of features a matrix of predictors 105 00:05:01,966 --> 00:05:05,433 which are independent variables age and estimated salary. 106 00:05:05,933 --> 00:05:08,566 And then y is set to be a response vector. 107 00:05:08,566 --> 00:05:11,400 So of course y is the dependent variable vector 108 00:05:11,400 --> 00:05:14,933 that is the purchased column okay perfect. 109 00:05:14,933 --> 00:05:16,200 And then the last argument 110 00:05:16,200 --> 00:05:19,266 that we will need to build our random forest classifier is 111 00:05:19,266 --> 00:05:22,533 of course the number of trees we want to have in the forest. 112 00:05:23,066 --> 00:05:27,033 And this number of trees is given by this entry argument here, 113 00:05:27,300 --> 00:05:30,600 which as you can see, is the number of trees to grow. 114 00:05:30,933 --> 00:05:35,633 So to grow is a nice way of telling that the trees will learn from the data set. 115 00:05:35,633 --> 00:05:37,133 How to make the predictions. 116 00:05:37,133 --> 00:05:40,100 And basically this argument is the number of trees. 117 00:05:40,100 --> 00:05:43,000 So remember in the Python tutorial we chose ten trees. 118 00:05:43,000 --> 00:05:44,500 Well let's do the same here. 119 00:05:44,500 --> 00:05:46,933 Let's pick N3 equals ten. 120 00:05:46,933 --> 00:05:52,000 And that is actually everything we need to build our random forest classifier okay. 121 00:05:52,000 --> 00:05:52,733 So let's do this. 122 00:05:52,733 --> 00:05:54,066 Let's input the arguments. 123 00:05:54,066 --> 00:05:56,133 Let's go back to R and here we are.