1 00:00:00,610 --> 00:00:06,290 In this lecture, we will learn how to create random forest classifier in Python. 2 00:00:07,800 --> 00:00:13,570 Not that we are only creating our classification, Morton, but you can also create do a regression 3 00:00:13,570 --> 00:00:18,220 model using the same method, but different codes for regression. 4 00:00:21,420 --> 00:00:27,740 Let's first look at the documentation of random photos on Escalon Web site. 5 00:00:29,880 --> 00:00:31,470 This one is for regression. 6 00:00:31,590 --> 00:00:34,290 So let's open the documentation for classifier. 7 00:00:44,810 --> 00:00:52,070 If you see the parameters or almost similar to one which we have for begging, the first one is the 8 00:00:52,070 --> 00:00:53,420 number of estimated. 9 00:00:55,770 --> 00:00:58,410 This is the number of Greece we want in our forest. 10 00:00:59,790 --> 00:01:01,300 So by default, this is spent. 11 00:01:01,920 --> 00:01:04,290 But we will lose tosing for our problem. 12 00:01:06,390 --> 00:01:09,330 Next is the criteria by default. 13 00:01:09,420 --> 00:01:14,630 It is GENI, but you can also change it to and Groppi for Cross and Groppi. 14 00:01:14,910 --> 00:01:16,680 As we have discussed with decrease. 15 00:01:19,080 --> 00:01:26,610 Next are some of the parameters for our KRI, so maximum depth, minimum sample displayed, minimum 16 00:01:26,610 --> 00:01:27,370 sample leaf. 17 00:01:27,460 --> 00:01:31,200 We have already discussed it during our D.C. entry tutorial. 18 00:01:32,740 --> 00:01:36,650 Now, this here is an important parameter. 19 00:01:37,740 --> 00:01:38,970 This is Max feature. 20 00:01:39,420 --> 00:01:46,590 Few remember, we select only a subset of our features to create separate trees in random forest. 21 00:01:48,240 --> 00:01:49,140 For example. 22 00:01:50,220 --> 00:01:57,480 In our case, we have 19 variables, so we will only take a subset of variables for each of the three. 23 00:01:59,510 --> 00:02:02,150 And by default, this is set to auto. 24 00:02:02,810 --> 00:02:08,360 And when we select auto, we get it squared Rudolph and feature selected. 25 00:02:09,650 --> 00:02:11,690 So we have 20 features. 26 00:02:12,020 --> 00:02:15,110 So a square root of 20 is around four point four. 27 00:02:15,380 --> 00:02:20,520 So probably we will get four or five features if we want. 28 00:02:20,630 --> 00:02:28,820 We can also give some numeric value or we can also provide a fraction of the maximum features. 29 00:02:28,850 --> 00:02:33,620 So, for example, if I give point five, we will select 10 features. 30 00:02:34,490 --> 00:02:38,240 So by default, the set to auto and we will also go with auto. 31 00:02:43,540 --> 00:02:49,200 Now, you can go through all the other parameters, but we will stick to before you consider. 32 00:02:53,210 --> 00:02:54,880 My third is almost similar. 33 00:02:54,920 --> 00:02:57,620 We will first import random forest classifier. 34 00:02:57,650 --> 00:03:00,560 We will create our random forest classifier object. 35 00:03:01,340 --> 00:03:07,760 We will train that object using the word ekstrand, inviting data, and then we will predict the values 36 00:03:07,760 --> 00:03:10,470 of by using a word to express data. 37 00:03:10,760 --> 00:03:17,960 So let's imported first and then we are using these parameters. 38 00:03:18,290 --> 00:03:19,730 We want those entries. 39 00:03:19,760 --> 00:03:23,630 So that's why we are giving an underscore to cemetaries thousand. 40 00:03:24,620 --> 00:03:28,370 We want the full processing power of war machine or computer. 41 00:03:28,520 --> 00:03:33,470 That's why we are giving and underscore the jobs equal to minus one by default. 42 00:03:33,500 --> 00:03:40,730 This is set to one that is our computer will only use one processor, but it is recommended that you 43 00:03:40,730 --> 00:03:45,740 use and jobs equal to minus one to get the full potential of your machine. 44 00:03:46,550 --> 00:03:49,160 Then we are giving random status for you to. 45 00:03:51,010 --> 00:03:55,930 Since we are randomly picking their time and randomly picking features for each of our Creat. 46 00:03:58,300 --> 00:04:05,380 So if we want to reproduce the same result in future, we can use random searches for to there to get 47 00:04:05,380 --> 00:04:06,930 the same result as this research. 48 00:04:07,690 --> 00:04:11,590 So always stick to some integer value of random. 49 00:04:11,950 --> 00:04:14,610 For example, we are using 42. 50 00:04:14,990 --> 00:04:17,800 But you can also use zero one, two or three. 51 00:04:17,980 --> 00:04:19,060 This will not impact you. 52 00:04:19,060 --> 00:04:20,020 Add more to the performance. 53 00:04:20,050 --> 00:04:24,010 This is just for reproducibility of the research that we are getting. 54 00:04:26,040 --> 00:04:30,460 And you already know that over M here is four or five. 55 00:04:30,480 --> 00:04:36,990 Since we are giving the default value and and default value, Biton is going to take on that group of 56 00:04:36,990 --> 00:04:39,420 20, which is equal to 4.5. 57 00:04:41,100 --> 00:04:42,330 Let's run this. 58 00:04:44,340 --> 00:04:48,330 And now we're in for to what extent and downward trend in this classified object. 59 00:04:50,950 --> 00:04:52,000 It will take some time. 60 00:04:54,690 --> 00:04:57,680 Now, let's grow confusion metrics on our test data. 61 00:05:02,690 --> 00:05:05,650 You can compare it with your bagging deserts. 62 00:05:06,440 --> 00:05:12,310 And then we're going to find out that Turistas Sports offered this model. 63 00:05:13,520 --> 00:05:17,210 So what accuracy for this model is sixty three point one percent. 64 00:05:17,630 --> 00:05:24,970 So if you compare the begging and begging of accuracy, score was six to 10 percent, but none for us. 65 00:05:25,010 --> 00:05:29,360 We have increased that accuracy score by one percent. 66 00:05:31,400 --> 00:05:34,280 So that's all for this video in the next video. 67 00:05:34,670 --> 00:05:39,470 We will learn how to optimize the values of our hyper parameters. 68 00:05:40,010 --> 00:05:43,730 So if you remember, there are different crede related. 69 00:05:44,920 --> 00:05:52,340 Parameters here, such as Max Thep minimums and police split in our example, we have used the default 70 00:05:52,340 --> 00:05:57,290 values, what we can optimize this value to get the best result for our data. 71 00:05:59,090 --> 00:06:06,470 So we will use Siwiec Grid Search and we will find the best tree for our data.