1
00:00:00,610 --> 00:00:06,290
In this lecture, we will learn how to create random forest classifier in Python.

2
00:00:07,800 --> 00:00:13,570
Not that we are only creating our classification, Morton, but you can also create do a regression

3
00:00:13,570 --> 00:00:18,220
model using the same method, but different codes for regression.

4
00:00:21,420 --> 00:00:27,740
Let's first look at the documentation of random photos on Escalon Web site.

5
00:00:29,880 --> 00:00:31,470
This one is for regression.

6
00:00:31,590 --> 00:00:34,290
So let's open the documentation for classifier.

7
00:00:44,810 --> 00:00:52,070
If you see the parameters or almost similar to one which we have for begging, the first one is the

8
00:00:52,070 --> 00:00:53,420
number of estimated.

9
00:00:55,770 --> 00:00:58,410
This is the number of Greece we want in our forest.

10
00:00:59,790 --> 00:01:01,300
So by default, this is spent.

11
00:01:01,920 --> 00:01:04,290
But we will lose tosing for our problem.

12
00:01:06,390 --> 00:01:09,330
Next is the criteria by default.

13
00:01:09,420 --> 00:01:14,630
It is GENI, but you can also change it to and Groppi for Cross and Groppi.

14
00:01:14,910 --> 00:01:16,680
As we have discussed with decrease.

15
00:01:19,080 --> 00:01:26,610
Next are some of the parameters for our KRI, so maximum depth, minimum sample displayed, minimum

16
00:01:26,610 --> 00:01:27,370
sample leaf.

17
00:01:27,460 --> 00:01:31,200
We have already discussed it during our D.C. entry tutorial.

18
00:01:32,740 --> 00:01:36,650
Now, this here is an important parameter.

19
00:01:37,740 --> 00:01:38,970
This is Max feature.

20
00:01:39,420 --> 00:01:46,590
Few remember, we select only a subset of our features to create separate trees in random forest.

21
00:01:48,240 --> 00:01:49,140
For example.

22
00:01:50,220 --> 00:01:57,480
In our case, we have 19 variables, so we will only take a subset of variables for each of the three.

23
00:01:59,510 --> 00:02:02,150
And by default, this is set to auto.

24
00:02:02,810 --> 00:02:08,360
And when we select auto, we get it squared Rudolph and feature selected.

25
00:02:09,650 --> 00:02:11,690
So we have 20 features.

26
00:02:12,020 --> 00:02:15,110
So a square root of 20 is around four point four.

27
00:02:15,380 --> 00:02:20,520
So probably we will get four or five features if we want.

28
00:02:20,630 --> 00:02:28,820
We can also give some numeric value or we can also provide a fraction of the maximum features.

29
00:02:28,850 --> 00:02:33,620
So, for example, if I give point five, we will select 10 features.

30
00:02:34,490 --> 00:02:38,240
So by default, the set to auto and we will also go with auto.

31
00:02:43,540 --> 00:02:49,200
Now, you can go through all the other parameters, but we will stick to before you consider.

32
00:02:53,210 --> 00:02:54,880
My third is almost similar.

33
00:02:54,920 --> 00:02:57,620
We will first import random forest classifier.

34
00:02:57,650 --> 00:03:00,560
We will create our random forest classifier object.

35
00:03:01,340 --> 00:03:07,760
We will train that object using the word ekstrand, inviting data, and then we will predict the values

36
00:03:07,760 --> 00:03:10,470
of by using a word to express data.

37
00:03:10,760 --> 00:03:17,960
So let's imported first and then we are using these parameters.

38
00:03:18,290 --> 00:03:19,730
We want those entries.

39
00:03:19,760 --> 00:03:23,630
So that's why we are giving an underscore to cemetaries thousand.

40
00:03:24,620 --> 00:03:28,370
We want the full processing power of war machine or computer.

41
00:03:28,520 --> 00:03:33,470
That's why we are giving and underscore the jobs equal to minus one by default.

42
00:03:33,500 --> 00:03:40,730
This is set to one that is our computer will only use one processor, but it is recommended that you

43
00:03:40,730 --> 00:03:45,740
use and jobs equal to minus one to get the full potential of your machine.

44
00:03:46,550 --> 00:03:49,160
Then we are giving random status for you to.

45
00:03:51,010 --> 00:03:55,930
Since we are randomly picking their time and randomly picking features for each of our Creat.

46
00:03:58,300 --> 00:04:05,380
So if we want to reproduce the same result in future, we can use random searches for to there to get

47
00:04:05,380 --> 00:04:06,930
the same result as this research.

48
00:04:07,690 --> 00:04:11,590
So always stick to some integer value of random.

49
00:04:11,950 --> 00:04:14,610
For example, we are using 42.

50
00:04:14,990 --> 00:04:17,800
But you can also use zero one, two or three.

51
00:04:17,980 --> 00:04:19,060
This will not impact you.

52
00:04:19,060 --> 00:04:20,020
Add more to the performance.

53
00:04:20,050 --> 00:04:24,010
This is just for reproducibility of the research that we are getting.

54
00:04:26,040 --> 00:04:30,460
And you already know that over M here is four or five.

55
00:04:30,480 --> 00:04:36,990
Since we are giving the default value and and default value, Biton is going to take on that group of

56
00:04:36,990 --> 00:04:39,420
20, which is equal to 4.5.

57
00:04:41,100 --> 00:04:42,330
Let's run this.

58
00:04:44,340 --> 00:04:48,330
And now we're in for to what extent and downward trend in this classified object.

59
00:04:50,950 --> 00:04:52,000
It will take some time.

60
00:04:54,690 --> 00:04:57,680
Now, let's grow confusion metrics on our test data.

61
00:05:02,690 --> 00:05:05,650
You can compare it with your bagging deserts.

62
00:05:06,440 --> 00:05:12,310
And then we're going to find out that Turistas Sports offered this model.

63
00:05:13,520 --> 00:05:17,210
So what accuracy for this model is sixty three point one percent.

64
00:05:17,630 --> 00:05:24,970
So if you compare the begging and begging of accuracy, score was six to 10 percent, but none for us.

65
00:05:25,010 --> 00:05:29,360
We have increased that accuracy score by one percent.

66
00:05:31,400 --> 00:05:34,280
So that's all for this video in the next video.

67
00:05:34,670 --> 00:05:39,470
We will learn how to optimize the values of our hyper parameters.

68
00:05:40,010 --> 00:05:43,730
So if you remember, there are different crede related.

69
00:05:44,920 --> 00:05:52,340
Parameters here, such as Max Thep minimums and police split in our example, we have used the default

70
00:05:52,340 --> 00:05:57,290
values, what we can optimize this value to get the best result for our data.

71
00:05:59,090 --> 00:06:06,470
So we will use Siwiec Grid Search and we will find the best tree for our data.