1
00:00:00,170 --> 00:00:07,710
Hello, let's talk over what is Random Forest, which is exactly a machine learning algorithm, but

2
00:00:07,710 --> 00:00:15,000
before going deep into random forest, let's have a quick overview of our previous session in which

3
00:00:15,000 --> 00:00:17,850
we have learned your random forest nothing.

4
00:00:17,850 --> 00:00:20,960
But it is just a collection of multiple pieces.

5
00:00:20,970 --> 00:00:28,830
And then we have also some knowledge behind what is this is a tree, which is exactly this nothing.

6
00:00:28,830 --> 00:00:31,940
But it is just a hierarchy like structure.

7
00:00:32,280 --> 00:00:33,450
Then we are also learn.

8
00:00:33,750 --> 00:00:34,230
Yeah.

9
00:00:34,560 --> 00:00:38,740
What are the main key factors using that you can build.

10
00:00:38,740 --> 00:00:39,380
Believe this is a..

11
00:00:39,390 --> 00:00:40,500
Which is exactly.

12
00:00:40,500 --> 00:00:42,750
Information gain and deniability.

13
00:00:42,960 --> 00:00:50,440
So whosever future has highest information that will get selected as a pattern or because a major goal

14
00:00:50,440 --> 00:00:52,580
assign the season tree is nothing.

15
00:00:53,040 --> 00:00:59,850
You have to select what future as a pattern or similarly using Guiney index as well in your body as

16
00:00:59,850 --> 00:01:00,120
well.

17
00:01:00,600 --> 00:01:03,510
Hocevar feature has lost in index.

18
00:01:03,510 --> 00:01:07,860
It means that particular feature has lowest inside it.

19
00:01:08,220 --> 00:01:13,830
So it means I'm going to select that feature as my better note that what my index will do.

20
00:01:14,310 --> 00:01:21,150
So using either information or Guiney, you can select your decision tree and once you have a decision,

21
00:01:21,330 --> 00:01:22,920
you can easily do prediction.

22
00:01:23,250 --> 00:01:25,140
So intersession what I am going to do.

23
00:01:25,140 --> 00:01:31,620
I'm going to give you our intuition behind what is exactly a random forest, which is nothing, which

24
00:01:31,620 --> 00:01:39,390
is just a collection of which is just a collection of multiple decision, which is just a collection

25
00:01:39,390 --> 00:01:41,090
of multiple pieces.

26
00:01:41,100 --> 00:01:47,370
And so out of is nothing but random forest is nothing, but it is just a collection of multiple pieces

27
00:01:47,370 --> 00:01:47,880
and trees.

28
00:01:49,220 --> 00:01:56,780
So inside the forest this season, trees like this is my decision, given that this is my decision to

29
00:01:56,780 --> 00:02:00,220
do and let's say this is my decision tree.

30
00:02:00,410 --> 00:02:08,270
So decision tree, which are big over here that are basically built using a technique known as bagging

31
00:02:08,810 --> 00:02:17,000
on, you can also see both estab aggregation because also dumb as boot is trap.

32
00:02:17,960 --> 00:02:18,980
Aggregation.

33
00:02:20,220 --> 00:02:22,080
Would this tap aggregation?

34
00:02:22,110 --> 00:02:27,360
So what exactly is this Baghi or what is this step execution?

35
00:02:27,810 --> 00:02:35,150
So bold step execution or bagging is nothing good in which we are going to create a multiple bassnectar.

36
00:02:35,200 --> 00:02:39,420
This is my Bhagavan, this is my back, and this is my back treat.

37
00:02:39,600 --> 00:02:42,540
Let's say this this is a free one.

38
00:02:42,840 --> 00:02:45,050
Gives us some protection for some data.

39
00:02:45,060 --> 00:02:45,710
Let's do one.

40
00:02:45,720 --> 00:02:48,110
Let's say this is zero Alexiev.

41
00:02:48,120 --> 00:02:50,250
This gives one for this.

42
00:02:50,430 --> 00:02:51,720
What this bank will do.

43
00:02:51,960 --> 00:02:59,160
It will exactly go with the Mesabi or I can see it will basically aggregate my prediction.

44
00:02:59,160 --> 00:03:07,920
So once I will aggregate I will come up with a final prediction as one because Mzoudi always wins or

45
00:03:07,960 --> 00:03:10,650
Hennesy Mzoudi goes with one.

46
00:03:10,830 --> 00:03:12,650
So you will also over here.

47
00:03:12,870 --> 00:03:16,380
This is exactly my Baghi techne.

48
00:03:16,710 --> 00:03:20,440
So let me consider a very simple use case order.

49
00:03:20,480 --> 00:03:22,020
Let me open a new page.

50
00:03:22,320 --> 00:03:23,910
So I'm just going to open a new page.

51
00:03:24,040 --> 00:03:28,670
Now suppose suppose I have my entire dataset.

52
00:03:29,040 --> 00:03:30,890
This is my entire dataset.

53
00:03:30,900 --> 00:03:38,330
Suppose this is my entire dataset that has some support and columns and angles.

54
00:03:38,760 --> 00:03:41,300
This is a dimension of my dataset.

55
00:03:41,520 --> 00:03:46,500
So what exactly initially will do so initially, what rhinovirus?

56
00:03:46,500 --> 00:03:46,980
Billu.

57
00:03:47,160 --> 00:03:50,250
Initially I will pick up some sample.

58
00:03:50,370 --> 00:03:50,910
Let's see.

59
00:03:50,910 --> 00:03:57,870
I'm going to need to sample one, so I will pick up some sample of data and I will pass I will pass

60
00:03:58,110 --> 00:04:04,490
this to my model, this to my model, which is exactly this is and which is exactly like this one.

61
00:04:04,740 --> 00:04:14,030
So I will pass some sample of data with roll sampling, with roll sampling, as well as feature sampling,

62
00:04:14,040 --> 00:04:16,560
road sampling and feature sampling.

63
00:04:16,560 --> 00:04:19,860
Or you can see that all sampling and column sampling.

64
00:04:19,890 --> 00:04:24,540
So what exactly is this row sampling and sampling support?

65
00:04:25,170 --> 00:04:30,670
This dataset has six hundred rows into its and column.

66
00:04:31,050 --> 00:04:34,500
So initially, what will happen initially?

67
00:04:36,020 --> 00:04:44,430
So what will happen next in sample one, let's say two hundred rows and three column will get shifted,

68
00:04:44,490 --> 00:04:52,520
let's say randomly, randomly, so let's say randomly, two hundred and three column will get transferred

69
00:04:52,520 --> 00:04:56,860
to my this isn't one model and it will do some kind of prediction on this.

70
00:04:57,050 --> 00:05:02,630
Let's see if it gets prediction, let's say having some regression use case so it will give prediction

71
00:05:02,780 --> 00:05:04,490
in the form of subcontinent's data.

72
00:05:04,490 --> 00:05:11,810
And if we have a, let's say, classification use, it will give a prediction in the form of discrimination,

73
00:05:11,810 --> 00:05:14,090
either one or either zero.

74
00:05:14,120 --> 00:05:17,700
And if regression, it will do some in the form of continuous nature.

75
00:05:17,730 --> 00:05:19,490
Let's say ten point eleven.

76
00:05:19,490 --> 00:05:20,220
Anything else?

77
00:05:20,750 --> 00:05:23,440
So this is with respect to my sample.

78
00:05:23,540 --> 00:05:31,430
Similarly, after it, after what will happen, whatever data I will have there, it will get back to

79
00:05:31,430 --> 00:05:32,420
my data set.

80
00:05:32,420 --> 00:05:40,040
It will get back to when you does it again, my sample to my sample to withdraw all sampling and future

81
00:05:40,040 --> 00:05:41,380
sampling will happen.

82
00:05:41,780 --> 00:05:51,140
And again, this data will get over here and my decision G two will get trained on this particular data,

83
00:05:51,140 --> 00:05:55,740
on this particular sample, and it will do some kind of prediction.

84
00:05:55,760 --> 00:05:57,580
Let's see if regression is good.

85
00:05:57,740 --> 00:06:04,100
So I am going to predict as athleticism some value and if classification uses, I will predict in some

86
00:06:04,100 --> 00:06:05,040
discrete nature.

87
00:06:05,940 --> 00:06:12,890
Similarly, after some rule stopping and features happening again happen, then I'm going to consider

88
00:06:12,890 --> 00:06:13,770
my sample three.

89
00:06:13,880 --> 00:06:15,920
I think this is my decision.

90
00:06:16,700 --> 00:06:17,680
This isn't it tree.

91
00:06:18,020 --> 00:06:26,780
And this season the three will get clean on this particular sample, on this particular sample, and

92
00:06:26,780 --> 00:06:30,580
it will determine exactly you prediction as well.

93
00:06:30,980 --> 00:06:35,210
So it will exactly give a prediction as well, depending upon what use case we have.

94
00:06:36,640 --> 00:06:38,260
So what will happen here?

95
00:06:38,740 --> 00:06:42,160
So as we know, as we know, that we will open a new bid.

96
00:06:43,110 --> 00:06:51,330
As we know, so as we know, this country has basically a property, it has basically a property that

97
00:06:51,330 --> 00:06:55,200
it has high variance, that it has high.

98
00:06:55,680 --> 00:06:58,380
So what exactly is the meaning of this high variance?

99
00:06:58,680 --> 00:07:07,350
It means basically there is a huge difference between what exactly my prediction is and what is the

100
00:07:07,350 --> 00:07:08,430
actual data.

101
00:07:09,030 --> 00:07:11,610
That is the exact meaning of variance.

102
00:07:12,270 --> 00:07:13,260
So let's say.

103
00:07:14,300 --> 00:07:17,960
On training data at sea, on training data.

104
00:07:19,120 --> 00:07:24,340
Next year on training data, you have some good accuracy, let's say your accuracy is somewhere around

105
00:07:24,610 --> 00:07:25,920
97 percent.

106
00:07:26,050 --> 00:07:34,530
So it means it means my decision and my decision is able to learn relationship very properly.

107
00:07:35,140 --> 00:07:41,310
But when I have some unseen data, but I have some test data, then I have some test data.

108
00:07:41,860 --> 00:07:48,100
So it will not give me that much good prediction because what this country will look like, like this

109
00:07:48,400 --> 00:07:51,300
is nothing, but it is just having some kind of season.

110
00:07:51,760 --> 00:07:52,600
But what if.

111
00:07:52,960 --> 00:07:53,740
But what if.

112
00:07:54,830 --> 00:08:04,070
I have a data that has that has something of very outlier or I can see that is far behind, that is

113
00:08:04,070 --> 00:08:07,480
far beyond the range of the any one note of this.

114
00:08:07,490 --> 00:08:14,310
So it will basically give us the lowest score, 60 percent in case of test data.

115
00:08:14,480 --> 00:08:20,060
So this is basically my variance is in case of Triniti, that you have high score, whereas in case

116
00:08:20,060 --> 00:08:21,550
of test data you have to score.

117
00:08:21,890 --> 00:08:24,320
But what random will do?

118
00:08:24,710 --> 00:08:31,670
It will convert this high variance, this high variance into law, various.

119
00:08:32,560 --> 00:08:40,570
But the question is how how exactly this random forest will convert this high variance in blue areas,

120
00:08:40,570 --> 00:08:49,030
because you will notice, if you will notice that my decision tree one will get green on some particular

121
00:08:49,030 --> 00:08:49,500
data.

122
00:08:49,510 --> 00:08:54,470
So it will basically become expert for this particular subset.

123
00:08:54,640 --> 00:08:59,230
So it will basically give some kind of prediction, that decision to give some kind of prediction.

124
00:08:59,230 --> 00:09:05,350
And then season three will get very do some kind of prediction that we have our final prediction that

125
00:09:05,350 --> 00:09:06,700
we have a final prediction.

126
00:09:06,760 --> 00:09:13,670
So basically each and every decision will get trained on specifically some kind of data.

127
00:09:13,690 --> 00:09:16,620
So they basically become expert on the data.

128
00:09:16,810 --> 00:09:18,840
That's why they convert.

129
00:09:19,180 --> 00:09:23,230
That's why they convert this high that is into low variance.

130
00:09:23,380 --> 00:09:29,800
So this is basically a main characteristic of random forest that it can load that high variance which

131
00:09:29,800 --> 00:09:32,980
caused by this is entry into this low variance.

132
00:09:33,370 --> 00:09:35,170
So let me open a previous page.

133
00:09:35,560 --> 00:09:42,610
You will see what here you have decided to do season three so that our let's say you have a regression

134
00:09:42,610 --> 00:09:43,190
use case.

135
00:09:43,480 --> 00:09:45,070
So what RF will do?

136
00:09:45,370 --> 00:09:52,710
What I will do, it will consider mean of all the predictions given by this one.

137
00:09:52,750 --> 00:09:58,710
So what it will do, it will consider mean of devotion to duty three by three.

138
00:09:58,900 --> 00:10:06,130
So it will basically basically consider mean of this, whereas in case of classification it will basically

139
00:10:06,460 --> 00:10:08,020
go through it already.

140
00:10:08,020 --> 00:10:12,100
Or I can say it will consider more of all the prediction.

141
00:10:12,380 --> 00:10:18,280
Whoever has the highest count, Marella Forest, will consider it as my final prediction.

142
00:10:19,030 --> 00:10:21,390
That's what happened in case of classification.

143
00:10:21,550 --> 00:10:28,660
In case of regression, it will basically consider me of the interpretations and interpretation if I

144
00:10:28,810 --> 00:10:29,200
talk.

145
00:10:29,350 --> 00:10:31,840
So if I will talk about interpretation.

146
00:10:31,840 --> 00:10:32,100
Right.

147
00:10:32,260 --> 00:10:34,340
So it is difficult to plot then the forest?

148
00:10:34,360 --> 00:10:41,250
It is difficult for us because it is nothing but a collection of multiple decision trees.

149
00:10:41,560 --> 00:10:47,740
And you will see in this season, if you have one decision, you can easily interpret it, you can easily

150
00:10:47,740 --> 00:10:48,310
interpret it.

151
00:10:48,460 --> 00:10:55,000
If you have to block it, you can easily block that decision using some libraries of some libraries,

152
00:10:55,000 --> 00:10:57,370
using some plantings of export.

153
00:10:57,370 --> 00:11:01,630
Underscore this, which is exactly a library in Biton.

154
00:11:01,900 --> 00:11:10,460
And by doctors using this using both this one, you can easily plot your entry, but it is difficult

155
00:11:10,460 --> 00:11:15,310
to spot because it is nothing but a collection of, let's say, hundreds, and this isn't free.

156
00:11:15,520 --> 00:11:17,430
So it will become a little bit difficult.

157
00:11:17,440 --> 00:11:21,270
It will become politically difficult to interpret your forest.

158
00:11:21,430 --> 00:11:24,740
That is, if I can say that is a drawback of random forest.

159
00:11:25,150 --> 00:11:26,640
So that's all about the session.

160
00:11:26,680 --> 00:11:28,540
Hopefully you will love the session very much.

161
00:11:28,780 --> 00:11:32,200
And my easiest way of explaining the forest for you.

162
00:11:32,350 --> 00:11:34,210
I hope you will love the session very much.

163
00:11:34,240 --> 00:11:34,810
Thank you.

164
00:11:35,140 --> 00:11:36,010
Have a nice day.

165
00:11:36,190 --> 00:11:36,940
Keep learning.

166
00:11:36,940 --> 00:11:37,660
Keep growing.

167
00:11:37,900 --> 00:11:38,710
Keep practicing.