1
00:00:00,660 --> 00:00:05,520
In this video, we will learn how to execute bagging in Python.

2
00:00:07,970 --> 00:00:13,280
But before starting, let's have a look at the documentation of bagging.

3
00:00:14,400 --> 00:00:15,330
On, Eskild, on.

4
00:00:17,410 --> 00:00:20,920
I have provided you the link of this documentation.

5
00:00:22,100 --> 00:00:23,960
You can also search it on Google.

6
00:00:23,990 --> 00:00:28,550
Just write a skillern and begging and you will get this documentation.

7
00:00:31,210 --> 00:00:33,940
This is oficial, a Skillern documentation.

8
00:00:34,030 --> 00:00:41,290
And if you need any help regarding any of the machine learning algorithm, you can always look at Escalon

9
00:00:41,530 --> 00:00:43,360
for in-depth explanation.

10
00:00:45,300 --> 00:00:48,090
So underdog top, you can see this is over Syntex.

11
00:00:49,730 --> 00:00:51,480
All the parameters, suddenly Serbia.

12
00:00:52,550 --> 00:01:00,010
And if you scroll down, you can get the definitions of each parameter will go by all the parameters

13
00:01:00,020 --> 00:01:00,670
one by one.

14
00:01:02,860 --> 00:01:04,860
So first one is the Baeza submitted.

15
00:01:05,560 --> 00:01:13,170
So here you have to provide your classification or regression model on which you want to apply begging.

16
00:01:15,330 --> 00:01:21,970
So for our example, we will use our Kreig classifier to apply begging method on that classifier.

17
00:01:23,050 --> 00:01:24,670
Next, is that an estimated.

18
00:01:25,840 --> 00:01:34,300
So you mean that we are going to make subsets of over data and create KRI on each of those subsets?

19
00:01:34,660 --> 00:01:41,710
So here an estimate means the number of Baeza symmetry in the symbol or the number of subsets we want

20
00:01:41,710 --> 00:01:43,570
to create from our data.

21
00:01:45,820 --> 00:01:47,100
The default value is ten.

22
00:01:48,040 --> 00:01:50,290
But for our example, we will use tosing.

23
00:01:52,620 --> 00:01:54,940
Now next to the max samples.

24
00:01:55,110 --> 00:02:01,470
This is a number of samples we want to draw from a word BFD data frame if we want to draw.

25
00:02:01,530 --> 00:02:04,000
Suppose 80 percent of data.

26
00:02:04,050 --> 00:02:08,040
Each time we can just write zero point eight instead of one.

27
00:02:08,220 --> 00:02:12,840
So by default, it is centred one and one means hundred percent of our data.

28
00:02:14,250 --> 00:02:17,130
Now the next parameter is maximum feature.

29
00:02:17,760 --> 00:02:26,790
So this is for the number of variables we want in each of our estimated since in begging, we use all

30
00:02:26,820 --> 00:02:27,750
the variables.

31
00:02:28,530 --> 00:02:34,170
So we will use the default value of one next parameter is bootstrap.

32
00:02:35,400 --> 00:02:44,010
So while taking the samples out of our original data frame, we can either take this sample with replacement

33
00:02:44,520 --> 00:02:48,590
or we can take these samples out without replacement.

34
00:02:49,380 --> 00:02:53,370
Now, in bagging, we always take sample with replacement.

35
00:02:53,550 --> 00:02:56,120
So we will use bootstrap equal to crew.

36
00:02:59,570 --> 00:03:04,710
Again, we have a barometer for bootstrap features here also.

37
00:03:05,450 --> 00:03:12,020
If we are taking the subset of our features, we can either take or features with replacement or without

38
00:03:12,020 --> 00:03:12,650
replacement.

39
00:03:13,900 --> 00:03:18,520
With replacement means, suppose we are taking five feature auto friendly features.

40
00:03:19,390 --> 00:03:23,230
We first select one feature out of friendly feature.

41
00:03:24,310 --> 00:03:27,730
Then for second feature, we will consider all the 20 feature.

42
00:03:28,630 --> 00:03:32,270
So there are some chances that some feature may get repeated.

43
00:03:35,620 --> 00:03:37,200
Nexis or BBT, Scott?

44
00:03:39,680 --> 00:03:45,140
We have not discussed auto back samples, but just to give you a small explanation.

45
00:03:45,170 --> 00:03:49,020
Suppose we are taking a subset of our data for each story.

46
00:03:49,280 --> 00:03:52,100
The remaining sample is known as out of bag.

47
00:03:54,850 --> 00:03:57,230
Next parameter, here is one sock.

48
00:03:57,790 --> 00:04:05,050
If we have previously created a bagging model, we can use the result of that bagging model to start

49
00:04:05,050 --> 00:04:07,540
our new bagging model by default.

50
00:04:07,570 --> 00:04:15,490
This is set to false now and job parameter is for optimizing the performance.

51
00:04:16,030 --> 00:04:22,000
So if you have my deep core processor in your computer, you can select minus one.

52
00:04:22,600 --> 00:04:27,790
If we select minus one, this means that we will use all the processing power of computer that.

53
00:04:30,450 --> 00:04:37,440
No next says that I know mystate, since we are going to get samples of forward data and this sample,

54
00:04:37,440 --> 00:04:43,620
some random sample, we can provide random estate so that we can reproduce our result.

55
00:04:44,590 --> 00:04:49,530
So this is just a random number, you can give either zero one, two.

56
00:04:49,780 --> 00:04:51,190
Or any integer value.

57
00:04:52,060 --> 00:04:57,750
And if you stick to that value, you will always get the same result for your begging process.

58
00:05:01,200 --> 00:05:03,300
Next parameter is verbose.

59
00:05:04,080 --> 00:05:11,790
So while running any more than you get some messages, so verbose parameter can cruel the amount of

60
00:05:11,940 --> 00:05:16,650
information you get as output during running that model.

61
00:05:18,360 --> 00:05:22,320
This is not related to performance or forward backing model.

62
00:05:26,900 --> 00:05:28,690
So let's go back to a.

63
00:05:29,660 --> 00:05:30,790
Despite a notebook.

64
00:05:31,270 --> 00:05:39,040
And here we will first create a basic classifier tree because we are going to use that classifier tree

65
00:05:39,100 --> 00:05:42,730
as a parameter in our base estimate of our banking classifier.

66
00:05:45,800 --> 00:05:55,060
So we'll import KRI from Escalon and we'll create a CLV creed with A, B, C entry classifier, and

67
00:05:55,060 --> 00:05:58,200
you know that in bagging we grow Fullard tree.

68
00:05:58,750 --> 00:06:04,940
That's why we are not providing any parameter for max depth or minimum leaf size or minimum's.

69
00:06:05,010 --> 00:06:06,040
I'm Polet into Nano.

70
00:06:06,430 --> 00:06:07,630
We are going for Yewtree.

71
00:06:10,250 --> 00:06:11,120
Let's run this.

72
00:06:12,230 --> 00:06:14,660
Now we will import begging classifier.

73
00:06:16,320 --> 00:06:18,600
This begging classifier is available.

74
00:06:19,800 --> 00:06:27,950
And and dot on somebody will run this and we're bagging classified name is bag underscore CMF.

75
00:06:28,060 --> 00:06:29,220
This is a variable name.

76
00:06:29,890 --> 00:06:32,130
Then we will call baggage classifier.

77
00:06:32,980 --> 00:06:35,770
And our first parameter here is base a scimitar.

78
00:06:36,520 --> 00:06:44,200
Since we want to use CCRI classifier, that's why we will give base cemetaries CnF CCRI.

79
00:06:45,130 --> 00:06:49,360
Now here we want Towser increase.

80
00:06:49,960 --> 00:06:52,960
That's why we are putting tousling here by default.

81
00:06:52,990 --> 00:06:53,970
This value is spent.

82
00:06:54,130 --> 00:07:01,940
So if you have a small dataset, you can increase this number two thousand five thousand 2000 3000 extra.

83
00:07:02,830 --> 00:07:06,160
So for our example, we are taking it as Towser.

84
00:07:07,420 --> 00:07:13,450
In a way, we are seeing that we are creating Towser increase tolls and different trees and we are going

85
00:07:13,450 --> 00:07:17,290
to average out the result of those trees to get our final value.

86
00:07:19,210 --> 00:07:27,550
Next, is bootstrap equal to true, since we wanted our subsets for each of these Towser increase to

87
00:07:27,550 --> 00:07:30,490
be taken with replacement from our two regional dataset.

88
00:07:30,850 --> 00:07:33,510
That's why we have kept bootstrapping to grow.

89
00:07:34,600 --> 00:07:37,930
And then next, promontories and jobs equal to minus one.

90
00:07:38,050 --> 00:07:41,200
Since we want full processing power of our computer.

91
00:07:42,320 --> 00:07:45,430
That's why we have taken and underscore the job, says minus one.

92
00:07:45,820 --> 00:07:50,230
And we are giving random status for title.

93
00:07:50,260 --> 00:07:51,580
This is just a random number.

94
00:07:51,580 --> 00:07:52,630
You can give zero.

95
00:07:52,690 --> 00:07:54,430
One, two, three, etc..

96
00:07:55,180 --> 00:08:00,520
So if I don't give this number the next time I add on the same code, I will get a somewhat different

97
00:08:00,520 --> 00:08:00,940
result.

98
00:08:02,170 --> 00:08:09,270
So to reproduce the same result, it is important to have them demonstrate at some concern number.

99
00:08:11,480 --> 00:08:20,570
Let's create this begging object and we will foot begging object on our extreme and by train variable.

100
00:08:22,740 --> 00:08:27,330
It will take some time as we are creating Towser increase this time.

101
00:08:33,170 --> 00:08:41,260
Now we have 48 hour more than it may take, two to three minutes for you if you are not grinning it

102
00:08:41,360 --> 00:08:42,560
on an efficient machine.

103
00:08:46,320 --> 00:08:46,740
Next.

104
00:08:48,170 --> 00:08:52,950
We are going to on fusion metrics, on our test data here.

105
00:08:53,210 --> 00:08:54,650
These are our true values.

106
00:08:54,860 --> 00:09:00,170
And instead of creating another variable, we are just calling bag underscore classifier.

107
00:09:01,070 --> 00:09:04,010
And we are predicting the values for white test here.

108
00:09:04,520 --> 00:09:10,560
So you can also use this instead of creating a separate variable each time and calling that video with

109
00:09:10,560 --> 00:09:10,720
you.

110
00:09:12,530 --> 00:09:18,200
So let's run this to create our confusion metrics.

111
00:09:21,510 --> 00:09:27,770
Again, rules are for actual value, so this trend bearden's exceed that for actual values.

112
00:09:28,920 --> 00:09:30,690
Your first is zero.

113
00:09:30,720 --> 00:09:31,860
And second is one.

114
00:09:32,190 --> 00:09:37,120
So these are four actual values and columns are footprint.

115
00:09:37,210 --> 00:09:42,630
That when you saw 20, 28 cent, four zero as actual and zero as predicted.

116
00:09:44,840 --> 00:09:47,590
So our model predicted this trend.

117
00:09:47,670 --> 00:09:49,200
Great observations correctly.

118
00:09:49,460 --> 00:09:56,880
And also this 36 odd observations correctly, because our actual value is also one for this 36.

119
00:09:57,440 --> 00:10:00,620
And the predicted value is also one for this 36.

120
00:10:01,740 --> 00:10:08,460
And accuracy score will be 28 plus 36, divided, whether total number of situations or we can directly

121
00:10:08,460 --> 00:10:10,640
call accuracy under score scored.

122
00:10:13,490 --> 00:10:15,500
So let's find out that Chrissy.

123
00:10:16,480 --> 00:10:22,360
So our accuracy or more than begging, is sixty two point seven percent.

124
00:10:24,320 --> 00:10:26,650
If you remember earlier.

125
00:10:28,710 --> 00:10:36,630
When we were using a single digit entry, our accuracy score was first fifty four point nine percent.

126
00:10:36,750 --> 00:10:43,320
Then after pruning, our accuracy score was increased to fifty five point eighty eight percent.

127
00:10:43,920 --> 00:10:48,470
And now with begging, our accuracy score is 62 percent.

128
00:10:49,230 --> 00:10:54,770
So you can see creating large number of trees improved our accuracy score.

129
00:10:55,200 --> 00:11:03,420
And in the next lecture, we will look at the random forest matter to further improve.

130
00:11:03,450 --> 00:11:04,050
This is good.

131
00:11:04,530 --> 00:11:04,920
Thank you.