1
00:00:00,970 --> 00:00:04,630
So in this video, we will learn how to do bagging technique.

2
00:00:05,670 --> 00:00:08,070
We expect that after creating.

3
00:00:09,160 --> 00:00:16,330
A model based on bagging the best performance of our model would be better than the broomed or the paltry.

4
00:00:18,960 --> 00:00:24,150
And are bagging can be done using the addendum for this back in only.

5
00:00:25,320 --> 00:00:31,170
And are bagging is like a special case of random forest in bagging.

6
00:00:31,320 --> 00:00:37,830
We'll be using all the variables, only the dataset will be bootstrapped and multiple training sets

7
00:00:37,830 --> 00:00:38,600
will be created.

8
00:00:40,090 --> 00:00:46,180
In random forest, there may be fewer variables, but the data set will again be bootstrapped.

9
00:00:46,960 --> 00:00:51,970
So since bootstrapping is being done, there is just a difference of number of variables to be considered.

10
00:00:52,690 --> 00:00:57,260
If I mentioned that I want all the variables to be considered, it will be considered as bagging.

11
00:00:57,880 --> 00:01:03,160
If I say that I want only a few of the variables to be considered, it will be random forest.

12
00:01:04,190 --> 00:01:09,770
So we will use random forest package only to do wood bagging and random forest.

13
00:01:11,830 --> 00:01:15,500
So the first thing is we have to install this package, that dumb forest.

14
00:01:15,880 --> 00:01:17,000
I'll go and underscore my.

15
00:01:20,310 --> 00:01:23,580
No, that the F of forest is capital.

16
00:01:29,870 --> 00:01:31,260
So random forces started.

17
00:01:34,420 --> 00:01:37,180
You can scroll and take that there is random forest.

18
00:01:37,980 --> 00:01:41,560
You either take it hit or run the Liberty Command.

19
00:01:44,370 --> 00:01:47,240
And this random forest is a No.

20
00:01:48,600 --> 00:01:49,630
Next to me, Set-aside.

21
00:01:50,250 --> 00:01:53,910
As I told you earlier, this adjusts for reproducibility of the desert.

22
00:01:54,090 --> 00:01:59,460
So if you set the seed to zero, you and I will get these same desert, said the seed.

23
00:02:03,150 --> 00:02:04,920
So now to preen our model.

24
00:02:06,500 --> 00:02:13,560
We will use this random for this function, bagging is the name of the variable which will get the information

25
00:02:13,720 --> 00:02:21,690
of the bag model to bagging, get information from random forest where the first parameter is formula.

26
00:02:22,740 --> 00:02:25,520
Formula is the same as earlier.

27
00:02:25,740 --> 00:02:28,620
We have collection as the variable to be predicted.

28
00:02:29,370 --> 00:02:35,840
After that, the lesson we need to give, the variables that are going to be used for prediction are

29
00:02:35,840 --> 00:02:36,930
the predictor variables.

30
00:02:37,710 --> 00:02:39,900
Since we are going to use all the variables.

31
00:02:40,410 --> 00:02:43,140
Therefore, I have put a dot after literally.

32
00:02:45,260 --> 00:02:48,920
Next, better my dad is data, these are trained data set.

33
00:02:50,560 --> 00:02:52,200
The last parameter is M.

34
00:02:52,620 --> 00:02:52,920
Right.

35
00:02:54,530 --> 00:03:00,420
Emigrate means how many of the predicted variables that we want to consider while building our model,

36
00:03:01,380 --> 00:03:04,770
if we use all the predictors to create all the models.

37
00:03:05,700 --> 00:03:08,130
This will be bagging sets in bagging.

38
00:03:08,130 --> 00:03:12,270
We have all the predictors, only different training dataset.

39
00:03:13,930 --> 00:03:18,880
If we reduce the number of M3, it becomes a case of random forest.

40
00:03:19,870 --> 00:03:24,480
So MPRI for us is going to be 17 because entrain D2C.

41
00:03:26,040 --> 00:03:30,330
If you look in the right, the strain dataset has 18 variables.

42
00:03:30,960 --> 00:03:33,150
One of them is the dependent variable.

43
00:03:33,210 --> 00:03:36,360
So therefore, we have 17 independent predictive variables.

44
00:03:36,720 --> 00:03:38,270
So Embry's equal to 70.

45
00:03:39,190 --> 00:03:39,960
I'll run this combine.

46
00:03:43,680 --> 00:03:51,000
Now, there is a variable called bagging created bagging contains information of the bag model.

47
00:03:53,180 --> 00:04:02,990
Now, using that model, we can predict the values on our best, it so will create new column in dataset

48
00:04:03,230 --> 00:04:05,270
called S. Dollar Bagging.

49
00:04:06,630 --> 00:04:11,100
So, again, to predict devalues using this model, you used to predict function.

50
00:04:12,130 --> 00:04:13,320
S dollar bagging.

51
00:04:14,170 --> 00:04:16,960
Very well, we'll get these values from predict function.

52
00:04:17,410 --> 00:04:22,990
The first barometer is the modelling, which is Baghi, and the second parameter is the deficit.

53
00:04:24,410 --> 00:04:27,200
So you predict bagging my test?

54
00:04:27,400 --> 00:04:28,250
I learned this discover.

55
00:04:30,030 --> 00:04:35,970
And now predicted values using the bagging model are saved in test dollar bagging with.

56
00:04:37,150 --> 00:04:43,920
Now, to find out the masc of these predicted bagging values will use this formula.

57
00:04:44,720 --> 00:04:51,410
Just masc bagging get value from the difference between predicted and actual values squared.

58
00:04:52,140 --> 00:04:54,480
And then we find the meaning of these values.

59
00:04:54,660 --> 00:04:57,500
So we'll run this command to get the MCO bagging.

60
00:04:59,210 --> 00:05:05,480
Now you can see that masc bagging has value here, which is nearly 52 million.

61
00:05:06,750 --> 00:05:09,420
You can compare this value with the other embassies.

62
00:05:10,750 --> 00:05:16,930
This MSE was the value of a tree that we created with a depth of three levels.

63
00:05:17,830 --> 00:05:25,030
It had a massive value of one hundred thirteen million when we created the full tree, that is.

64
00:05:25,120 --> 00:05:27,490
There was no restriction on the number of levels.

65
00:05:28,480 --> 00:05:31,660
Then we had an embassy value of ninety nine million.

66
00:05:32,770 --> 00:05:40,060
When we prune the tree, we got some improvement and we had a massive rally of 97 million.

67
00:05:40,690 --> 00:05:41,980
But when we are using bagging.

68
00:05:43,120 --> 00:05:46,180
The MSE value as it reduced significantly.

69
00:05:46,540 --> 00:05:49,720
And now the embassy is nearly 52, 53 million.

70
00:05:50,140 --> 00:05:57,610
So you can see clearly there is huge improvement in prediction accuracy when we do Bagi.

71
00:05:59,470 --> 00:06:01,210
The downside of using begging is.

72
00:06:02,390 --> 00:06:07,640
Like earlier, you were able to plot the decision tree very easily.

73
00:06:08,360 --> 00:06:11,210
That is not possible when you have done bagging.

74
00:06:12,150 --> 00:06:19,050
So by trading off, interpretively, that prediction accuracy, you are able to get better prediction

75
00:06:19,050 --> 00:06:19,470
accuracy.