1 00:00:00,970 --> 00:00:04,630 So in this video, we will learn how to do bagging technique. 2 00:00:05,670 --> 00:00:08,070 We expect that after creating. 3 00:00:09,160 --> 00:00:16,330 A model based on bagging the best performance of our model would be better than the broomed or the paltry. 4 00:00:18,960 --> 00:00:24,150 And are bagging can be done using the addendum for this back in only. 5 00:00:25,320 --> 00:00:31,170 And are bagging is like a special case of random forest in bagging. 6 00:00:31,320 --> 00:00:37,830 We'll be using all the variables, only the dataset will be bootstrapped and multiple training sets 7 00:00:37,830 --> 00:00:38,600 will be created. 8 00:00:40,090 --> 00:00:46,180 In random forest, there may be fewer variables, but the data set will again be bootstrapped. 9 00:00:46,960 --> 00:00:51,970 So since bootstrapping is being done, there is just a difference of number of variables to be considered. 10 00:00:52,690 --> 00:00:57,260 If I mentioned that I want all the variables to be considered, it will be considered as bagging. 11 00:00:57,880 --> 00:01:03,160 If I say that I want only a few of the variables to be considered, it will be random forest. 12 00:01:04,190 --> 00:01:09,770 So we will use random forest package only to do wood bagging and random forest. 13 00:01:11,830 --> 00:01:15,500 So the first thing is we have to install this package, that dumb forest. 14 00:01:15,880 --> 00:01:17,000 I'll go and underscore my. 15 00:01:20,310 --> 00:01:23,580 No, that the F of forest is capital. 16 00:01:29,870 --> 00:01:31,260 So random forces started. 17 00:01:34,420 --> 00:01:37,180 You can scroll and take that there is random forest. 18 00:01:37,980 --> 00:01:41,560 You either take it hit or run the Liberty Command. 19 00:01:44,370 --> 00:01:47,240 And this random forest is a No. 20 00:01:48,600 --> 00:01:49,630 Next to me, Set-aside. 21 00:01:50,250 --> 00:01:53,910 As I told you earlier, this adjusts for reproducibility of the desert. 22 00:01:54,090 --> 00:01:59,460 So if you set the seed to zero, you and I will get these same desert, said the seed. 23 00:02:03,150 --> 00:02:04,920 So now to preen our model. 24 00:02:06,500 --> 00:02:13,560 We will use this random for this function, bagging is the name of the variable which will get the information 25 00:02:13,720 --> 00:02:21,690 of the bag model to bagging, get information from random forest where the first parameter is formula. 26 00:02:22,740 --> 00:02:25,520 Formula is the same as earlier. 27 00:02:25,740 --> 00:02:28,620 We have collection as the variable to be predicted. 28 00:02:29,370 --> 00:02:35,840 After that, the lesson we need to give, the variables that are going to be used for prediction are 29 00:02:35,840 --> 00:02:36,930 the predictor variables. 30 00:02:37,710 --> 00:02:39,900 Since we are going to use all the variables. 31 00:02:40,410 --> 00:02:43,140 Therefore, I have put a dot after literally. 32 00:02:45,260 --> 00:02:48,920 Next, better my dad is data, these are trained data set. 33 00:02:50,560 --> 00:02:52,200 The last parameter is M. 34 00:02:52,620 --> 00:02:52,920 Right. 35 00:02:54,530 --> 00:03:00,420 Emigrate means how many of the predicted variables that we want to consider while building our model, 36 00:03:01,380 --> 00:03:04,770 if we use all the predictors to create all the models. 37 00:03:05,700 --> 00:03:08,130 This will be bagging sets in bagging. 38 00:03:08,130 --> 00:03:12,270 We have all the predictors, only different training dataset. 39 00:03:13,930 --> 00:03:18,880 If we reduce the number of M3, it becomes a case of random forest. 40 00:03:19,870 --> 00:03:24,480 So MPRI for us is going to be 17 because entrain D2C. 41 00:03:26,040 --> 00:03:30,330 If you look in the right, the strain dataset has 18 variables. 42 00:03:30,960 --> 00:03:33,150 One of them is the dependent variable. 43 00:03:33,210 --> 00:03:36,360 So therefore, we have 17 independent predictive variables. 44 00:03:36,720 --> 00:03:38,270 So Embry's equal to 70. 45 00:03:39,190 --> 00:03:39,960 I'll run this combine. 46 00:03:43,680 --> 00:03:51,000 Now, there is a variable called bagging created bagging contains information of the bag model. 47 00:03:53,180 --> 00:04:02,990 Now, using that model, we can predict the values on our best, it so will create new column in dataset 48 00:04:03,230 --> 00:04:05,270 called S. Dollar Bagging. 49 00:04:06,630 --> 00:04:11,100 So, again, to predict devalues using this model, you used to predict function. 50 00:04:12,130 --> 00:04:13,320 S dollar bagging. 51 00:04:14,170 --> 00:04:16,960 Very well, we'll get these values from predict function. 52 00:04:17,410 --> 00:04:22,990 The first barometer is the modelling, which is Baghi, and the second parameter is the deficit. 53 00:04:24,410 --> 00:04:27,200 So you predict bagging my test? 54 00:04:27,400 --> 00:04:28,250 I learned this discover. 55 00:04:30,030 --> 00:04:35,970 And now predicted values using the bagging model are saved in test dollar bagging with. 56 00:04:37,150 --> 00:04:43,920 Now, to find out the masc of these predicted bagging values will use this formula. 57 00:04:44,720 --> 00:04:51,410 Just masc bagging get value from the difference between predicted and actual values squared. 58 00:04:52,140 --> 00:04:54,480 And then we find the meaning of these values. 59 00:04:54,660 --> 00:04:57,500 So we'll run this command to get the MCO bagging. 60 00:04:59,210 --> 00:05:05,480 Now you can see that masc bagging has value here, which is nearly 52 million. 61 00:05:06,750 --> 00:05:09,420 You can compare this value with the other embassies. 62 00:05:10,750 --> 00:05:16,930 This MSE was the value of a tree that we created with a depth of three levels. 63 00:05:17,830 --> 00:05:25,030 It had a massive value of one hundred thirteen million when we created the full tree, that is. 64 00:05:25,120 --> 00:05:27,490 There was no restriction on the number of levels. 65 00:05:28,480 --> 00:05:31,660 Then we had an embassy value of ninety nine million. 66 00:05:32,770 --> 00:05:40,060 When we prune the tree, we got some improvement and we had a massive rally of 97 million. 67 00:05:40,690 --> 00:05:41,980 But when we are using bagging. 68 00:05:43,120 --> 00:05:46,180 The MSE value as it reduced significantly. 69 00:05:46,540 --> 00:05:49,720 And now the embassy is nearly 52, 53 million. 70 00:05:50,140 --> 00:05:57,610 So you can see clearly there is huge improvement in prediction accuracy when we do Bagi. 71 00:05:59,470 --> 00:06:01,210 The downside of using begging is. 72 00:06:02,390 --> 00:06:07,640 Like earlier, you were able to plot the decision tree very easily. 73 00:06:08,360 --> 00:06:11,210 That is not possible when you have done bagging. 74 00:06:12,150 --> 00:06:19,050 So by trading off, interpretively, that prediction accuracy, you are able to get better prediction 75 00:06:19,050 --> 00:06:19,470 accuracy.