1 00:00:01,110 --> 00:00:05,610 Now we are going to discuss the third ensemble technique, which is boosting. 2 00:00:07,530 --> 00:00:10,630 In boosting also we create a number of trees. 3 00:00:11,380 --> 00:00:15,190 But the difference is that the trees are grown sequentially. 4 00:00:15,790 --> 00:00:20,740 That does each tree is grown using information from previously grown tree. 5 00:00:22,540 --> 00:00:26,110 We'll be discussing tree boosting techniques, gradient boosting. 6 00:00:26,830 --> 00:00:29,560 Add a boost, an extreme boost. 7 00:00:31,660 --> 00:00:35,320 First, let's understand the concept behind grading, boosting. 8 00:00:37,970 --> 00:00:46,340 Gradient boosting is a slow learning procedure, that is we fed artery using current residuals rather 9 00:00:46,340 --> 00:00:47,880 than the outcome as a response. 10 00:00:50,510 --> 00:00:53,570 So this is what happens in a gradient boosting. 11 00:00:54,110 --> 00:00:58,400 We start with a single node three, which has all the observations. 12 00:01:00,200 --> 00:01:02,600 The single Node three has some predictions. 13 00:01:04,130 --> 00:01:08,930 We find the difference between predictions and actuals to get did as it was. 14 00:01:11,300 --> 00:01:14,730 Then we use all the variables to fit these as tools. 15 00:01:15,350 --> 00:01:19,790 Using a small tree, but no duck. 16 00:01:20,730 --> 00:01:24,290 We control the length of the tree in boosting, like bagging. 17 00:01:24,730 --> 00:01:25,370 Very creative. 18 00:01:25,500 --> 00:01:26,520 Full length trees. 19 00:01:28,440 --> 00:01:34,600 So this small tree we just waited on need as it was, is multiplied with a shrinkage parameter. 20 00:01:35,430 --> 00:01:37,560 And then is added to the original tree. 21 00:01:39,660 --> 00:01:49,290 So the second tree is basically the sum of first tree and newly created tree multiplied by a shrinkage 22 00:01:49,290 --> 00:01:49,830 parameter. 23 00:01:52,390 --> 00:01:58,520 Now, using this secondary, we again find dirigibles and then using these residuals. 24 00:01:58,690 --> 00:02:01,570 We again fit a small tree on these residuals. 25 00:02:03,610 --> 00:02:10,060 This morti is again multiplied by LAMDA and added to this again, three to create the third three. 26 00:02:11,150 --> 00:02:11,870 And so on. 27 00:02:13,410 --> 00:02:20,300 And this way, we sequentially continue to create trees and add them while fitting the current Reginald's. 28 00:02:22,220 --> 00:02:27,190 So instead of learning, by creating the whole tree in one go, grading, boosting. 29 00:02:27,470 --> 00:02:31,100 Learn slowly by creating one small tree at a time. 30 00:02:33,530 --> 00:02:35,600 This is why it is called a slow learner. 31 00:02:37,010 --> 00:02:45,140 Additionally, if you keep a very small value of this shrinkage parameter lambda, it will learn even 32 00:02:45,140 --> 00:02:45,860 more slowly. 33 00:02:47,390 --> 00:02:54,860 So when we are learning this grading, boosting in our software, we need to provide three tuning parameters. 34 00:02:55,460 --> 00:03:03,290 One of them is number of trees to be built, which will mean how many small trees will be created and 35 00:03:03,410 --> 00:03:04,580 added to the main tree. 36 00:03:05,420 --> 00:03:09,260 Although I like bagging boosting in overfit. 37 00:03:09,980 --> 00:03:17,360 If the number of trees is too large, the second during is that shrinkage parameter lambda. 38 00:03:18,680 --> 00:03:21,740 This parameter controls the rate at which model loans. 39 00:03:22,580 --> 00:03:27,560 Typically it has values between zero point zero one and zero one zero zero one. 40 00:03:30,310 --> 00:03:33,340 The third parameter is depth of the boosting tree. 41 00:03:34,990 --> 00:03:39,040 This basically controls the growth of these small boosting trees. 42 00:03:39,580 --> 00:03:43,180 Even trees of size one sometimes vote when. 43 00:03:44,400 --> 00:03:46,440 Such small trees are known as stumps. 44 00:03:47,940 --> 00:03:54,540 So will need to provide the value of these three tuning parameters enough for threat mandatorily to 45 00:03:54,540 --> 00:03:56,040 run a gradient boosting model. 46 00:03:59,260 --> 00:04:03,040 Next is adaptability or adaptive boosting? 47 00:04:04,330 --> 00:04:06,640 In this first we created three. 48 00:04:07,930 --> 00:04:11,150 We find out the predictions using that three. 49 00:04:12,500 --> 00:04:21,680 Wherever that tree has misclassified or in case of regression, wherever the residual of that tree is 50 00:04:21,680 --> 00:04:26,720 very large, we increase the importance of that particular observation. 51 00:04:28,010 --> 00:04:30,950 Then we again create a tree on our observations. 52 00:04:32,180 --> 00:04:36,650 Not this time, since we have increased the importance of those observations. 53 00:04:37,240 --> 00:04:42,710 Our tree will try to capture or rightly classify those observations. 54 00:04:44,290 --> 00:04:49,040 So this time we'll get a second tree, which will be a little bit different than the first tree. 55 00:04:50,730 --> 00:04:55,680 Again, we will find out be residuals or misclassified observations. 56 00:04:56,580 --> 00:05:02,910 We will increase the weightage of those observations and run the model again in this way. 57 00:05:03,120 --> 00:05:07,590 We will continue to build model for some free decided number of names. 58 00:05:08,810 --> 00:05:12,720 This breed decided a number of pain has to be given as a barometer. 59 00:05:12,770 --> 00:05:20,840 When we run this model in our software, this is also a form of boosting because each of our three, 60 00:05:21,450 --> 00:05:24,980 our new three is learning from the previously clear dead trees. 61 00:05:26,570 --> 00:05:28,730 The third technique is a sea boost. 62 00:05:29,740 --> 00:05:32,720 Its boost is almost similar to gradient boosting. 63 00:05:33,350 --> 00:05:35,990 The only thing is in a steep boost. 64 00:05:36,470 --> 00:05:40,330 We use a regularized model to control over everything. 65 00:05:41,630 --> 00:05:49,760 I hope you're aware of regularisation methods in linear models matter such as Lassalle and Wrage integration. 66 00:05:50,030 --> 00:05:53,690 Use these techniques if you are aware of that. 67 00:05:54,290 --> 00:05:54,840 That's great. 68 00:05:54,980 --> 00:06:02,530 If you are not, you can find the links to understand Agent Lessel in the description of this mutilated. 69 00:06:04,220 --> 00:06:06,710 Basically when we are doing regularisation. 70 00:06:07,780 --> 00:06:11,340 We are adding a cost to the number of variables. 71 00:06:12,230 --> 00:06:21,200 So when we are optimizing MSE in case of regression or Jenie in case of classification, we add an additional 72 00:06:21,200 --> 00:06:25,760 penalty item for the number of variables that are going to be used in that model. 73 00:06:26,690 --> 00:06:35,630 And this way, we try to minimize the number of variables that come in that final model by doing regularization. 74 00:06:35,750 --> 00:06:39,310 Basically our model, a white overfitting. 75 00:06:40,340 --> 00:06:47,180 So a boost contains a regularization terms in the cost function otherways. 76 00:06:47,750 --> 00:06:51,470 It is exactly similar as Bourdin boosting only differences. 77 00:06:51,590 --> 00:06:53,610 It includes a regularization. 78 00:06:53,870 --> 00:06:57,470 That is why it has better chance in preventing overfitting. 79 00:06:58,250 --> 00:07:05,450 There are some other minor differences between a boost and gradient boosting, which we will not be 80 00:07:05,450 --> 00:07:06,850 covering as part of discourse.