1
00:00:01,110 --> 00:00:05,610
Now we are going to discuss the third ensemble technique, which is boosting.

2
00:00:07,530 --> 00:00:10,630
In boosting also we create a number of trees.

3
00:00:11,380 --> 00:00:15,190
But the difference is that the trees are grown sequentially.

4
00:00:15,790 --> 00:00:20,740
That does each tree is grown using information from previously grown tree.

5
00:00:22,540 --> 00:00:26,110
We'll be discussing tree boosting techniques, gradient boosting.

6
00:00:26,830 --> 00:00:29,560
Add a boost, an extreme boost.

7
00:00:31,660 --> 00:00:35,320
First, let's understand the concept behind grading, boosting.

8
00:00:37,970 --> 00:00:46,340
Gradient boosting is a slow learning procedure, that is we fed artery using current residuals rather

9
00:00:46,340 --> 00:00:47,880
than the outcome as a response.

10
00:00:50,510 --> 00:00:53,570
So this is what happens in a gradient boosting.

11
00:00:54,110 --> 00:00:58,400
We start with a single node three, which has all the observations.

12
00:01:00,200 --> 00:01:02,600
The single Node three has some predictions.

13
00:01:04,130 --> 00:01:08,930
We find the difference between predictions and actuals to get did as it was.

14
00:01:11,300 --> 00:01:14,730
Then we use all the variables to fit these as tools.

15
00:01:15,350 --> 00:01:19,790
Using a small tree, but no duck.

16
00:01:20,730 --> 00:01:24,290
We control the length of the tree in boosting, like bagging.

17
00:01:24,730 --> 00:01:25,370
Very creative.

18
00:01:25,500 --> 00:01:26,520
Full length trees.

19
00:01:28,440 --> 00:01:34,600
So this small tree we just waited on need as it was, is multiplied with a shrinkage parameter.

20
00:01:35,430 --> 00:01:37,560
And then is added to the original tree.

21
00:01:39,660 --> 00:01:49,290
So the second tree is basically the sum of first tree and newly created tree multiplied by a shrinkage

22
00:01:49,290 --> 00:01:49,830
parameter.

23
00:01:52,390 --> 00:01:58,520
Now, using this secondary, we again find dirigibles and then using these residuals.

24
00:01:58,690 --> 00:02:01,570
We again fit a small tree on these residuals.

25
00:02:03,610 --> 00:02:10,060
This morti is again multiplied by LAMDA and added to this again, three to create the third three.

26
00:02:11,150 --> 00:02:11,870
And so on.

27
00:02:13,410 --> 00:02:20,300
And this way, we sequentially continue to create trees and add them while fitting the current Reginald's.

28
00:02:22,220 --> 00:02:27,190
So instead of learning, by creating the whole tree in one go, grading, boosting.

29
00:02:27,470 --> 00:02:31,100
Learn slowly by creating one small tree at a time.

30
00:02:33,530 --> 00:02:35,600
This is why it is called a slow learner.

31
00:02:37,010 --> 00:02:45,140
Additionally, if you keep a very small value of this shrinkage parameter lambda, it will learn even

32
00:02:45,140 --> 00:02:45,860
more slowly.

33
00:02:47,390 --> 00:02:54,860
So when we are learning this grading, boosting in our software, we need to provide three tuning parameters.

34
00:02:55,460 --> 00:03:03,290
One of them is number of trees to be built, which will mean how many small trees will be created and

35
00:03:03,410 --> 00:03:04,580
added to the main tree.

36
00:03:05,420 --> 00:03:09,260
Although I like bagging boosting in overfit.

37
00:03:09,980 --> 00:03:17,360
If the number of trees is too large, the second during is that shrinkage parameter lambda.

38
00:03:18,680 --> 00:03:21,740
This parameter controls the rate at which model loans.

39
00:03:22,580 --> 00:03:27,560
Typically it has values between zero point zero one and zero one zero zero one.

40
00:03:30,310 --> 00:03:33,340
The third parameter is depth of the boosting tree.

41
00:03:34,990 --> 00:03:39,040
This basically controls the growth of these small boosting trees.

42
00:03:39,580 --> 00:03:43,180
Even trees of size one sometimes vote when.

43
00:03:44,400 --> 00:03:46,440
Such small trees are known as stumps.

44
00:03:47,940 --> 00:03:54,540
So will need to provide the value of these three tuning parameters enough for threat mandatorily to

45
00:03:54,540 --> 00:03:56,040
run a gradient boosting model.

46
00:03:59,260 --> 00:04:03,040
Next is adaptability or adaptive boosting?

47
00:04:04,330 --> 00:04:06,640
In this first we created three.

48
00:04:07,930 --> 00:04:11,150
We find out the predictions using that three.

49
00:04:12,500 --> 00:04:21,680
Wherever that tree has misclassified or in case of regression, wherever the residual of that tree is

50
00:04:21,680 --> 00:04:26,720
very large, we increase the importance of that particular observation.

51
00:04:28,010 --> 00:04:30,950
Then we again create a tree on our observations.

52
00:04:32,180 --> 00:04:36,650
Not this time, since we have increased the importance of those observations.

53
00:04:37,240 --> 00:04:42,710
Our tree will try to capture or rightly classify those observations.

54
00:04:44,290 --> 00:04:49,040
So this time we'll get a second tree, which will be a little bit different than the first tree.

55
00:04:50,730 --> 00:04:55,680
Again, we will find out be residuals or misclassified observations.

56
00:04:56,580 --> 00:05:02,910
We will increase the weightage of those observations and run the model again in this way.

57
00:05:03,120 --> 00:05:07,590
We will continue to build model for some free decided number of names.

58
00:05:08,810 --> 00:05:12,720
This breed decided a number of pain has to be given as a barometer.

59
00:05:12,770 --> 00:05:20,840
When we run this model in our software, this is also a form of boosting because each of our three,

60
00:05:21,450 --> 00:05:24,980
our new three is learning from the previously clear dead trees.

61
00:05:26,570 --> 00:05:28,730
The third technique is a sea boost.

62
00:05:29,740 --> 00:05:32,720
Its boost is almost similar to gradient boosting.

63
00:05:33,350 --> 00:05:35,990
The only thing is in a steep boost.

64
00:05:36,470 --> 00:05:40,330
We use a regularized model to control over everything.

65
00:05:41,630 --> 00:05:49,760
I hope you're aware of regularisation methods in linear models matter such as Lassalle and Wrage integration.

66
00:05:50,030 --> 00:05:53,690
Use these techniques if you are aware of that.

67
00:05:54,290 --> 00:05:54,840
That's great.

68
00:05:54,980 --> 00:06:02,530
If you are not, you can find the links to understand Agent Lessel in the description of this mutilated.

69
00:06:04,220 --> 00:06:06,710
Basically when we are doing regularisation.

70
00:06:07,780 --> 00:06:11,340
We are adding a cost to the number of variables.

71
00:06:12,230 --> 00:06:21,200
So when we are optimizing MSE in case of regression or Jenie in case of classification, we add an additional

72
00:06:21,200 --> 00:06:25,760
penalty item for the number of variables that are going to be used in that model.

73
00:06:26,690 --> 00:06:35,630
And this way, we try to minimize the number of variables that come in that final model by doing regularization.

74
00:06:35,750 --> 00:06:39,310
Basically our model, a white overfitting.

75
00:06:40,340 --> 00:06:47,180
So a boost contains a regularization terms in the cost function otherways.

76
00:06:47,750 --> 00:06:51,470
It is exactly similar as Bourdin boosting only differences.

77
00:06:51,590 --> 00:06:53,610
It includes a regularization.

78
00:06:53,870 --> 00:06:57,470
That is why it has better chance in preventing overfitting.

79
00:06:58,250 --> 00:07:05,450
There are some other minor differences between a boost and gradient boosting, which we will not be

80
00:07:05,450 --> 00:07:06,850
covering as part of discourse.