1
00:00:01,540 --> 00:00:04,090
In this video, we will learn about shrinkage methods.

2
00:00:05,370 --> 00:00:10,220
So we discussed this absurd selection method where we were still using the least court technique.

3
00:00:10,810 --> 00:00:12,880
But on a subset of variables.

4
00:00:14,060 --> 00:00:19,860
In this video, we will learn about techniques where our model will contain all predator variables and

5
00:00:19,860 --> 00:00:26,910
we will try to regularize these estimated coefficient or shrink these estimated coefficient towards

6
00:00:27,060 --> 00:00:27,530
zero.

7
00:00:29,080 --> 00:00:31,870
We will also see how this leads to reduction invariants.

8
00:00:33,500 --> 00:00:36,440
We'll be discussing the two best known techniques for shrinking.

9
00:00:36,680 --> 00:00:39,470
That is delegitimation and the Lassalle.

10
00:00:40,920 --> 00:00:43,050
Let's start with relitigation first.

11
00:00:46,300 --> 00:00:49,890
So in ordinary leased court methode, we minimized.

12
00:00:50,140 --> 00:00:56,040
These are a system which was squared off some of difference between actual value of way and predictively

13
00:00:56,140 --> 00:00:56,500
of late.

14
00:00:58,470 --> 00:01:00,960
Retrogression is similar to ordinarily squared.

15
00:01:01,070 --> 00:01:05,540
Except that the coalition will be estimated by minimizing a slightly different quantity.

16
00:01:07,080 --> 00:01:11,140
This quantity is given by this formula, lamda paint.

17
00:01:11,430 --> 00:01:12,060
Some of.

18
00:01:13,310 --> 00:01:14,750
Squares off all TB does.

19
00:01:17,160 --> 00:01:20,790
So since we will be minimizing this quantity, this whole quantity.

20
00:01:21,930 --> 00:01:28,560
Therefore, we will also be attempting to chinning or reduce the values of these coefficient towards

21
00:01:28,680 --> 00:01:29,130
zero.

22
00:01:31,540 --> 00:01:34,480
This holdem is called shrinkage penalty.

23
00:01:36,260 --> 00:01:38,970
And this lamda is Garlock yawning barometer.

24
00:01:41,110 --> 00:01:47,520
That tuning barometer serves to control legislative impact of these two terms, RSS and Detering Case

25
00:01:47,520 --> 00:01:47,990
Munadi.

26
00:01:49,160 --> 00:01:50,840
Four different values of LAMDA.

27
00:01:51,650 --> 00:01:53,420
Will get different values of B does.

28
00:01:54,380 --> 00:01:56,990
Therefore, selecting a good value of land is critical.

29
00:01:58,340 --> 00:01:59,770
We'll come back to this talk later.

30
00:02:02,940 --> 00:02:04,120
No doubt we are shrinking.

31
00:02:04,230 --> 00:02:10,020
All we does except be does it all because we want to shrink the impact of variables and not the intercept.

32
00:02:12,630 --> 00:02:18,680
Another important thing to know when we are doing reintegration is that since no bit of a loser, part

33
00:02:18,680 --> 00:02:24,720
of the formula which we are trying to minimize the scale of the lives of dependent variables.

34
00:02:25,160 --> 00:02:27,050
That is the X variables.

35
00:02:28,330 --> 00:02:29,380
That ultimate does no.

36
00:02:30,010 --> 00:02:33,640
That is if you have value of one or two predictors and dollars.

37
00:02:35,520 --> 00:02:36,820
And then calculate your beta.

38
00:02:37,170 --> 00:02:39,340
And if it doesn't pound and then you get laid.

39
00:02:39,410 --> 00:02:44,430
We dug in these to be does we're not related directly at the currency exchange rate.

40
00:02:45,300 --> 00:02:49,770
Therefore, retrogression is not scale invariant as well as method is.

41
00:02:51,040 --> 00:02:55,910
But to handle this issue, we will be standardizing the values of predictive variables.

42
00:02:58,670 --> 00:03:04,210
You will not be covering how to standardize the variables mathematically are software packages will

43
00:03:04,210 --> 00:03:05,740
be handling that part for us.

44
00:03:06,040 --> 00:03:11,140
But just remember that before running a digital edition, we need to standardize all the variables.

45
00:03:12,130 --> 00:03:15,400
So how does legislation improve over Lee Squid's?

46
00:03:17,510 --> 00:03:24,650
If you remember our discussion on bias and variance, a less flexible model has more bias, but less

47
00:03:24,650 --> 00:03:25,220
variance.

48
00:03:26,400 --> 00:03:29,550
This is what this additional shrinkage barometer is doing.

49
00:03:30,390 --> 00:03:32,280
It makes the model less flexible.

50
00:03:32,580 --> 00:03:38,320
And as we continue to increase the value of LAMDA, the model continues to become less and less flexible.

51
00:03:39,560 --> 00:03:46,280
So as LAMDA increases our models, bias increases, but at variance decreases.

52
00:03:48,280 --> 00:03:50,650
Now, the decrease of variance is more.

53
00:03:50,980 --> 00:03:54,610
And the increase in bias is less Delta than really of lambda.

54
00:03:57,600 --> 00:04:01,590
At this critical rally of LAMDA, the total letter is minimum.

55
00:04:02,280 --> 00:04:06,170
And it is even less than to the letter of ordinarily squares met her.

56
00:04:07,370 --> 00:04:11,660
If you look at this graph, this green line is showing you the variance.

57
00:04:12,950 --> 00:04:17,560
As you continue to increase LAMDA, this x axis is lamda values.

58
00:04:18,230 --> 00:04:24,530
If you continue to increase lamda value, be part of means guerrera due to variance keeps on decreasing.

59
00:04:26,010 --> 00:04:27,980
And this black line is deep bias.

60
00:04:28,950 --> 00:04:30,720
That continues to increase.

61
00:04:31,850 --> 00:04:35,830
And some of these two values is the total error in the model.

62
00:04:36,860 --> 00:04:38,330
If you look at this pink line.

63
00:04:39,380 --> 00:04:41,960
This point is giving you the minimum added.

64
00:04:43,340 --> 00:04:45,230
And it is at certain value of lambda.

65
00:04:45,560 --> 00:04:52,220
So this shrinkage of coefficients is actually helping you minimize the mean squared error so that it

66
00:04:52,220 --> 00:04:54,410
is even less than the ordinarily squares.

67
00:04:57,250 --> 00:04:59,260
The next technique is called the lasso.

68
00:05:00,700 --> 00:05:07,160
So one major disadvantage of registration is that it will include all predictor variables in the final

69
00:05:07,160 --> 00:05:07,550
model.

70
00:05:08,660 --> 00:05:12,820
The coefficients are sitting towards zero, but they do not become Z2.

71
00:05:13,920 --> 00:05:18,390
This leaves us with a model with B variables which may be less interpretable.

72
00:05:21,130 --> 00:05:25,720
We can overcome this problem by allowing the model.

73
00:05:27,050 --> 00:05:28,780
To shrink devalues ejido.

74
00:05:30,080 --> 00:05:33,410
This is provided by the automated technique called the lasso.

75
00:05:34,110 --> 00:05:38,330
In this technique will be minimizing RSS plus a value.

76
00:05:39,260 --> 00:05:43,890
This value similar to the previous one, but instead of B squared.

77
00:05:44,030 --> 00:05:46,580
Here we are using absolute value of beta.

78
00:05:48,040 --> 00:05:54,120
This small change has the impact of forcing some of the coefficient to become completely zero when the

79
00:05:54,130 --> 00:05:55,900
tuning parameter is sufficiently large.

80
00:05:57,520 --> 00:06:01,690
If coefficients become zero, those variables will not be part of the model.

81
00:06:02,830 --> 00:06:07,630
Therefore, like subcircuit election technique, loss also does variable selection.

82
00:06:08,620 --> 00:06:14,650
Did Izzeldin model is usually more interpretable than the resulting model under reintegration?

83
00:06:16,010 --> 00:06:19,130
If I were to compare LASO and Richard Edition.

84
00:06:20,240 --> 00:06:26,660
In terms of model, indubitably, LASO will be always more interpretable than legitimisation because

85
00:06:26,660 --> 00:06:28,310
it will have less number of labels.

86
00:06:29,610 --> 00:06:35,880
But if I have to compare in terms of prediction, accuracy, there is no universal dominance of one

87
00:06:35,880 --> 00:06:37,020
method or another.

88
00:06:38,830 --> 00:06:44,770
In general, if the response variable is expected to be dependent on a lot of political variables,

89
00:06:45,400 --> 00:06:47,830
then legitimisation is the technique of choice.

90
00:06:48,910 --> 00:06:55,780
But if the response variable is expected to be dependent on less number of critical variables, then

91
00:06:55,900 --> 00:06:57,880
LASO will be the technique of choice.

92
00:06:58,910 --> 00:07:04,450
In practical scenario, because it is easy to run all types of regression models with just a single

93
00:07:04,450 --> 00:07:05,140
line of code.

94
00:07:05,710 --> 00:07:12,190
We run all types of regression models and then we select the one which is giving us the best result

95
00:07:12,310 --> 00:07:13,460
on the test data.