1 00:00:01,100 --> 00:00:04,520 In this video, we will discuss about the bias media straight off. 2 00:00:07,110 --> 00:00:13,720 So as I told you in the test, trains split Lichter, our agenda is to find the model the lowest. 3 00:00:13,770 --> 00:00:14,500 Test it a. 4 00:00:16,590 --> 00:00:20,730 Now, fundamentally, there are three contributors to the expected tested at. 5 00:00:22,710 --> 00:00:29,730 These three contributors are called variance bias and the variance of erratum, which is represented 6 00:00:29,730 --> 00:00:30,360 by IDO. 7 00:00:32,830 --> 00:00:33,650 This totem. 8 00:00:35,090 --> 00:00:42,260 Comes from the fact that there is some inherent randomness in the process and given sambal observations, 9 00:00:42,860 --> 00:00:45,920 also do not follow the intended function. 10 00:00:48,300 --> 00:00:50,250 So this is an irreducible error. 11 00:00:50,790 --> 00:00:54,000 And since we cannot do much about it, we will not focus on it. 12 00:00:55,470 --> 00:01:00,030 We'll focus on these two other terms and let's talk about them one by one. 13 00:01:01,910 --> 00:01:07,540 Towards variance variance refers to the amount by which efford change. 14 00:01:07,970 --> 00:01:09,710 If we change our training dataset. 15 00:01:11,650 --> 00:01:18,580 And bias refers to that part of it, which is introduced by approximating a complicated Real-Life relationship 16 00:01:18,970 --> 00:01:19,930 with a simpler model. 17 00:01:21,990 --> 00:01:23,370 So let's look at them one by one. 18 00:01:25,750 --> 00:01:31,150 So as I told, new variance refers to the amount by which the predicted function would change. 19 00:01:31,420 --> 00:01:33,040 If I change, my training, does it. 20 00:01:34,690 --> 00:01:40,930 If you remember when we talked about simple linear regression, I told you that there is this group 21 00:01:40,930 --> 00:01:49,690 Population Lane used by this recall, which is the best line if we were putting the line on the whole 22 00:01:49,690 --> 00:01:50,290 population. 23 00:01:52,760 --> 00:01:57,950 But when we have putting it on a sample, the sample regression line is different from the population 24 00:01:57,950 --> 00:01:58,670 regression line. 25 00:02:00,200 --> 00:02:04,130 And different sample data changes, the sample regression line also changes. 26 00:02:05,310 --> 00:02:11,490 So basically variances capturing the part of error which is coming from that particular simple. 27 00:02:13,930 --> 00:02:17,900 So if we have two models, one of them is more flexible than the other. 28 00:02:18,580 --> 00:02:20,290 Which one will have more variance? 29 00:02:22,260 --> 00:02:26,820 Well, since the more flexible method, we'll be trying to touch each and every point. 30 00:02:28,720 --> 00:02:33,820 Even if I change one or two points, it will give out a completely different that it would function 31 00:02:34,150 --> 00:02:35,620 to accommodate this small change. 32 00:02:36,910 --> 00:02:40,630 This means that more flexible methods of high variance. 33 00:02:43,260 --> 00:02:44,820 This is shown graphically as well. 34 00:02:45,420 --> 00:02:47,010 This first graph on the left. 35 00:02:47,640 --> 00:02:52,950 We are trying to predict this relationship with a straight line. 36 00:02:54,190 --> 00:02:56,200 Straight lane is a very less flexible matter. 37 00:02:58,160 --> 00:03:02,270 Even if I change one or two data point, this blue point. 38 00:03:03,260 --> 00:03:07,400 These lope and the intersect of this line will not change as much. 39 00:03:09,530 --> 00:03:16,630 However, if you look at the function on the date, if I change even one or two point on this go. 40 00:03:17,450 --> 00:03:20,660 The predicted output function will be very different. 41 00:03:23,090 --> 00:03:26,810 So you can see that the variance is very high. 42 00:03:27,200 --> 00:03:33,440 If the flexibility and the covers I saw more flexible Dimitar, I will be the brilliant. 43 00:03:35,970 --> 00:03:40,860 This phenomenon of following the data too closely, as you see in the right glove. 44 00:03:41,890 --> 00:03:46,990 That yet even following the error in the observations is called Overfitting. 45 00:03:48,220 --> 00:03:50,830 When we overdo it, we do get low training error. 46 00:03:51,520 --> 00:03:53,530 But the testator increases. 47 00:03:55,970 --> 00:04:03,250 Now, let's talk about bias bias refers to that part of the error, which is introduced by approximating 48 00:04:03,250 --> 00:04:06,370 a complicated Real-Life relationship with a simpler model. 49 00:04:07,750 --> 00:04:08,500 For example. 50 00:04:09,920 --> 00:04:16,200 He may be trying to fit a linear model between dependent and independent variables with a linear relationship 51 00:04:16,290 --> 00:04:17,400 is highly unlikely. 52 00:04:18,820 --> 00:04:23,280 You can see in this graph the points can never be fitted with a straight line. 53 00:04:23,600 --> 00:04:24,430 Much still. 54 00:04:25,580 --> 00:04:30,390 But still, if we select a linear model, it is always going to have some error. 55 00:04:30,770 --> 00:04:33,110 And that part of it, it is called the bias. 56 00:04:35,730 --> 00:04:38,450 And how is bias littered with flexibility of model? 57 00:04:39,610 --> 00:04:43,930 You can see that linear model, which is less flexible, is unable to fit this data. 58 00:04:45,250 --> 00:04:50,320 If I increase flexibility and allow it to go, then it will better fit the point. 59 00:04:51,430 --> 00:04:55,870 So generally, if we increase flexibility, the bias error reduces. 60 00:04:57,250 --> 00:05:00,700 So you can see where the bias variance tradeoff is coming from. 61 00:05:03,050 --> 00:05:09,860 As we increase flexibility, error due to variance increases and error due to bias decreases. 62 00:05:11,840 --> 00:05:18,410 Although we want to decrease board, but when we do a two degrees one, the other one starts to increase. 63 00:05:19,520 --> 00:05:23,870 So the challenge is to find that point where this summer's minimum. 64 00:05:25,900 --> 00:05:27,430 This is depicted graphically here. 65 00:05:28,350 --> 00:05:32,880 This orange line is showing us the variance, which is increasing with flexibility. 66 00:05:33,690 --> 00:05:37,380 And this blue line is for bias, which is decreasing with flexibility. 67 00:05:38,190 --> 00:05:42,120 And this red line is some of the of these Twitters. 68 00:05:43,100 --> 00:05:46,760 We want to find this minimum point that this sum is the minimum. 69 00:05:47,900 --> 00:05:52,070 Although we will not be able to compute bias and variance for our model. 70 00:05:52,700 --> 00:05:58,820 This concept will be used when we will be comparing different models and their potential accuracy in 71 00:05:58,820 --> 00:06:00,410 predicting dependent variables.