1 00:00:00,533 --> 00:00:01,400 Hello and welcome back. 2 00:00:01,400 --> 00:00:02,766 Today we've got a very exciting tutorial. 3 00:00:02,766 --> 00:00:04,666 The bias variance trade off. 4 00:00:04,666 --> 00:00:05,000 All right. 5 00:00:05,000 --> 00:00:07,800 So to operate with these terms is first define them. 6 00:00:07,800 --> 00:00:12,266 Bias is a systematic error that occurs in the machine learning model itself 7 00:00:12,266 --> 00:00:15,300 due to incorrect assumptions in the machine learning process. 8 00:00:16,066 --> 00:00:18,900 technically it can be thought of as the error between 9 00:00:18,900 --> 00:00:23,833 the average model prediction and the ground truth, whereas variance is 10 00:00:23,833 --> 00:00:27,166 how much the model can adjust depending on the given data set. 11 00:00:27,966 --> 00:00:31,200 variance refers to changes in the model when using different portions 12 00:00:31,200 --> 00:00:33,000 of the training data set. 13 00:00:33,000 --> 00:00:36,300 Now let's visualize this to better understand it. 14 00:00:36,566 --> 00:00:38,066 But first we're going to 15 00:00:38,066 --> 00:00:41,600 refer to something we discussed previously which is k fold cross-validation. 16 00:00:41,600 --> 00:00:46,233 So in k fold cross-validation we've got these metrics. 17 00:00:46,233 --> 00:00:49,233 So we've done our we split a set. 18 00:00:49,233 --> 00:00:49,966 We've got training set. 19 00:00:49,966 --> 00:00:52,466 Then the training set we split into ten folds. 20 00:00:52,466 --> 00:00:55,466 And so then we train the model on these nine folds combined. 21 00:00:55,466 --> 00:01:00,433 And we validated two tests on this one left fold a left over fold. 22 00:01:00,733 --> 00:01:03,266 Then we train the model on these nine folds. 23 00:01:03,266 --> 00:01:06,066 And then we train test it on this one fold and so on. 24 00:01:06,066 --> 00:01:07,300 And so and the result is a result. 25 00:01:07,300 --> 00:01:10,966 We're slightly changing the training data just a little bit every time. 26 00:01:11,333 --> 00:01:14,366 And we are validating or testing the resulting model 27 00:01:14,366 --> 00:01:19,500 on a new unseen like on data that wasn't unseen was unseen during training. 28 00:01:19,500 --> 00:01:21,633 And so we're getting these, sets of metrics. 29 00:01:21,633 --> 00:01:24,900 And in case of ten fold cross-validation, we have ten metrics. 30 00:01:25,300 --> 00:01:28,533 Now, if we plot this on the bias 31 00:01:28,533 --> 00:01:31,533 variance curve on, like metaphorically speaking, 32 00:01:31,800 --> 00:01:36,066 in the top left corner we'll have high bias, low variance. 33 00:01:36,066 --> 00:01:39,933 So this is what happens, when you have high bias, low low variance. 34 00:01:40,200 --> 00:01:42,866 Your, this is what you want to predict. 35 00:01:42,866 --> 00:01:46,900 This is the target, but your model is all of those 36 00:01:46,900 --> 00:01:49,900 predictions of all those, models that we just looked at. 37 00:01:49,933 --> 00:01:54,100 They are far away from the target, but they're clustered together, right? So. 38 00:01:54,100 --> 00:01:55,133 And what does that mean? 39 00:01:55,133 --> 00:01:57,666 Well, that means that the model is too simple 40 00:01:57,666 --> 00:02:00,400 and does not capture the underlying trend of the data. 41 00:02:00,400 --> 00:02:02,766 So it's far away from the target. 42 00:02:02,766 --> 00:02:04,800 but at the same time, their cluster, 43 00:02:04,800 --> 00:02:06,900 on the other hand, you might have this situation 44 00:02:06,900 --> 00:02:11,100 where you have low bias and high variance, where, your 45 00:02:11,266 --> 00:02:14,666 the average of all of these models is on the target, 46 00:02:14,966 --> 00:02:19,433 but at the same time, as you can see, every time we change the underlying 47 00:02:19,433 --> 00:02:24,466 trend data slightly, the model result is different or varies. 48 00:02:24,833 --> 00:02:27,700 And that's low bias, high variance. 49 00:02:27,700 --> 00:02:30,533 And it means that the model is too sensitive 50 00:02:30,533 --> 00:02:33,500 and is capturing noise as if it were a real trend. 51 00:02:33,500 --> 00:02:35,633 It's overfitting to our data. 52 00:02:35,633 --> 00:02:39,533 And both of these, scenarios, as you can imagine, are bad. 53 00:02:39,666 --> 00:02:42,400 So in this case, we are away from the target. 54 00:02:42,400 --> 00:02:46,533 In this case where, overfitting to the data. 55 00:02:47,700 --> 00:02:51,000 And of course, if you, you can also have a scenario like this 56 00:02:51,000 --> 00:02:52,333 where you have high bias and high variance. 57 00:02:52,333 --> 00:02:56,000 So the average of these predictions is somewhere here, which is also away 58 00:02:56,100 --> 00:02:57,633 from where we want it to be. 59 00:02:57,633 --> 00:02:59,800 And they're also scattered across. 60 00:02:59,800 --> 00:03:02,133 And this is the probably the worst of the three. 61 00:03:02,133 --> 00:03:05,133 And here the model is too simple to capture the data trend 62 00:03:05,266 --> 00:03:06,866 and it's too sensitive. 63 00:03:06,866 --> 00:03:08,633 It captures noise as well. 64 00:03:08,633 --> 00:03:11,966 So those are the three kind of non-ideal scenarios. 65 00:03:11,966 --> 00:03:16,000 And here's the ideal scenario, which is kind of like the unicorn 66 00:03:16,000 --> 00:03:19,100 where your, data is clustered together. 67 00:03:19,100 --> 00:03:22,933 So you have low variance and it's in the right spot around. 68 00:03:23,100 --> 00:03:27,300 So the average is the average that, we want to actually 69 00:03:27,300 --> 00:03:28,000 we're aiming to predict. 70 00:03:28,000 --> 00:03:30,300 So that therefore has got low bias. 71 00:03:30,300 --> 00:03:31,633 And this is a great model. 72 00:03:31,633 --> 00:03:32,566 It actually capture. 73 00:03:32,566 --> 00:03:33,900 It accurately captures 74 00:03:33,900 --> 00:03:37,133 the underlying trends of the data and generalizes well to unseen data. 75 00:03:37,600 --> 00:03:40,233 Now the thing is that this is very rare. 76 00:03:40,233 --> 00:03:45,600 And it's kind of like, a unicorn to catch this kind of scenario. 77 00:03:45,766 --> 00:03:49,100 Most of the times you'll be trading off between this and this. 78 00:03:49,100 --> 00:03:52,566 So if you make your model very sophisticated, then 79 00:03:52,666 --> 00:03:56,433 you will indeed get a very good average score. 80 00:03:56,433 --> 00:03:57,966 You'll be very close to what you're predicting, 81 00:03:57,966 --> 00:04:01,100 but your you'll be capturing noise and data is. 82 00:04:01,100 --> 00:04:04,666 And as soon as the data changes slightly, your model predictions will be off. 83 00:04:04,666 --> 00:04:06,566 So it's not a good model in that sense. 84 00:04:07,500 --> 00:04:08,733 Or if you make it 85 00:04:08,733 --> 00:04:11,933 less complex, if your model is super sophisticated, it'll be here. 86 00:04:11,933 --> 00:04:14,600 If it's less sophisticated, it'll end up here, 87 00:04:14,600 --> 00:04:17,833 where it's too simple to even capture this. 88 00:04:17,833 --> 00:04:19,466 Like, all the predictions will be close to each other, 89 00:04:19,466 --> 00:04:21,800 but it's too simple to capture the, target. 90 00:04:21,800 --> 00:04:25,866 And so you will have to be trading off between these two, making either simpler 91 00:04:25,866 --> 00:04:29,233 or more complex, and finding your middle ground somewhere between them 92 00:04:29,466 --> 00:04:33,200 and trying to be or aiming to be as close as possible to this scenario. 93 00:04:33,566 --> 00:04:36,300 So that's how the bias variance tradeoff works, and that's how 94 00:04:36,300 --> 00:04:39,766 we can combine it with k fold cross-validation to assess, 95 00:04:40,300 --> 00:04:42,300 and understand our models better. 96 00:04:42,300 --> 00:04:44,700 On that note, I look forward to seeing you back here next time. 97 00:04:44,700 --> 00:04:46,600 And until then, enjoy machine learning.