1
00:00:00,533 --> 00:00:01,400
Hello and welcome back.

2
00:00:01,400 --> 00:00:02,766
Today we've got a very exciting tutorial.

3
00:00:02,766 --> 00:00:04,666
The bias variance trade off.

4
00:00:04,666 --> 00:00:05,000
All right.

5
00:00:05,000 --> 00:00:07,800
So to operate with these terms
is first define them.

6
00:00:07,800 --> 00:00:12,266
Bias is a systematic error that occurs
in the machine learning model itself

7
00:00:12,266 --> 00:00:15,300
due to incorrect assumptions
in the machine learning process.

8
00:00:16,066 --> 00:00:18,900
technically it can be thought of
as the error between

9
00:00:18,900 --> 00:00:23,833
the average model prediction
and the ground truth, whereas variance is

10
00:00:23,833 --> 00:00:27,166
how much the model can adjust
depending on the given data set.

11
00:00:27,966 --> 00:00:31,200
variance refers to changes in the model
when using different portions

12
00:00:31,200 --> 00:00:33,000
of the training data set.

13
00:00:33,000 --> 00:00:36,300
Now let's visualize this
to better understand it.

14
00:00:36,566 --> 00:00:38,066
But first we're going to

15
00:00:38,066 --> 00:00:41,600
refer to something we discussed previously
which is k fold cross-validation.

16
00:00:41,600 --> 00:00:46,233
So in k fold cross-validation
we've got these metrics.

17
00:00:46,233 --> 00:00:49,233
So we've done our we split a set.

18
00:00:49,233 --> 00:00:49,966
We've got training set.

19
00:00:49,966 --> 00:00:52,466
Then the training set
we split into ten folds.

20
00:00:52,466 --> 00:00:55,466
And so then we train the model
on these nine folds combined.

21
00:00:55,466 --> 00:01:00,433
And we validated two tests on this one
left fold a left over fold.

22
00:01:00,733 --> 00:01:03,266
Then we train the model
on these nine folds.

23
00:01:03,266 --> 00:01:06,066
And then we train
test it on this one fold and so on.

24
00:01:06,066 --> 00:01:07,300
And so and the result is a result.

25
00:01:07,300 --> 00:01:10,966
We're slightly changing the training data
just a little bit every time.

26
00:01:11,333 --> 00:01:14,366
And we are validating
or testing the resulting model

27
00:01:14,366 --> 00:01:19,500
on a new unseen like on data that wasn't
unseen was unseen during training.

28
00:01:19,500 --> 00:01:21,633
And so we're getting these, 
sets of metrics.

29
00:01:21,633 --> 00:01:24,900
And in case of ten fold cross-validation,
we have ten metrics.

30
00:01:25,300 --> 00:01:28,533
Now, if we plot this on the bias

31
00:01:28,533 --> 00:01:31,533
variance curve on,
like metaphorically speaking,

32
00:01:31,800 --> 00:01:36,066
in the top left corner
we'll have high bias, low variance.

33
00:01:36,066 --> 00:01:39,933
So this is what happens,
when you have high bias, low low variance.

34
00:01:40,200 --> 00:01:42,866
Your, this is what you want to predict.

35
00:01:42,866 --> 00:01:46,900
This is the target,
but your model is all of those

36
00:01:46,900 --> 00:01:49,900
predictions of all those, models
that we just looked at.

37
00:01:49,933 --> 00:01:54,100
They are far away from the target,
but they're clustered together, right? So.

38
00:01:54,100 --> 00:01:55,133
And what does that mean?

39
00:01:55,133 --> 00:01:57,666
Well,
that means that the model is too simple

40
00:01:57,666 --> 00:02:00,400
and does not capture the underlying
trend of the data.

41
00:02:00,400 --> 00:02:02,766
So it's far away from the target.

42
00:02:02,766 --> 00:02:04,800
but at the same time, their cluster,

43
00:02:04,800 --> 00:02:06,900
on the other hand,
you might have this situation

44
00:02:06,900 --> 00:02:11,100
where you have low bias and high variance,
where, your

45
00:02:11,266 --> 00:02:14,666
the average of all of these models
is on the target,

46
00:02:14,966 --> 00:02:19,433
but at the same time, as you can see,
every time we change the underlying

47
00:02:19,433 --> 00:02:24,466
trend data slightly,
the model result is different or varies.

48
00:02:24,833 --> 00:02:27,700
And that's low bias, high variance.

49
00:02:27,700 --> 00:02:30,533
And it means that
the model is too sensitive

50
00:02:30,533 --> 00:02:33,500
and is capturing noise
as if it were a real trend.

51
00:02:33,500 --> 00:02:35,633
It's overfitting to our data.

52
00:02:35,633 --> 00:02:39,533
And both of these, scenarios, as you can
imagine, are bad.

53
00:02:39,666 --> 00:02:42,400
So in this case,
we are away from the target.

54
00:02:42,400 --> 00:02:46,533
In this case
where, overfitting to the data.

55
00:02:47,700 --> 00:02:51,000
And of course, if you,
you can also have a scenario like this

56
00:02:51,000 --> 00:02:52,333
where you have high bias
and high variance.

57
00:02:52,333 --> 00:02:56,000
So the average of these predictions
is somewhere here, which is also away

58
00:02:56,100 --> 00:02:57,633
from where we want it to be.

59
00:02:57,633 --> 00:02:59,800
And they're also scattered across.

60
00:02:59,800 --> 00:03:02,133
And this is the probably
the worst of the three.

61
00:03:02,133 --> 00:03:05,133
And here the model is too simple
to capture the data trend

62
00:03:05,266 --> 00:03:06,866
and it's too sensitive.

63
00:03:06,866 --> 00:03:08,633
It captures noise as well.

64
00:03:08,633 --> 00:03:11,966
So those are the three
kind of non-ideal scenarios.

65
00:03:11,966 --> 00:03:16,000
And here's the ideal scenario,
which is kind of like the unicorn

66
00:03:16,000 --> 00:03:19,100
where your, data is clustered together.

67
00:03:19,100 --> 00:03:22,933
So you have low variance
and it's in the right spot around.

68
00:03:23,100 --> 00:03:27,300
So the average is the average that,
we want to actually

69
00:03:27,300 --> 00:03:28,000
we're aiming to predict.

70
00:03:28,000 --> 00:03:30,300
So that therefore has got low bias.

71
00:03:30,300 --> 00:03:31,633
And this is a great model.

72
00:03:31,633 --> 00:03:32,566
It actually capture.

73
00:03:32,566 --> 00:03:33,900
It accurately captures

74
00:03:33,900 --> 00:03:37,133
the underlying trends of the data
and generalizes well to unseen data.

75
00:03:37,600 --> 00:03:40,233
Now the thing is that this is very rare.

76
00:03:40,233 --> 00:03:45,600
And it's kind of like, 
a unicorn to catch this kind of scenario.

77
00:03:45,766 --> 00:03:49,100
Most of the times you'll be trading off
between this and this.

78
00:03:49,100 --> 00:03:52,566
So if you make your model
very sophisticated, then

79
00:03:52,666 --> 00:03:56,433
you will indeed
get a very good average score.

80
00:03:56,433 --> 00:03:57,966
You'll be very close
to what you're predicting,

81
00:03:57,966 --> 00:04:01,100
but your you'll be capturing
noise and data is.

82
00:04:01,100 --> 00:04:04,666
And as soon as the data changes slightly,
your model predictions will be off.

83
00:04:04,666 --> 00:04:06,566
So it's not a good model in that sense.

84
00:04:07,500 --> 00:04:08,733
Or if you make it

85
00:04:08,733 --> 00:04:11,933
less complex, if your model is super
sophisticated, it'll be here.

86
00:04:11,933 --> 00:04:14,600
If it's less sophisticated,
it'll end up here,

87
00:04:14,600 --> 00:04:17,833
where it's too simple
to even capture this.

88
00:04:17,833 --> 00:04:19,466
Like, all the predictions
will be close to each other,

89
00:04:19,466 --> 00:04:21,800
but it's too simple
to capture the, target.

90
00:04:21,800 --> 00:04:25,866
And so you will have to be trading off
between these two, making either simpler

91
00:04:25,866 --> 00:04:29,233
or more complex, and finding your middle
ground somewhere between them

92
00:04:29,466 --> 00:04:33,200
and trying to be or aiming to be as close
as possible to this scenario.

93
00:04:33,566 --> 00:04:36,300
So that's how the bias variance
tradeoff works, and that's how

94
00:04:36,300 --> 00:04:39,766
we can combine it
with k fold cross-validation to assess,

95
00:04:40,300 --> 00:04:42,300
and understand our models better.

96
00:04:42,300 --> 00:04:44,700
On that note, I look forward
to seeing you back here next time.

97
00:04:44,700 --> 00:04:46,600
And until then, enjoy machine learning.