1
00:00:00,680 --> 00:00:08,030
Modelling part for is the final part of modelling and its comparison wanting to know here's how will

2
00:00:08,030 --> 00:00:11,540
our model form in the real world.

3
00:00:11,570 --> 00:00:16,550
So after you've tuned and proved your models performance through hyper parameter tuning it's time to

4
00:00:16,550 --> 00:00:22,580
see how it performs on the tests in the tests and is like the final exam for machine learning models.

5
00:00:22,970 --> 00:00:29,000
If you've created your data splits correctly it should give you an indication on how your model will

6
00:00:29,000 --> 00:00:37,430
perform once deployed in production means customer facing rather than just being on your local computer.

7
00:00:37,850 --> 00:00:44,660
Since your model has never seen data in the test set evaluating your model on it is a good way to see

8
00:00:44,660 --> 00:00:52,040
how it generalizes and remember why generalizing I mean adapts to data it hasn't seen before such as

9
00:00:52,250 --> 00:00:59,030
how heart disease prediction machine learning model would perform at classifying whether a patient has

10
00:00:59,150 --> 00:01:00,550
heart disease or not.

11
00:01:00,830 --> 00:01:08,240
On a patient who wasn't in our original data set a good model will yield similar results on the training

12
00:01:08,240 --> 00:01:10,210
validation and test sets.

13
00:01:10,400 --> 00:01:16,310
And it's not uncommon to see a slight decline in performance from the model on the training and validation

14
00:01:16,310 --> 00:01:18,470
set to the test center.

15
00:01:18,560 --> 00:01:26,000
For example your model might achieve 98 percent accuracy on the training dataset and 96 percent accuracy

16
00:01:26,300 --> 00:01:34,290
on the test set what you should be worried about is if the training set performance is dramatically

17
00:01:34,470 --> 00:01:38,060
higher than the test set also known as under fitting.

18
00:01:38,340 --> 00:01:45,660
And if the test set performance is higher than the training set performance also known as overfishing

19
00:01:46,800 --> 00:01:53,190
overfishing and underfunding are both examples of a model not being able to generalize well which is

20
00:01:53,190 --> 00:02:03,230
what we don't want the ideal model shows up in the goldilocks zone it fits just right not too well but

21
00:02:03,230 --> 00:02:04,740
not too poorly.

22
00:02:04,780 --> 00:02:09,440
You see here this this machine learning model if this was your data these green data points.

23
00:02:09,440 --> 00:02:12,720
This line here it kind of fits the shape but it's it's.

24
00:02:12,920 --> 00:02:14,990
This would be classified as as under fitting.

25
00:02:14,990 --> 00:02:17,130
This is not what we want our model today.

26
00:02:17,130 --> 00:02:18,350
And this one over here.

27
00:02:18,980 --> 00:02:24,190
Well it's doing a good job of fitting all the data points but it's going far too close it's almost too

28
00:02:24,260 --> 00:02:29,780
perfect model it's just snaking between them so this example would mean that the model has learned the

29
00:02:29,780 --> 00:02:35,510
patents too well in this dataset it would be like seeing the final exam before actually taking the final

30
00:02:35,510 --> 00:02:36,620
exam.

31
00:02:36,620 --> 00:02:39,190
This one has the goldilocks zone right.

32
00:02:39,230 --> 00:02:46,130
This is an iterative process where exactly this Goldilocks Zone is like a balanced model will really

33
00:02:46,130 --> 00:02:49,780
depend on your data and the problem you're trying to solve.

34
00:02:49,970 --> 00:02:56,450
That's why again it's a it's an iterative process finding this this balanced zone and now after some

35
00:02:56,450 --> 00:03:01,430
experience and practice working on different machine learning problems you'll be out to start to tell

36
00:03:01,430 --> 00:03:05,700
whether your model is overheating or on defeating now.

37
00:03:05,790 --> 00:03:11,340
There are several reasons why on defeating an opening can happen but the main ones are data leakage

38
00:03:11,400 --> 00:03:13,000
and data mismatch.

39
00:03:13,290 --> 00:03:19,230
Data leakage happens when some of your test data leaks into your training data.

40
00:03:19,260 --> 00:03:24,990
This often results in overfishing or a model doing better on the test set then on the training dataset

41
00:03:26,040 --> 00:03:27,000
it's like.

42
00:03:27,000 --> 00:03:31,470
Remember it was like if you were to have a look at the final exam or everyone had to look at the final

43
00:03:31,470 --> 00:03:36,780
exam as the practice exam you're machine learning model has just learned what it's about to be test

44
00:03:36,780 --> 00:03:37,320
on.

45
00:03:37,320 --> 00:03:43,080
So when it comes time to modelling it just learned it way too well and it starts to fit the data like

46
00:03:43,080 --> 00:03:44,440
this.

47
00:03:44,460 --> 00:03:50,280
And this is why it's important to do your splits correctly and ensure that machine learning model training

48
00:03:50,640 --> 00:03:53,520
happens only on the training data set.

49
00:03:53,520 --> 00:03:59,850
Validation and model tuning happens only on the validation or training data set and that testing and

50
00:03:59,850 --> 00:04:03,270
model comparison happens on the test data set.

51
00:04:04,800 --> 00:04:09,490
And I've remember some different approaches use only a training and testing set and do model tuning

52
00:04:09,490 --> 00:04:13,740
on the training set but testing always stays the same.

53
00:04:13,750 --> 00:04:18,730
It's like when you when you go to university you're doing a course you want to make sure that the final

54
00:04:18,730 --> 00:04:22,810
exam is kind of an indication of how well you understand things.

55
00:04:22,810 --> 00:04:28,660
Same with the test data set for machine learning the test dataset is used as an indication of how well

56
00:04:28,660 --> 00:04:31,100
your model will generalize in the real world.

57
00:04:31,150 --> 00:04:38,880
So you want to avoid data leakage data mismatch happens when the data you're testing on is different

58
00:04:38,880 --> 00:04:45,560
to the data you're training on such as having different features in the training data to the test data.

59
00:04:45,600 --> 00:04:52,080
Having this kind of mismatch can lead to models performing poorly on test data compared to training

60
00:04:52,080 --> 00:04:53,150
data.

61
00:04:53,160 --> 00:04:59,460
This is why it's important to ensure that training is done on the same kind of data as you'll be testing

62
00:04:59,460 --> 00:05:07,360
on and as close as possible to what you'll be using in your future applications other ways to combat

63
00:05:07,400 --> 00:05:11,200
under fitting include using a more advanced model.

64
00:05:11,200 --> 00:05:16,330
This could mean a totally different model or increasing the number of hyper parameters on your current

65
00:05:16,330 --> 00:05:16,750
model.

66
00:05:16,750 --> 00:05:21,520
Remember when we were cooking our chicken dish we might alter one of the hyper parameters of our oven

67
00:05:21,520 --> 00:05:22,720
by turning it up.

68
00:05:22,750 --> 00:05:27,340
That might be something that you might do on a machine learning model instead of only using two layers

69
00:05:27,340 --> 00:05:28,300
in a known network.

70
00:05:28,300 --> 00:05:29,660
You might use for.

71
00:05:29,890 --> 00:05:35,250
We'll see more of this in the future project you could also reduce the number of features you're trying

72
00:05:35,250 --> 00:05:35,850
to model.

73
00:05:35,850 --> 00:05:41,990
Maybe your data has too many features and the model you're using is struggling to find patterns in them.

74
00:05:42,740 --> 00:05:45,510
Finally you could train your model for longer.

75
00:05:45,650 --> 00:05:50,150
Sometimes models take longer to train or longer to learn than you'd expect.

76
00:05:50,150 --> 00:05:57,280
One of your experiments may involve a longer training phase to reduce overfishing.

77
00:05:57,280 --> 00:06:00,080
Useful solutions are to collect more data.

78
00:06:00,130 --> 00:06:06,580
More data will provide more potential patterns for a model to find and thus lower the potential for

79
00:06:06,580 --> 00:06:08,560
it to find them all.

80
00:06:08,650 --> 00:06:11,670
Or you could try use a less advanced model.

81
00:06:11,680 --> 00:06:14,430
This is uncommon but it's a possibility.

82
00:06:14,440 --> 00:06:21,280
The model you're using is too good at learning and it models your data too well be cautious of models

83
00:06:21,280 --> 00:06:25,440
performing too well as they might lead to incorrect predictions.

84
00:06:25,450 --> 00:06:33,070
Remember no model is perfect so be sure to check your good results as much as you check your poor results

85
00:06:34,130 --> 00:06:37,260
finally when comparing two different models to each other.

86
00:06:37,550 --> 00:06:43,340
It's important to ensure you're comparing apples with apples and oranges with oranges.

87
00:06:43,490 --> 00:06:51,090
For example Model 2 trained on dataset 1 vs. Model 3 trained on data set.

88
00:06:51,110 --> 00:06:57,710
1 During comparison you'll want to make sure you take into account not only the final result but what

89
00:06:57,710 --> 00:06:58,610
it took to get there.

90
00:06:59,360 --> 00:07:06,890
If Model 2 takes one second to make a prediction at ninety three point one percent accuracy Model 3

91
00:07:06,950 --> 00:07:15,350
takes 4 seconds to make a prediction at ninety four point seven accuracy is that extra 3 percent accuracy

92
00:07:15,500 --> 00:07:19,240
worth that extra 3 seconds of prediction time.

93
00:07:19,310 --> 00:07:26,870
Now this will depend on what the goal is but if you are optimizing for prediction time you want to make

94
00:07:26,870 --> 00:07:28,990
predictions as fast as possible.

95
00:07:29,030 --> 00:07:35,610
You might choose Model 2 because it makes predictions 4 times faster than model train but at a higher

96
00:07:35,610 --> 00:07:37,340
accuracy level.

97
00:07:37,340 --> 00:07:42,410
Again this will be different depending on what kind of application or production you use case you want

98
00:07:42,410 --> 00:07:49,100
to use but just some things to keep in mind is more than just how a model performs that goes into choosing

99
00:07:49,160 --> 00:07:50,440
which one you should use.

100
00:07:52,130 --> 00:07:55,000
A couple of things you want to remember from this lesson.

101
00:07:55,100 --> 00:08:00,090
Avoid overfishing and under feeding you want a model that heads towards generality.

102
00:08:00,100 --> 00:08:02,240
It's like when you do your practice exam.

103
00:08:02,240 --> 00:08:08,600
If you saw the final exam you might just become an expert memorization machine rather than someone who

104
00:08:08,600 --> 00:08:11,730
could use that knowledge in the real world.

105
00:08:11,810 --> 00:08:14,450
Keep the test set separate at all costs.

106
00:08:14,450 --> 00:08:20,810
When you split your data you want to have a training set and then throw away the test data set and lock

107
00:08:20,810 --> 00:08:21,140
it up.

108
00:08:21,370 --> 00:08:26,600
And once your model has been trained then you can open up the test data set you can unlock it take it

109
00:08:26,600 --> 00:08:33,220
out of the safe and see how your model performs when comparing models compare apples to apples have

110
00:08:33,230 --> 00:08:37,940
Model 1 and Model 2 on data set 1 and dataset 1.

111
00:08:38,210 --> 00:08:43,640
You want to make sure the two models you're comparing have been created in the same sort of environment

112
00:08:43,880 --> 00:08:47,750
so that you can ensure that what you're comparing is is legitimate.

113
00:08:47,750 --> 00:08:55,050
Comparisons finally won Best Performance Metric does not equal the best model.

114
00:08:55,050 --> 00:08:58,710
Remember now example you might be optimizing for prediction time.

115
00:08:58,710 --> 00:09:04,320
So although a model that makes a faster prediction doesn't get as high accuracy is as another model

116
00:09:04,320 --> 00:09:06,090
that takes a little bit longer.

117
00:09:06,170 --> 00:09:12,510
It might not matter to you because you need something that can predict as fast as possible.

118
00:09:12,510 --> 00:09:17,850
That was a lot but we'll see plenty more of this in action throughout the course.

119
00:09:17,850 --> 00:09:22,860
You'll also be using it throughout your entire machine learning Korean so it's important to remember

120
00:09:22,860 --> 00:09:24,240
these concepts.

121
00:09:24,240 --> 00:09:29,880
Let's push on to the next step and seeing how we can put all of this previous steps together in Step

122
00:09:29,880 --> 00:09:30,900
6.

123
00:09:30,900 --> 00:09:31,910
Experimentation.