1
00:00:01,610 --> 00:00:03,270
So once we have trained our model.

2
00:00:03,860 --> 00:00:10,220
I told you that we can quantify the quality of fate of our model using the means squared error term,

3
00:00:10,640 --> 00:00:15,980
which is given by this formula, is basically residual more squares divided by N.

4
00:00:18,050 --> 00:00:23,350
If the predicted responses are very close to these observations, the means good at it will be small.

5
00:00:27,300 --> 00:00:34,410
But if we find out, I mean, square error on these same data which we use to bring our model, it is

6
00:00:34,410 --> 00:00:35,640
called training MASC.

7
00:00:38,490 --> 00:00:40,890
Reigning in it is not something we are interested in.

8
00:00:41,460 --> 00:00:48,060
We are interested in the accuracy of these predictions when we apply our method to previously unseen

9
00:00:48,250 --> 00:00:49,000
best data.

10
00:00:50,670 --> 00:00:53,280
For example, I'm predicting home price.

11
00:00:54,470 --> 00:01:00,350
I don't really care how well our method predicts house price of previously completed transactions.

12
00:01:00,950 --> 00:01:02,220
I get about Holwell.

13
00:01:02,300 --> 00:01:05,630
Will it predict the whole place of the future transactions?

14
00:01:07,130 --> 00:01:12,290
Similarly, if I want to predict the risk of a particular disease in different individuals.

15
00:01:13,400 --> 00:01:18,310
I want to do it for future patients and not for the ones I already know the outcome for.

16
00:01:19,690 --> 00:01:21,280
So what we're going to do is.

17
00:01:22,490 --> 00:01:25,610
We are going to split our data into two parts.

18
00:01:26,060 --> 00:01:27,820
One will be called training set.

19
00:01:29,360 --> 00:01:31,040
This will be used to train the model.

20
00:01:31,970 --> 00:01:34,910
And the other part will be called the test said.

21
00:01:36,430 --> 00:01:41,790
This will be the unseen data and it will be used to assess the accuracy of our model.

22
00:01:43,620 --> 00:01:51,250
So mathematically, I have these in pairs of observations, X one by one, X two way to do X and Y in

23
00:01:52,260 --> 00:01:57,090
these will be part of my training data and I'll use them to train my model.

24
00:01:57,660 --> 00:02:02,360
Once I have used them, I will have identified the functional form of play.

25
00:02:02,580 --> 00:02:04,020
That is the ethics.

26
00:02:06,000 --> 00:02:11,970
Now, I will use previously unseen set of observations exito ways to.

27
00:02:13,830 --> 00:02:16,470
These observations will come from our guest said.

28
00:02:17,730 --> 00:02:25,060
And I will try to find out the best error which is given by this formula, so best mean squared, it

29
00:02:25,620 --> 00:02:26,490
is average of.

30
00:02:27,540 --> 00:02:35,130
Squared of difference between the predicted value of life to the actual value of life on the given BASTIDA.

31
00:02:37,140 --> 00:02:38,960
Two, four different types of models.

32
00:02:39,560 --> 00:02:46,390
I will compare the value of this test error and then to LIGNE model with least tested.

33
00:02:48,770 --> 00:02:52,490
I hope you understand the idea behind having best entering data.

34
00:02:53,970 --> 00:02:59,580
Basically, we have training data and corresponding training at edit body models selected.

35
00:03:01,120 --> 00:03:07,030
But there is no guarantee that the matter with Lewis reading it, it will also have law tested at.

36
00:03:08,760 --> 00:03:15,870
Roughly speaking, many statistical methods specifically estimate redoes does so that we are able to

37
00:03:15,870 --> 00:03:17,430
minimize prunings, Ed.

38
00:03:18,760 --> 00:03:25,590
And for such methods, all the training so that there will be small, but the actual test set, it can

39
00:03:25,590 --> 00:03:26,450
be quite large.

40
00:03:30,210 --> 00:03:33,010
In this graph, you can see four different lamed.

41
00:03:35,150 --> 00:03:36,080
This black one.

42
00:03:37,330 --> 00:03:38,410
Is the true function.

43
00:03:40,180 --> 00:03:41,550
Mitt, we want to predict.

44
00:03:43,610 --> 00:03:47,110
This orange line is the output of a linear regression model.

45
00:03:48,520 --> 00:03:53,590
And these blue and green lines are the result of some other more flexible models.

46
00:03:55,650 --> 00:04:00,200
And these small circle that we are seeing are the data points which were used to train the model.

47
00:04:02,770 --> 00:04:09,640
You can see as I am increasing the flexibility of the model, that is, I'm allowing it to change its

48
00:04:09,640 --> 00:04:11,620
shape or its direction.

49
00:04:11,650 --> 00:04:15,220
Many times it is touching more points on this graph.

50
00:04:16,920 --> 00:04:23,940
So this green card, which has high flexibility, is putting the maximum number of point, whereas the

51
00:04:24,000 --> 00:04:28,730
orange girl, which has least flexibility, is touching very few points.

52
00:04:30,730 --> 00:04:34,050
You can see after a certain level of flexibility.

53
00:04:35,580 --> 00:04:38,900
This flexibility is making the code more bigly.

54
00:04:39,120 --> 00:04:44,610
That is it is following the individual data points and not the overall function.

55
00:04:45,860 --> 00:04:52,460
To the effect of flexibility on training at it and tested, it can be seen on the graph, on debate.

56
00:04:54,540 --> 00:05:01,310
You can see that this great plot is off Gradinger as you keep on increasing the flexibility, the training,

57
00:05:01,330 --> 00:05:02,860
it keeps coming down.

58
00:05:04,140 --> 00:05:10,650
That is the model will be fitting or will be getting a lot of these sample points.

59
00:05:11,770 --> 00:05:19,720
But after a certain point, the test it which is given by this red, got it start increasing with the

60
00:05:20,170 --> 00:05:21,460
increasing flexibility.

61
00:05:24,020 --> 00:05:32,480
You can see this Orange Point is the best entrain ever for this orange glow, which is District Lanco.

62
00:05:34,810 --> 00:05:41,410
This blue point is for the blue go, which is approximating the blue function very closely.

63
00:05:41,980 --> 00:05:46,030
And this green point is for the green go, which is very flexible.

64
00:05:47,020 --> 00:05:49,690
It has low training other than the bluegill.

65
00:05:51,260 --> 00:05:58,410
Is putting these points more closely, but it has hired Estero because it is not approximating the true

66
00:05:58,410 --> 00:05:58,950
function.

67
00:06:01,490 --> 00:06:03,720
So we want to identify this blue point.

68
00:06:04,340 --> 00:06:12,350
When we get the minimum tested it out, there are several techniques to split the data into training

69
00:06:12,350 --> 00:06:15,760
and test so that we can find this minimum point.

70
00:06:18,850 --> 00:06:22,300
So we are going to discuss the three most popular techniques.

71
00:06:23,200 --> 00:06:25,060
First is called Vegetation's and Approach.

72
00:06:25,570 --> 00:06:28,320
Second is Leverne out cross-validation.

73
00:06:29,350 --> 00:06:31,630
And the third one is gatefold cross-validation.

74
00:06:33,850 --> 00:06:38,080
The first technique, which is validation set approach, is the simplest approach.

75
00:06:39,200 --> 00:06:45,650
In this matter, we will randomly divide the data into two parts, a training set and a tested.

76
00:06:46,970 --> 00:06:51,350
The model will be fitted on the training set and once the model is trained.

77
00:06:52,670 --> 00:06:55,370
The Air Force did this, it will be calculated.

78
00:06:55,760 --> 00:06:56,720
Estimate detested.

79
00:06:58,320 --> 00:07:00,540
We usually do a split of 80, 20.

80
00:07:00,660 --> 00:07:04,530
That is, we use 80 percent of the data for training purposes.

81
00:07:04,830 --> 00:07:07,800
And 20 percent of the data for testing purposes.

82
00:07:09,170 --> 00:07:12,790
We'll be running this approach in our software package in a separate video.

83
00:07:15,370 --> 00:07:17,610
There are basically two limitations of this approach.

84
00:07:18,090 --> 00:07:22,350
One is that part of the available data will not be used for training.

85
00:07:23,690 --> 00:07:28,430
And as we know, the more data we use during training, better will be the performance of the model.

86
00:07:29,600 --> 00:07:34,400
So if we keep some data for testing, the green model will not be as good.

87
00:07:36,620 --> 00:07:41,180
And if you have a limited number of observations, your training will be severely impacted.

88
00:07:43,340 --> 00:07:50,090
Secondly, the best it can be, highly variable, depending on which observation is selected for training

89
00:07:50,510 --> 00:07:52,580
and which observation is selected for testing.

90
00:07:53,750 --> 00:07:58,010
So to handle these two issues, there are these two alternative approaches.

91
00:07:59,280 --> 00:08:01,320
In the Leverne out cross-validation.

92
00:08:02,350 --> 00:08:04,110
Suppose we have an observations.

93
00:08:04,900 --> 00:08:06,460
We'll keep the first observation.

94
00:08:07,800 --> 00:08:14,580
For testing purposes and random order on delimiting and minus one observations, then we will keep the

95
00:08:14,580 --> 00:08:17,160
second observation for testing purposes.

96
00:08:18,190 --> 00:08:22,150
And run the model on the remaining and minus one observations.

97
00:08:22,810 --> 00:08:30,070
And we will run this model and things that every time we'll keep one observation or testing and the

98
00:08:30,070 --> 00:08:31,600
other and minus one for training.

99
00:08:32,960 --> 00:08:38,330
And will take the average of the error on each of these testing of the missions.

100
00:08:41,070 --> 00:08:47,690
So since we'll need to run this model and times, this method can be computationally expensive.

101
00:08:49,020 --> 00:08:54,330
So an alternative to this Libano cross-validation is the K4 cross-validation.

102
00:08:55,900 --> 00:08:58,470
And this we were divided into two kasit.

103
00:09:00,590 --> 00:09:04,660
And then we're trained to do it on game minus one six.

104
00:09:05,030 --> 00:09:07,970
And Newsday gets it for testing purposes.

105
00:09:09,870 --> 00:09:10,650
You can see that.

106
00:09:11,250 --> 00:09:15,000
Leave a note cross-validation is a special case of Caple validation.

107
00:09:16,610 --> 00:09:17,240
If you have.

108
00:09:18,310 --> 00:09:23,130
Gays equal to N then give or validation and leave a note cross-validation are the same thing.

109
00:09:24,370 --> 00:09:27,440
So we will not be covering these two techniques in this software package.

110
00:09:27,820 --> 00:09:32,020
We will only be running the validations at abroad and our software package.