1
00:00:00,610 --> 00:00:07,740
I continue with our cause, and in this video, I would order some data splitting, so I training the

2
00:00:07,740 --> 00:00:14,040
parameters of a prediction function and testing it on same data is an incorrect procedure.

3
00:00:14,040 --> 00:00:16,850
From my logical point of view.

4
00:00:17,190 --> 00:00:23,280
A model is you do is you simply to predict the same level as opposed to regular training.

5
00:00:24,240 --> 00:00:32,580
It would have above a score, but would not be able to predict anything useful on the data hasn't previously

6
00:00:33,210 --> 00:00:34,020
been explored.

7
00:00:34,500 --> 00:00:36,920
So this situation is called overfitting.

8
00:00:37,260 --> 00:00:44,460
So to avoid this is a common practice to run automatic learning experiments that are splitting to provide

9
00:00:44,460 --> 00:00:48,180
some of the data that is available on a training set and a test.

10
00:00:49,140 --> 00:00:56,930
So data splitting is an operation that allow us to divide the available data or to assess the NRA for

11
00:00:57,390 --> 00:01:00,320
for cross validation validation purposes.

12
00:01:00,810 --> 00:01:08,520
So easier to train a predictive model and to test the model performance training and testing the model

13
00:01:08,520 --> 00:01:14,790
forms on the form the basis for for the usage of the model, for prediction.

14
00:01:14,790 --> 00:01:21,900
And in predictive analytics, for example, if we want a dataset has a 100 rollerbladers.

15
00:01:23,480 --> 00:01:26,350
Which includes the predictable and respond variables.

16
00:01:26,690 --> 00:01:36,950
We will split it into a convenient ratio, say 70 and 30, and allocate 70 rows for training and 30

17
00:01:36,950 --> 00:01:37,930
rolls for testing.

18
00:01:38,510 --> 00:01:44,150
The role will be selected randomly to reduce bias once the training data is available.

19
00:01:44,450 --> 00:01:49,730
The data is fed to neural network to get massive universal function in place.

20
00:01:50,330 --> 00:01:57,680
That training data determines the ways, biases and activation function to be used so that we can get

21
00:01:58,190 --> 00:01:59,770
to output from input.

22
00:02:00,500 --> 00:02:07,970
So once sufficient convergence is achieved, the model is all in a memory and the next step is testing

23
00:02:07,970 --> 00:02:08,560
the model.

24
00:02:09,320 --> 00:02:12,590
So we lost the 30 rolls of data to check it be.

25
00:02:12,620 --> 00:02:21,110
Our actual output matches the predictable from the model, the evaluation issue to get the various metrics

26
00:02:21,470 --> 00:02:25,880
that can validate the model if accuracy is to worry.

27
00:02:27,020 --> 00:02:34,070
The model has to be rebuilt with the change in the training data and all the biometrics back to the

28
00:02:34,070 --> 00:02:41,510
new neural network builder, so to split the data, the second library has been used more specifically

29
00:02:41,750 --> 00:02:43,480
the cycle London model.

30
00:02:44,000 --> 00:02:48,510
And of course, selection, not an explicit function, has been used.

31
00:02:49,340 --> 00:02:51,470
So this function quickly covid.

32
00:02:52,370 --> 00:02:57,320
A random split into the training and I said, so let's create a function.

33
00:03:01,970 --> 00:03:03,920
So if we had to import a library.

34
00:03:07,840 --> 00:03:09,940
So is from Escalon.

35
00:03:13,070 --> 00:03:14,780
Doc Modahl.

36
00:03:16,650 --> 00:03:19,230
Selection is in part.

37
00:03:20,100 --> 00:03:25,330
Trying to split and at least try to make it easier for us.

38
00:03:25,380 --> 00:03:32,340
We would divide a satellite data frame into two, which is the predictors X and Y.

39
00:03:35,620 --> 00:03:36,310
Devi.

40
00:03:38,350 --> 00:03:39,410
They predict the.

41
00:03:40,930 --> 00:03:41,290
And.

42
00:03:43,580 --> 00:03:44,360
Which is why.

43
00:03:51,460 --> 00:03:52,480
And through this.

44
00:03:54,320 --> 00:03:57,590
We wish you the bonds that are friend revolution.

45
00:04:00,780 --> 00:04:03,090
So, Josh, you actually equal.

46
00:04:04,110 --> 00:04:07,230
Data scale not drop.

47
00:04:09,090 --> 00:04:09,480
Is.

48
00:04:13,380 --> 00:04:14,130
Um, uh.

49
00:04:15,590 --> 00:04:17,060
Axis equal one.

50
00:04:18,730 --> 00:04:23,290
And then bring I thought this.

51
00:04:24,930 --> 00:04:25,470
Roy.

52
00:04:30,910 --> 00:04:33,220
I think it could be capital I.

53
00:04:35,360 --> 00:04:43,250
And to make it more easier with that character, why so we won't let any confusion.

54
00:04:46,000 --> 00:04:49,390
Data scale is.

55
00:04:51,800 --> 00:04:52,280
Matt.

56
00:04:54,600 --> 00:05:00,480
And then bring white out this dry.

57
00:05:16,890 --> 00:05:21,570
I did make a mistake in here because it May Square bracket.

58
00:05:23,400 --> 00:05:26,280
Had run the sale and we got our Resul.

59
00:05:28,200 --> 00:05:37,380
So the band does not that our friend John Ross was available from Rose, our column Werrimull Rose,

60
00:05:37,380 --> 00:05:45,750
or by mystifying label names and corresponding assets, or by specifying the index or club names directly

61
00:05:46,200 --> 00:05:52,920
when using a multi index, labels on different label can be removed by specifying the label.

62
00:05:53,370 --> 00:05:54,560
So to extract it.

63
00:05:54,960 --> 00:05:59,610
We had to remove the target color map from the starting date scale data frame.

64
00:06:00,150 --> 00:06:00,900
So now.

65
00:06:01,800 --> 00:06:02,910
That being the.

66
00:06:04,780 --> 00:06:05,380
Exer.

67
00:06:12,340 --> 00:06:13,000
So.

68
00:06:16,390 --> 00:06:18,370
Now, let's try some gold for the.

69
00:06:21,080 --> 00:06:25,820
So it is a half hour, an hour, why is only the map Kolob?

70
00:06:27,790 --> 00:06:28,540
So.

71
00:06:31,010 --> 00:06:37,730
I had heard in Colombia and I had only one goal of which target, now we can split the frame.

72
00:06:39,370 --> 00:06:41,560
So that was uncalled for, that is.

73
00:06:42,750 --> 00:06:44,370
I underscored when.

74
00:06:46,070 --> 00:06:55,520
Come, I underscore, has come why underscore under Skytrain, comma, why underscored as equal train.

75
00:06:56,850 --> 00:07:02,370
Pressplay, which is X, Y and Z test, underscores I.

76
00:07:03,330 --> 00:07:12,420
So which a Kojiro high tree there are, because we split 70 percent for the training and.

77
00:07:15,000 --> 00:07:19,290
Thirty percent for the testing and then we get your random underscores that.

78
00:07:20,380 --> 00:07:21,220
Equal five.

79
00:07:22,460 --> 00:07:26,690
Now is very simple with just brain.

80
00:07:29,490 --> 00:07:30,570
Eyestrain.

81
00:07:32,660 --> 00:07:34,700
Shall we call?

82
00:07:35,940 --> 00:07:40,110
Don't I underscore train the.

83
00:07:42,450 --> 00:07:45,630
And then that could be.

84
00:07:48,250 --> 00:07:50,050
And one to.

85
00:07:53,100 --> 00:07:54,780
Thanks, Tess.

86
00:07:56,840 --> 00:07:57,610
And then.

87
00:08:00,970 --> 00:08:03,250
Why train so?

88
00:08:05,200 --> 00:08:06,760
Hectares and then.

89
00:08:08,420 --> 00:08:09,140
Why train?

90
00:08:11,390 --> 00:08:16,730
And then we should bring the White House, so why?

91
00:08:18,380 --> 00:08:18,920
This.

92
00:08:24,340 --> 00:08:25,150
And then why?

93
00:08:27,410 --> 00:08:32,070
So that explains it before we run the court.

94
00:08:32,750 --> 00:08:34,670
So in here we.

95
00:08:36,360 --> 00:08:39,610
Trent Lott split from John, showed up for Paramatta.

96
00:08:40,880 --> 00:08:47,090
I guess I am not upset, so I said, why are predicting an attack at the frame?

97
00:08:47,340 --> 00:08:49,610
So I said, why is a predictor?

98
00:08:49,650 --> 00:08:55,580
Why the target cyber parameter can take the following Thai's plot in dargah or not.

99
00:08:55,950 --> 00:09:04,860
So the option is Diffa with a zero point two five is between zero point zero and one point zero is present.

100
00:09:04,860 --> 00:09:07,170
A proportion of the data set a goal.

101
00:09:08,530 --> 00:09:17,830
In the split, so if the parameter is your is and the absolute number of decibels, if the parameter

102
00:09:17,830 --> 00:09:20,980
is not, the value is set to complement that.

103
00:09:21,900 --> 00:09:26,010
Trained inside somebody for the values to zero to five.

104
00:09:26,340 --> 00:09:33,000
So in our case, we set aside zero poetry, which means 30 percent of the data is divided up as a test

105
00:09:33,000 --> 00:09:33,450
data.

106
00:09:34,080 --> 00:09:42,540
So the last one is random set parameter is you to set it used by the random number generator.

107
00:09:42,930 --> 00:09:44,610
So in this way, the.

108
00:09:46,480 --> 00:09:53,080
30, the Rabbitohs splitting up the operation, is Gardet now executed, I have to say, what is the

109
00:09:53,090 --> 00:09:53,830
result we get?

110
00:09:58,340 --> 00:09:59,090
And.

111
00:10:01,000 --> 00:10:02,860
I got here because.

112
00:10:04,070 --> 00:10:05,320
It's to become a.

113
00:10:06,360 --> 00:10:07,110
Not adult.

114
00:10:15,240 --> 00:10:16,730
And we got our resolve.

115
00:10:17,830 --> 00:10:18,550
So.

116
00:10:20,000 --> 00:10:28,580
The and the frame is split into two datasets that had two hundred and fifty four rows for Ekstrand and

117
00:10:29,210 --> 00:10:36,650
one hundred and fifty two rows for the act as a similar subdivision for the Y.

118
00:10:37,920 --> 00:10:39,670
And that is the end of this video.

119
00:10:39,930 --> 00:10:43,760
I hope you enjoy it and I will see you in the next video.