1
00:00:00,570 --> 00:00:08,340
In this video, we will learn about this string split, four time series data, that trend is split

2
00:00:08,340 --> 00:00:09,070
four times.

3
00:00:09,180 --> 00:00:12,570
This data is very different from that strain.

4
00:00:12,570 --> 00:00:18,690
Is it for any other machine learning algorithms such as linear regression, logistic regression decision,

5
00:00:18,690 --> 00:00:19,610
trees, etc.?

6
00:00:21,060 --> 00:00:25,050
The first differences for other machine learning algorithms.

7
00:00:25,230 --> 00:00:31,720
We randomly choose a subsegment of our data as a set or validation set.

8
00:00:33,510 --> 00:00:42,210
But we cannot randomly choose data from more time series because in case of TIME cities, their data

9
00:00:42,300 --> 00:00:47,550
is organized in a particular order using dates or time values.

10
00:00:48,270 --> 00:00:53,100
And we can not randomly choose a particular value from this time cities.

11
00:00:54,540 --> 00:01:01,140
The second difference is that for other machine learning algorithms, we create three different sets

12
00:01:01,590 --> 00:01:04,410
test cream and validation set.

13
00:01:05,040 --> 00:01:07,400
We usually train our model on train set.

14
00:01:08,730 --> 00:01:11,850
We tune out what hyper parameters using validation set.

15
00:01:12,210 --> 00:01:18,370
And then and then we finally use the model to predict values on the test set.

16
00:01:20,910 --> 00:01:22,320
But in times it is data.

17
00:01:22,530 --> 00:01:26,310
We usually divide our data in just test and cream.

18
00:01:27,270 --> 00:01:36,450
This is mainly because in most of the cases we have limited data and using validation, data and training,

19
00:01:36,450 --> 00:01:41,910
the model makes more sense than using the validation data to validate the model.

20
00:01:43,140 --> 00:01:47,230
So for time cities, we just divide our data into test and train.

21
00:01:47,940 --> 00:01:53,130
And we do not randomly pick some values to create test set.

22
00:01:54,960 --> 00:02:01,650
We generally store last few values of our time cities to act as a test site.

23
00:02:03,600 --> 00:02:11,550
So if you have a month levalley data for some time cities, you may take last three or four months as

24
00:02:11,550 --> 00:02:12,480
your test set.

25
00:02:15,400 --> 00:02:19,030
Now, let's start creating test string data in Python.

26
00:02:21,340 --> 00:02:26,680
We will be using the BAILLY minimum temperature dataset that we were using earlier.

27
00:02:27,560 --> 00:02:31,980
So the data frame name is temp underscore B.F..

28
00:02:32,540 --> 00:02:34,730
Oh, let's look at first five values.

29
00:02:37,220 --> 00:02:45,020
You can see that we have two columns, date and time, 10 percent for templated, the date column contained

30
00:02:45,020 --> 00:02:47,600
the date time values for our time cities.

31
00:02:48,470 --> 00:02:51,230
You can see this, the daily data of Embitter.

32
00:02:52,040 --> 00:02:58,610
So in the first row, we have Daito 1st of January in the second rowby of data of 2nd January.

33
00:03:00,230 --> 00:03:02,120
We can also look at the last five mandu.

34
00:03:10,630 --> 00:03:13,210
To get an idea of the lost value of these cities.

35
00:03:13,520 --> 00:03:15,530
So these are the last five lives.

36
00:03:16,150 --> 00:03:21,850
You can see we have data of 10 years from 1981 to 1990.

37
00:03:23,920 --> 00:03:30,270
Now, let's look at how many values we have in our data frame.

38
00:03:30,580 --> 00:03:32,470
So we are using dots, shape, attribute.

39
00:03:37,020 --> 00:03:45,250
We have three thousand six hundred and fifty groups and just two columns now.

40
00:03:46,510 --> 00:03:55,120
We are planning to use 80 percent of this data as our train site and the remaining 20 as a test site.

41
00:03:58,120 --> 00:04:06,760
So what we are going to do out of this 10 years of data, we will use first 88 as our train site and

42
00:04:06,760 --> 00:04:09,520
last two years as our test site.

43
00:04:10,150 --> 00:04:13,750
So not just get the first value of this shape.

44
00:04:14,320 --> 00:04:18,250
So this is the number of roads we have in our data.

45
00:04:19,630 --> 00:04:26,140
Now, we will create another way, even that is train size, in which we will get how many records we

46
00:04:26,140 --> 00:04:27,940
want in our train site.

47
00:04:28,600 --> 00:04:35,450
So we have three thousand six hundred and fifty records and we want to take 80 percent of these records

48
00:04:35,470 --> 00:04:36,400
and do our train set.

49
00:04:36,730 --> 00:04:41,660
So we are using and then bam dot be a dot chip.

50
00:04:42,160 --> 00:04:47,470
And we are getting the first element of our shape happen, which is three six five zero.

51
00:04:47,980 --> 00:04:50,570
And we are multiplying this value with Boin Day.

52
00:04:51,370 --> 00:04:54,390
And finally, we are converting this value in.

53
00:04:54,620 --> 00:05:02,770
And so suppose after multiplication we get a value of suppose two thousand six hundred four point five.

54
00:05:03,070 --> 00:05:06,520
We don't want that point five decimal value.

55
00:05:06,790 --> 00:05:13,570
So we are converting this value into an integer value because sizes this should be an integer value.

56
00:05:15,990 --> 00:05:20,830
Let's spend this Sauveur train sizes, two nine two zero records.

57
00:05:21,610 --> 00:05:26,500
So we want two nine two zero records in our train set and remaining records intersect.

58
00:05:29,500 --> 00:05:33,910
Not to create, Green said, we will split our time.

59
00:05:34,050 --> 00:05:34,260
Yep.

60
00:05:34,890 --> 00:05:42,430
Dappling, we will take first two nine two zero records as green and remaining records from two nine

61
00:05:42,430 --> 00:05:48,460
two zero two three six five zero index as asked to do that.

62
00:05:48,610 --> 00:05:54,680
We will use slicing of data frames so we will just see spam D.F..

63
00:05:55,690 --> 00:05:59,240
Now we won all the records from first records.

64
00:05:59,410 --> 00:06:06,580
So the index is CEDO for the first record we want, although we lose from zero to the screen size screen

65
00:06:06,580 --> 00:06:08,090
sizes to nine two zero.

66
00:06:08,770 --> 00:06:10,600
So we will have all the records.

67
00:06:11,620 --> 00:06:16,090
We're indexes between zero two two nine two zero.

68
00:06:18,520 --> 00:06:22,720
You can see these are the indexes just before the column names.

69
00:06:23,620 --> 00:06:25,100
We are getting indexes.

70
00:06:25,630 --> 00:06:27,160
When we look at our data frame.

71
00:06:30,340 --> 00:06:34,240
So let's run this now for the test.

72
00:06:35,050 --> 00:06:39,250
We will need all the records, we're indexes that are then to name to zero.

73
00:06:39,790 --> 00:06:43,200
So here we are again, selecting over time B.F..

74
00:06:43,690 --> 00:06:45,610
And we are selecting all the data.

75
00:06:46,450 --> 00:06:50,650
We're the index is greater than two nine two zero.

76
00:06:50,890 --> 00:06:54,220
So we had, I think, queen size here and then colon.

77
00:06:57,400 --> 00:07:04,050
And cooler, and then we are not writing anything, leaving it blank mean till the end.

78
00:07:06,390 --> 00:07:09,070
Similarly, we can also mention this last number here.

79
00:07:09,600 --> 00:07:13,980
So Green underscore the size, then call than three six, four lane.

80
00:07:14,040 --> 00:07:15,010
This will give the same.

81
00:07:20,640 --> 00:07:21,790
So let's run this.

82
00:07:23,890 --> 00:07:25,570
Let's debate what we have done.

83
00:07:25,900 --> 00:07:35,830
We have selected all the data from Index zero to the screen size as green, and we have selected the

84
00:07:35,830 --> 00:07:42,260
remaining data, which is from the screen size index to the last index as our work as well.

85
00:07:43,880 --> 00:07:47,830
Now let's look at the shape of forward green and as dataset.

86
00:07:50,930 --> 00:08:00,410
You can see we have two nine two zero records and train and we have remaining 730 records in test.

87
00:08:04,000 --> 00:08:08,920
This is always split data and to test and train for time cities.

88
00:08:11,200 --> 00:08:15,670
Now let's discuss another concept here that is walk forward validation.

89
00:08:16,900 --> 00:08:20,380
So suppose this light gray is your training data.

90
00:08:21,880 --> 00:08:30,940
What we usually do is we train them for more than, say, M1 on this training set, and then we will

91
00:08:30,940 --> 00:08:33,580
use this model to predict the future values.

92
00:08:34,030 --> 00:08:42,230
So we will use the same model to predict the value at time, even when doing time to do well over time.

93
00:08:42,430 --> 00:08:43,960
Three and so on.

94
00:08:46,540 --> 00:08:53,460
But since at time t two, we have all the information available for time, Devinn.

95
00:08:54,070 --> 00:09:02,200
So if you want to predict the value at Time B3 and you have the additional values at time B1 and Time

96
00:09:02,200 --> 00:09:11,470
T2, you can use these two new records to improve your model and then predict the value at time B three.

97
00:09:13,330 --> 00:09:21,760
Similarly, if you want to predict the value at time B five and you already have data of all the values

98
00:09:21,960 --> 00:09:29,530
and time before you want to use all the available information with you, you don't want to use your

99
00:09:29,530 --> 00:09:36,670
same M1 model that you have created at time, even because in that case you are missing out on values

100
00:09:36,700 --> 00:09:39,900
that are available at time B two to three and four.

101
00:09:42,070 --> 00:09:49,610
So what we can do is if we want to train a more modern four time T two, we can take all the values

102
00:09:49,650 --> 00:09:54,940
till time to even create a modern and use that model to predict the value at time.

103
00:09:54,950 --> 00:09:59,720
T2 If we want to predict the value at time D3.

104
00:10:00,190 --> 00:10:03,040
We will take all the values to retain T2.

105
00:10:03,730 --> 00:10:05,410
So we will take all the values.

106
00:10:05,440 --> 00:10:09,090
We'll create another model and predict the value at twenty three.

107
00:10:10,130 --> 00:10:17,590
Similarly, if we want to predict any value at the time to five, we will take all the information that

108
00:10:17,590 --> 00:10:25,150
is available to us the time before we'll create a model on that and then we will predict the value at

109
00:10:25,150 --> 00:10:25,540
time.

110
00:10:25,630 --> 00:10:25,880
P.

111
00:10:28,540 --> 00:10:37,350
So for our test set also, we are not going to create a single model and predict the values for the

112
00:10:37,370 --> 00:10:43,240
seven party records for the first record in our test set.

113
00:10:43,540 --> 00:10:51,160
We are going to use this train set and create a model and predict for the first value, for the second

114
00:10:51,160 --> 00:10:52,090
value of our test.

115
00:10:52,100 --> 00:10:55,120
It will take all the values of our train set.

116
00:10:55,360 --> 00:11:02,230
We will date the first when you afford a set, and then we will predict the value for our second record

117
00:11:02,350 --> 00:11:03,340
of our test set.

118
00:11:05,050 --> 00:11:12,490
So this way of validation is known as walk forward of elevation, and it usually give us more accuracy

119
00:11:12,700 --> 00:11:14,010
than a single time series.

120
00:11:14,020 --> 00:11:14,490
More than.

121
00:11:17,740 --> 00:11:21,120
So we will learn how to use walk forward validation as well.

122
00:11:22,540 --> 00:11:25,240
Along with creating a single day time model.

123
00:11:26,500 --> 00:11:27,590
That's all for this video.

124
00:11:27,730 --> 00:11:28,170
Thank you.