1 00:00:00,830 --> 00:00:07,400 In this video, we will learn how to perform Upsampling and Downsampling in Biton. 2 00:00:10,130 --> 00:00:16,790 We have already covered downsampling Upsampling in order to read lecture's downsampling means. 3 00:00:18,220 --> 00:00:24,610 Converting our data from higher frequency data to some lower frequency data. 4 00:00:26,470 --> 00:00:34,930 Which means converting monthly data and to easily data, converting daily data into weekly data or converting 5 00:00:34,930 --> 00:00:37,480 any weekly data into monthly data. 6 00:00:40,060 --> 00:00:44,220 So let's look at how to perform downsampling in Python. 7 00:00:45,550 --> 00:00:53,530 We will be using the same dataset that we were using earlier, which is US airline miles data. 8 00:00:54,010 --> 00:00:57,640 This data is available in the resources section of this video. 9 00:00:59,590 --> 00:01:01,610 So let's first import the data. 10 00:01:03,760 --> 00:01:12,730 I am already mentioning which column contains the data, and I am also mentioning the hiders are present 11 00:01:12,820 --> 00:01:13,430 in the Post. 12 00:01:15,160 --> 00:01:19,000 So let's look at top five rows of this data. 13 00:01:22,090 --> 00:01:26,890 So here you can see this is the monthly data in the first stroke. 14 00:01:27,010 --> 00:01:29,500 We have a data of Jan 1963. 15 00:01:29,710 --> 00:01:33,810 And in the second row, we have the data of Fab 1963. 16 00:01:34,780 --> 00:01:38,170 The dates columns are available in a column named Month. 17 00:01:38,590 --> 00:01:41,460 And the values are available in the column. 18 00:01:41,980 --> 00:01:44,590 Miles and miles and millions. 19 00:01:47,720 --> 00:01:51,210 So let's first convert this data to quarterly data. 20 00:01:52,890 --> 00:01:55,290 The function we are going to use is very simple. 21 00:01:56,580 --> 00:02:03,270 So here we are creating a new data stream that is quarterly miles data stream. 22 00:02:04,260 --> 00:02:05,950 And we are mentioning Miles, dear. 23 00:02:06,180 --> 00:02:10,200 And we are using these simple function in resemble function. 24 00:02:10,320 --> 00:02:13,230 You have to provide at least two arguments. 25 00:02:14,220 --> 00:02:17,040 First, you have to provide the new frequency of your data. 26 00:02:17,550 --> 00:02:20,160 So since we are going to generate quarterly data. 27 00:02:20,670 --> 00:02:21,890 I have mentioned Cuvier. 28 00:02:22,770 --> 00:02:27,240 And next, you have to mention the date column of your data. 29 00:02:27,810 --> 00:02:31,950 So in our data, the data column is present in this month's column. 30 00:02:32,670 --> 00:02:37,680 So here we are right on equal to then the name of the column. 31 00:02:37,860 --> 00:02:43,740 So the name of the column is Month and then we need some aggregate function. 32 00:02:44,160 --> 00:02:46,850 So in the quarter there are three values. 33 00:02:47,430 --> 00:02:49,340 So suppose this is the first quarter. 34 00:02:49,380 --> 00:02:50,400 These three values. 35 00:02:51,690 --> 00:02:55,500 Now we need to use an aggregate function to summarize this data. 36 00:02:56,130 --> 00:02:59,430 We can either take mean of these three values. 37 00:02:59,490 --> 00:03:07,230 We can take max of these three values or minimum value of these three values, or we can take total 38 00:03:07,230 --> 00:03:08,940 sum of all these three values. 39 00:03:09,930 --> 00:03:13,020 So here we are planning to take mean of the values. 40 00:03:13,060 --> 00:03:16,620 That's when we are using Daudt mean aggregate method. 41 00:03:18,270 --> 00:03:25,460 So again, we are creating a new dedapper, importantly minus D.F. from our minds data frame. 42 00:03:26,200 --> 00:03:30,900 Then we are using this simple function for aggregating data. 43 00:03:32,010 --> 00:03:33,840 And we want to add new data. 44 00:03:35,140 --> 00:03:39,540 To be, of course, a live frequency, the date column is the month column. 45 00:03:39,910 --> 00:03:43,240 And we want to get the mean of the values. 46 00:03:43,930 --> 00:03:48,260 So let's run this and let's look at first. 47 00:03:48,280 --> 00:03:50,110 Five values of my dataset. 48 00:03:52,540 --> 00:03:53,680 So this is the value. 49 00:03:54,340 --> 00:03:59,350 You can see we have downsampled our data from monthly data to quarterly data. 50 00:03:59,950 --> 00:04:01,810 Here we have the data our first quarter. 51 00:04:02,380 --> 00:04:09,240 The mean value of these three values is around six six nine six point three three for the next quarter 52 00:04:09,240 --> 00:04:10,540 to the mean value is this. 53 00:04:10,690 --> 00:04:11,350 And so on. 54 00:04:12,340 --> 00:04:17,350 So overall, we have reduced the frequency of forward data. 55 00:04:22,050 --> 00:04:25,530 So, again, these are key words, you can use other key words as well. 56 00:04:25,890 --> 00:04:35,790 So for annual, you can use it instead of you for weekly data, you can use the blue inside of Q and 57 00:04:35,790 --> 00:04:36,330 so on. 58 00:04:37,800 --> 00:04:45,120 So let's again convert this monthly data into yearly data. 59 00:04:45,840 --> 00:04:51,370 So we are creating another data frame that is a total minus data thing. 60 00:04:52,080 --> 00:04:54,350 We are taking value from minus the doubling. 61 00:04:54,720 --> 00:04:58,350 And again, we are reassembling it at a new level. 62 00:04:58,890 --> 00:05:03,250 So a sense for year or annual. 63 00:05:04,590 --> 00:05:06,600 And the data column is the month column. 64 00:05:08,300 --> 00:05:11,570 And here we want the sum of all the values. 65 00:05:12,140 --> 00:05:17,770 So for the first year, we won some of all these values of Twellman in the next year. 66 00:05:17,990 --> 00:05:22,350 We won some of all the twelve values of next year. 67 00:05:23,510 --> 00:05:25,340 So let's run this. 68 00:05:26,620 --> 00:05:30,460 Also, look at first five venues for decoupling. 69 00:05:31,550 --> 00:05:39,440 So you can see we have the last day of 1960, three years, and we have the total minus of that year. 70 00:05:41,600 --> 00:05:44,090 So these are the two examples of downsampling. 71 00:05:44,150 --> 00:05:47,780 First, we have converted our monthly return to port elevator. 72 00:05:48,350 --> 00:05:55,970 And here we have converted our monthly data and do and the old data that the sum of all the monthly 73 00:05:55,970 --> 00:05:56,540 values. 74 00:05:58,400 --> 00:06:01,970 Now, here is a list of all the arguments that you can pass. 75 00:06:03,170 --> 00:06:04,250 So this is the list. 76 00:06:05,150 --> 00:06:11,780 If you have a daily data, you can convert it into weekly data using W.S. as an argument. 77 00:06:13,190 --> 00:06:15,940 We have already used quarter and a Q4. 78 00:06:16,320 --> 00:06:19,430 You sense for quarter and a sense for here. 79 00:06:19,480 --> 00:06:24,350 And you can also try the arguments as well for this data. 80 00:06:27,510 --> 00:06:36,330 So if you notice in the last video, we also summarized our data using group by function. 81 00:06:37,590 --> 00:06:46,230 So I guess when we were converting over monthly data into you alluded to here, we use grew by function. 82 00:06:46,500 --> 00:06:49,830 So here we first created the user column. 83 00:06:50,430 --> 00:06:57,780 And then we use this year for them to summarize this, Miles, M-m data and getting the mean value on 84 00:06:57,900 --> 00:06:58,950 an yearly level. 85 00:07:00,690 --> 00:07:07,200 So you can also use these simple and sort of these school functions to get the same desert. 86 00:07:12,790 --> 00:07:14,350 So this is down something. 87 00:07:14,710 --> 00:07:21,250 Next, we have Upsampling up seven Pling means increasing the frequency of your data. 88 00:07:21,520 --> 00:07:25,570 So if you have monthly data and you want to deliver data. 89 00:07:26,640 --> 00:07:29,320 Then you will convert your monthly return to daily data. 90 00:07:29,870 --> 00:07:35,640 And for each man, you will have thirty 31 or 28 values depending on the month. 91 00:07:36,600 --> 00:07:42,610 So decreasing the frequency of your data means downsampling and increasing the frequency means upsampling. 92 00:07:44,880 --> 00:07:50,070 So for Upsampling also, we are going to use same minus data. 93 00:07:50,490 --> 00:07:54,360 And here we are going to convert our monthly data into daily data. 94 00:07:57,330 --> 00:07:58,380 The function is similar. 95 00:07:58,770 --> 00:08:00,140 We will use three sample. 96 00:08:01,020 --> 00:08:06,870 And earlier when we were down, something we were using quarterly or annually. 97 00:08:07,530 --> 00:08:10,740 Here we are using B B sense for daily. 98 00:08:12,150 --> 00:08:14,940 And we will use the same statement. 99 00:08:15,450 --> 00:08:19,230 We have just changed the frequency from A to daily. 100 00:08:20,370 --> 00:08:21,750 So let's run this. 101 00:08:23,770 --> 00:08:27,130 Let's look first to define values of what you did say. 102 00:08:30,080 --> 00:08:37,360 We are calling this new day to set up some minds to frame this on the first 35 values. 103 00:08:39,200 --> 00:08:44,380 You can see earlier we only had 1963 zero one zero one. 104 00:08:44,510 --> 00:08:48,260 And the next well, it was 1963 zero two zero one. 105 00:08:51,600 --> 00:08:53,790 So this one of the first four values. 106 00:08:53,820 --> 00:08:57,830 Now we have our own party, more values between these two dates. 107 00:09:01,030 --> 00:09:08,500 So you can see the second row here is of 2nd January 1963 and so on. 108 00:09:11,380 --> 00:09:15,020 And now this is the second value of what? 109 00:09:15,300 --> 00:09:19,180 Your column, first Fab 1963. 110 00:09:20,620 --> 00:09:27,260 You can also notice that for the newly created growth we have and they're and as values of minds. 111 00:09:29,470 --> 00:09:35,380 So oral you can see that we have only created the structural forward and you the frame. 112 00:09:35,540 --> 00:09:39,410 We have not fully values of miles in our new data. 113 00:09:40,570 --> 00:09:49,390 So the next step of upsampling is to fill this empty values with some values using the already present 114 00:09:49,390 --> 00:09:49,930 values. 115 00:09:52,150 --> 00:09:55,360 So just look at this Faustman value here. 116 00:09:55,520 --> 00:09:58,960 Starting with Lewis six thousand eight hundred twenty seven. 117 00:09:59,680 --> 00:10:04,420 And the ending with Lewis, six thousand one hundred and seventy eight. 118 00:10:07,470 --> 00:10:09,890 Now, how can we feel this ambivalence? 119 00:10:11,600 --> 00:10:15,830 Now, it would make more sense to go straight line between these two values. 120 00:10:15,950 --> 00:10:20,610 Six thousand eight hundred twenty seven and six thousand one hundred and seventy eight. 121 00:10:21,480 --> 00:10:22,910 And feel all these values. 122 00:10:22,980 --> 00:10:24,510 According to that straight line. 123 00:10:25,560 --> 00:10:31,520 So suppose this value can be six thousand eight hundred and twenty two. 124 00:10:31,710 --> 00:10:35,560 This well, you can be six thousand eight hundred and seventeen. 125 00:10:36,120 --> 00:10:40,020 This value can be six thousand eight hundred and twenty and so on. 126 00:10:40,320 --> 00:10:41,910 Then we reach this value. 127 00:10:41,940 --> 00:10:45,720 Six thousand one hundred and seventy eight at the end of this value. 128 00:10:47,940 --> 00:10:54,600 Basically we are trying to fit a linear line between these two values in that way. 129 00:10:54,780 --> 00:10:59,310 These newly created values will make a lot more sense to us. 130 00:11:02,430 --> 00:11:03,930 So let's see how to do that. 131 00:11:07,270 --> 00:11:11,490 To fill this one news, we will use Interpol alert function. 132 00:11:12,690 --> 00:11:14,460 We are creating a new data frame. 133 00:11:14,520 --> 00:11:16,390 That is Interpol alerted Miles. 134 00:11:16,630 --> 00:11:23,700 The Dufrene and we have used this data frame up simple miles the Dufferin and we are using Interpol 135 00:11:23,700 --> 00:11:31,380 alert function and we want to fill this well use using the linear line between the available values. 136 00:11:32,310 --> 00:11:34,050 So let's run this. 137 00:11:35,430 --> 00:11:39,280 Let's again take a head of all the 35 values. 138 00:11:39,900 --> 00:11:45,950 So now we have filled all this empty values with a linear line between the available values. 139 00:11:47,700 --> 00:11:52,650 So you can see the first value is same six thousand eight hundred and twenty seven. 140 00:11:53,310 --> 00:11:56,490 And the next value is six thousand eight hundred and six. 141 00:11:57,180 --> 00:12:03,450 So this are Degreaser 21 units in the next role as well. 142 00:12:03,510 --> 00:12:07,560 You can see there is a decrease of 21 units, 21 units and so on. 143 00:12:07,920 --> 00:12:09,360 Then we reach this value. 144 00:12:11,440 --> 00:12:13,660 Six thousand one hundred and seventy eight. 145 00:12:13,910 --> 00:12:18,630 Here you can see there is an almost surgically look one where Luzier. 146 00:12:21,500 --> 00:12:24,830 So we have separated the defense of these school values. 147 00:12:27,090 --> 00:12:31,830 Six thousand eight hundred twenty seven and six thousand one hundred and seventy eight and coopetition 148 00:12:32,100 --> 00:12:34,440 equally spaced values. 149 00:12:36,060 --> 00:12:41,520 So the difference between A and IT two values will be around twenty one units. 150 00:12:43,560 --> 00:12:45,150 Let's plot the graph husband. 151 00:12:50,270 --> 00:12:59,210 So this is the graph, you can see that they run linear lines in between any two points in our data. 152 00:13:00,440 --> 00:13:02,370 So just look at these two values. 153 00:13:02,840 --> 00:13:09,580 So this may be a value of December 1965. 154 00:13:09,920 --> 00:13:13,390 And this may be a value of January 1966. 155 00:13:13,490 --> 00:13:19,730 We have that straight line between these two values to create interpolated values at daily level. 156 00:13:23,350 --> 00:13:24,590 Now, you can also look at. 157 00:13:24,660 --> 00:13:28,270 There are sharp edges on this maps. 158 00:13:29,500 --> 00:13:32,290 This is due to the linear model that we are using. 159 00:13:33,160 --> 00:13:40,510 We are just passing two values and we are growing the straight line between between those school values. 160 00:13:41,200 --> 00:13:47,080 What we can do is we can also use polynomial function in sort of this linear function. 161 00:13:48,160 --> 00:13:55,820 So if somewhere there is a decrease and then an increase, our model will try to fit a curved line and 162 00:13:55,860 --> 00:14:01,480 sort of a straight line to smooth out the edges of this graph. 163 00:14:03,040 --> 00:14:05,440 So let's look at how to do that. 164 00:14:06,340 --> 00:14:14,050 So earlier we were using mentor to equate to linear, since we wanted to fit linear line between the 165 00:14:14,140 --> 00:14:15,280 existing points. 166 00:14:17,510 --> 00:14:24,520 So now to fit a polynomial or stein curve, we can use method, equal this plane. 167 00:14:24,710 --> 00:14:27,730 And then we can provide the order of that plane. 168 00:14:27,830 --> 00:14:32,600 So if if we want to offer TopCoder decline, we can use all that equal to do. 169 00:14:32,840 --> 00:14:38,000 If we want to tell Kubic line, we can use all that equal to three and so on. 170 00:14:39,080 --> 00:14:44,170 So we will be using the same function interpolate function just instead of mentally. 171 00:14:44,270 --> 00:14:45,270 What do you mean yet? 172 00:14:45,380 --> 00:14:50,230 We are using metallic plug this plane and then we are mentioning the order of that plane. 173 00:14:51,200 --> 00:14:52,880 Let's for this as well. 174 00:14:53,450 --> 00:14:58,390 We are saving over newly created be detrimental poly interpolated miles to frame. 175 00:15:00,020 --> 00:15:01,320 Let's look at. 176 00:15:02,020 --> 00:15:02,290 No. 177 00:15:06,550 --> 00:15:12,040 You can see, no, there are no edges on this graph. 178 00:15:12,730 --> 00:15:17,320 We have a splain which is smoothing out this edges. 179 00:15:20,320 --> 00:15:30,840 You can also compare the values of this newly created plane with Lina did nothing, so. 180 00:15:51,470 --> 00:15:58,670 So you can see the difference between any two values at the start is larger than the difference. 181 00:16:00,910 --> 00:16:12,100 The friends of Anita will lose a day or so of our interpolate model is trying to fit a curve and remove 182 00:16:12,280 --> 00:16:14,110 edges from our graph. 183 00:16:16,020 --> 00:16:24,310 Now, let's look at what all these different aggregate functions are available and our reship method. 184 00:16:25,170 --> 00:16:31,020 So you can see earlier we were using mean, but you can also use some. 185 00:16:31,230 --> 00:16:32,510 You can also use men. 186 00:16:32,610 --> 00:16:34,350 You can also use Macs. 187 00:16:35,070 --> 00:16:37,770 So all these options are available. 188 00:16:37,920 --> 00:16:42,840 You can look at all these values and try this parameters on your own. 189 00:16:45,090 --> 00:16:45,500 Thank you.