1 00:00:00,480 --> 00:00:06,120 In this lecture, we will learn how to remove trend and seasonality from our data. 2 00:00:07,590 --> 00:00:13,560 We have already discussed the fencing technique to remove Grendon seasonality. 3 00:00:14,340 --> 00:00:17,700 Now we will implement the fencing in Python. 4 00:00:18,910 --> 00:00:27,500 I am going to use the same dataset that we are using, which is US airline miles data and saving this 5 00:00:28,150 --> 00:00:32,230 dataset into a data frame named Miles into score B.F.. 6 00:00:33,440 --> 00:00:34,460 Let's run this. 7 00:00:35,210 --> 00:00:38,810 Let's take a look at first flight values of our dataset. 8 00:00:40,790 --> 00:00:43,710 We update value in the column name month. 9 00:00:44,030 --> 00:00:49,280 And we have the value of four time cities in the column name miles and millions. 10 00:00:51,970 --> 00:00:59,400 Now, as we have discussed, in order to re-elect her, to remove train the applier defensing fourth 11 00:00:59,400 --> 00:01:00,700 period equal to one. 12 00:01:01,420 --> 00:01:08,050 So we will subtract value of all time cities from its lagged value of previous spread. 13 00:01:09,100 --> 00:01:10,540 So let's also create. 14 00:01:12,040 --> 00:01:14,860 And another them for leg one. 15 00:01:16,780 --> 00:01:22,320 We will be using shift method with argument as one to create like one column. 16 00:01:23,650 --> 00:01:26,200 Now again, let's look at the head. 17 00:01:28,180 --> 00:01:33,560 You can see we have Miles data and then we have a leg, one data of mice. 18 00:01:35,470 --> 00:01:41,530 So in the second period of leg, one for them, we have a value of mice of the first row. 19 00:01:42,610 --> 00:01:48,240 So this leg, one called them, consists of leg one values of mice and men. 20 00:01:48,340 --> 00:01:48,680 Call them. 21 00:01:52,500 --> 00:01:58,800 Now, to remove Crane, we need to subtract this leg one value from this minus value. 22 00:01:59,910 --> 00:02:04,090 The resultant series should not contain any grain. 23 00:02:04,680 --> 00:02:07,980 After subtracting out defensing this legman value. 24 00:02:11,540 --> 00:02:18,850 Now we can either subtract, these two lose, or there is a separate method or function available and 25 00:02:18,860 --> 00:02:22,130 find us that can do that automatically. 26 00:02:22,850 --> 00:02:30,980 So the function name is def essense for the forensic and in the function you have to mention the time 27 00:02:30,980 --> 00:02:34,280 period for which you want to play defensing. 28 00:02:34,550 --> 00:02:40,880 So if you want to defensing with leg one variable, you can write time period equal to one a few on 29 00:02:41,480 --> 00:02:44,160 defensing with time period or fell. 30 00:02:44,450 --> 00:02:47,360 That is defensing to remove seasonality. 31 00:02:47,750 --> 00:02:50,990 You can write period equerry to do an. 32 00:02:53,260 --> 00:02:58,750 So here I am really going to use this method instead of subtracting these two aloose. 33 00:02:59,530 --> 00:03:02,470 So I'm creating another column. 34 00:03:02,650 --> 00:03:03,820 That is my assignment. 35 00:03:03,930 --> 00:03:06,040 Underscore def, underscore one. 36 00:03:07,630 --> 00:03:13,230 And first, we have to mention the column name, which is which contains the value of all time cities, 37 00:03:13,300 --> 00:03:14,690 which is Mice and Men. 38 00:03:15,220 --> 00:03:19,840 And then we have to call the method def with period equal to one. 39 00:03:20,080 --> 00:03:21,340 So let's run this. 40 00:03:22,360 --> 00:03:24,730 Let's look at the word Dufrene once again. 41 00:03:25,740 --> 00:03:29,630 So now you can see that VM another column minus. 42 00:03:29,660 --> 00:03:35,290 I mean the one which is the difference of these two values Miles. 43 00:03:35,500 --> 00:03:47,380 And it's lagman value since the legman value for the first row was and then the diffs value is also. 44 00:03:47,380 --> 00:03:57,130 And then for the second row you can see that 649 is the difference of these two aloose for the third 45 00:03:57,130 --> 00:04:00,640 row, nine zero six is the difference of these two values, Miles. 46 00:04:00,640 --> 00:04:03,460 And it's legman, one value and so on. 47 00:04:04,750 --> 00:04:11,380 So this newly created defensed series should not contain any trend. 48 00:04:13,020 --> 00:04:20,360 Let's prove that by decomposing our mild cities and decomposing these different cities. 49 00:04:22,460 --> 00:04:28,220 So first I am creating decompose plot or forward or regional my cities. 50 00:04:29,660 --> 00:04:35,090 And then we will plot the decomposed plot for our different cities. 51 00:04:37,000 --> 00:04:39,360 We know how to plot decompose plot. 52 00:04:40,950 --> 00:04:43,110 First, we have to change the indexes. 53 00:04:43,410 --> 00:04:46,640 So right now, the indexes are in numeric format. 54 00:04:47,550 --> 00:04:51,150 We have to change this indexes to date. 55 00:04:51,210 --> 00:04:51,900 Time for me. 56 00:04:52,500 --> 00:05:00,960 That's why I'm changing my index to month values in this line of command in the second line. 57 00:05:01,470 --> 00:05:07,580 I'm floating decompose plots using seasonal underscore decompose function from stack smarter. 58 00:05:08,110 --> 00:05:11,550 I have already imported this in our previous lecture. 59 00:05:12,480 --> 00:05:17,600 So if we are using this for the first time, you have to import it from is start Smardon. 60 00:05:19,630 --> 00:05:26,370 I am passing Miles M-m data as values and we are going to blow 10 a.. 61 00:05:26,370 --> 00:05:34,030 More than I'm saving this result and result underscored it and then I'm floating desert. 62 00:05:34,320 --> 00:05:38,430 Underscore to stand this. 63 00:05:39,390 --> 00:05:40,180 So this is over. 64 00:05:40,190 --> 00:05:41,180 Decomposed rough. 65 00:05:41,850 --> 00:05:47,060 You can see this was a word or regional series V have consent. 66 00:05:47,190 --> 00:05:49,990 Increasing trend, somewhat of a linear time. 67 00:05:50,510 --> 00:05:53,810 And we have seasonality and these are the solutions. 68 00:05:54,630 --> 00:06:00,170 So now let's plot the graph off over different cities. 69 00:06:03,160 --> 00:06:11,410 Now, one thing to notice before plotting the decomposed graph for different cities is that our first 70 00:06:11,650 --> 00:06:16,390 value of four different cities does not contain any value. 71 00:06:16,610 --> 00:06:17,450 It's in there. 72 00:06:17,500 --> 00:06:23,830 And so we have to avoid this particular record before alerting the cities. 73 00:06:25,240 --> 00:06:26,650 So how to do that? 74 00:06:28,660 --> 00:06:30,280 We will be using a lock. 75 00:06:31,360 --> 00:06:36,640 We are using a lot because we want to apply condition on both rows and columns. 76 00:06:37,180 --> 00:06:42,250 Earlier, we were only selecting the columns and we were selecting all the rows. 77 00:06:42,730 --> 00:06:49,380 Now we want to select all the rows except this first row and we want to select this for them. 78 00:06:50,950 --> 00:06:55,060 So we will be using I to segment our dataset. 79 00:06:56,690 --> 00:06:59,170 We'll be writing, Miles, underscored D.F.. 80 00:06:59,290 --> 00:07:00,280 Then I look. 81 00:07:01,180 --> 00:07:06,130 Then we will start what rose from the second row. 82 00:07:06,280 --> 00:07:11,680 That's why we are writing one and then colon indexing. 83 00:07:11,690 --> 00:07:12,850 It starts from zero. 84 00:07:13,210 --> 00:07:20,470 So we're ignoring the rule that is present at the index zero, which is. 85 00:07:22,650 --> 00:07:28,540 This role, we are selecting all the rules from indexing equate to one. 86 00:07:29,350 --> 00:07:32,320 So you can see that index here is one. 87 00:07:32,380 --> 00:07:35,920 So we are selecting all the rules with indexing. 88 00:07:35,920 --> 00:07:37,600 Equate the one below the end. 89 00:07:39,280 --> 00:07:44,900 And that's why we are using one column. 90 00:07:45,700 --> 00:07:47,020 One is the starting point. 91 00:07:47,230 --> 00:07:54,490 And since there is no end point after column, it will take all those iteration after the first observation. 92 00:07:55,300 --> 00:08:03,550 So the first argument we are providing Rose, and in the second argument we are providing three because. 93 00:08:06,070 --> 00:08:08,260 Because we want the fourth column. 94 00:08:08,980 --> 00:08:10,600 So this is the zero column. 95 00:08:10,630 --> 00:08:11,740 This is the first column. 96 00:08:11,770 --> 00:08:13,030 This is the second column. 97 00:08:13,210 --> 00:08:14,800 This is the third column. 98 00:08:15,730 --> 00:08:21,400 That's VI v writing comma three. 99 00:08:24,040 --> 00:08:31,440 Now, this is how we are selecting over the tough frame that we need to decompose. 100 00:08:32,680 --> 00:08:36,880 And again, we are using an alliterative model. 101 00:08:38,800 --> 00:08:41,050 Let's run this. 102 00:08:44,170 --> 00:08:47,300 So this is their decomposed model for a word. 103 00:08:47,680 --> 00:08:48,730 Different cities. 104 00:08:48,910 --> 00:08:53,680 You can see earlier we had the linear trend. 105 00:08:54,190 --> 00:08:56,010 Now there is no trend. 106 00:08:57,040 --> 00:09:02,980 We are getting random values centred around zero in our trend line. 107 00:09:04,790 --> 00:09:13,450 But you can see that we have successfully removed Craine from our data using like one defensing. 108 00:09:17,660 --> 00:09:23,660 Now, one thing you can also notice is that seasonality is still present. 109 00:09:24,200 --> 00:09:26,630 There is a seasonal pattern around here as well. 110 00:09:26,750 --> 00:09:32,270 And in the difference, Florida as well, you can see there is some kind of seasonal pattern that is 111 00:09:32,390 --> 00:09:34,310 occurring in each cycle. 112 00:09:34,970 --> 00:09:39,770 So we have to remove seasonality also to remove seasonality. 113 00:09:40,790 --> 00:09:44,570 We do defensing with leg off. 114 00:09:44,690 --> 00:09:51,660 Well, it's so first legs, blood, the lifeblood for our date as well. 115 00:09:51,950 --> 00:09:56,270 You can see there is a visible seasonality present in our data. 116 00:09:56,840 --> 00:10:04,850 Even in our word difference, you can notice the seasonality for the first three periods. 117 00:10:04,870 --> 00:10:10,190 You can clearly see there is a pattern that is repeating after every time interval. 118 00:10:12,200 --> 00:10:21,520 Now, to remove seasonality, let's apply defensing for period equate to people will be using same bandos 119 00:10:21,530 --> 00:10:22,100 function. 120 00:10:23,120 --> 00:10:25,330 That is the method. 121 00:10:26,660 --> 00:10:32,810 And this time we are applying defensing on our newly created column. 122 00:10:32,930 --> 00:10:34,940 That is my step on that score one. 123 00:10:35,870 --> 00:10:37,100 Let's apply this. 124 00:10:39,000 --> 00:10:39,570 And. 125 00:10:42,660 --> 00:10:43,900 Lord Elgin, 126 00:10:48,460 --> 00:10:50,000 so does the plot. 127 00:10:50,310 --> 00:10:54,820 After the debate, the Flensing First Free defends using Lego fun. 128 00:10:55,300 --> 00:10:58,110 Now we're differenced using leg off twelve. 129 00:10:58,760 --> 00:11:02,190 So at the starting work time series was like this. 130 00:11:02,410 --> 00:11:05,580 After one defensing we removed the train. 131 00:11:05,750 --> 00:11:09,310 And after w defensing we removed the seasonality as well. 132 00:11:10,210 --> 00:11:20,780 Now let's look at the decompose graph of our newly created series and see whether we have remove tendencies 133 00:11:20,800 --> 00:11:21,700 naledi or not. 134 00:11:22,750 --> 00:11:27,150 So here I am calling this object as Cesari underscore. 135 00:11:27,320 --> 00:11:29,470 See this? 136 00:11:30,100 --> 00:11:40,630 So similarly, we will be using I love only and this time we will have around say first for L values 137 00:11:40,750 --> 00:11:42,810 as and then N in our dataset. 138 00:11:46,770 --> 00:11:54,270 So we would be starting of a tough dream from indexical to quell, that is a tough thing for Lou. 139 00:11:54,870 --> 00:12:00,930 And since this is the fifth column, that is it does present at indexical before. 140 00:12:01,260 --> 00:12:02,370 So this is zero. 141 00:12:02,430 --> 00:12:04,680 One, two, three and four. 142 00:12:05,370 --> 00:12:07,540 That's why I've been writing my quote before. 143 00:12:08,820 --> 00:12:11,220 So let's load this with. 144 00:12:14,910 --> 00:12:25,390 Be getting that dysfunction does not and does missing values because we are somehow including an end 145 00:12:25,540 --> 00:12:27,050 and value in our data. 146 00:12:27,800 --> 00:12:35,450 So he does the thing we applied defensing on this underscore one which already had one. 147 00:12:35,540 --> 00:12:36,280 And then value. 148 00:12:37,010 --> 00:12:42,260 So now I guess we have 13 rolls in with the last column is. 149 00:12:42,320 --> 00:12:47,560 And then so I guess we have to change this 20 to 30. 150 00:12:49,940 --> 00:12:51,080 Let's run this again. 151 00:12:51,900 --> 00:12:52,840 This should work. 152 00:12:56,120 --> 00:13:03,000 So you can see there is no crime right now and there is no seasonality as well. 153 00:13:06,190 --> 00:13:10,360 So earlier we were getting a graph like this. 154 00:13:10,750 --> 00:13:15,310 Remember, one year is around five units on our x axis. 155 00:13:16,030 --> 00:13:19,720 And if you see, we still have the pattern. 156 00:13:19,750 --> 00:13:23,680 But in the five month period, there is no seasonal pattern. 157 00:13:23,710 --> 00:13:25,900 There is no permanent ups and downs. 158 00:13:27,370 --> 00:13:29,230 This may be the one, but this may be December. 159 00:13:29,320 --> 00:13:30,400 This may be January. 160 00:13:30,790 --> 00:13:38,370 But there is no clear pattern saying that of winter or summer. 161 00:13:38,560 --> 00:13:41,590 It's performing good or bad for our problem. 162 00:13:44,920 --> 00:13:48,880 You can also look at the values of over Y-axis here. 163 00:13:49,900 --> 00:13:56,320 So for seasonality, we have a value from minus 250 to 250. 164 00:13:57,370 --> 00:14:05,770 Whereas when we had seasonality in our data, the values were between minus two thousand five one day, 165 00:14:05,950 --> 00:14:12,340 two plus two thousand five hundred year, minus two thousand two plus two. 166 00:14:13,040 --> 00:14:14,990 So you can see the scale as well. 167 00:14:19,580 --> 00:14:23,030 Definitely, we have removed some seasonality from our data. 168 00:14:24,630 --> 00:14:33,440 The scales has also changed from 2000 to 250, which signifies that there is a significant reduction 169 00:14:33,440 --> 00:14:36,290 in the amount of seasonality of type of fencing. 170 00:14:38,090 --> 00:14:42,790 This is how we remove seasonality and trend from our data to remove grey. 171 00:14:43,460 --> 00:14:46,970 We apply defensing with period equal to one. 172 00:14:47,810 --> 00:14:53,960 And we use the function of pandas and to remove seasonality. 173 00:14:54,960 --> 00:15:02,790 We applied defensing of Berard equal to dwell on this on this detour and the data. 174 00:15:03,840 --> 00:15:05,460 So that's all for this video. 175 00:15:06,300 --> 00:15:06,690 Thank you.