1 00:00:02,020 --> 00:00:05,700 Now we will discuss a concept called seasonality in data. 2 00:00:07,680 --> 00:00:14,820 When we have time data and our data shows recurring patterns such as high is coming every three months. 3 00:00:15,870 --> 00:00:23,570 Other low coming in a particular season, such type of data is called seasonal data can have seasonality 4 00:00:23,600 --> 00:00:24,980 because of the weather. 5 00:00:25,700 --> 00:00:26,930 For example, every year. 6 00:00:27,020 --> 00:00:30,350 Ice cream sales are higher in summer and lower in winter. 7 00:00:32,130 --> 00:00:35,100 Or seasonality for tourism industry. 8 00:00:36,470 --> 00:00:38,870 Depends on holidays and vacations. 9 00:00:40,720 --> 00:00:46,810 Now, the variance in sales due to this season of the year may not be explained well by other factors 10 00:00:46,810 --> 00:00:49,370 of model and hence our model will not fit well. 11 00:00:51,440 --> 00:00:55,460 Therefore, it is advisable that we remove seasonality from the data. 12 00:00:56,830 --> 00:01:03,760 To do that, we usually find a correction factor, which we multiply with the data to get a normalized 13 00:01:03,760 --> 00:01:04,240 value. 14 00:01:06,170 --> 00:01:08,970 To calculate this factor, we can use this formula. 15 00:01:09,710 --> 00:01:13,730 M is equal to mean of a year divided by means of month. 16 00:01:15,270 --> 00:01:18,780 This will give us multiplication factor for each observation. 17 00:01:20,270 --> 00:01:22,700 Let me show you an example of how this is done. 18 00:01:26,370 --> 00:01:34,010 Here I have data off sales in the last three years and every year in the first six months, sales show 19 00:01:34,010 --> 00:01:36,080 a rise and then a dip. 20 00:01:37,340 --> 00:01:41,240 You can look at this graph from month one two, month three. 21 00:01:41,300 --> 00:01:42,140 There's a race. 22 00:01:43,260 --> 00:01:44,460 Month three and four. 23 00:01:44,650 --> 00:01:45,410 There's a flood. 24 00:01:46,000 --> 00:01:47,930 And after month food, there is a dip. 25 00:01:51,440 --> 00:01:57,080 After understanding the business context, if we are sure that this is due to the seasonal nature of 26 00:01:57,080 --> 00:02:01,730 sales, we find the factors to normalize these sales figures. 27 00:02:05,270 --> 00:02:07,370 If we use the formula shown earlier. 28 00:02:08,700 --> 00:02:11,940 I will get multiplication factors of each value. 29 00:02:15,050 --> 00:02:16,970 So this valley of 21. 30 00:02:18,270 --> 00:02:20,010 We'll get multiplication factor. 31 00:02:21,230 --> 00:02:24,920 By dividing mean of this year, by means of this month. 32 00:02:27,810 --> 00:02:32,410 But because the mean of these three years is almost similar. 33 00:02:34,580 --> 00:02:42,230 I have used population mean and divided by month, meaning I would suggest that you try doing it for 34 00:02:42,320 --> 00:02:43,010 each cell. 35 00:02:45,080 --> 00:02:52,310 So using population means divided by month, I mean, I have found these factors here, which I have 36 00:02:52,310 --> 00:02:54,030 multiplied with these sales figures. 37 00:02:56,350 --> 00:02:59,410 And below in this table, I have these multiplied values. 38 00:03:00,940 --> 00:03:06,750 So you can see if you multiply 21 with one point five two, you get thirty one point nine four. 39 00:03:07,920 --> 00:03:12,330 If you mind about twenty three with one point five two, you'll get thirty four point nine nine. 40 00:03:13,020 --> 00:03:16,590 So this is all this table is created using the multiplication factors. 41 00:03:17,670 --> 00:03:23,920 You can see that these old values are now in the same range and data is not seasonal. 42 00:03:26,020 --> 00:03:31,390 Not this data can be used for analysis and can be fed to our model. 43 00:03:32,590 --> 00:03:34,280 So this is all to move seasonality.