1 00:00:01,110 --> 00:00:04,410 The first step in the process is Gorie definition. 2 00:00:05,730 --> 00:00:13,980 This basically means that we should be clear about our objective data mining and clearly defining the 3 00:00:13,980 --> 00:00:18,420 forecasting goal is essential for arriving at useful results. 4 00:00:20,430 --> 00:00:25,950 Here is a list of some of the questions that you must answer before starting the analysis. 5 00:00:27,600 --> 00:00:31,320 You must first determine the purpose of generating forecasts. 6 00:00:32,340 --> 00:00:34,110 The type of forecasts that are needed. 7 00:00:35,040 --> 00:00:37,540 How the forecasts will be used by the organization. 8 00:00:39,360 --> 00:00:43,110 What are the costs associated with forecast errors? 9 00:00:44,160 --> 00:00:50,110 What data will be available in the future, etc. in the next few slides? 10 00:00:50,880 --> 00:00:55,440 I'm covering some of the important considerations during the goal setting step. 11 00:00:57,270 --> 00:01:05,100 Pay attention because these issues will affect every step in the forecasting process from data collection 12 00:01:05,520 --> 00:01:09,180 through data processing, modeling and performance evaluation. 13 00:01:12,330 --> 00:01:17,880 First comes whether our goal is descriptive analysis or predictive analysis. 14 00:01:19,980 --> 00:01:27,750 Time series data modeling, as done for either descriptive or predictive purposes in descriptive modeling 15 00:01:27,960 --> 00:01:35,070 or time studies, is modelled to determine its competence in terms of seasonal patterns or trends or 16 00:01:35,070 --> 00:01:37,830 relation to external factors and the like. 17 00:01:39,660 --> 00:01:43,380 These can then be used for decision making and policy formulation. 18 00:01:44,700 --> 00:01:51,780 For example, if you want to find out the effect of rainy season on travel bookings, that is descriptive 19 00:01:51,780 --> 00:01:52,380 analysis. 20 00:01:54,680 --> 00:02:01,760 In contrast, predictive analysis uses the information in a time cities to forecast future values of 21 00:02:01,790 --> 00:02:02,310 parties. 22 00:02:04,190 --> 00:02:09,050 For example, if you want to find out the number of travel bookings that will be done in the coming 23 00:02:09,050 --> 00:02:12,710 three months, that is predictive analysis. 24 00:02:14,480 --> 00:02:20,750 The difference between descriptive and predictive goals leads to the differences in the types of methods 25 00:02:20,750 --> 00:02:23,040 used and in the modeling process itself. 26 00:02:24,740 --> 00:02:32,690 For example, in selecting a method for describing a time cities or even for explaining its patterns, 27 00:02:33,640 --> 00:02:37,600 priority is given to methods that produce explainable results. 28 00:02:38,510 --> 00:02:47,300 And two models which are based on causal arguments, on the other hand, are predictive model is judged 29 00:02:47,300 --> 00:02:53,190 by its predictive accuracy rather than its ability to provide correct explanation. 30 00:02:55,810 --> 00:03:04,380 In short, for descriptive analysis, those models are preferred, which are more explainable, and 31 00:03:04,390 --> 00:03:11,500 for predictive analysis, we prefer those models which give higher accuracy, even if they are complex 32 00:03:11,680 --> 00:03:13,390 and not easily explainable. 33 00:03:14,320 --> 00:03:15,790 Let's consider an example. 34 00:03:15,940 --> 00:03:24,670 To understand this more clearly, Amtrak is a U.S. railway company which routinely collects data on 35 00:03:24,760 --> 00:03:25,360 ridership. 36 00:03:26,890 --> 00:03:34,930 In this example, we will look at these series of monthly Amtrak ridership between January 1991 and 37 00:03:34,930 --> 00:03:37,900 March 2004 in the United States. 38 00:03:39,460 --> 00:03:46,960 This dataset is part of the course resources, not what could be the different analysis goals for a 39 00:03:46,960 --> 00:03:47,770 railway company. 40 00:03:48,280 --> 00:03:48,820 Let's see. 41 00:03:51,630 --> 00:03:59,880 One possible analysis goal that Amtrak might have is to forecast future monthly ridership on its trains 42 00:04:00,510 --> 00:04:01,950 for the purpose of pricing. 43 00:04:03,900 --> 00:04:06,240 Clearly, this is a predictable. 44 00:04:08,180 --> 00:04:13,550 Setting the prices as per demand is a popular practice by airlines and hotel chains. 45 00:04:14,450 --> 00:04:21,470 You may have noticed hotel and flight prices are higher in a peak travel season as compared to other 46 00:04:21,470 --> 00:04:21,860 season. 47 00:04:23,930 --> 00:04:26,390 This strategy of pricing is called revenue management. 48 00:04:28,560 --> 00:04:37,460 Anyway, predicting future ridership is a predictable or different goal for which Amtrak might want 49 00:04:37,460 --> 00:04:45,530 to use these ridership data is for the impact assessment or for evaluating the effect of some, even, 50 00:04:47,120 --> 00:04:54,410 for example, airport closure due to bad weather or opening of a new large national highway. 51 00:04:56,330 --> 00:04:58,530 This goal is retrospective in nature. 52 00:04:59,510 --> 00:05:07,310 That is airport was closed probably last month and we are trying to establish the impact of that event 53 00:05:07,820 --> 00:05:08,860 on rail ridership. 54 00:05:08,900 --> 00:05:16,480 Now, this goal is therefore descriptive as it will describe the impact of that event. 55 00:05:18,560 --> 00:05:25,130 This analysis would compare the series before and after the event, and no direct interest is shown 56 00:05:25,250 --> 00:05:26,960 in the future values of the series. 57 00:05:28,730 --> 00:05:34,700 With these two examples, I hope that you can clearly differentiate between descriptive and predictive, 58 00:05:34,700 --> 00:05:43,700 good predictive analysis is also called Time series forecasting, whereas descriptive analysis is called 59 00:05:43,970 --> 00:05:45,230 times in his analysis. 60 00:05:46,920 --> 00:05:53,100 The focus in this course is on time to these forecasting, where the goal is to predict future values 61 00:05:53,190 --> 00:05:54,090 of the times it is. 62 00:05:55,720 --> 00:06:02,190 However, some of the methods that we will take, you can also be used for descriptive purposes.