1 00:00:01,010 --> 00:00:07,160 And effective initial step for characterizing the nature of what time it is and for detecting potential 2 00:00:07,160 --> 00:00:10,040 problems is to use data visualization. 3 00:00:11,940 --> 00:00:19,830 By visualizing the cities, we can detect initial patterns, identify its competence and support potential 4 00:00:19,830 --> 00:00:25,200 problems such as outliers, unequal spacing and missing values. 5 00:00:27,120 --> 00:00:33,420 The most basic and informative plot for visualizing our time cities is the time block. 6 00:00:34,510 --> 00:00:39,470 Our time plot is simply a line chart of the city's value over time. 7 00:00:40,160 --> 00:00:45,970 So the time values go on the horizontal axis and the value of the valuable. 8 00:00:46,130 --> 00:00:48,620 We are monitoring goes on the y axis. 9 00:00:50,180 --> 00:00:57,320 If you look at this example here, we had and tracked data, monthly data with ridership numbers. 10 00:00:58,300 --> 00:01:02,510 It was from January 1991 to March 2004. 11 00:01:03,310 --> 00:01:10,150 It was monthly data and in the form of our table, we convert this table into a line plot. 12 00:01:11,750 --> 00:01:15,890 And in this, you can see the pattern of ridership over time. 13 00:01:17,830 --> 00:01:23,150 A second step in visualizing our time cities is to examine it more carefully. 14 00:01:24,110 --> 00:01:28,940 The following operations are some of the examples of how you can dig deeper. 15 00:01:29,090 --> 00:01:30,350 Using visualization. 16 00:01:33,000 --> 00:01:40,440 Foster's zooming in, zooming in, is basically looking at a shorter period of pain within the series 17 00:01:40,800 --> 00:01:42,530 instead of the larger pillar of pain. 18 00:01:44,190 --> 00:01:50,220 The idea is that we will not be able to figure out weekly patterns when we are looking at a plot of 19 00:01:50,220 --> 00:01:51,090 several years. 20 00:01:52,740 --> 00:01:59,340 Zooming in to a shorter period can reveal patterns that are hidden when viewing the entire cities. 21 00:02:01,110 --> 00:02:06,420 This is especially important when the times it is this long in example. 22 00:02:06,420 --> 00:02:14,520 Here in the first graph, we plotted the entire data from January 1991 to October 2003. 23 00:02:15,990 --> 00:02:17,090 There's was a lot of data. 24 00:02:18,290 --> 00:02:24,920 And the seasonal effect is not coming out clearly in this data because the data points are getting squished 25 00:02:24,920 --> 00:02:25,400 together. 26 00:02:26,830 --> 00:02:29,200 But if you take out only two years from it. 27 00:02:29,560 --> 00:02:34,750 That is if you plot from January 96 to December 1997. 28 00:02:36,100 --> 00:02:41,170 Then you can clearly see that within each year there is a rise and a dip. 29 00:02:42,820 --> 00:02:49,200 This seasonal effect in ridership can be viewed only if you look at a shorter window of pain. 30 00:02:50,020 --> 00:02:54,040 So you can see that the ridership is less. 31 00:02:54,190 --> 00:02:55,280 And Dan Ferb. 32 00:02:55,780 --> 00:03:00,670 It increases and reaches a maximum point in the summer season. 33 00:03:01,210 --> 00:03:05,980 And then it again starts to dip and again reaches a low point in December then. 34 00:03:07,270 --> 00:03:08,590 It again starts to increase. 35 00:03:08,710 --> 00:03:10,540 And this is a seasonal type of pattern. 36 00:03:11,500 --> 00:03:17,830 We were able to view this because we zoomed in into a smaller part of data instead of looking at the 37 00:03:17,830 --> 00:03:18,580 larger picture. 38 00:03:19,840 --> 00:03:24,960 So zooming in helps us view the patterns that are hidden in smaller part of the same cities. 39 00:03:27,060 --> 00:03:29,550 Next operation is adding trendlines. 40 00:03:31,110 --> 00:03:38,560 So we are usually interested in finding the overall trend in the variable, and one way of better capturing 41 00:03:38,680 --> 00:03:45,460 the shape of the trend is to add a trend line, trend lines, approximate trend in the data. 42 00:03:46,700 --> 00:03:53,510 There are several types of clean lines, such as linear trend line quadratic and exponential drilling's. 43 00:03:55,350 --> 00:04:01,590 We will learn how to draw a trend line on our data in the coming videos, next operation that we can 44 00:04:01,590 --> 00:04:03,540 do is suppressing seasonality. 45 00:04:05,780 --> 00:04:09,290 Here also, we are trying to identify the overall trend in our data. 46 00:04:10,530 --> 00:04:18,480 And it is often easier to see the trend when seasonalities suppressed by suppressed, we mean that we 47 00:04:18,480 --> 00:04:21,930 are trying to remove the effects of seasonal variations in the variable. 48 00:04:23,110 --> 00:04:26,110 Suppressing seasonal patterns can be done in three ways. 49 00:04:27,640 --> 00:04:31,530 One is by planting the seeds at a cruder timescale. 50 00:04:32,830 --> 00:04:39,280 For example, if there is seasonal variation, such as ice cream sales increasing in summer and decreasing 51 00:04:39,280 --> 00:04:39,760 in winter. 52 00:04:41,080 --> 00:04:43,210 If we plourde this Eardley. 53 00:04:44,310 --> 00:04:49,710 This seasonal effect will be removed and we will be able to see the overall trend in ice creams it. 54 00:04:51,710 --> 00:04:56,370 A second option is to plot separate time plot for each season. 55 00:04:57,690 --> 00:05:03,670 So instead of applauding for all those reasons, we applaud DeSales for some words of different years. 56 00:05:06,060 --> 00:05:10,590 A third and a more popular option is to use moving average blotz. 57 00:05:11,750 --> 00:05:17,460 We will discuss moving average plots and becoming sections two, if you look at the example here. 58 00:05:19,140 --> 00:05:26,400 This is the same ridership data from January 1991 to September 2003 when we zoomed in. 59 00:05:26,850 --> 00:05:34,020 We were able to identify seasonal effect, and that is the ridership was increasing in the month of 60 00:05:34,020 --> 00:05:37,140 summers and decreasing in the month of winters. 61 00:05:38,970 --> 00:05:45,990 Now, to identify a trend, we are aggregating the total ridership in an ear. 62 00:05:46,970 --> 00:05:47,970 And plotting it here. 63 00:05:49,430 --> 00:05:51,230 This is the first point that I told you. 64 00:05:51,500 --> 00:05:55,070 That is by plotting these series at a cruder timescale. 65 00:05:56,310 --> 00:06:00,190 We can remove this seasonal effect and look at the trend. 66 00:06:01,790 --> 00:06:08,670 Two, if you look at the total ridership in that particular year and you applaud that. 67 00:06:08,850 --> 00:06:16,110 This is the line plot we get and this plot is telling us the overall trend in ridership over the years. 68 00:06:16,980 --> 00:06:26,370 And from this trend, you can observe that since 1996, dictatorship is nearly constantly increasing. 69 00:06:26,910 --> 00:06:33,090 There is a somewhat linear increase in ridership since 1996. 70 00:06:34,380 --> 00:06:39,870 So this kind of observations can be made when you remove seasonality and only look at different. 71 00:06:42,190 --> 00:06:44,480 Next operation is plotting a lag. 72 00:06:44,820 --> 00:06:45,610 Scatterplot. 73 00:06:47,650 --> 00:06:52,180 Scatterplot are basically used to explore the relationship between two variables. 74 00:06:53,790 --> 00:07:00,450 And it is often the case in times it is forecasting that the next value of the DBL is somewhat dependent 75 00:07:00,570 --> 00:07:02,280 on the previous value of the variable. 76 00:07:04,310 --> 00:07:08,190 Previous observations in a time, these are called lag's. 77 00:07:09,650 --> 00:07:15,050 So the previous observation that is previous timestep is called lag one. 78 00:07:16,230 --> 00:07:21,420 Observation at two times turps ago is called lag two and so on. 79 00:07:23,520 --> 00:07:31,230 So you often plot a scatterplot to explore the relationship between each observation and a lag of that 80 00:07:31,230 --> 00:07:31,830 observation. 81 00:07:32,920 --> 00:07:40,900 For example, if you're trying to plot a scatterplot for that based in Breacher vs. the previous day's 82 00:07:40,900 --> 00:07:43,900 templated, it looks something like this. 83 00:07:45,700 --> 00:07:52,630 You can see here, although it is not a perfect straight line, but this type of graph is suggesting 84 00:07:52,630 --> 00:07:59,510 that there is a positive relation between previous day's temperature and the next day's temperature. 85 00:07:59,980 --> 00:08:03,460 That is, if the previous day value is large. 86 00:08:04,060 --> 00:08:06,430 Next day, value is likely to be large. 87 00:08:06,970 --> 00:08:15,130 For example, if previous day value is more than 15, it is very likely that the next day value will 88 00:08:15,130 --> 00:08:19,560 be more than 15 because most of the observations lay there. 89 00:08:21,610 --> 00:08:25,760 So this was a plot of land, one that is on y axis. 90 00:08:26,110 --> 00:08:29,860 We have wildy plus one and on x axis we have VIP. 91 00:08:31,930 --> 00:08:35,260 We can similarly applaud other leg values also. 92 00:08:35,410 --> 00:08:40,100 So if we want to block Blagg five on y axis, we can have white people as one. 93 00:08:40,330 --> 00:08:43,270 And on X-axis, we can take white T minus for. 94 00:08:44,730 --> 00:08:46,380 But practically. 95 00:08:47,800 --> 00:08:52,960 The impact of order values on the recent values is less at. 96 00:08:54,120 --> 00:08:57,870 So the relationship that we will see will be more diffused. 97 00:08:58,380 --> 00:09:03,360 So you'll see somewhat like this where the values are more randomly distributed. 98 00:09:03,870 --> 00:09:11,280 And there is no linear relationship clearly visible in the distribution to lag. 99 00:09:11,280 --> 00:09:20,330 Scatterplot helps us get a hain't, whether there is a relationship between current values and the values. 100 00:09:21,300 --> 00:09:28,230 And if you see such a relationship, it makes sense to use lag features in your analysis.