1
00:00:01,010 --> 00:00:07,160
And effective initial step for characterizing the nature of what time it is and for detecting potential

2
00:00:07,160 --> 00:00:10,040
problems is to use data visualization.

3
00:00:11,940 --> 00:00:19,830
By visualizing the cities, we can detect initial patterns, identify its competence and support potential

4
00:00:19,830 --> 00:00:25,200
problems such as outliers, unequal spacing and missing values.

5
00:00:27,120 --> 00:00:33,420
The most basic and informative plot for visualizing our time cities is the time block.

6
00:00:34,510 --> 00:00:39,470
Our time plot is simply a line chart of the city's value over time.

7
00:00:40,160 --> 00:00:45,970
So the time values go on the horizontal axis and the value of the valuable.

8
00:00:46,130 --> 00:00:48,620
We are monitoring goes on the y axis.

9
00:00:50,180 --> 00:00:57,320
If you look at this example here, we had and tracked data, monthly data with ridership numbers.

10
00:00:58,300 --> 00:01:02,510
It was from January 1991 to March 2004.

11
00:01:03,310 --> 00:01:10,150
It was monthly data and in the form of our table, we convert this table into a line plot.

12
00:01:11,750 --> 00:01:15,890
And in this, you can see the pattern of ridership over time.

13
00:01:17,830 --> 00:01:23,150
A second step in visualizing our time cities is to examine it more carefully.

14
00:01:24,110 --> 00:01:28,940
The following operations are some of the examples of how you can dig deeper.

15
00:01:29,090 --> 00:01:30,350
Using visualization.

16
00:01:33,000 --> 00:01:40,440
Foster's zooming in, zooming in, is basically looking at a shorter period of pain within the series

17
00:01:40,800 --> 00:01:42,530
instead of the larger pillar of pain.

18
00:01:44,190 --> 00:01:50,220
The idea is that we will not be able to figure out weekly patterns when we are looking at a plot of

19
00:01:50,220 --> 00:01:51,090
several years.

20
00:01:52,740 --> 00:01:59,340
Zooming in to a shorter period can reveal patterns that are hidden when viewing the entire cities.

21
00:02:01,110 --> 00:02:06,420
This is especially important when the times it is this long in example.

22
00:02:06,420 --> 00:02:14,520
Here in the first graph, we plotted the entire data from January 1991 to October 2003.

23
00:02:15,990 --> 00:02:17,090
There's was a lot of data.

24
00:02:18,290 --> 00:02:24,920
And the seasonal effect is not coming out clearly in this data because the data points are getting squished

25
00:02:24,920 --> 00:02:25,400
together.

26
00:02:26,830 --> 00:02:29,200
But if you take out only two years from it.

27
00:02:29,560 --> 00:02:34,750
That is if you plot from January 96 to December 1997.

28
00:02:36,100 --> 00:02:41,170
Then you can clearly see that within each year there is a rise and a dip.

29
00:02:42,820 --> 00:02:49,200
This seasonal effect in ridership can be viewed only if you look at a shorter window of pain.

30
00:02:50,020 --> 00:02:54,040
So you can see that the ridership is less.

31
00:02:54,190 --> 00:02:55,280
And Dan Ferb.

32
00:02:55,780 --> 00:03:00,670
It increases and reaches a maximum point in the summer season.

33
00:03:01,210 --> 00:03:05,980
And then it again starts to dip and again reaches a low point in December then.

34
00:03:07,270 --> 00:03:08,590
It again starts to increase.

35
00:03:08,710 --> 00:03:10,540
And this is a seasonal type of pattern.

36
00:03:11,500 --> 00:03:17,830
We were able to view this because we zoomed in into a smaller part of data instead of looking at the

37
00:03:17,830 --> 00:03:18,580
larger picture.

38
00:03:19,840 --> 00:03:24,960
So zooming in helps us view the patterns that are hidden in smaller part of the same cities.

39
00:03:27,060 --> 00:03:29,550
Next operation is adding trendlines.

40
00:03:31,110 --> 00:03:38,560
So we are usually interested in finding the overall trend in the variable, and one way of better capturing

41
00:03:38,680 --> 00:03:45,460
the shape of the trend is to add a trend line, trend lines, approximate trend in the data.

42
00:03:46,700 --> 00:03:53,510
There are several types of clean lines, such as linear trend line quadratic and exponential drilling's.

43
00:03:55,350 --> 00:04:01,590
We will learn how to draw a trend line on our data in the coming videos, next operation that we can

44
00:04:01,590 --> 00:04:03,540
do is suppressing seasonality.

45
00:04:05,780 --> 00:04:09,290
Here also, we are trying to identify the overall trend in our data.

46
00:04:10,530 --> 00:04:18,480
And it is often easier to see the trend when seasonalities suppressed by suppressed, we mean that we

47
00:04:18,480 --> 00:04:21,930
are trying to remove the effects of seasonal variations in the variable.

48
00:04:23,110 --> 00:04:26,110
Suppressing seasonal patterns can be done in three ways.

49
00:04:27,640 --> 00:04:31,530
One is by planting the seeds at a cruder timescale.

50
00:04:32,830 --> 00:04:39,280
For example, if there is seasonal variation, such as ice cream sales increasing in summer and decreasing

51
00:04:39,280 --> 00:04:39,760
in winter.

52
00:04:41,080 --> 00:04:43,210
If we plourde this Eardley.

53
00:04:44,310 --> 00:04:49,710
This seasonal effect will be removed and we will be able to see the overall trend in ice creams it.

54
00:04:51,710 --> 00:04:56,370
A second option is to plot separate time plot for each season.

55
00:04:57,690 --> 00:05:03,670
So instead of applauding for all those reasons, we applaud DeSales for some words of different years.

56
00:05:06,060 --> 00:05:10,590
A third and a more popular option is to use moving average blotz.

57
00:05:11,750 --> 00:05:17,460
We will discuss moving average plots and becoming sections two, if you look at the example here.

58
00:05:19,140 --> 00:05:26,400
This is the same ridership data from January 1991 to September 2003 when we zoomed in.

59
00:05:26,850 --> 00:05:34,020
We were able to identify seasonal effect, and that is the ridership was increasing in the month of

60
00:05:34,020 --> 00:05:37,140
summers and decreasing in the month of winters.

61
00:05:38,970 --> 00:05:45,990
Now, to identify a trend, we are aggregating the total ridership in an ear.

62
00:05:46,970 --> 00:05:47,970
And plotting it here.

63
00:05:49,430 --> 00:05:51,230
This is the first point that I told you.

64
00:05:51,500 --> 00:05:55,070
That is by plotting these series at a cruder timescale.

65
00:05:56,310 --> 00:06:00,190
We can remove this seasonal effect and look at the trend.

66
00:06:01,790 --> 00:06:08,670
Two, if you look at the total ridership in that particular year and you applaud that.

67
00:06:08,850 --> 00:06:16,110
This is the line plot we get and this plot is telling us the overall trend in ridership over the years.

68
00:06:16,980 --> 00:06:26,370
And from this trend, you can observe that since 1996, dictatorship is nearly constantly increasing.

69
00:06:26,910 --> 00:06:33,090
There is a somewhat linear increase in ridership since 1996.

70
00:06:34,380 --> 00:06:39,870
So this kind of observations can be made when you remove seasonality and only look at different.

71
00:06:42,190 --> 00:06:44,480
Next operation is plotting a lag.

72
00:06:44,820 --> 00:06:45,610
Scatterplot.

73
00:06:47,650 --> 00:06:52,180
Scatterplot are basically used to explore the relationship between two variables.

74
00:06:53,790 --> 00:07:00,450
And it is often the case in times it is forecasting that the next value of the DBL is somewhat dependent

75
00:07:00,570 --> 00:07:02,280
on the previous value of the variable.

76
00:07:04,310 --> 00:07:08,190
Previous observations in a time, these are called lag's.

77
00:07:09,650 --> 00:07:15,050
So the previous observation that is previous timestep is called lag one.

78
00:07:16,230 --> 00:07:21,420
Observation at two times turps ago is called lag two and so on.

79
00:07:23,520 --> 00:07:31,230
So you often plot a scatterplot to explore the relationship between each observation and a lag of that

80
00:07:31,230 --> 00:07:31,830
observation.

81
00:07:32,920 --> 00:07:40,900
For example, if you're trying to plot a scatterplot for that based in Breacher vs. the previous day's

82
00:07:40,900 --> 00:07:43,900
templated, it looks something like this.

83
00:07:45,700 --> 00:07:52,630
You can see here, although it is not a perfect straight line, but this type of graph is suggesting

84
00:07:52,630 --> 00:07:59,510
that there is a positive relation between previous day's temperature and the next day's temperature.

85
00:07:59,980 --> 00:08:03,460
That is, if the previous day value is large.

86
00:08:04,060 --> 00:08:06,430
Next day, value is likely to be large.

87
00:08:06,970 --> 00:08:15,130
For example, if previous day value is more than 15, it is very likely that the next day value will

88
00:08:15,130 --> 00:08:19,560
be more than 15 because most of the observations lay there.

89
00:08:21,610 --> 00:08:25,760
So this was a plot of land, one that is on y axis.

90
00:08:26,110 --> 00:08:29,860
We have wildy plus one and on x axis we have VIP.

91
00:08:31,930 --> 00:08:35,260
We can similarly applaud other leg values also.

92
00:08:35,410 --> 00:08:40,100
So if we want to block Blagg five on y axis, we can have white people as one.

93
00:08:40,330 --> 00:08:43,270
And on X-axis, we can take white T minus for.

94
00:08:44,730 --> 00:08:46,380
But practically.

95
00:08:47,800 --> 00:08:52,960
The impact of order values on the recent values is less at.

96
00:08:54,120 --> 00:08:57,870
So the relationship that we will see will be more diffused.

97
00:08:58,380 --> 00:09:03,360
So you'll see somewhat like this where the values are more randomly distributed.

98
00:09:03,870 --> 00:09:11,280
And there is no linear relationship clearly visible in the distribution to lag.

99
00:09:11,280 --> 00:09:20,330
Scatterplot helps us get a hain't, whether there is a relationship between current values and the values.

100
00:09:21,300 --> 00:09:28,230
And if you see such a relationship, it makes sense to use lag features in your analysis.