1 00:00:01,350 --> 00:00:08,610 In this lecture, we will learn how to create suspense, Martin, or my forecast model in Python. 2 00:00:10,860 --> 00:00:18,980 In my forecast, we can simply assume that the last period value is the forecast for this period. 3 00:00:22,350 --> 00:00:27,690 So let's create this modern on our daily minimum temperature data set. 4 00:00:28,620 --> 00:00:31,060 So first, let's import that does it. 5 00:00:31,380 --> 00:00:34,140 You're creating a data frame, B.F.. 6 00:00:35,040 --> 00:00:37,900 On let's look at the first five values of this stuff. 7 00:00:39,820 --> 00:00:41,770 You can see that we have two columns. 8 00:00:42,010 --> 00:00:42,940 First one is their data. 9 00:00:43,060 --> 00:00:44,590 Second one is the temperature. 10 00:00:46,690 --> 00:00:49,030 We have detailed data in the data column. 11 00:00:49,240 --> 00:00:52,360 And we have to see these values in the same data. 12 00:00:54,640 --> 00:01:02,020 Now, since we are creating my forecast model, that means that for this second and very well you our 13 00:01:02,020 --> 00:01:03,930 forecast was previous day value. 14 00:01:04,000 --> 00:01:06,410 So twenty point seven. 15 00:01:06,910 --> 00:01:09,580 But the actual value is seventeen point nine. 16 00:01:10,120 --> 00:01:15,340 Similarly, the forecast for Turn-off Jan is seventeen point nine. 17 00:01:15,460 --> 00:01:18,370 And the actual value is eighteen point eight. 18 00:01:19,930 --> 00:01:27,700 So to clear the forecast said value, we can simply create a lag value of this temperature data. 19 00:01:29,380 --> 00:01:34,150 So let's just feared that we will use Dort shift manteau to do that. 20 00:01:37,550 --> 00:01:42,470 And we are creating a new column that is B, we are naming that piece. 21 00:01:42,980 --> 00:01:48,710 And this temperature value will actis the actual value at D plus one. 22 00:01:50,480 --> 00:02:00,490 Let's spend this and let's again take a look at the herd values so you can see for a second no fap over 23 00:02:00,510 --> 00:02:07,100 the forecasted values, twenty point seven and actually lose seventeen point nine four fifths of gen. 24 00:02:07,700 --> 00:02:10,160 The forecasted value is fourteen point six. 25 00:02:10,400 --> 00:02:13,940 And the actual value is fifteen point eight. 26 00:02:14,630 --> 00:02:23,720 So you can say that this damn one is over via variable and the speed variable is what X variable and 27 00:02:24,500 --> 00:02:29,330 the predicted value or the forecasted value is same as the X value. 28 00:02:31,370 --> 00:02:34,100 So let's divide this data and do test centerin. 29 00:02:35,600 --> 00:02:38,000 We are creating two different data frames. 30 00:02:39,320 --> 00:02:40,290 First one is their train. 31 00:02:40,850 --> 00:02:51,800 And second, when they start test, we are taking last seven values as their test data and the remaining 32 00:02:51,800 --> 00:02:53,390 values as they were train data. 33 00:02:56,330 --> 00:03:03,470 So here we are selecting from index one to be a dot Chaib zero minus seven. 34 00:03:05,090 --> 00:03:09,070 So we are taking all the values out there. 35 00:03:09,330 --> 00:03:18,930 Then this first value and the last seven values stream data and the last seven values as a test data. 36 00:03:21,020 --> 00:03:25,710 We are ignoring the first value because we have an end in the first record. 37 00:03:26,690 --> 00:03:27,950 So let's run this. 38 00:03:28,820 --> 00:03:33,410 Let's look at the first five values of four dream dataset. 39 00:03:38,070 --> 00:03:44,230 You can see we have ignored the first straw and we have the remaining in tell you. 40 00:03:45,930 --> 00:03:50,670 Again, this is our way of a label and this is a works very well on that divide. 41 00:03:50,940 --> 00:03:53,460 This green test and to crane necks. 42 00:03:53,730 --> 00:03:56,160 Greenway and test X. 43 00:03:56,300 --> 00:03:56,940 Test Y. 44 00:03:59,850 --> 00:04:03,510 For trainings, we won this data. 45 00:04:04,320 --> 00:04:12,620 So this is Colombe and for why we won this column that this column time. 46 00:04:14,760 --> 00:04:18,420 Similarly, we will do the same thing for test data centers when. 47 00:04:22,590 --> 00:04:25,500 So now we have divided our data into four parts. 48 00:04:26,190 --> 00:04:28,340 Mix Greenway their specs. 49 00:04:28,560 --> 00:04:29,060 That's way. 50 00:04:32,190 --> 00:04:36,010 Now, usually we train a lot more than on this train data. 51 00:04:36,390 --> 00:04:40,920 And we use test data, evaluate performance of that model. 52 00:04:42,240 --> 00:04:49,530 Now, since we are building a night forecast model, there is no need to create another more than we 53 00:04:49,530 --> 00:04:53,370 can just use X values as our forecasted value. 54 00:04:55,620 --> 00:05:00,440 So let's create another data frame that is predictions. 55 00:05:01,920 --> 00:05:05,550 And this will contain the country as X dataset. 56 00:05:07,990 --> 00:05:13,640 So this fever news will become the production values for their test data. 57 00:05:13,750 --> 00:05:15,580 So let's end this. 58 00:05:16,780 --> 00:05:21,650 Now let's see the predicted values and the Y value or ordinarily. 59 00:05:26,260 --> 00:05:32,890 So you can see first we have the bad values and here we have the actual Waverly's. 60 00:05:33,100 --> 00:05:41,740 So you're also you can see that the actual value for index three six four three is the predicted value 61 00:05:42,070 --> 00:05:44,380 for the record, three, six foot four. 62 00:05:44,890 --> 00:05:49,780 So you can see that we have a naive forecast model for our data. 63 00:05:51,010 --> 00:05:53,410 Now, let's look at data in our predictions. 64 00:05:54,340 --> 00:06:03,340 We will use mean squared error, meaning squared error is just the sum of the squared of differences 65 00:06:03,340 --> 00:06:05,620 between predicted value and Y value. 66 00:06:08,260 --> 00:06:13,330 And we are going to use mean squared error from Escalon dot matrix. 67 00:06:15,280 --> 00:06:19,410 And we are saving this data to a very well named MASC. 68 00:06:20,680 --> 00:06:21,580 Let's run this. 69 00:06:23,080 --> 00:06:26,320 Also, you can notice that we have deployed two series of data. 70 00:06:26,620 --> 00:06:29,440 First is the actual values and then the credit card values. 71 00:06:30,460 --> 00:06:34,210 So the MASC fought over this data set is three point four two. 72 00:06:35,440 --> 00:06:39,400 And you can also plot their predictions and why on the graph. 73 00:06:40,450 --> 00:06:43,630 So this blue line is the actual values. 74 00:06:43,720 --> 00:06:46,150 And this red line is the predicted values. 75 00:06:46,300 --> 00:06:50,050 We are just using by a plot or plot to plot this data. 76 00:06:52,510 --> 00:06:59,010 Now, why this masc value for night forecast is important because we are going to evaluate what our 77 00:06:59,040 --> 00:07:02,820 advance models using this MASC value. 78 00:07:03,640 --> 00:07:11,530 So if what I advanced than is giving us a messy value of greater than this value, then we can say that 79 00:07:11,770 --> 00:07:15,610 our model is not able to extract any information from that data. 80 00:07:16,420 --> 00:07:20,250 And you can consider that Baym Sidis data as a random walk. 81 00:07:20,500 --> 00:07:26,080 Since we are not able to extract any information better than the night forecast model. 82 00:07:27,640 --> 00:07:34,210 So that is where night forecasts masc value is important because it will tell you whether your data 83 00:07:34,600 --> 00:07:36,230 is a random walk or not. 84 00:07:37,510 --> 00:07:45,100 If your advance models such as a model, a model or a rhema model is not able to improve on this MASC 85 00:07:45,100 --> 00:07:50,050 value, then you can say that your time in cities is a random walk. 86 00:07:51,430 --> 00:07:53,380 That's all for this lecture. 87 00:07:53,560 --> 00:07:53,980 Thank you.