1 00:00:00,240 --> 00:00:05,210 Now we've got our dates column past and it's in a good format that we can work with her panties date 2 00:00:05,210 --> 00:00:06,070 time format. 3 00:00:06,120 --> 00:00:10,240 Let's keep modifying our data frame so that we can work within and eventually build a model. 4 00:00:10,270 --> 00:00:14,370 You can see here where we're reviewing the first five rows and we've got 53 columns along here. 5 00:00:14,730 --> 00:00:19,890 But what you'll notice in the middle when there's too many columns to display in Juba a notebook it 6 00:00:19,890 --> 00:00:21,000 truncated. 7 00:00:21,060 --> 00:00:22,230 So we've got dot dot dot dot. 8 00:00:22,590 --> 00:00:29,400 But if you wanted to see every column a trick you can do is DFA dot head and then dot t for transpose. 9 00:00:29,400 --> 00:00:33,090 So now we'll see the first five rows across the top. 10 00:00:33,150 --> 00:00:34,350 These are columns now. 11 00:00:34,410 --> 00:00:43,680 And if we scroll down these are all of our columns so you can see all 53 darlings it is a fair few missing 12 00:00:43,680 --> 00:00:46,260 values up towards the back end here. 13 00:00:46,260 --> 00:00:51,420 Now this might be different depending on what rows we're looking at but that's just a a little way if 14 00:00:51,630 --> 00:00:56,970 your data frame is getting truncated you can try to transpose it and you might be out of view them all 15 00:00:57,120 --> 00:00:58,020 in one hit. 16 00:00:58,020 --> 00:01:02,730 This could take us a fair while if we're going to go through each and every single column trying to 17 00:01:02,730 --> 00:01:03,950 figure out what's going on. 18 00:01:04,560 --> 00:01:13,200 So maybe to begin with is just to Kansas s Date column to remind ourselves of what it looks like now. 19 00:01:13,440 --> 00:01:19,320 Let's check out the first 20 or so wonderful so we got different dates we can see we have good examples 20 00:01:19,320 --> 00:01:26,250 from different years different models different days what would be better would be if it's all in order 21 00:01:26,280 --> 00:01:27,350 that would make sense right. 22 00:01:27,360 --> 00:01:33,060 If our data frame was from the top from row zero all the way down to the bottom it was in order. 23 00:01:33,090 --> 00:01:39,080 So this is the first date and then as the examples went on as rows went on the dates went up. 24 00:01:39,090 --> 00:01:45,700 So let's do that what we might do is go sort data frame by sale date. 25 00:01:45,810 --> 00:01:48,290 This is good practice for a time series problem. 26 00:01:48,450 --> 00:01:48,690 Right. 27 00:01:48,690 --> 00:01:49,790 It's good idea. 28 00:01:49,890 --> 00:01:55,200 Usually with the time series problem you're trying to use examples from the past or past events to make 29 00:01:55,200 --> 00:01:57,240 some inference or predictions on future events. 30 00:01:57,240 --> 00:02:02,340 So it's always a good idea to sort your values by the date that they occur. 31 00:02:02,340 --> 00:02:13,810 So what we'll do is when working with time series data it's a good idea to sort it all sorted by date. 32 00:02:13,810 --> 00:02:21,720 We're just getting light by day so we can do that with a little Panda's function so what we might do 33 00:02:21,720 --> 00:02:32,970 is go sort data frame in date order DLF sort values go by. 34 00:02:32,970 --> 00:02:34,510 This is where we pass it. 35 00:02:35,420 --> 00:02:36,430 A column name. 36 00:02:37,110 --> 00:02:41,640 And then we want to do it in place for him in place is just going to make our change if we didn't have 37 00:02:41,640 --> 00:02:43,070 in place equals true. 38 00:02:43,140 --> 00:02:48,440 We'd have to go DFA eagles that would just do it in place because we do want our dates to be in order. 39 00:02:48,510 --> 00:02:49,770 We'll do a sending. 40 00:02:49,890 --> 00:02:55,320 So we want it to go from smallest date to highest date equals true. 41 00:02:55,410 --> 00:03:03,110 And now because we've passed our dates and passed I mean past is in PRSA day. 42 00:03:03,330 --> 00:03:09,100 We can run these sort of functions saw values because Panda's going to understand sale date as a daytime 43 00:03:09,170 --> 00:03:09,910 object. 44 00:03:09,960 --> 00:03:11,790 So it's got some special properties about that. 45 00:03:12,330 --> 00:03:13,380 So we go here. 46 00:03:13,380 --> 00:03:16,370 Sale date dot head. 47 00:03:16,440 --> 00:03:21,810 So what this is going to do is going to sort our data frame by sale date starting with the earliest 48 00:03:21,810 --> 00:03:22,620 date first. 49 00:03:22,650 --> 00:03:29,610 Hopefully if this works and then we'll view the same 20 rows always call the same line of code to view 50 00:03:29,610 --> 00:03:33,500 the first 20 examples and see what's changed. 51 00:03:33,610 --> 00:03:34,960 Wonderful. 52 00:03:34,990 --> 00:03:37,700 So these numbers on the left here are the index. 53 00:03:37,960 --> 00:03:45,120 So the index of the rows that we've just sorted ZDF don't head so the original index of that row is 54 00:03:45,120 --> 00:03:52,360 two thousand two hundred five thousand six hundred fifteen but the sale date was 1989 or the 17th of 55 00:03:52,360 --> 00:03:54,270 the first 1989. 56 00:03:54,280 --> 00:03:58,910 So now you see how instead of having our dates scrambled we've got them in order. 57 00:03:58,960 --> 00:04:02,450 That's exactly what we wanted and not what we might do. 58 00:04:02,450 --> 00:04:13,690 It's also good practice to make make copy of the original data frame and why is this so we make a copy 59 00:04:14,890 --> 00:04:16,180 of the original data frame. 60 00:04:16,210 --> 00:04:17,120 We'll see it in a second. 61 00:04:18,940 --> 00:04:26,120 So when we manipulate the copy we've still got our original data. 62 00:04:26,120 --> 00:04:32,290 So we do something wrong on our copy we're we're exploring our data or when we're manipulating it we 63 00:04:32,290 --> 00:04:35,320 can still always revert back to the original data frame. 64 00:04:35,320 --> 00:04:40,240 So what we might do is make a copy and you can call this whatever you want I usually call it something 65 00:04:40,240 --> 00:04:43,480 like IDF temp as in data frame temporary. 66 00:04:43,870 --> 00:04:47,720 You call IDF experiment or something like that just something simple that you can remember. 67 00:04:47,860 --> 00:04:53,080 We're going to use IDF dot copy so that it's basically it's going to say hey take the data frame we've 68 00:04:53,080 --> 00:04:55,960 been working with including the changes that it's in order. 69 00:04:55,960 --> 00:05:00,270 Sale date and just make a copy of it and save it to the variable DFT. 70 00:05:00,580 --> 00:05:02,200 So let's do that. 71 00:05:02,200 --> 00:05:02,710 Wonderful. 72 00:05:02,800 --> 00:05:11,880 So if we check just to make sure the F temp dot sale date I know we want just the head area exact same 73 00:05:12,360 --> 00:05:15,240 wonderful man we can also. 74 00:05:15,750 --> 00:05:23,320 It's just the exact same copy of the data frame but now if we make changes and that's what we're going 75 00:05:23,320 --> 00:05:26,520 to do we make changes to DFT temp. 76 00:05:26,590 --> 00:05:32,440 Those changes won't be reflected in DLF if we ever need to we can revert back to DLF as our original 77 00:05:32,560 --> 00:05:38,220 and we can manipulate IDF temp as much as we want because we know we've still got the original. 78 00:05:38,440 --> 00:05:39,760 So beautiful. 79 00:05:39,760 --> 00:05:46,750 Now we've got DFA temp what we might do is although we've we've ordered our data frame by date we might 80 00:05:46,840 --> 00:05:52,830 use these dates to enrich the data in our data frame. 81 00:05:52,930 --> 00:05:56,380 And if you're wondering what that means don't worry we'll cover it in the next video. 82 00:05:56,440 --> 00:05:59,880 But for now just have a play around with sorting a data frame. 83 00:05:59,890 --> 00:06:04,870 You could even sort it by another value just don't save it in place just see what it does. 84 00:06:04,870 --> 00:06:10,750 You can change the ASPI into any column name that's a number and you can change this to false if you 85 00:06:10,750 --> 00:06:12,260 wanted to and see if that works out. 86 00:06:12,280 --> 00:06:15,770 But otherwise if you're following along let's make some changes to DFT temp.