1 00:00:00,420 --> 00:00:07,050 All before going deep dive into this session, let's have a quick recap of what we have done in this 2 00:00:07,050 --> 00:00:07,820 project. 3 00:00:08,550 --> 00:00:14,340 So doing lots of preprocessing, doing lots of analysis, and we have some various insights from the 4 00:00:14,430 --> 00:00:14,910 data. 5 00:00:15,360 --> 00:00:21,300 And in the previous session, we had basically automate our analysis as well as we have performed this 6 00:00:21,470 --> 00:00:28,140 spatial analysis as well to get a clear cut off at what places we have a maximum. 7 00:00:28,140 --> 00:00:28,540 Right. 8 00:00:28,590 --> 00:00:31,080 That's what we have done in April session. 9 00:00:31,590 --> 00:00:33,200 So in this session, what we have to do. 10 00:00:33,210 --> 00:00:34,920 So let's go ahead with the assignment. 11 00:00:35,290 --> 00:00:39,930 The very first problem statement is we have to do a lot of data preparation. 12 00:00:40,230 --> 00:00:47,900 So very close do have to perform this analysis, this data preparation on this overall data. 13 00:00:47,910 --> 00:00:51,710 Jan, June 15, it means you have to read this data. 14 00:00:52,080 --> 00:00:53,560 So very first, what I have to do. 15 00:00:53,610 --> 00:00:57,780 Let me copy this entire report from here and here. 16 00:00:57,780 --> 00:01:02,210 I'm just going to say I have to read this data ready for the very first. 17 00:01:02,240 --> 00:01:07,830 I'm just going to say, please do not read on this CSV and here you have to just paste it. 18 00:01:08,070 --> 00:01:14,670 And on this, if you are going to pass tab over here, you will get all these stuffs available at this 19 00:01:14,670 --> 00:01:15,600 particular spot. 20 00:01:15,840 --> 00:01:21,630 So you have to basically read this one slide and this will exactly return some data. 21 00:01:22,410 --> 00:01:25,530 So let's say I'm just going to store that data frame. 22 00:01:25,530 --> 00:01:29,190 Let's over underscore 15, whatever. 23 00:01:29,490 --> 00:01:30,600 It's all up to you. 24 00:01:31,050 --> 00:01:32,400 So just executed. 25 00:01:32,400 --> 00:01:37,800 It will definitely take some couple of seconds depending upon what specifications you have. 26 00:01:37,830 --> 00:01:45,780 So all of this stuff gets executed and a fine to call just had so that I will get a preview how exactly 27 00:01:45,780 --> 00:01:50,340 my data looks like you will see your despatching based number, your pick up date. 28 00:01:50,550 --> 00:01:55,020 So this pickup date will definitely play a very major role in your analysis. 29 00:01:55,240 --> 00:01:57,910 The third one, what exactly you're affiliated with? 30 00:01:58,120 --> 00:01:59,330 No, the fourth one. 31 00:01:59,340 --> 00:02:01,180 What is exactly exact location. 32 00:02:02,070 --> 00:02:09,270 So let's say if I'm going to check what exactly is a data type of each and every column name. 33 00:02:09,450 --> 00:02:15,210 So you will see this pickup date by default by now, assign it as the string. 34 00:02:15,210 --> 00:02:19,890 It has an object type, but we know its support timestamp format. 35 00:02:19,890 --> 00:02:22,290 It means we have to convert this. 36 00:02:22,290 --> 00:02:24,650 That's what we have done in our first session. 37 00:02:24,930 --> 00:02:28,190 So let me let me show that what we have done in the first session. 38 00:02:28,200 --> 00:02:30,330 Yeah, this is exactly that code. 39 00:02:30,370 --> 00:02:34,470 This is exactly that code that we have written in our very full session. 40 00:02:34,860 --> 00:02:36,000 So let me just copy. 41 00:02:36,210 --> 00:02:41,850 Let me just paste or here you have to do some modifications for what exactly? 42 00:02:41,850 --> 00:02:44,520 My frame name, which is exactly what it is. 43 00:02:45,510 --> 00:02:52,240 And here I have to say, what exactly is my feature name to my feature name, anything but pick up underscore 44 00:02:52,290 --> 00:02:54,420 date, which is exactly this one. 45 00:02:54,930 --> 00:03:00,020 So after doing all these things, what I have to do, I have to update this column as well. 46 00:03:00,030 --> 00:03:02,950 So here I'm going to say just based all this stuff. 47 00:03:03,300 --> 00:03:08,760 So if I'm going to execute it, it will take a while to showcase all these things. 48 00:03:08,910 --> 00:03:17,280 And if, again, I'm going to call this div, so here you get a atter as this does not support this. 49 00:03:17,290 --> 00:03:17,610 This. 50 00:03:17,610 --> 00:03:23,670 Yeah, because you will see or hear in this you have a different format because here you have a very 51 00:03:23,670 --> 00:03:24,090 first. 52 00:03:24,180 --> 00:03:25,670 What exactly is your ear. 53 00:03:25,950 --> 00:03:29,120 The second one is your month and the third one is your date. 54 00:03:29,340 --> 00:03:31,760 So let me do some modifications over here. 55 00:03:31,800 --> 00:03:32,050 Yeah. 56 00:03:32,280 --> 00:03:34,600 So the very first what exactly might you. 57 00:03:34,620 --> 00:03:36,120 So this is exactly my ear. 58 00:03:36,360 --> 00:03:38,760 After it, I have to assign this hyphen. 59 00:03:39,030 --> 00:03:42,840 The second one is exactly what is my money. 60 00:03:42,840 --> 00:03:47,970 So I have to say this is exactly my month and I have to remove this backslash. 61 00:03:48,210 --> 00:03:51,480 Similarly over here, I have to remove this backslash. 62 00:03:51,480 --> 00:03:55,010 And here I have to say what exactly is my date? 63 00:03:55,230 --> 00:04:00,420 So this is exactly my date and this is exactly my other minute and second. 64 00:04:00,600 --> 00:04:02,960 And again, I'm going to execute it. 65 00:04:03,060 --> 00:04:06,210 So all of this stuff gets successfully executed over here. 66 00:04:06,210 --> 00:04:07,100 And now you will see. 67 00:04:07,110 --> 00:04:09,450 Now, this supports some data for me. 68 00:04:09,690 --> 00:04:16,350 Now from this pick up date, you have to basically access what is your ear, what is your day, what 69 00:04:16,350 --> 00:04:20,670 is a weak day, what is it may not month and hour all these types of time. 70 00:04:20,820 --> 00:04:27,420 So let me access all these things, because these derived features, these Derat attributes will play 71 00:04:27,420 --> 00:04:29,730 a very major role in your analysis. 72 00:04:29,880 --> 00:04:33,090 That's what we have performed in all our seven sessions. 73 00:04:33,510 --> 00:04:38,420 So let me access what exactly the week day, what is a day and all these types of things. 74 00:04:38,700 --> 00:04:41,550 So very first you have to just azziz this. 75 00:04:41,790 --> 00:04:43,290 So I'm just going to copy. 76 00:04:43,290 --> 00:04:44,970 You can type manually as well. 77 00:04:44,970 --> 00:04:47,060 So here you have to say very close. 78 00:04:47,070 --> 00:04:50,940 Let's say I just need what exactly is weekdays here? 79 00:04:50,940 --> 00:04:52,560 I'm going to say Danisco name. 80 00:04:52,890 --> 00:04:57,000 So it will exactly determine my weekday, so I have to store it in a column. 81 00:04:57,000 --> 00:04:59,830 So here I'm going to say very first I have to access my data for. 82 00:05:00,040 --> 00:05:01,840 And then I have to define a column. 83 00:05:02,530 --> 00:05:07,920 Its name is nothing, but lets its name is regarded once after doing all this stuff. 84 00:05:08,170 --> 00:05:12,670 What I have to do, I have to now need a date for this. 85 00:05:12,670 --> 00:05:15,090 I'm just going to call this door David here. 86 00:05:15,370 --> 00:05:17,850 So it will exactly return me my day. 87 00:05:17,860 --> 00:05:19,630 So I have to store it somewhere else. 88 00:05:19,930 --> 00:05:27,690 So here I'm just going to say over underscore 15 and this time I have to store it in, let's say, our 89 00:05:28,040 --> 00:05:28,770 day column. 90 00:05:28,780 --> 00:05:33,610 So here we have to define using this once after doing all these things. 91 00:05:33,880 --> 00:05:38,770 What I have to do now, I just need a minute, month, an hour. 92 00:05:39,040 --> 00:05:47,140 So let me say I have to access my Myners for this dark minute will give you exactly what exactly Illuminati's. 93 00:05:47,500 --> 00:05:54,670 For days now I have to define a column which is exactly this minute, which is exactly this one. 94 00:05:55,060 --> 00:06:00,520 Once after doing all this stuff, what I have to do, I have to access my money. 95 00:06:00,520 --> 00:06:04,810 So Dorte Month will exactly give you a month. 96 00:06:05,020 --> 00:06:07,300 So now you have to store it somewhere else. 97 00:06:07,340 --> 00:06:10,030 Let's say I have to define a column for this. 98 00:06:10,080 --> 00:06:16,030 I'm just going to say it is nothing but column name is that same month after doing all this stuff. 99 00:06:16,030 --> 00:06:25,060 Let's say I'm just going to call this day dot out here and this time I'm just going to say over 15 and 100 00:06:25,060 --> 00:06:29,360 this time I have to define some of it over here because I have to store my data. 101 00:06:29,710 --> 00:06:34,930 So now just execute all this stuff, all the stuff gets successfully executed. 102 00:06:34,930 --> 00:06:41,230 And if I'm going to call ahead how exactly my data frame looks now. 103 00:06:41,450 --> 00:06:48,820 So you figure out you have five new features that will definitely play a very major role in your analysis, 104 00:06:48,820 --> 00:06:50,500 which is exactly my big day. 105 00:06:50,710 --> 00:06:51,610 What is my day? 106 00:06:51,610 --> 00:06:56,240 What is my Mainard, what is my Mond and what are all these features? 107 00:06:56,770 --> 00:07:05,530 So let's go ahead with our next statement in which I have to analyze what exactly is at or because in 108 00:07:05,530 --> 00:07:09,910 monthly device in New York City so far, this is what I have to do. 109 00:07:10,090 --> 00:07:13,330 Very first, I have to excuse my Munt column. 110 00:07:13,600 --> 00:07:22,380 So I'm just going to say over underscore fifteen of month and on this, if I'm going to call this value 111 00:07:22,390 --> 00:07:24,190 in the school counsellor there. 112 00:07:24,430 --> 00:07:28,560 So you will definitely get a count with respect to each and every month. 113 00:07:29,170 --> 00:07:35,550 Now you get easily visualize with respect to each and every month how much number of rush you have. 114 00:07:35,890 --> 00:07:37,330 So you know what you have to do. 115 00:07:37,330 --> 00:07:40,390 You have to simply visualize this data. 116 00:07:40,390 --> 00:07:44,470 So far this either you can use a bar plot or a pie chart. 117 00:07:44,560 --> 00:07:53,410 It's all up to you next time where to use bottled water so I can use bar chart from tautly matplotlib 118 00:07:53,410 --> 00:07:55,450 seabourne, whatever you want. 119 00:07:55,450 --> 00:08:01,810 But I decided to go ahead with Bloche because whenever you are going to work from some real world aspect 120 00:08:02,230 --> 00:08:06,090 at the time, this plays a very important role over there. 121 00:08:06,310 --> 00:08:14,410 So I'm going to say B X dot bar and if you will pass plus tab, you will get entire documentation of 122 00:08:14,410 --> 00:08:15,100 this function. 123 00:08:15,670 --> 00:08:22,920 So I'm just going to say on this X axis, what I need, I just need X index. 124 00:08:22,920 --> 00:08:25,090 I'm just going to say I just need index. 125 00:08:25,330 --> 00:08:36,370 And on Y axis, I have to just copy this content and on why I have to just pasted that set and now I 126 00:08:36,370 --> 00:08:38,530 have to just execute this set. 127 00:08:38,860 --> 00:08:43,000 It will definitely return my beautiful bar chart over here. 128 00:08:43,000 --> 00:08:50,230 And you can easily visualize in this six month we have a higher number of pickups. 129 00:08:50,230 --> 00:08:58,330 And if we have to if we have to conclude, if we have to inference from this, then we can say an area 130 00:08:58,630 --> 00:09:07,060 number of over pickups has been Stoodley increases throughout the first half of the two thousand fifteen 131 00:09:07,060 --> 00:09:07,710 in New York. 132 00:09:08,110 --> 00:09:10,360 So that's the type of analysis. 133 00:09:10,360 --> 00:09:17,040 That's the type of interest, your conclusion, how you can get from your data, how you can set your 134 00:09:17,080 --> 00:09:17,840 visuals. 135 00:09:17,860 --> 00:09:19,780 So that's all about decision. 136 00:09:19,780 --> 00:09:21,460 Hope you'll have the session very much. 137 00:09:21,730 --> 00:09:24,340 Penghu, how nice to keep learning. 138 00:09:24,340 --> 00:09:25,210 Keep going. 139 00:09:25,570 --> 00:09:26,470 Keep practicing.