1 00:00:00,090 --> 00:00:05,580 Hello and welcome to the very first session of this flight fair prediction project, in which we have 2 00:00:05,580 --> 00:00:11,600 to predict what exactly can be the fear of the triggers, depending upon all the other features in our 3 00:00:11,610 --> 00:00:12,300 data as well. 4 00:00:12,480 --> 00:00:16,470 So we will gradually start from data importing, data cleaning. 5 00:00:16,470 --> 00:00:19,670 Then we have to perform data processing on our data. 6 00:00:19,680 --> 00:00:25,530 After doing all the stuff, we have to perform lots of analysis of our data to understand our data as 7 00:00:25,530 --> 00:00:26,550 much as we can. 8 00:00:26,760 --> 00:00:32,330 So I'm just going to open my chapter notebook where I'm going to code in Python programming language. 9 00:00:32,460 --> 00:00:38,910 So this is exactly that ID where I'm going to code in Python program so that if I'm going to import 10 00:00:39,270 --> 00:00:45,330 some basic libraries, all the necessary libraries and all the necessary modules, that will be highly 11 00:00:45,330 --> 00:00:51,660 helpful for us to do all the manipulation task, poor data, all the lots of analysis, to do all the 12 00:00:51,660 --> 00:00:54,750 modeling tasks, whatever we have to do so very fast. 13 00:00:54,750 --> 00:00:59,790 I'm just going to import my very first module, which is exactly fundus module, and I'm going to create 14 00:00:59,790 --> 00:01:01,020 a speedy. 15 00:01:01,290 --> 00:01:06,330 So PARNAS is extensively used in case of data manipulation data analysis. 16 00:01:06,480 --> 00:01:11,390 Let's say you have to extract data from some GSV into your data frame. 17 00:01:11,520 --> 00:01:14,610 So in such case, by that will be your pioneer. 18 00:01:14,880 --> 00:01:22,020 So now for your Nomaka competition, you guys can import your nonfamily and I will create this and people 19 00:01:22,230 --> 00:01:26,730 and for Visitacion purpose, you guys can import the very first module. 20 00:01:26,730 --> 00:01:35,280 I'm going to import as as as s so I'm going to test imported and let's say for atomisation task us and 21 00:01:35,280 --> 00:01:40,470 also import this matplotlib, the Jazmyne and the Resolution Library. 22 00:01:40,470 --> 00:01:47,260 So I'm just going to execute it using Autobus enter and all the stuff gets executed over here. 23 00:01:47,490 --> 00:01:54,030 So now we have to read our data, which is exactly available over here, which is in my data on the 24 00:01:54,030 --> 00:01:54,600 screen. 25 00:01:54,630 --> 00:02:00,240 So we have to read this data before performing any sort of analysis on your data. 26 00:02:00,450 --> 00:02:04,040 So I'm just going to read this data and we will observe over here. 27 00:02:04,140 --> 00:02:09,000 This is exactly my data is in the Excel format. 28 00:02:09,030 --> 00:02:10,360 You would observe over here. 29 00:02:10,530 --> 00:02:18,060 So for this, we have to call a function, which is exactly my read Underscore Excel, which is exactly 30 00:02:18,060 --> 00:02:18,530 this one. 31 00:02:18,540 --> 00:02:23,190 And here very first, we have to mention where exactly my data is available. 32 00:02:23,310 --> 00:02:27,630 So I'm just going to copy this entire path from here and I'm just going to paste over here. 33 00:02:27,780 --> 00:02:33,750 And after it, if you will press tab over here, you will get all your stuff available at this particular 34 00:02:33,750 --> 00:02:34,110 spot. 35 00:02:34,320 --> 00:02:36,320 So we have to basically read this file. 36 00:02:36,600 --> 00:02:42,880 Let's say I'm going to store its data frame object in, let's say, a create on this whole data. 37 00:02:43,080 --> 00:02:51,270 So basically, I'm going to execute it and lets it to get a preview of data frame you guys can call 38 00:02:51,270 --> 00:02:53,620 basically had on your dataset. 39 00:02:54,120 --> 00:02:58,350 So I'm just going to call had and this is exactly preview of your data. 40 00:02:58,350 --> 00:03:04,600 How exactly you do that looks like you will see over here you have your line date of January, source 41 00:03:04,600 --> 00:03:06,720 and Destination and all the other features. 42 00:03:06,870 --> 00:03:13,020 And basically you have to predict what exactly can be the price depending upon all other independent 43 00:03:13,020 --> 00:03:13,530 features. 44 00:03:14,250 --> 00:03:20,250 So let's say if I'm going to open my assignment for the session, for the very first task is we have 45 00:03:20,250 --> 00:03:24,300 to deal with all the missing values that are available in data. 46 00:03:24,690 --> 00:03:29,330 So let's say I'm going to call what are the missing values available in data? 47 00:03:29,430 --> 00:03:31,980 How many missing values are available and data. 48 00:03:32,070 --> 00:03:34,520 So far, this you guys can call is any. 49 00:03:34,560 --> 00:03:38,190 And if you will call someone there, you get all the information. 50 00:03:38,190 --> 00:03:40,970 Author Missing values available in your data. 51 00:03:41,220 --> 00:03:45,260 You will see you have just two missing values, one over there, one over here. 52 00:03:45,540 --> 00:03:50,940 So basically, you guys can drop these missing values because here you have three less missing values. 53 00:03:51,120 --> 00:03:54,890 So let me check what exactly the shape of this data felt. 54 00:03:55,110 --> 00:04:01,350 So if I will call Chaib over there, you will see it has that much number of rows and that number of 55 00:04:01,350 --> 00:04:03,120 columns in my data thing. 56 00:04:03,300 --> 00:04:07,330 So I'm just going to drop whatever missing value I have in my data. 57 00:04:07,620 --> 00:04:09,570 So far, this is what we can do. 58 00:04:09,600 --> 00:04:10,710 We can simply drop. 59 00:04:10,710 --> 00:04:13,590 And for this, we had to call a function drop any. 60 00:04:13,770 --> 00:04:18,490 And here I have to pass my parameter to update my data from elsewhere. 61 00:04:18,810 --> 00:04:21,090 So just executed after it. 62 00:04:21,240 --> 00:04:28,440 What we have to do, let's say we do cross-check whether my missing value estimate or not to just copy 63 00:04:28,440 --> 00:04:30,900 from here and just to paste over here. 64 00:04:31,080 --> 00:04:32,490 And now we will observe. 65 00:04:32,500 --> 00:04:34,650 We don't have any missing value in my data. 66 00:04:34,650 --> 00:04:41,430 It means up to some extent my data is going to clean the let's move ahead to the next problem statement 67 00:04:41,430 --> 00:04:42,870 that we have to deal with that. 68 00:04:43,290 --> 00:04:47,130 So this is exactly next problem statement that we have to deal with that. 69 00:04:47,160 --> 00:04:54,150 So basically we have to perform data clearing on our data to make this data ready for the analysis as 70 00:04:54,150 --> 00:04:55,590 well as modeling purposes. 71 00:04:55,590 --> 00:04:59,600 Because whenever you are going to work from Real World Project, you will have. 72 00:04:59,960 --> 00:05:05,880 Get your clean data, you will always get your raw data, whether in the form of some fve, whether 73 00:05:05,880 --> 00:05:11,070 you have to accept the data from some APIs, whether you have to extract the data from some databases, 74 00:05:11,160 --> 00:05:13,040 it's all a that use case. 75 00:05:13,350 --> 00:05:18,900 So whenever you are going to work on some real world scenarios, you will always get your raw data. 76 00:05:18,900 --> 00:05:20,630 And you have to prepare this data. 77 00:05:20,760 --> 00:05:22,310 You have to do lots of analysis. 78 00:05:22,320 --> 00:05:23,820 You have to understand your skills. 79 00:05:23,940 --> 00:05:25,410 You have to understand your data. 80 00:05:25,500 --> 00:05:30,910 And then you have to build such a model that can do prediction depending upon what use case you have. 81 00:05:30,930 --> 00:05:37,350 So let's say very first I'm going to check what exactly is a data typed of each and every variable of 82 00:05:37,350 --> 00:05:39,630 each and every column available in my data. 83 00:05:39,650 --> 00:05:42,600 So just call this stuff, which is my data function. 84 00:05:42,810 --> 00:05:44,790 And here we will observe. 85 00:05:45,030 --> 00:05:51,960 These are all the data types by default over here and we will observe this data journey by default. 86 00:05:52,110 --> 00:05:54,240 Partners will assign it as object. 87 00:05:54,360 --> 00:05:58,770 But we know if it is of timestamp format, it is a daytime format. 88 00:05:58,920 --> 00:06:05,970 So it means we have to convert this data journey, this departure time, as well as this arrival time 89 00:06:05,970 --> 00:06:10,370 as well, because these three variables are basically of data and format. 90 00:06:10,420 --> 00:06:11,730 We have to convert this. 91 00:06:11,940 --> 00:06:18,540 So I'm just going to define that function that can convert my all the three variables to some daytime 92 00:06:18,540 --> 00:06:19,130 format. 93 00:06:19,140 --> 00:06:26,170 So next time we define a function and I'm going to say its name is, let's say, change to change into 94 00:06:26,460 --> 00:06:27,100 daytime. 95 00:06:27,150 --> 00:06:32,130 So I'm just going to define that function that can change whatever column I'm going to pass over here. 96 00:06:32,130 --> 00:06:38,910 It will convert that data type of that particular column to whatever column I'm going to pass over here. 97 00:06:39,100 --> 00:06:45,870 Very first, I have to call I to underscore data and function that will convert this object data type 98 00:06:45,870 --> 00:06:53,340 into some data so that if I have to exit this data frame, which is exactly in Kreen on this data and 99 00:06:53,340 --> 00:06:59,700 here I have to exit this train data off column, similarly, I have to update this as well. 100 00:06:59,890 --> 00:07:03,350 So I have to update this using this, this, this. 101 00:07:03,600 --> 00:07:10,290 So let's say I have to just execute this function and basically whosever feature I have to convert it. 102 00:07:10,500 --> 00:07:13,620 I can basically parse this column name in this function. 103 00:07:13,630 --> 00:07:14,400 That's it. 104 00:07:14,430 --> 00:07:19,590 So basically I have to upgrade the data journey, this departure time, as well as this arrival time. 105 00:07:19,920 --> 00:07:26,660 So let's say I'm going to I create as for I n and let's say I want all the three columns. 106 00:07:26,950 --> 00:07:30,480 So what I'm going to do our very first I have to exit the data frame. 107 00:07:30,480 --> 00:07:35,700 And on this, if I'm going to call these columns, we will look at all the columns in the form of list. 108 00:07:35,850 --> 00:07:36,700 So that is false. 109 00:07:36,750 --> 00:07:39,390 And just going to, let's say, complete this one. 110 00:07:39,390 --> 00:07:41,790 And here I'm going to assign a list. 111 00:07:41,970 --> 00:07:50,250 And in this I'm going to Stormi all the data and these two column name as well that I have to deal with 112 00:07:50,250 --> 00:07:50,540 that. 113 00:07:50,760 --> 00:07:57,110 And in this loop, basically I have to access this function, which is my chain into data. 114 00:07:57,300 --> 00:07:59,720 And here I have to pass this. 115 00:07:59,810 --> 00:08:01,650 I just executed. 116 00:08:01,800 --> 00:08:06,770 And if again that's it, I'm going to call these types of again. 117 00:08:07,080 --> 00:08:14,220 Now we will observe over here these three features gets converted into daytime format just because of 118 00:08:14,400 --> 00:08:17,350 this blocks of code that I have written over here. 119 00:08:17,370 --> 00:08:18,030 That's it. 120 00:08:18,130 --> 00:08:23,560 Now, what we have to do, let's say this is my data column, so what I'm going to do. 121 00:08:23,940 --> 00:08:28,560 So whenever you are going to parse this column to your machine learning model and you will say you predict 122 00:08:28,560 --> 00:08:34,560 on the basis of whatever, the entry for my machine learning model isn't able to understand what our 123 00:08:34,560 --> 00:08:38,460 entry will be awarded for, what we have to do in such case. 124 00:08:38,610 --> 00:08:43,540 So basically, we have to split this date and we have to tell to a machine learning model. 125 00:08:43,680 --> 00:08:45,320 Yeah, this is it. 126 00:08:45,600 --> 00:08:46,500 This is month. 127 00:08:46,500 --> 00:08:51,570 And this is easier than only my machine learning model is able to understand. 128 00:08:51,720 --> 00:08:54,630 What exactly is a date for a journey? 129 00:08:54,630 --> 00:08:57,180 What is a month long journey and what does the air quality. 130 00:08:57,600 --> 00:08:59,440 That's what I am going to do now. 131 00:08:59,820 --> 00:09:03,210 So very first, what I have to do very first, I'm going to access many of them. 132 00:09:03,270 --> 00:09:09,090 And in this I have to exit this data journey column, which is exactly this journey. 133 00:09:09,090 --> 00:09:16,440 And on this, what I have to do to access your day, you have to call a door day function to access 134 00:09:16,440 --> 00:09:16,770 them. 135 00:09:17,100 --> 00:09:19,270 Similarly, what you have to do. 136 00:09:19,500 --> 00:09:27,810 Similarly, I'm going to call this data dot month to access your month and after it, let's say whatever 137 00:09:27,810 --> 00:09:28,560 month it will. 138 00:09:29,070 --> 00:09:32,640 I have this stored in some variable or in some columns. 139 00:09:32,640 --> 00:09:37,840 I will restore it in, let's say, a month similar leave whatever data will determine. 140 00:09:38,110 --> 00:09:41,520 I'm going to store it somewhere else, let's say in some column name. 141 00:09:41,790 --> 00:09:44,210 So I have to define that column as well. 142 00:09:44,490 --> 00:09:47,580 So that's that column name is Jaune. 143 00:09:47,940 --> 00:09:56,610 Underscore the similarly over here, let's say journey on this score month after just executed. 144 00:09:56,610 --> 00:09:59,220 And if I'm going to call ahead on. 145 00:09:59,350 --> 00:10:02,930 Data to get a rough idea what exactly is going in my data. 146 00:10:03,220 --> 00:10:10,750 Now we will observe over here at two columns a journey day and journeyman has been added in your data 147 00:10:10,750 --> 00:10:11,070 frame. 148 00:10:11,260 --> 00:10:18,010 Now, what we have to do, we have to drop this data journey column because we have already first all 149 00:10:18,010 --> 00:10:20,590 the necessary things from this feature. 150 00:10:20,620 --> 00:10:23,270 It means we can drop this column. 151 00:10:23,460 --> 00:10:28,360 So what I am going to do, I have to ask this, this and here I have to call a graph function. 152 00:10:28,480 --> 00:10:30,750 And here I have to mention what I have to drop. 153 00:10:30,970 --> 00:10:36,270 Then I'm going to say access to one, because I have to drop this column in a vertical way. 154 00:10:36,490 --> 00:10:42,560 Then I have to say to my in-place barometer as to because I have to update mine as well. 155 00:10:42,580 --> 00:10:46,390 So just executed all of the stuff get executed. 156 00:10:46,630 --> 00:10:49,370 So that's all about the session in the upcoming session. 157 00:10:49,390 --> 00:10:54,340 We are going to deal with this departure time as well as this arrival time feature as well. 158 00:10:54,550 --> 00:10:59,160 So that next session is all about data cleaning and data preprocessing. 159 00:10:59,410 --> 00:11:01,480 So hope to love the session very much. 160 00:11:01,510 --> 00:11:02,110 Thank you. 161 00:11:02,320 --> 00:11:03,260 Have a nice day. 162 00:11:03,310 --> 00:11:04,180 Keep learning. 163 00:11:04,180 --> 00:11:04,930 Keep growing. 164 00:11:05,290 --> 00:11:06,220 Keep practicing.