1
00:00:00,090 --> 00:00:05,580
Hello and welcome to the very first session of this flight fair prediction project, in which we have

2
00:00:05,580 --> 00:00:11,600
to predict what exactly can be the fear of the triggers, depending upon all the other features in our

3
00:00:11,610 --> 00:00:12,300
data as well.

4
00:00:12,480 --> 00:00:16,470
So we will gradually start from data importing, data cleaning.

5
00:00:16,470 --> 00:00:19,670
Then we have to perform data processing on our data.

6
00:00:19,680 --> 00:00:25,530
After doing all the stuff, we have to perform lots of analysis of our data to understand our data as

7
00:00:25,530 --> 00:00:26,550
much as we can.

8
00:00:26,760 --> 00:00:32,330
So I'm just going to open my chapter notebook where I'm going to code in Python programming language.

9
00:00:32,460 --> 00:00:38,910
So this is exactly that ID where I'm going to code in Python program so that if I'm going to import

10
00:00:39,270 --> 00:00:45,330
some basic libraries, all the necessary libraries and all the necessary modules, that will be highly

11
00:00:45,330 --> 00:00:51,660
helpful for us to do all the manipulation task, poor data, all the lots of analysis, to do all the

12
00:00:51,660 --> 00:00:54,750
modeling tasks, whatever we have to do so very fast.

13
00:00:54,750 --> 00:00:59,790
I'm just going to import my very first module, which is exactly fundus module, and I'm going to create

14
00:00:59,790 --> 00:01:01,020
a speedy.

15
00:01:01,290 --> 00:01:06,330
So PARNAS is extensively used in case of data manipulation data analysis.

16
00:01:06,480 --> 00:01:11,390
Let's say you have to extract data from some GSV into your data frame.

17
00:01:11,520 --> 00:01:14,610
So in such case, by that will be your pioneer.

18
00:01:14,880 --> 00:01:22,020
So now for your Nomaka competition, you guys can import your nonfamily and I will create this and people

19
00:01:22,230 --> 00:01:26,730
and for Visitacion purpose, you guys can import the very first module.

20
00:01:26,730 --> 00:01:35,280
I'm going to import as as as s so I'm going to test imported and let's say for atomisation task us and

21
00:01:35,280 --> 00:01:40,470
also import this matplotlib, the Jazmyne and the Resolution Library.

22
00:01:40,470 --> 00:01:47,260
So I'm just going to execute it using Autobus enter and all the stuff gets executed over here.

23
00:01:47,490 --> 00:01:54,030
So now we have to read our data, which is exactly available over here, which is in my data on the

24
00:01:54,030 --> 00:01:54,600
screen.

25
00:01:54,630 --> 00:02:00,240
So we have to read this data before performing any sort of analysis on your data.

26
00:02:00,450 --> 00:02:04,040
So I'm just going to read this data and we will observe over here.

27
00:02:04,140 --> 00:02:09,000
This is exactly my data is in the Excel format.

28
00:02:09,030 --> 00:02:10,360
You would observe over here.

29
00:02:10,530 --> 00:02:18,060
So for this, we have to call a function, which is exactly my read Underscore Excel, which is exactly

30
00:02:18,060 --> 00:02:18,530
this one.

31
00:02:18,540 --> 00:02:23,190
And here very first, we have to mention where exactly my data is available.

32
00:02:23,310 --> 00:02:27,630
So I'm just going to copy this entire path from here and I'm just going to paste over here.

33
00:02:27,780 --> 00:02:33,750
And after it, if you will press tab over here, you will get all your stuff available at this particular

34
00:02:33,750 --> 00:02:34,110
spot.

35
00:02:34,320 --> 00:02:36,320
So we have to basically read this file.

36
00:02:36,600 --> 00:02:42,880
Let's say I'm going to store its data frame object in, let's say, a create on this whole data.

37
00:02:43,080 --> 00:02:51,270
So basically, I'm going to execute it and lets it to get a preview of data frame you guys can call

38
00:02:51,270 --> 00:02:53,620
basically had on your dataset.

39
00:02:54,120 --> 00:02:58,350
So I'm just going to call had and this is exactly preview of your data.

40
00:02:58,350 --> 00:03:04,600
How exactly you do that looks like you will see over here you have your line date of January, source

41
00:03:04,600 --> 00:03:06,720
and Destination and all the other features.

42
00:03:06,870 --> 00:03:13,020
And basically you have to predict what exactly can be the price depending upon all other independent

43
00:03:13,020 --> 00:03:13,530
features.

44
00:03:14,250 --> 00:03:20,250
So let's say if I'm going to open my assignment for the session, for the very first task is we have

45
00:03:20,250 --> 00:03:24,300
to deal with all the missing values that are available in data.

46
00:03:24,690 --> 00:03:29,330
So let's say I'm going to call what are the missing values available in data?

47
00:03:29,430 --> 00:03:31,980
How many missing values are available and data.

48
00:03:32,070 --> 00:03:34,520
So far, this you guys can call is any.

49
00:03:34,560 --> 00:03:38,190
And if you will call someone there, you get all the information.

50
00:03:38,190 --> 00:03:40,970
Author Missing values available in your data.

51
00:03:41,220 --> 00:03:45,260
You will see you have just two missing values, one over there, one over here.

52
00:03:45,540 --> 00:03:50,940
So basically, you guys can drop these missing values because here you have three less missing values.

53
00:03:51,120 --> 00:03:54,890
So let me check what exactly the shape of this data felt.

54
00:03:55,110 --> 00:04:01,350
So if I will call Chaib over there, you will see it has that much number of rows and that number of

55
00:04:01,350 --> 00:04:03,120
columns in my data thing.

56
00:04:03,300 --> 00:04:07,330
So I'm just going to drop whatever missing value I have in my data.

57
00:04:07,620 --> 00:04:09,570
So far, this is what we can do.

58
00:04:09,600 --> 00:04:10,710
We can simply drop.

59
00:04:10,710 --> 00:04:13,590
And for this, we had to call a function drop any.

60
00:04:13,770 --> 00:04:18,490
And here I have to pass my parameter to update my data from elsewhere.

61
00:04:18,810 --> 00:04:21,090
So just executed after it.

62
00:04:21,240 --> 00:04:28,440
What we have to do, let's say we do cross-check whether my missing value estimate or not to just copy

63
00:04:28,440 --> 00:04:30,900
from here and just to paste over here.

64
00:04:31,080 --> 00:04:32,490
And now we will observe.

65
00:04:32,500 --> 00:04:34,650
We don't have any missing value in my data.

66
00:04:34,650 --> 00:04:41,430
It means up to some extent my data is going to clean the let's move ahead to the next problem statement

67
00:04:41,430 --> 00:04:42,870
that we have to deal with that.

68
00:04:43,290 --> 00:04:47,130
So this is exactly next problem statement that we have to deal with that.

69
00:04:47,160 --> 00:04:54,150
So basically we have to perform data clearing on our data to make this data ready for the analysis as

70
00:04:54,150 --> 00:04:55,590
well as modeling purposes.

71
00:04:55,590 --> 00:04:59,600
Because whenever you are going to work from Real World Project, you will have.

72
00:04:59,960 --> 00:05:05,880
Get your clean data, you will always get your raw data, whether in the form of some fve, whether

73
00:05:05,880 --> 00:05:11,070
you have to accept the data from some APIs, whether you have to extract the data from some databases,

74
00:05:11,160 --> 00:05:13,040
it's all a that use case.

75
00:05:13,350 --> 00:05:18,900
So whenever you are going to work on some real world scenarios, you will always get your raw data.

76
00:05:18,900 --> 00:05:20,630
And you have to prepare this data.

77
00:05:20,760 --> 00:05:22,310
You have to do lots of analysis.

78
00:05:22,320 --> 00:05:23,820
You have to understand your skills.

79
00:05:23,940 --> 00:05:25,410
You have to understand your data.

80
00:05:25,500 --> 00:05:30,910
And then you have to build such a model that can do prediction depending upon what use case you have.

81
00:05:30,930 --> 00:05:37,350
So let's say very first I'm going to check what exactly is a data typed of each and every variable of

82
00:05:37,350 --> 00:05:39,630
each and every column available in my data.

83
00:05:39,650 --> 00:05:42,600
So just call this stuff, which is my data function.

84
00:05:42,810 --> 00:05:44,790
And here we will observe.

85
00:05:45,030 --> 00:05:51,960
These are all the data types by default over here and we will observe this data journey by default.

86
00:05:52,110 --> 00:05:54,240
Partners will assign it as object.

87
00:05:54,360 --> 00:05:58,770
But we know if it is of timestamp format, it is a daytime format.

88
00:05:58,920 --> 00:06:05,970
So it means we have to convert this data journey, this departure time, as well as this arrival time

89
00:06:05,970 --> 00:06:10,370
as well, because these three variables are basically of data and format.

90
00:06:10,420 --> 00:06:11,730
We have to convert this.

91
00:06:11,940 --> 00:06:18,540
So I'm just going to define that function that can convert my all the three variables to some daytime

92
00:06:18,540 --> 00:06:19,130
format.

93
00:06:19,140 --> 00:06:26,170
So next time we define a function and I'm going to say its name is, let's say, change to change into

94
00:06:26,460 --> 00:06:27,100
daytime.

95
00:06:27,150 --> 00:06:32,130
So I'm just going to define that function that can change whatever column I'm going to pass over here.

96
00:06:32,130 --> 00:06:38,910
It will convert that data type of that particular column to whatever column I'm going to pass over here.

97
00:06:39,100 --> 00:06:45,870
Very first, I have to call I to underscore data and function that will convert this object data type

98
00:06:45,870 --> 00:06:53,340
into some data so that if I have to exit this data frame, which is exactly in Kreen on this data and

99
00:06:53,340 --> 00:06:59,700
here I have to exit this train data off column, similarly, I have to update this as well.

100
00:06:59,890 --> 00:07:03,350
So I have to update this using this, this, this.

101
00:07:03,600 --> 00:07:10,290
So let's say I have to just execute this function and basically whosever feature I have to convert it.

102
00:07:10,500 --> 00:07:13,620
I can basically parse this column name in this function.

103
00:07:13,630 --> 00:07:14,400
That's it.

104
00:07:14,430 --> 00:07:19,590
So basically I have to upgrade the data journey, this departure time, as well as this arrival time.

105
00:07:19,920 --> 00:07:26,660
So let's say I'm going to I create as for I n and let's say I want all the three columns.

106
00:07:26,950 --> 00:07:30,480
So what I'm going to do our very first I have to exit the data frame.

107
00:07:30,480 --> 00:07:35,700
And on this, if I'm going to call these columns, we will look at all the columns in the form of list.

108
00:07:35,850 --> 00:07:36,700
So that is false.

109
00:07:36,750 --> 00:07:39,390
And just going to, let's say, complete this one.

110
00:07:39,390 --> 00:07:41,790
And here I'm going to assign a list.

111
00:07:41,970 --> 00:07:50,250
And in this I'm going to Stormi all the data and these two column name as well that I have to deal with

112
00:07:50,250 --> 00:07:50,540
that.

113
00:07:50,760 --> 00:07:57,110
And in this loop, basically I have to access this function, which is my chain into data.

114
00:07:57,300 --> 00:07:59,720
And here I have to pass this.

115
00:07:59,810 --> 00:08:01,650
I just executed.

116
00:08:01,800 --> 00:08:06,770
And if again that's it, I'm going to call these types of again.

117
00:08:07,080 --> 00:08:14,220
Now we will observe over here these three features gets converted into daytime format just because of

118
00:08:14,400 --> 00:08:17,350
this blocks of code that I have written over here.

119
00:08:17,370 --> 00:08:18,030
That's it.

120
00:08:18,130 --> 00:08:23,560
Now, what we have to do, let's say this is my data column, so what I'm going to do.

121
00:08:23,940 --> 00:08:28,560
So whenever you are going to parse this column to your machine learning model and you will say you predict

122
00:08:28,560 --> 00:08:34,560
on the basis of whatever, the entry for my machine learning model isn't able to understand what our

123
00:08:34,560 --> 00:08:38,460
entry will be awarded for, what we have to do in such case.

124
00:08:38,610 --> 00:08:43,540
So basically, we have to split this date and we have to tell to a machine learning model.

125
00:08:43,680 --> 00:08:45,320
Yeah, this is it.

126
00:08:45,600 --> 00:08:46,500
This is month.

127
00:08:46,500 --> 00:08:51,570
And this is easier than only my machine learning model is able to understand.

128
00:08:51,720 --> 00:08:54,630
What exactly is a date for a journey?

129
00:08:54,630 --> 00:08:57,180
What is a month long journey and what does the air quality.

130
00:08:57,600 --> 00:08:59,440
That's what I am going to do now.

131
00:08:59,820 --> 00:09:03,210
So very first, what I have to do very first, I'm going to access many of them.

132
00:09:03,270 --> 00:09:09,090
And in this I have to exit this data journey column, which is exactly this journey.

133
00:09:09,090 --> 00:09:16,440
And on this, what I have to do to access your day, you have to call a door day function to access

134
00:09:16,440 --> 00:09:16,770
them.

135
00:09:17,100 --> 00:09:19,270
Similarly, what you have to do.

136
00:09:19,500 --> 00:09:27,810
Similarly, I'm going to call this data dot month to access your month and after it, let's say whatever

137
00:09:27,810 --> 00:09:28,560
month it will.

138
00:09:29,070 --> 00:09:32,640
I have this stored in some variable or in some columns.

139
00:09:32,640 --> 00:09:37,840
I will restore it in, let's say, a month similar leave whatever data will determine.

140
00:09:38,110 --> 00:09:41,520
I'm going to store it somewhere else, let's say in some column name.

141
00:09:41,790 --> 00:09:44,210
So I have to define that column as well.

142
00:09:44,490 --> 00:09:47,580
So that's that column name is Jaune.

143
00:09:47,940 --> 00:09:56,610
Underscore the similarly over here, let's say journey on this score month after just executed.

144
00:09:56,610 --> 00:09:59,220
And if I'm going to call ahead on.

145
00:09:59,350 --> 00:10:02,930
Data to get a rough idea what exactly is going in my data.

146
00:10:03,220 --> 00:10:10,750
Now we will observe over here at two columns a journey day and journeyman has been added in your data

147
00:10:10,750 --> 00:10:11,070
frame.

148
00:10:11,260 --> 00:10:18,010
Now, what we have to do, we have to drop this data journey column because we have already first all

149
00:10:18,010 --> 00:10:20,590
the necessary things from this feature.

150
00:10:20,620 --> 00:10:23,270
It means we can drop this column.

151
00:10:23,460 --> 00:10:28,360
So what I am going to do, I have to ask this, this and here I have to call a graph function.

152
00:10:28,480 --> 00:10:30,750
And here I have to mention what I have to drop.

153
00:10:30,970 --> 00:10:36,270
Then I'm going to say access to one, because I have to drop this column in a vertical way.

154
00:10:36,490 --> 00:10:42,560
Then I have to say to my in-place barometer as to because I have to update mine as well.

155
00:10:42,580 --> 00:10:46,390
So just executed all of the stuff get executed.

156
00:10:46,630 --> 00:10:49,370
So that's all about the session in the upcoming session.

157
00:10:49,390 --> 00:10:54,340
We are going to deal with this departure time as well as this arrival time feature as well.

158
00:10:54,550 --> 00:10:59,160
So that next session is all about data cleaning and data preprocessing.

159
00:10:59,410 --> 00:11:01,480
So hope to love the session very much.

160
00:11:01,510 --> 00:11:02,110
Thank you.

161
00:11:02,320 --> 00:11:03,260
Have a nice day.

162
00:11:03,310 --> 00:11:04,180
Keep learning.

163
00:11:04,180 --> 00:11:04,930
Keep growing.

164
00:11:05,290 --> 00:11:06,220
Keep practicing.