1
00:00:00,150 --> 00:00:06,600
Halaal So before going deep dive into this session, let's have a quick recap of what we have done in

2
00:00:06,600 --> 00:00:09,210
our previous session from the previous session.

3
00:00:09,240 --> 00:00:15,150
We have basically performed one hot encoding on airlines destination as well as on source.

4
00:00:15,660 --> 00:00:19,830
Still, we have lots of column that we still have to deal with that.

5
00:00:20,130 --> 00:00:25,300
So basically those are route totally stops and additional info.

6
00:00:25,560 --> 00:00:32,520
So in this column, so in this session, we have to deal with this basically this route column.

7
00:00:32,530 --> 00:00:36,370
So you will see here, this is exactly separator.

8
00:00:36,390 --> 00:00:44,910
So if you will separate this data on the basis of this separator, MARCOSSON, that you guys get off

9
00:00:44,910 --> 00:00:49,740
school, I have to separate it because this is your data and machine learning.

10
00:00:49,740 --> 00:00:51,180
Just don't understand.

11
00:00:51,450 --> 00:00:53,160
What is this route column for?

12
00:00:53,160 --> 00:00:55,440
You have to make machine learning to understand.

13
00:00:55,710 --> 00:00:57,840
Yeah, this is my route one.

14
00:00:57,840 --> 00:00:59,880
This is my route to this is my duty.

15
00:01:01,050 --> 00:01:02,180
What is my route.

16
00:01:02,190 --> 00:01:07,740
And so you have to make to learn to machine learning algorithm, whatever algorithm you are going to

17
00:01:07,740 --> 00:01:09,870
use in upcoming session.

18
00:01:10,200 --> 00:01:11,880
So here are what I am going to do.

19
00:01:12,060 --> 00:01:15,750
I'm just like saying I'm going to access very first what is monitoring.

20
00:01:15,900 --> 00:01:21,650
And on this, if I'm going to access my route, you will see where you have all these different different

21
00:01:21,690 --> 00:01:21,960
routes.

22
00:01:22,230 --> 00:01:30,040
So now basically I have to basically I have to split this on a basis of this apparatus.

23
00:01:30,090 --> 00:01:37,560
If I'm going to execute it, you will see it will exactly return a list which will get splitted basically

24
00:01:37,560 --> 00:01:40,400
on the basis of separate what you are going to pass.

25
00:01:40,710 --> 00:01:47,310
Let's say I have to access this very first route, BLR, which denotes my Bangalor.

26
00:01:47,550 --> 00:01:54,210
So if you have to access this, you are going to say start of zero, it's just executed.

27
00:01:54,210 --> 00:01:58,170
You will get your data of Route one.

28
00:01:58,380 --> 00:02:01,440
So what I'm going to do, I'm just going to copy.

29
00:02:01,470 --> 00:02:04,260
Let's I have to store it somewhere else before it.

30
00:02:04,590 --> 00:02:13,010
So I'm going to say categorical all let's say RWB underscore one that said this is my horse too.

31
00:02:13,020 --> 00:02:15,390
So I'm just going to store it after it.

32
00:02:15,450 --> 00:02:16,720
What do we have to do?

33
00:02:17,010 --> 00:02:26,280
I'm just going to copy this entire code and I have to pay for forward to Route three to four and still

34
00:02:26,280 --> 00:02:26,970
good flight.

35
00:02:27,390 --> 00:02:30,840
So just do a modification to three.

36
00:02:31,170 --> 00:02:32,800
Similarly four.

37
00:02:33,000 --> 00:02:40,230
Similarly, I have five over there and after it we have to exit as one.

38
00:02:40,620 --> 00:02:43,050
This is two here.

39
00:02:43,050 --> 00:02:45,780
I have to exit data of three here.

40
00:02:45,780 --> 00:02:50,280
I have to exit the data of fourth index, so just execute it.

41
00:02:50,280 --> 00:02:53,670
And if I'm going to call I had on my data frame.

42
00:02:54,090 --> 00:03:01,020
Now you will see over here these all the five rules have been added in your data frame.

43
00:03:01,320 --> 00:03:05,510
So now this route column makes no sense at all.

44
00:03:05,970 --> 00:03:06,450
So far.

45
00:03:06,450 --> 00:03:07,920
This what you can do.

46
00:03:07,950 --> 00:03:11,430
You guys can simply delay this for four days.

47
00:03:11,430 --> 00:03:17,820
I'm going to use this to our columns here very first to mention what exactly my name is, Astrid.

48
00:03:17,850 --> 00:03:21,270
I have to mention what is my future name.

49
00:03:21,360 --> 00:03:22,810
So just executed.

50
00:03:22,830 --> 00:03:27,480
It will remove this route column from your data frame.

51
00:03:27,930 --> 00:03:32,850
So after it, let's say I'm going to check whether I have any null values or not.

52
00:03:33,210 --> 00:03:39,500
So for this, I'm going to call this null or somewhere that you will see this feature has that this

53
00:03:39,510 --> 00:03:46,320
feature has that much and this feature has that much support is what I'm going to do wherever I have

54
00:03:46,370 --> 00:03:49,660
any missing value, just replace it with none.

55
00:03:49,830 --> 00:03:56,940
So basically, I'm going to trade on for I in, let's say very first, if I'm going to print whatever

56
00:03:56,940 --> 00:03:59,850
my column names, who will see I have to.

57
00:03:59,850 --> 00:04:06,690
I trade on Route three or four, Route five, because these are the column that exactly contain missing

58
00:04:06,690 --> 00:04:07,140
value.

59
00:04:07,470 --> 00:04:14,790
So now I have to say, wherever I have this column, I'm going to trade on each and every column and

60
00:04:14,790 --> 00:04:17,040
I'm going to say categorical.

61
00:04:17,790 --> 00:04:24,740
I don't Filani and how I have to fill I have to fill it with none.

62
00:04:24,960 --> 00:04:31,440
And after it I'm going to use my in place parameter and here I have to pass through because I have to

63
00:04:31,800 --> 00:04:33,820
update my data stream as well.

64
00:04:34,080 --> 00:04:40,760
So if I'm going to execute all this stuff gets executed and if I say I'm going to cross-check.

65
00:04:40,770 --> 00:04:43,740
So again you can call this is not dorsum.

66
00:04:43,740 --> 00:04:50,540
And this time you will also you don't have any missing value in data because you have gold, you know,

67
00:04:50,630 --> 00:04:52,570
very smartly, Oria.

68
00:04:52,890 --> 00:04:59,670
And if you are that much not compatible with this code, you can do separately for Routley three then

69
00:04:59,670 --> 00:04:59,880
for.

70
00:04:59,940 --> 00:05:01,120
Good for them, for all.

71
00:05:01,710 --> 00:05:07,330
But if you are going to work from real world today, you have to write a code in such a way.

72
00:05:07,560 --> 00:05:10,320
That's what I'm going to show you in that much it.

73
00:05:10,770 --> 00:05:13,140
So after it, what do you guys have to do?

74
00:05:13,320 --> 00:05:14,810
Let's say I have to print.

75
00:05:14,820 --> 00:05:15,220
Yeah.

76
00:05:15,570 --> 00:05:18,990
Each and every feature has how many categories.

77
00:05:19,260 --> 00:05:27,630
So what I'm going to do so far I in categorical columns and after what I have to print, I have to print.

78
00:05:27,810 --> 00:05:28,110
Yeah.

79
00:05:28,170 --> 00:05:30,960
How many categories in each and every feature.

80
00:05:31,350 --> 00:05:35,070
I have to write all these things in my print statement.

81
00:05:35,110 --> 00:05:41,520
So here I going to say I have to add a placeholder and whatever value I will see placeholder in this

82
00:05:41,520 --> 00:05:46,320
placeholder, I'm simply going to receive it via my format function.

83
00:05:46,330 --> 00:05:56,670
So I'm going to say a placeholder has to total that much categories and I have to simply receive these

84
00:05:56,670 --> 00:06:00,240
values via my format function.

85
00:06:00,240 --> 00:06:04,380
And in this format, I have to say basically this.

86
00:06:04,680 --> 00:06:15,810
I will be over for a placeholder and the second placeholder will get replaced by categorical of I dot

87
00:06:16,200 --> 00:06:21,090
value counts because you have to count in each and every feature.

88
00:06:21,090 --> 00:06:25,560
And on this you can call whatever will be its length.

89
00:06:25,950 --> 00:06:34,530
So if I'm going to execute it now, you will see over here your line has that much feature on each and

90
00:06:34,530 --> 00:06:39,350
every feature has that much feature and you will observe over a thing over here.

91
00:06:39,810 --> 00:06:43,650
Ruud has basically this much better rule to do.

92
00:06:43,650 --> 00:06:46,770
Three and four has a higher number of features.

93
00:06:46,770 --> 00:06:53,400
And if you are going to use one hot encoding over here, that will create a more number of columns and

94
00:06:53,610 --> 00:06:59,220
that will definitely create some problem for the algorithm that you are going to use because your data

95
00:06:59,220 --> 00:07:00,100
becomes huge.

96
00:07:00,390 --> 00:07:03,240
So that's the issue when you are going to use one not encoding.

97
00:07:03,600 --> 00:07:11,430
So to get rid of that issue, to get rid of this high dimensionality issue, we are going to use lable

98
00:07:11,430 --> 00:07:12,030
encoder.

99
00:07:12,390 --> 00:07:14,350
So very first you have to import that class.

100
00:07:14,370 --> 00:07:21,930
So for this, I'm going to say from Escalon, we have to use our proposing module in that I have my

101
00:07:22,110 --> 00:07:26,160
label in order to just execute it.

102
00:07:26,160 --> 00:07:31,920
And after it, we have to do I have to simply initialize this level encoder.

103
00:07:32,310 --> 00:07:37,500
So using this encoder, I have to encode my all the column.

104
00:07:37,620 --> 00:07:44,010
So what I'm going to do, let's say very first, I'm going to print all the column names, then after

105
00:07:44,010 --> 00:07:48,520
it, what I have to do, let's say what I'm going to do and we do.

106
00:07:48,570 --> 00:08:00,000
I trade on each and every feature so far, I, I, I'm, I treat as for I and then I have to encode

107
00:08:00,000 --> 00:08:00,990
it for this.

108
00:08:00,990 --> 00:08:02,520
I have to use this in order.

109
00:08:02,700 --> 00:08:07,600
And in this I have a function within the transform and I have to transform it.

110
00:08:08,040 --> 00:08:11,640
So what I have I have to transform each and every I.

111
00:08:12,150 --> 00:08:14,010
So here I'm going to say categorical.

112
00:08:14,180 --> 00:08:16,840
I similarly I have to update it as well.

113
00:08:17,190 --> 00:08:20,240
So what I have to update, I have to update each and every item.

114
00:08:20,370 --> 00:08:24,480
So I'm going to say this, this, this and just executed after it.

115
00:08:24,750 --> 00:08:33,000
If I'm going to go head on my data now, we will see over here all your root feature gets converted

116
00:08:33,000 --> 00:08:34,250
into Intisar format.

117
00:08:34,530 --> 00:08:39,780
That's what we exactly need because my machine learning only understand.

118
00:08:39,780 --> 00:08:40,320
Yeah.

119
00:08:40,770 --> 00:08:42,030
What is the meaning of this?

120
00:08:42,270 --> 00:08:44,010
A value and machine learning.

121
00:08:44,010 --> 00:08:45,910
Now understand my Thaxted.

122
00:08:46,140 --> 00:08:52,350
That's why we are going to deal with this feature encoding and now we have to deal with this basically

123
00:08:52,350 --> 00:08:55,860
this tortellinis because as well as this additional info.

124
00:08:56,340 --> 00:09:01,860
But we will know this over here in most of the rules, we have this new information provided.

125
00:09:02,130 --> 00:09:05,490
So basically we can drop this column.

126
00:09:05,520 --> 00:09:08,790
So for this, I'm going to say just drop this column in here.

127
00:09:09,090 --> 00:09:09,810
Very first.

128
00:09:09,810 --> 00:09:13,710
I have to mention what exactly is after it.

129
00:09:13,710 --> 00:09:20,620
I have to mention this feature name, which is exactly additional underscore force, infosec just executed

130
00:09:20,940 --> 00:09:21,570
after it.

131
00:09:21,660 --> 00:09:27,450
We have to deal with this total on this code is tops feature for this.

132
00:09:27,450 --> 00:09:32,370
I have to access categorical and here I have to access this, this, this and on this.

133
00:09:32,370 --> 00:09:41,760
If I'm going to call uni now, you will see each and every unique item available in this total on this

134
00:09:42,480 --> 00:09:46,500
feature you will see stop to and all these things.

135
00:09:46,740 --> 00:09:50,730
So in machine learning that we're able to understand what is the meaning of this nice talk to what we

136
00:09:51,000 --> 00:09:51,510
can do.

137
00:09:52,020 --> 00:09:54,720
We can replace this stop with zero.

138
00:09:54,750 --> 00:09:59,700
This to start with two one, start with one, three, three and four to stop.

139
00:10:00,550 --> 00:10:08,080
So for this, what I am going to do, instead of using cyclone labor in a glass, we are going to use

140
00:10:08,080 --> 00:10:09,520
some own approach.

141
00:10:09,520 --> 00:10:11,530
We are going to use a custom approach.

142
00:10:11,990 --> 00:10:17,770
Basically, we are going to define a dictionary over there and in this dictionary, wherever, I'm going

143
00:10:17,770 --> 00:10:24,210
to say whatever I have noticed or just replace it with zero wherever I have to stop, replace it with

144
00:10:24,220 --> 00:10:24,630
two.

145
00:10:25,030 --> 00:10:32,490
One is to replace it with one tree to replace it with three four to stop, replace it with four.

146
00:10:32,500 --> 00:10:33,160
That's it.

147
00:10:33,340 --> 00:10:40,130
And after it, what I have to do, I have to map this dictionary to my daughter on is stops.

148
00:10:40,170 --> 00:10:47,590
So this I have to use a map or did and what I have to map, I have to map this dictionary after it.

149
00:10:47,920 --> 00:10:51,340
I have to update this feature as well.

150
00:10:51,770 --> 00:10:55,600
So I'm going to say categorical of total stuff to just execute it.

151
00:10:55,600 --> 00:11:02,590
And if I'm going to call, I had on my date of and you will see all your future guests convert into

152
00:11:02,590 --> 00:11:04,140
some Intisar format.

153
00:11:04,450 --> 00:11:06,570
That's what I actually want.

154
00:11:06,580 --> 00:11:13,110
Now, what you have to do, you have to simply concat this categorical data frame and all the previous

155
00:11:13,120 --> 00:11:13,810
data from that.

156
00:11:13,810 --> 00:11:20,500
You have defined that airlines data frame and this is your source, do the same and this is your airline

157
00:11:20,510 --> 00:11:23,440
little frame as well as you have some destination it affects.

158
00:11:23,680 --> 00:11:27,790
So you have to concatenate all these data in four days.

159
00:11:27,790 --> 00:11:32,290
I'm going to use this concat function of one does module.

160
00:11:32,290 --> 00:11:35,580
And here very first, I have to mention my categorical.

161
00:11:35,950 --> 00:11:38,410
The second one is exactly my airline.

162
00:11:38,410 --> 00:11:41,380
The third one is exactly my source.

163
00:11:41,550 --> 00:11:45,990
The fourth one is exactly my destination, not text top.

164
00:11:46,000 --> 00:11:48,280
Is it exactly my destination.

165
00:11:48,610 --> 00:11:59,260
And here the fifth one is exactly my Kreen underscore data and here you have to pass that list, which

166
00:11:59,260 --> 00:12:06,640
is exactly created by you, which is continuous in this column, which contains all the continuous data.

167
00:12:06,940 --> 00:12:13,780
And after it, what you have to do, you have the same access parameter as one because you have to concatenate

168
00:12:13,930 --> 00:12:15,810
in a vertical question.

169
00:12:16,210 --> 00:12:18,490
So just assign a small Brisas order.

170
00:12:18,550 --> 00:12:23,340
I'm going to store it all this stuff collecting data on the school train.

171
00:12:23,470 --> 00:12:25,330
So just execute all this thing.

172
00:12:25,360 --> 00:12:32,560
And if I'm going to call, let's say had on this data, you'll see this is that data frame that you

173
00:12:32,740 --> 00:12:33,460
actually need.

174
00:12:33,580 --> 00:12:41,290
But you will notice over here some of the column you can see over here in this data frame as you have

175
00:12:41,680 --> 00:12:42,520
that over here.

176
00:12:42,520 --> 00:12:49,480
But you will think a new issue here that is still you have this Erlend column, the source column,

177
00:12:49,480 --> 00:12:50,570
the destination column.

178
00:12:50,860 --> 00:12:55,480
So what I'm going to do, I'm simply going to drop all these features.

179
00:12:55,690 --> 00:12:59,260
So what I'm going to do, I'm going to do this job and column.

180
00:12:59,260 --> 00:13:02,800
And here I have to say my data for a minute and it's kallikrein.

181
00:13:02,980 --> 00:13:07,420
And my column name is nothing but my airline column.

182
00:13:07,420 --> 00:13:11,170
And I have to just copy and just paste.

183
00:13:11,170 --> 00:13:12,100
Just paste.

184
00:13:12,370 --> 00:13:17,590
This time I'm going to say I have to remove my source column as well.

185
00:13:17,590 --> 00:13:25,470
And this time I'm going to say this time I have to remove my destination feature just executed all this

186
00:13:25,480 --> 00:13:32,340
dots get executed over here and is I'm going to execute my head over there.

187
00:13:32,620 --> 00:13:35,680
You will see all of this stuff gets executed.

188
00:13:35,680 --> 00:13:40,080
Oria, let's say you have to visualize you have to see all the thirty five columns.

189
00:13:40,360 --> 00:13:46,380
So in such case, you have to extend its limit, the limit of Bonnar.

190
00:13:46,570 --> 00:13:53,170
So for this, what I'm going to do, I'm going to call a set on this call option and here you guys can

191
00:13:53,170 --> 00:14:00,640
set or display dot max on the score calls and what you have a display you have on display.

192
00:14:00,940 --> 00:14:08,620
You're thirty five columns and after eight, I am just going to call ahead on my data and a Scottrade.

193
00:14:08,710 --> 00:14:17,050
So just execute it and now we will see all over my column means are exactly these ones.

194
00:14:17,200 --> 00:14:18,780
That's all about the session.

195
00:14:18,790 --> 00:14:21,640
Hopefully you love the session very much so thank you guys.

196
00:14:21,670 --> 00:14:22,640
Have a nice day.

197
00:14:22,810 --> 00:14:23,770
Keep learning.

198
00:14:23,770 --> 00:14:24,670
Keep growing.

199
00:14:25,000 --> 00:14:26,020
Keep practicing.