1 00:00:00,150 --> 00:00:06,600 Halaal So before going deep dive into this session, let's have a quick recap of what we have done in 2 00:00:06,600 --> 00:00:09,210 our previous session from the previous session. 3 00:00:09,240 --> 00:00:15,150 We have basically performed one hot encoding on airlines destination as well as on source. 4 00:00:15,660 --> 00:00:19,830 Still, we have lots of column that we still have to deal with that. 5 00:00:20,130 --> 00:00:25,300 So basically those are route totally stops and additional info. 6 00:00:25,560 --> 00:00:32,520 So in this column, so in this session, we have to deal with this basically this route column. 7 00:00:32,530 --> 00:00:36,370 So you will see here, this is exactly separator. 8 00:00:36,390 --> 00:00:44,910 So if you will separate this data on the basis of this separator, MARCOSSON, that you guys get off 9 00:00:44,910 --> 00:00:49,740 school, I have to separate it because this is your data and machine learning. 10 00:00:49,740 --> 00:00:51,180 Just don't understand. 11 00:00:51,450 --> 00:00:53,160 What is this route column for? 12 00:00:53,160 --> 00:00:55,440 You have to make machine learning to understand. 13 00:00:55,710 --> 00:00:57,840 Yeah, this is my route one. 14 00:00:57,840 --> 00:00:59,880 This is my route to this is my duty. 15 00:01:01,050 --> 00:01:02,180 What is my route. 16 00:01:02,190 --> 00:01:07,740 And so you have to make to learn to machine learning algorithm, whatever algorithm you are going to 17 00:01:07,740 --> 00:01:09,870 use in upcoming session. 18 00:01:10,200 --> 00:01:11,880 So here are what I am going to do. 19 00:01:12,060 --> 00:01:15,750 I'm just like saying I'm going to access very first what is monitoring. 20 00:01:15,900 --> 00:01:21,650 And on this, if I'm going to access my route, you will see where you have all these different different 21 00:01:21,690 --> 00:01:21,960 routes. 22 00:01:22,230 --> 00:01:30,040 So now basically I have to basically I have to split this on a basis of this apparatus. 23 00:01:30,090 --> 00:01:37,560 If I'm going to execute it, you will see it will exactly return a list which will get splitted basically 24 00:01:37,560 --> 00:01:40,400 on the basis of separate what you are going to pass. 25 00:01:40,710 --> 00:01:47,310 Let's say I have to access this very first route, BLR, which denotes my Bangalor. 26 00:01:47,550 --> 00:01:54,210 So if you have to access this, you are going to say start of zero, it's just executed. 27 00:01:54,210 --> 00:01:58,170 You will get your data of Route one. 28 00:01:58,380 --> 00:02:01,440 So what I'm going to do, I'm just going to copy. 29 00:02:01,470 --> 00:02:04,260 Let's I have to store it somewhere else before it. 30 00:02:04,590 --> 00:02:13,010 So I'm going to say categorical all let's say RWB underscore one that said this is my horse too. 31 00:02:13,020 --> 00:02:15,390 So I'm just going to store it after it. 32 00:02:15,450 --> 00:02:16,720 What do we have to do? 33 00:02:17,010 --> 00:02:26,280 I'm just going to copy this entire code and I have to pay for forward to Route three to four and still 34 00:02:26,280 --> 00:02:26,970 good flight. 35 00:02:27,390 --> 00:02:30,840 So just do a modification to three. 36 00:02:31,170 --> 00:02:32,800 Similarly four. 37 00:02:33,000 --> 00:02:40,230 Similarly, I have five over there and after it we have to exit as one. 38 00:02:40,620 --> 00:02:43,050 This is two here. 39 00:02:43,050 --> 00:02:45,780 I have to exit data of three here. 40 00:02:45,780 --> 00:02:50,280 I have to exit the data of fourth index, so just execute it. 41 00:02:50,280 --> 00:02:53,670 And if I'm going to call I had on my data frame. 42 00:02:54,090 --> 00:03:01,020 Now you will see over here these all the five rules have been added in your data frame. 43 00:03:01,320 --> 00:03:05,510 So now this route column makes no sense at all. 44 00:03:05,970 --> 00:03:06,450 So far. 45 00:03:06,450 --> 00:03:07,920 This what you can do. 46 00:03:07,950 --> 00:03:11,430 You guys can simply delay this for four days. 47 00:03:11,430 --> 00:03:17,820 I'm going to use this to our columns here very first to mention what exactly my name is, Astrid. 48 00:03:17,850 --> 00:03:21,270 I have to mention what is my future name. 49 00:03:21,360 --> 00:03:22,810 So just executed. 50 00:03:22,830 --> 00:03:27,480 It will remove this route column from your data frame. 51 00:03:27,930 --> 00:03:32,850 So after it, let's say I'm going to check whether I have any null values or not. 52 00:03:33,210 --> 00:03:39,500 So for this, I'm going to call this null or somewhere that you will see this feature has that this 53 00:03:39,510 --> 00:03:46,320 feature has that much and this feature has that much support is what I'm going to do wherever I have 54 00:03:46,370 --> 00:03:49,660 any missing value, just replace it with none. 55 00:03:49,830 --> 00:03:56,940 So basically, I'm going to trade on for I in, let's say very first, if I'm going to print whatever 56 00:03:56,940 --> 00:03:59,850 my column names, who will see I have to. 57 00:03:59,850 --> 00:04:06,690 I trade on Route three or four, Route five, because these are the column that exactly contain missing 58 00:04:06,690 --> 00:04:07,140 value. 59 00:04:07,470 --> 00:04:14,790 So now I have to say, wherever I have this column, I'm going to trade on each and every column and 60 00:04:14,790 --> 00:04:17,040 I'm going to say categorical. 61 00:04:17,790 --> 00:04:24,740 I don't Filani and how I have to fill I have to fill it with none. 62 00:04:24,960 --> 00:04:31,440 And after it I'm going to use my in place parameter and here I have to pass through because I have to 63 00:04:31,800 --> 00:04:33,820 update my data stream as well. 64 00:04:34,080 --> 00:04:40,760 So if I'm going to execute all this stuff gets executed and if I say I'm going to cross-check. 65 00:04:40,770 --> 00:04:43,740 So again you can call this is not dorsum. 66 00:04:43,740 --> 00:04:50,540 And this time you will also you don't have any missing value in data because you have gold, you know, 67 00:04:50,630 --> 00:04:52,570 very smartly, Oria. 68 00:04:52,890 --> 00:04:59,670 And if you are that much not compatible with this code, you can do separately for Routley three then 69 00:04:59,670 --> 00:04:59,880 for. 70 00:04:59,940 --> 00:05:01,120 Good for them, for all. 71 00:05:01,710 --> 00:05:07,330 But if you are going to work from real world today, you have to write a code in such a way. 72 00:05:07,560 --> 00:05:10,320 That's what I'm going to show you in that much it. 73 00:05:10,770 --> 00:05:13,140 So after it, what do you guys have to do? 74 00:05:13,320 --> 00:05:14,810 Let's say I have to print. 75 00:05:14,820 --> 00:05:15,220 Yeah. 76 00:05:15,570 --> 00:05:18,990 Each and every feature has how many categories. 77 00:05:19,260 --> 00:05:27,630 So what I'm going to do so far I in categorical columns and after what I have to print, I have to print. 78 00:05:27,810 --> 00:05:28,110 Yeah. 79 00:05:28,170 --> 00:05:30,960 How many categories in each and every feature. 80 00:05:31,350 --> 00:05:35,070 I have to write all these things in my print statement. 81 00:05:35,110 --> 00:05:41,520 So here I going to say I have to add a placeholder and whatever value I will see placeholder in this 82 00:05:41,520 --> 00:05:46,320 placeholder, I'm simply going to receive it via my format function. 83 00:05:46,330 --> 00:05:56,670 So I'm going to say a placeholder has to total that much categories and I have to simply receive these 84 00:05:56,670 --> 00:06:00,240 values via my format function. 85 00:06:00,240 --> 00:06:04,380 And in this format, I have to say basically this. 86 00:06:04,680 --> 00:06:15,810 I will be over for a placeholder and the second placeholder will get replaced by categorical of I dot 87 00:06:16,200 --> 00:06:21,090 value counts because you have to count in each and every feature. 88 00:06:21,090 --> 00:06:25,560 And on this you can call whatever will be its length. 89 00:06:25,950 --> 00:06:34,530 So if I'm going to execute it now, you will see over here your line has that much feature on each and 90 00:06:34,530 --> 00:06:39,350 every feature has that much feature and you will observe over a thing over here. 91 00:06:39,810 --> 00:06:43,650 Ruud has basically this much better rule to do. 92 00:06:43,650 --> 00:06:46,770 Three and four has a higher number of features. 93 00:06:46,770 --> 00:06:53,400 And if you are going to use one hot encoding over here, that will create a more number of columns and 94 00:06:53,610 --> 00:06:59,220 that will definitely create some problem for the algorithm that you are going to use because your data 95 00:06:59,220 --> 00:07:00,100 becomes huge. 96 00:07:00,390 --> 00:07:03,240 So that's the issue when you are going to use one not encoding. 97 00:07:03,600 --> 00:07:11,430 So to get rid of that issue, to get rid of this high dimensionality issue, we are going to use lable 98 00:07:11,430 --> 00:07:12,030 encoder. 99 00:07:12,390 --> 00:07:14,350 So very first you have to import that class. 100 00:07:14,370 --> 00:07:21,930 So for this, I'm going to say from Escalon, we have to use our proposing module in that I have my 101 00:07:22,110 --> 00:07:26,160 label in order to just execute it. 102 00:07:26,160 --> 00:07:31,920 And after it, we have to do I have to simply initialize this level encoder. 103 00:07:32,310 --> 00:07:37,500 So using this encoder, I have to encode my all the column. 104 00:07:37,620 --> 00:07:44,010 So what I'm going to do, let's say very first, I'm going to print all the column names, then after 105 00:07:44,010 --> 00:07:48,520 it, what I have to do, let's say what I'm going to do and we do. 106 00:07:48,570 --> 00:08:00,000 I trade on each and every feature so far, I, I, I'm, I treat as for I and then I have to encode 107 00:08:00,000 --> 00:08:00,990 it for this. 108 00:08:00,990 --> 00:08:02,520 I have to use this in order. 109 00:08:02,700 --> 00:08:07,600 And in this I have a function within the transform and I have to transform it. 110 00:08:08,040 --> 00:08:11,640 So what I have I have to transform each and every I. 111 00:08:12,150 --> 00:08:14,010 So here I'm going to say categorical. 112 00:08:14,180 --> 00:08:16,840 I similarly I have to update it as well. 113 00:08:17,190 --> 00:08:20,240 So what I have to update, I have to update each and every item. 114 00:08:20,370 --> 00:08:24,480 So I'm going to say this, this, this and just executed after it. 115 00:08:24,750 --> 00:08:33,000 If I'm going to go head on my data now, we will see over here all your root feature gets converted 116 00:08:33,000 --> 00:08:34,250 into Intisar format. 117 00:08:34,530 --> 00:08:39,780 That's what we exactly need because my machine learning only understand. 118 00:08:39,780 --> 00:08:40,320 Yeah. 119 00:08:40,770 --> 00:08:42,030 What is the meaning of this? 120 00:08:42,270 --> 00:08:44,010 A value and machine learning. 121 00:08:44,010 --> 00:08:45,910 Now understand my Thaxted. 122 00:08:46,140 --> 00:08:52,350 That's why we are going to deal with this feature encoding and now we have to deal with this basically 123 00:08:52,350 --> 00:08:55,860 this tortellinis because as well as this additional info. 124 00:08:56,340 --> 00:09:01,860 But we will know this over here in most of the rules, we have this new information provided. 125 00:09:02,130 --> 00:09:05,490 So basically we can drop this column. 126 00:09:05,520 --> 00:09:08,790 So for this, I'm going to say just drop this column in here. 127 00:09:09,090 --> 00:09:09,810 Very first. 128 00:09:09,810 --> 00:09:13,710 I have to mention what exactly is after it. 129 00:09:13,710 --> 00:09:20,620 I have to mention this feature name, which is exactly additional underscore force, infosec just executed 130 00:09:20,940 --> 00:09:21,570 after it. 131 00:09:21,660 --> 00:09:27,450 We have to deal with this total on this code is tops feature for this. 132 00:09:27,450 --> 00:09:32,370 I have to access categorical and here I have to access this, this, this and on this. 133 00:09:32,370 --> 00:09:41,760 If I'm going to call uni now, you will see each and every unique item available in this total on this 134 00:09:42,480 --> 00:09:46,500 feature you will see stop to and all these things. 135 00:09:46,740 --> 00:09:50,730 So in machine learning that we're able to understand what is the meaning of this nice talk to what we 136 00:09:51,000 --> 00:09:51,510 can do. 137 00:09:52,020 --> 00:09:54,720 We can replace this stop with zero. 138 00:09:54,750 --> 00:09:59,700 This to start with two one, start with one, three, three and four to stop. 139 00:10:00,550 --> 00:10:08,080 So for this, what I am going to do, instead of using cyclone labor in a glass, we are going to use 140 00:10:08,080 --> 00:10:09,520 some own approach. 141 00:10:09,520 --> 00:10:11,530 We are going to use a custom approach. 142 00:10:11,990 --> 00:10:17,770 Basically, we are going to define a dictionary over there and in this dictionary, wherever, I'm going 143 00:10:17,770 --> 00:10:24,210 to say whatever I have noticed or just replace it with zero wherever I have to stop, replace it with 144 00:10:24,220 --> 00:10:24,630 two. 145 00:10:25,030 --> 00:10:32,490 One is to replace it with one tree to replace it with three four to stop, replace it with four. 146 00:10:32,500 --> 00:10:33,160 That's it. 147 00:10:33,340 --> 00:10:40,130 And after it, what I have to do, I have to map this dictionary to my daughter on is stops. 148 00:10:40,170 --> 00:10:47,590 So this I have to use a map or did and what I have to map, I have to map this dictionary after it. 149 00:10:47,920 --> 00:10:51,340 I have to update this feature as well. 150 00:10:51,770 --> 00:10:55,600 So I'm going to say categorical of total stuff to just execute it. 151 00:10:55,600 --> 00:11:02,590 And if I'm going to call, I had on my date of and you will see all your future guests convert into 152 00:11:02,590 --> 00:11:04,140 some Intisar format. 153 00:11:04,450 --> 00:11:06,570 That's what I actually want. 154 00:11:06,580 --> 00:11:13,110 Now, what you have to do, you have to simply concat this categorical data frame and all the previous 155 00:11:13,120 --> 00:11:13,810 data from that. 156 00:11:13,810 --> 00:11:20,500 You have defined that airlines data frame and this is your source, do the same and this is your airline 157 00:11:20,510 --> 00:11:23,440 little frame as well as you have some destination it affects. 158 00:11:23,680 --> 00:11:27,790 So you have to concatenate all these data in four days. 159 00:11:27,790 --> 00:11:32,290 I'm going to use this concat function of one does module. 160 00:11:32,290 --> 00:11:35,580 And here very first, I have to mention my categorical. 161 00:11:35,950 --> 00:11:38,410 The second one is exactly my airline. 162 00:11:38,410 --> 00:11:41,380 The third one is exactly my source. 163 00:11:41,550 --> 00:11:45,990 The fourth one is exactly my destination, not text top. 164 00:11:46,000 --> 00:11:48,280 Is it exactly my destination. 165 00:11:48,610 --> 00:11:59,260 And here the fifth one is exactly my Kreen underscore data and here you have to pass that list, which 166 00:11:59,260 --> 00:12:06,640 is exactly created by you, which is continuous in this column, which contains all the continuous data. 167 00:12:06,940 --> 00:12:13,780 And after it, what you have to do, you have the same access parameter as one because you have to concatenate 168 00:12:13,930 --> 00:12:15,810 in a vertical question. 169 00:12:16,210 --> 00:12:18,490 So just assign a small Brisas order. 170 00:12:18,550 --> 00:12:23,340 I'm going to store it all this stuff collecting data on the school train. 171 00:12:23,470 --> 00:12:25,330 So just execute all this thing. 172 00:12:25,360 --> 00:12:32,560 And if I'm going to call, let's say had on this data, you'll see this is that data frame that you 173 00:12:32,740 --> 00:12:33,460 actually need. 174 00:12:33,580 --> 00:12:41,290 But you will notice over here some of the column you can see over here in this data frame as you have 175 00:12:41,680 --> 00:12:42,520 that over here. 176 00:12:42,520 --> 00:12:49,480 But you will think a new issue here that is still you have this Erlend column, the source column, 177 00:12:49,480 --> 00:12:50,570 the destination column. 178 00:12:50,860 --> 00:12:55,480 So what I'm going to do, I'm simply going to drop all these features. 179 00:12:55,690 --> 00:12:59,260 So what I'm going to do, I'm going to do this job and column. 180 00:12:59,260 --> 00:13:02,800 And here I have to say my data for a minute and it's kallikrein. 181 00:13:02,980 --> 00:13:07,420 And my column name is nothing but my airline column. 182 00:13:07,420 --> 00:13:11,170 And I have to just copy and just paste. 183 00:13:11,170 --> 00:13:12,100 Just paste. 184 00:13:12,370 --> 00:13:17,590 This time I'm going to say I have to remove my source column as well. 185 00:13:17,590 --> 00:13:25,470 And this time I'm going to say this time I have to remove my destination feature just executed all this 186 00:13:25,480 --> 00:13:32,340 dots get executed over here and is I'm going to execute my head over there. 187 00:13:32,620 --> 00:13:35,680 You will see all of this stuff gets executed. 188 00:13:35,680 --> 00:13:40,080 Oria, let's say you have to visualize you have to see all the thirty five columns. 189 00:13:40,360 --> 00:13:46,380 So in such case, you have to extend its limit, the limit of Bonnar. 190 00:13:46,570 --> 00:13:53,170 So for this, what I'm going to do, I'm going to call a set on this call option and here you guys can 191 00:13:53,170 --> 00:14:00,640 set or display dot max on the score calls and what you have a display you have on display. 192 00:14:00,940 --> 00:14:08,620 You're thirty five columns and after eight, I am just going to call ahead on my data and a Scottrade. 193 00:14:08,710 --> 00:14:17,050 So just execute it and now we will see all over my column means are exactly these ones. 194 00:14:17,200 --> 00:14:18,780 That's all about the session. 195 00:14:18,790 --> 00:14:21,640 Hopefully you love the session very much so thank you guys. 196 00:14:21,670 --> 00:14:22,640 Have a nice day. 197 00:14:22,810 --> 00:14:23,770 Keep learning. 198 00:14:23,770 --> 00:14:24,670 Keep growing. 199 00:14:25,000 --> 00:14:26,020 Keep practicing.