1 00:00:00,300 --> 00:00:03,160 Long before going deep dive into a recession. 2 00:00:03,360 --> 00:00:08,100 Let's have a walk through on what we all have done in all our previous session. 3 00:00:08,310 --> 00:00:14,160 So found data and particularly the analysis people from lots of data, people sitting on data to make 4 00:00:14,310 --> 00:00:18,030 our data ready for analysis as well as for the modeling purpose. 5 00:00:18,300 --> 00:00:21,500 Then we perform lots of techniques of feature encoding. 6 00:00:21,960 --> 00:00:29,700 Then we have a deal with our outliers as well, using our distribution plot and Berthelot approach. 7 00:00:29,700 --> 00:00:35,000 After what we have to do, we have to separate our independent feature as well as dependent feature. 8 00:00:35,220 --> 00:00:36,750 So intersession what we have to do. 9 00:00:36,750 --> 00:00:40,180 We have to apply feature selection on data. 10 00:00:40,290 --> 00:00:45,030 So what exactly is data selection and why is there the need of the selection? 11 00:00:45,300 --> 00:00:53,460 So feature selection is all about when and where you have to find your best feature, which will contribute 12 00:00:53,670 --> 00:00:58,410 most, and that has a good relationship with the target variable. 13 00:00:58,530 --> 00:00:59,250 So that's it. 14 00:00:59,250 --> 00:01:06,390 That's all about my feature selection and why to apply feature selection, simple to select important 15 00:01:06,390 --> 00:01:12,390 features so that you don't have an issue of multiple dimensions, so that you don't have an issue of 16 00:01:12,540 --> 00:01:13,740 multiple column. 17 00:01:13,950 --> 00:01:18,150 That's why you have to perform several techniques of feature selection. 18 00:01:18,330 --> 00:01:23,670 So basically we are going to select important feature using a concept of information. 19 00:01:24,120 --> 00:01:30,900 So for this, I'm going to say from my cycle on module, you have to use a feature selection, some 20 00:01:30,900 --> 00:01:31,350 module. 21 00:01:31,620 --> 00:01:38,360 And from this I'm going to say I have to import my major underscored info on a school closet. 22 00:01:38,640 --> 00:01:40,020 So just execute it. 23 00:01:40,020 --> 00:01:42,330 And now I'm just going to initialize this. 24 00:01:42,760 --> 00:01:44,570 So I'm going to say this. 25 00:01:44,580 --> 00:01:50,800 And here you have to parse what exactly is your independent data and what is the dependent data. 26 00:01:51,090 --> 00:01:55,350 So my independent data and my friend just executed. 27 00:01:55,350 --> 00:01:57,720 It will take a couple of seconds as well. 28 00:01:57,870 --> 00:02:05,190 And it will exactly return you some kind of importance, some kind of priority with respect to your 29 00:02:05,190 --> 00:02:06,180 target variable. 30 00:02:06,300 --> 00:02:11,100 Now, you will see this is exactly the importance of with respect to your target value. 31 00:02:11,100 --> 00:02:13,410 But let's say I have to thirds here. 32 00:02:13,440 --> 00:02:15,480 I have to create a data frame. 33 00:02:15,480 --> 00:02:22,740 And in that data frame, I have met all the columns with respect to its priority, with respect to its 34 00:02:22,740 --> 00:02:23,350 importance. 35 00:02:23,460 --> 00:02:25,200 So far, this is what I'm going to do. 36 00:02:25,200 --> 00:02:27,450 I'm just going to create a data stream over here. 37 00:02:27,630 --> 00:02:31,960 So here I'm going to say Beadie Dot data frame. 38 00:02:31,980 --> 00:02:40,230 And here very first, I have to use this one because it will exactly return me my importance. 39 00:02:40,380 --> 00:02:47,580 And on this index, I have to say I have to just assign my X dot columns. 40 00:02:47,790 --> 00:02:50,100 Let's say I have to store it somewhere else. 41 00:02:50,140 --> 00:02:54,570 I'm going to say this is nothing but my important during and after. 42 00:02:54,750 --> 00:02:56,530 I have to print it as well. 43 00:02:56,850 --> 00:02:58,540 So just execute it. 44 00:02:58,560 --> 00:03:00,360 So all of this stuff gets executed. 45 00:03:00,370 --> 00:03:04,500 Let's say I have to do some monopolises in this data frame. 46 00:03:04,860 --> 00:03:07,100 Let's say I have to rename this column. 47 00:03:07,410 --> 00:03:14,310 So what I'm going to do, I'm going to say I am Bidart columns equals to actually I have to assign my 48 00:03:14,310 --> 00:03:15,090 own column name. 49 00:03:15,090 --> 00:03:17,700 So I'm going to say say this important. 50 00:03:17,700 --> 00:03:23,490 And after eight, let's say I have to sort this data frame, I have to short this data frame on the 51 00:03:23,490 --> 00:03:25,320 basis of its importance. 52 00:03:25,590 --> 00:03:33,300 So what I'm going to do for this, I'm going to say this ampe dot short underscore values and how you 53 00:03:33,300 --> 00:03:34,230 have to short it. 54 00:03:34,530 --> 00:03:37,770 So if you will shift the stab, you will see over here. 55 00:03:38,070 --> 00:03:43,420 Here you have a barometer which is by it means on what column basis you have to short it. 56 00:03:43,740 --> 00:03:51,140 So here I am going to say because to importance what you have defined earlier. 57 00:03:51,450 --> 00:03:55,680 So because the importance and you have to also mention. 58 00:03:55,860 --> 00:04:00,270 Yeah, I just want descending order of importance. 59 00:04:00,310 --> 00:04:04,110 For this you have to set Ascender equals two fold. 60 00:04:04,440 --> 00:04:11,440 So just execute it and you will see this beautiful little frame that gets shorted on the basis of importance. 61 00:04:11,610 --> 00:04:14,090 So you will easily get to know over here. 62 00:04:14,250 --> 00:04:20,640 Yeah, route to column, this three column, this total columns are basically top three column that 63 00:04:20,640 --> 00:04:29,780 will contribute most to my target variable, whereas this Jet Airways business is that column that contribute 64 00:04:29,790 --> 00:04:31,290 least to my data. 65 00:04:31,800 --> 00:04:34,610 So you can simply skip this column if you really want. 66 00:04:34,800 --> 00:04:40,540 After having conversation with your domain expertise, you guys can simply skip this column. 67 00:04:40,770 --> 00:04:45,810 Similarly, you have all the importances with respect to each and every column of data. 68 00:04:46,260 --> 00:04:48,120 That's what I'm trying to show you. 69 00:04:48,420 --> 00:04:49,820 So that's all about the decision. 70 00:04:49,860 --> 00:04:51,750 Hopefully you will love this session very much. 71 00:04:52,080 --> 00:04:52,940 So thank you, guys. 72 00:04:52,950 --> 00:04:53,860 Have a nice day. 73 00:04:53,880 --> 00:04:54,740 Keep learning. 74 00:04:54,750 --> 00:04:55,560 Keep growing. 75 00:04:55,830 --> 00:04:56,730 Keep practicing.