1 00:00:00,520 --> 00:00:07,650 All before going deep dive into this session, let's have a quick recap of what we have done in all 2 00:00:07,690 --> 00:00:08,330 our sessions. 3 00:00:08,940 --> 00:00:13,920 So doing lots of analysis, doing lots of techniques of feature engineering as well. 4 00:00:13,930 --> 00:00:16,850 And we have found some meaningful insight as well. 5 00:00:17,130 --> 00:00:23,640 And for machine learning aspect, what we have done, we have performed this correlation, doing feature 6 00:00:23,640 --> 00:00:28,270 encoding techniques on our data and how to handle our as well. 7 00:00:28,290 --> 00:00:34,700 What we have done in the previous session by just defining dysfunction in which we are seeing just apply 8 00:00:34,710 --> 00:00:35,850 log on the data. 9 00:00:35,860 --> 00:00:39,260 That's what we have done in our previous sessions in the session. 10 00:00:39,270 --> 00:00:45,720 What we have to do this is exactly this assignment for this particular session in which I have to apply 11 00:00:46,020 --> 00:00:55,080 techniques of feature importance on our data to select what are exactly my most important features to 12 00:00:55,080 --> 00:00:57,110 build a machine learning model. 13 00:00:57,240 --> 00:01:01,300 Because if you will observe here, you have plenty of features. 14 00:01:01,440 --> 00:01:02,400 Look at that here. 15 00:01:02,400 --> 00:01:03,720 You have tons of features. 16 00:01:04,230 --> 00:01:11,430 So from this end number of features, you have to select some subset of features that are going to contribute 17 00:01:11,430 --> 00:01:13,430 more to your machine learning model. 18 00:01:13,830 --> 00:01:17,530 That's what exactly my feature important technique will do. 19 00:01:18,000 --> 00:01:21,840 So first, let me check if I have any value in my data frame or not. 20 00:01:21,960 --> 00:01:24,320 I'm just going to call this is null. 21 00:01:24,570 --> 00:01:30,870 And let me just call to some to get a summation of all the missing values and you will figure out here 22 00:01:30,870 --> 00:01:33,730 I have the single missing value so you can just drop it. 23 00:01:33,750 --> 00:01:34,320 That's it. 24 00:01:34,560 --> 00:01:41,430 So I'm going to say data frame don't drop any and just place in place equal to truth so that you can 25 00:01:41,430 --> 00:01:44,160 obliterate every that just executed. 26 00:01:44,160 --> 00:01:51,480 And if now what you have to do now, you need your independent feature as well as your dependent features. 27 00:01:51,720 --> 00:01:56,420 So let me just define two variables as lie. 28 00:01:56,730 --> 00:01:58,860 So why is exactly your dependent feature? 29 00:01:58,870 --> 00:02:04,200 So I'm going to say data frame of is underscore cancel. 30 00:02:04,440 --> 00:02:11,220 So this is exactly your dependent feature that you have to predict with respect to different number 31 00:02:11,220 --> 00:02:12,360 of independent features. 32 00:02:12,780 --> 00:02:16,800 So now how you have to consider your independent features. 33 00:02:16,830 --> 00:02:19,340 So if let's say let me show you a thing. 34 00:02:19,350 --> 00:02:28,070 So if I'm going to say that a frame got drop and here if I'm going to set this is underscore cancel 35 00:02:28,320 --> 00:02:34,670 and if I'm not going to set any in place parameter, it means I'm not going to update the frame. 36 00:02:34,860 --> 00:02:37,060 And if I'm going to that, you want to sell. 37 00:02:37,080 --> 00:02:42,270 But before executing, you have to set one more parameter, which is exactly exis. 38 00:02:42,390 --> 00:02:44,700 It means how you have to drop this column. 39 00:02:44,970 --> 00:02:52,470 So if I'm going to execute it now, you will see here you have all your features except that this one, 40 00:02:52,860 --> 00:03:00,500 it means this is your data frame or you can say this is that set that exactly your independent feature. 41 00:03:00,510 --> 00:03:03,680 So I'm going to say let me just copy from here. 42 00:03:03,960 --> 00:03:04,870 Let me just did it. 43 00:03:05,130 --> 00:03:07,930 And in this X I have to just assigned it. 44 00:03:07,950 --> 00:03:08,590 That's it. 45 00:03:08,880 --> 00:03:10,830 Now just aggregate it. 46 00:03:10,830 --> 00:03:17,190 Now you have to perform your future importance concept on your data for this. 47 00:03:17,190 --> 00:03:23,540 You have to import some libraries that will definitely help you to do all these several task. 48 00:03:24,060 --> 00:03:31,800 So for this, I'm going to say from this cyclone module, I have something which is exactly my linear 49 00:03:31,800 --> 00:03:32,720 and it's called model. 50 00:03:32,730 --> 00:03:38,970 And from this I have to import my algorithm, which is exactly my Lessel, which is exactly this one. 51 00:03:39,540 --> 00:03:46,510 After what we have to do, let's say from this ascalon, I have something which is exactly my feature 52 00:03:46,510 --> 00:03:51,210 and it's called selection to select your most important features. 53 00:03:51,240 --> 00:03:55,620 This is exactly that some module that is going to help you a lot. 54 00:03:55,650 --> 00:04:01,110 So from this module, you have to import something known as select and just step. 55 00:04:01,110 --> 00:04:09,100 And here this is a class that is going to help you a lot to perform this feature importance concept. 56 00:04:09,120 --> 00:04:14,040 So here what I have to do very first, I have to initialize this select from model. 57 00:04:14,040 --> 00:04:19,430 And here what I have to do, I have to specify my this last regression model. 58 00:04:19,650 --> 00:04:23,040 So here what I'm going to do, I'm just going to say initialize this. 59 00:04:23,220 --> 00:04:26,330 And here I have two very first, Manchanda Slessor. 60 00:04:26,700 --> 00:04:31,350 And if I will press schist crosstab over here, you will get all Erastus. 61 00:04:31,590 --> 00:04:37,380 What exactly is the alphabet amedure and what is your fitness going to set all these custom parameters 62 00:04:37,580 --> 00:04:37,770 here? 63 00:04:38,250 --> 00:04:42,240 So the main parameter that you have to take care of it with is exactly this. 64 00:04:42,240 --> 00:04:45,390 Elsah So what exactly this Alphie is. 65 00:04:45,390 --> 00:04:48,330 So it is equivalent to your penalty parameter. 66 00:04:48,390 --> 00:04:54,210 It means the bigger the value of the alpha, the less number of features will get selected. 67 00:04:54,360 --> 00:04:54,690 Let's see. 68 00:04:54,690 --> 00:04:56,550 I just need more number of features. 69 00:04:56,560 --> 00:04:59,550 It means I have to set some low value of. 70 00:04:59,910 --> 00:05:05,520 So here I'm going to say Alpha, let's say close to zero point zero zero five, let's say I'm going 71 00:05:05,520 --> 00:05:12,480 to set this value from my own experience, or if you want to customize this value, you can just something 72 00:05:12,480 --> 00:05:14,310 known as cross-validation. 73 00:05:14,490 --> 00:05:19,620 But here I am going to set this threshold value from my own experience, working on different, different 74 00:05:19,620 --> 00:05:21,690 project on different different domain. 75 00:05:21,990 --> 00:05:28,530 So after it, what I have said, I have to set something known as my random parameter and assign its 76 00:05:28,620 --> 00:05:32,710 value as zero after doing all this stuff. 77 00:05:33,000 --> 00:05:41,070 What I have to do then I have to use this select from model, which is exactly inside my cyclotron. 78 00:05:41,070 --> 00:05:47,530 And this tells how this will select the features whose coefficients are non-zero. 79 00:05:47,550 --> 00:05:50,920 That's what that's what the select phone model will do. 80 00:05:51,180 --> 00:05:54,740 So here I am going to say it's object, like I'm going to store it somewhere else. 81 00:05:54,780 --> 00:06:00,900 Let's say feature on this code selection, underscore model. 82 00:06:00,910 --> 00:06:03,390 Let's say this is exactly its object. 83 00:06:03,600 --> 00:06:05,670 After it, I have to just execute it. 84 00:06:05,670 --> 00:06:10,030 Now, what we have to do, we have to basically fit my data to its object. 85 00:06:10,320 --> 00:06:12,660 So for this, I'm just going to say dot fit. 86 00:06:13,020 --> 00:06:20,870 And here I have to mention this Y and X, so I'm going to say X, comma Y, you just execute this and 87 00:06:20,880 --> 00:06:28,190 it will take a while and all the stuff in front of you, you will see your last model has been executed. 88 00:06:28,380 --> 00:06:34,590 And these are all the custom parameters that you have defined and these are all the things that you 89 00:06:34,590 --> 00:06:35,180 have done. 90 00:06:35,820 --> 00:06:37,350 So if you want to check. 91 00:06:37,470 --> 00:06:37,890 Yeah. 92 00:06:37,920 --> 00:06:42,600 What are the columns that are selected by these select from model class. 93 00:06:42,600 --> 00:06:50,550 So for this I'm just going to say feature on this selection model dot get on underscore support, which 94 00:06:50,550 --> 00:06:53,540 is exactly this one, and just execute this. 95 00:06:53,770 --> 00:07:00,670 You will see all your columns in the form of a list where you have values as central. 96 00:07:01,050 --> 00:07:07,020 So whenever there is a false, it means that model or you can say that feature selection technique is 97 00:07:07,020 --> 00:07:07,850 going to recommend. 98 00:07:09,040 --> 00:07:14,650 This feature isn't contributing more to my machine learning model and true means here. 99 00:07:14,880 --> 00:07:18,830 You have to consider this feature for your machine learning model. 100 00:07:19,290 --> 00:07:25,920 So for this, let's say I have to visualize here which feature gets elected and which feature doesn't 101 00:07:25,920 --> 00:07:26,730 get selected. 102 00:07:27,000 --> 00:07:31,740 So let's say I have to create a data frame or let's say you want to print it. 103 00:07:31,740 --> 00:07:34,320 Let's say whatever you want to have to print it, I have to print. 104 00:07:34,620 --> 00:07:39,650 So let's say if I'm going to call my expert columns, these are my all the columns. 105 00:07:39,660 --> 00:07:40,830 Let me store it somewhere else. 106 00:07:40,830 --> 00:07:44,200 Let's say these are exactly my all the columns over here. 107 00:07:44,490 --> 00:07:47,220 Now, what I have to do is I have to print something. 108 00:07:47,460 --> 00:07:51,360 So I'm going to say let's say I have to print what I have to print. 109 00:07:51,360 --> 00:07:56,970 I have to something like, say, my total features features. 110 00:07:56,970 --> 00:07:59,850 Then I have to add something known as a placeholder. 111 00:07:59,850 --> 00:08:05,550 So here I'm going to add my placeholder and this placeholder will receive value from my form and function. 112 00:08:05,550 --> 00:08:11,790 So I'm going to say a dot format and after it, what I have to do, I'm going to say X dot shape. 113 00:08:12,000 --> 00:08:14,190 And let's start with the number of column columns. 114 00:08:14,190 --> 00:08:19,260 So just pass one over there and in a similar way what you guys can do. 115 00:08:19,260 --> 00:08:25,710 Let me just copy this and let me just paste or did this time let's I have to print my select features. 116 00:08:25,740 --> 00:08:31,920 I'm going to say it is nothing but let's say my selected features and here what I have to pass. 117 00:08:32,160 --> 00:08:34,620 Let me just store it somewhere else. 118 00:08:34,620 --> 00:08:42,060 Let's say selected features are nothing, but let's say I have to class basically this filter in my 119 00:08:42,240 --> 00:08:42,810 column. 120 00:08:42,810 --> 00:08:48,240 So here I'm going to say I have to pass this filter in my column that said, if I'm going to execute 121 00:08:48,240 --> 00:08:53,610 it, you will get all your columns that have been selected from your select key model technique. 122 00:08:53,880 --> 00:08:54,770 So here you are. 123 00:08:54,780 --> 00:09:01,380 I have to do to store it somewhere else, I'm going to say selected on this score feature. 124 00:09:01,680 --> 00:09:03,780 So just execute it now. 125 00:09:03,780 --> 00:09:09,320 Here, what I have to do here, I am going to say I had to simply print it slant. 126 00:09:09,330 --> 00:09:14,400 I'm going to say basically land of whatever list I have defined. 127 00:09:14,410 --> 00:09:21,360 What and if I'm going to execute the salvator, you will see you have total features as trunked, but 128 00:09:21,360 --> 00:09:24,480 your model selects all your 14 features. 129 00:09:24,690 --> 00:09:30,710 So these are exactly your 14 features that you have to consider for the machine learning aspect. 130 00:09:31,140 --> 00:09:33,920 So let me just print what are exactly those features. 131 00:09:34,260 --> 00:09:36,990 So these are exactly all your features. 132 00:09:36,990 --> 00:09:40,290 It means you just need this feature to build a machine learning model. 133 00:09:40,290 --> 00:09:43,920 It means you have to consider those features as your independent features. 134 00:09:44,220 --> 00:09:48,660 So let me update my X, which is exactly my independent data frame. 135 00:09:48,660 --> 00:09:55,140 So I'm going to say if I'm going to pass this selected underscore feature in my X, so if I'm going 136 00:09:55,140 --> 00:09:58,590 to execute it now, you will see here you have this fourteen column. 137 00:09:58,680 --> 00:09:59,400 It mean this is. 138 00:09:59,470 --> 00:10:06,970 Your independent features, so let me just assign it an X that sit, so now you're dependent on data 139 00:10:06,970 --> 00:10:09,520 and independent data is almost ready. 140 00:10:09,520 --> 00:10:14,650 And in all of your upcoming sessions, we are going to build our model and we are also going to check 141 00:10:14,650 --> 00:10:21,520 what exactly the accuracy, how we can cross validate our model and all these things in a much more 142 00:10:21,520 --> 00:10:21,850 depth. 143 00:10:21,880 --> 00:10:24,390 So that's all about this session of the session. 144 00:10:24,430 --> 00:10:24,970 Very much. 145 00:10:25,270 --> 00:10:25,900 Thank you. 146 00:10:26,110 --> 00:10:27,040 Have a nice day. 147 00:10:27,340 --> 00:10:28,180 Keep learning. 148 00:10:28,180 --> 00:10:28,990 Keep growing. 149 00:10:29,110 --> 00:10:29,740 Keep practicing.