1 00:00:00,260 --> 00:00:01,520 All right all right all right. 2 00:00:01,590 --> 00:00:06,420 To finish off this prove a concept that we're working through of predicting whether or not someone has 3 00:00:06,420 --> 00:00:07,730 heart disease. 4 00:00:07,980 --> 00:00:12,810 We're going to fulfill the last requirement that our boss asked of us and that is feature importance 5 00:00:12,820 --> 00:00:18,200 so let's make a little heading here feature important and beautiful. 6 00:00:18,210 --> 00:00:23,790 Now what is feature important will feature importance let's just put a little note to ourselves feature 7 00:00:23,790 --> 00:00:36,520 importance is another way of asking which features contributed most to the outcomes of the model. 8 00:00:36,570 --> 00:00:37,730 You could also extend it. 9 00:00:37,860 --> 00:00:41,100 And how did they contribute. 10 00:00:41,100 --> 00:00:46,410 This is useful to know since we're trying to predict heart disease using a patient's medical characteristics 11 00:00:47,370 --> 00:00:49,560 which characteristics for a refresher. 12 00:00:49,560 --> 00:00:52,390 We've seen this a few times now but we'll refresh ourselves. 13 00:00:52,470 --> 00:01:00,510 We're using this data here so which one of these characteristics of age sex C.P. all these other ones 14 00:01:00,510 --> 00:01:03,490 cholesterol black eggs sang. 15 00:01:03,510 --> 00:01:07,800 And if you want to a breakdown of what they are we can revisit our data dictionary up the top. 16 00:01:07,920 --> 00:01:13,490 We're looking at here will feature important which one of these contributes. 17 00:01:13,570 --> 00:01:16,910 And how do they contribute or how much do they contribute to predicting the target. 18 00:01:17,620 --> 00:01:21,270 So how would you exactly find it out. 19 00:01:21,280 --> 00:01:34,270 Well finding feature importance is different for each machine learning model now as we saw before with 20 00:01:34,270 --> 00:01:37,060 how to tune height parameters of a certain model. 21 00:01:37,060 --> 00:01:42,580 What you might want to look at and since we're using logistic regression to calculate all of these metrics 22 00:01:43,170 --> 00:01:47,680 and logistic regression is the model that we found by grid search by tuning the hard parameters to get 23 00:01:47,680 --> 00:01:49,930 the best results so far. 24 00:01:49,930 --> 00:01:59,590 What you might look up is something like this how to find feature important using logistic regression 25 00:02:02,300 --> 00:02:09,050 if we go here I'll look into feature importance using logistic regression model based feature importance 26 00:02:09,530 --> 00:02:12,530 how to find the importance of features of a logistic regression model. 27 00:02:12,620 --> 00:02:16,670 If you went through this you could insert almost any model here. 28 00:02:16,670 --> 00:02:21,650 So random forest or cane and classifiers or something like that and you'd find a bunch of different 29 00:02:21,650 --> 00:02:22,710 methods. 30 00:02:22,730 --> 00:02:25,950 So what we're going to do again is pretend that we've done our research. 31 00:02:26,060 --> 00:02:29,210 Our boss is gone can you get the feature importance of that model we've gone. 32 00:02:29,210 --> 00:02:29,690 Hold on. 33 00:02:29,690 --> 00:02:32,020 I'm not entirely sure what feature importance is. 34 00:02:32,060 --> 00:02:37,880 So we've gone away for overnight or for a couple of hours done our research got our model figured out 35 00:02:37,910 --> 00:02:40,790 Okay this is how we find feature importance beautiful. 36 00:02:40,890 --> 00:02:42,050 So that's what we're going to do here. 37 00:02:42,050 --> 00:02:47,000 We've retained we've done this research but remember part of being a machine learning engineer part 38 00:02:47,000 --> 00:02:48,070 of being a data scientist. 39 00:02:48,080 --> 00:02:52,340 The most important thing is researching and experimenting. 40 00:02:52,340 --> 00:02:55,520 It's built into our framework that we're using experiments. 41 00:02:55,520 --> 00:02:59,780 That's what we're doing here where we're in the modeling phase really that we've created the heading 42 00:02:59,780 --> 00:03:01,950 for it but really we're experimenting. 43 00:03:02,330 --> 00:03:18,470 So let's figure it out let's find the feature important for now logistic regression model I'll put here 44 00:03:18,620 --> 00:03:27,110 a little note one line to find feature important is to search for so you can put it in here I'll put 45 00:03:27,110 --> 00:03:27,860 it in brackets 46 00:03:30,500 --> 00:03:34,220 model name feature importance. 47 00:03:34,280 --> 00:03:35,030 That's one way to do it. 48 00:03:35,030 --> 00:03:37,850 That's what we just saw up in this little search here. 49 00:03:37,850 --> 00:03:41,840 That's what you should try a few curious to have a fun feature importance with there but whichever model 50 00:03:41,840 --> 00:03:42,910 that you're using. 51 00:03:42,920 --> 00:03:51,760 So first of all we're going to fit an instance of logistic regression will create one with the best 52 00:03:51,760 --> 00:03:55,890 parameter GC log rig based programs. 53 00:03:55,960 --> 00:03:57,300 We'll see what that is. 54 00:03:57,370 --> 00:04:02,310 Shifting into wonderful and then we'll go see a laugh equals logistic regression. 55 00:04:02,350 --> 00:04:08,740 We're going to posit a C value of this this long long long decimal. 56 00:04:08,740 --> 00:04:09,430 Here we go. 57 00:04:09,430 --> 00:04:18,160 And then we're gonna pass it here solve solver live linear beautiful so go to our CSF so we'll instantiate 58 00:04:18,160 --> 00:04:24,970 that that's our classifier and then we'll go see a left off it X train y train. 59 00:04:25,150 --> 00:04:32,220 Wonderful and now through our research we've found that there's an attribute to our fitted logistic 60 00:04:32,220 --> 00:04:35,850 regression model called co F which stands for coefficient. 61 00:04:36,480 --> 00:04:37,060 Okay. 62 00:04:37,200 --> 00:04:42,060 And the way I can remember this for logistic regression is coefficient kind of reminds me I may have 63 00:04:42,060 --> 00:04:44,490 correlation but let's see what happens 64 00:04:47,540 --> 00:04:52,280 remember we found this through our research we've looked up how to find the feature importance for a 65 00:04:52,280 --> 00:04:58,670 logistic regression model and it's told us if we're using psychic learn specifically logistic regression 66 00:04:58,790 --> 00:05:07,160 and if we fit the model we can call the CO f attribute which gives us the coefficient the value or how 67 00:05:07,160 --> 00:05:09,030 each parameter. 68 00:05:09,080 --> 00:05:10,400 Let's look at our dataset again 69 00:05:13,340 --> 00:05:16,760 how each of these independent variables. 70 00:05:16,760 --> 00:05:26,440 So the X train data set the coefficient contributes to our labels here or the target labels. 71 00:05:26,750 --> 00:05:32,360 So that was a bit of a mouthful but what we're gonna do is manipulate this coefficient array so that 72 00:05:32,360 --> 00:05:36,400 it looks like it makes sense because right now it's just a just a list of numbers. 73 00:05:36,410 --> 00:05:40,610 But if you can add them up they're actually going to be the same length as what's happening here. 74 00:05:40,610 --> 00:05:41,640 These Amana columns. 75 00:05:42,140 --> 00:05:43,970 But let's just twist them together. 76 00:05:44,000 --> 00:05:50,780 So what are we going to match the features to columns Coates of features two columns. 77 00:05:50,810 --> 00:05:52,540 So this is going to make sense. 78 00:05:52,580 --> 00:05:53,570 That's what I'm trying to get at. 79 00:05:53,570 --> 00:05:58,880 This will make it make sense dict so feature dict we're going to create a dictionary and we're gonna 80 00:05:58,880 --> 00:06:11,990 zip together def columns and we're going to list along with C a pro EF 0 and let's say 81 00:06:15,070 --> 00:06:19,060 featured in Boehm. 82 00:06:19,160 --> 00:06:21,510 Look at that how good is that. 83 00:06:22,000 --> 00:06:23,070 So if go here. 84 00:06:23,900 --> 00:06:28,870 We've met all of the different variables here to the right column. 85 00:06:28,890 --> 00:06:31,710 Let's remind ourselves of what's happening here. 86 00:06:31,760 --> 00:06:32,910 Go ahead. 87 00:06:33,070 --> 00:06:38,520 So all we've done is we've taken the call where for Ray right which is an attribute of our classifier 88 00:06:39,420 --> 00:06:44,630 and we've taken the columns from our data frame and we've mapped them to each other. 89 00:06:45,360 --> 00:06:50,440 So what this is telling us is how much and what why. 90 00:06:50,450 --> 00:06:52,590 See we've got some negative values here. 91 00:06:52,610 --> 00:06:57,770 Whether it's a negative or positive correlation how much each of these contribute to predicting the 92 00:06:57,770 --> 00:06:59,630 target variable. 93 00:06:59,630 --> 00:07:01,580 Well that's a bit of a mouthful then. 94 00:07:01,850 --> 00:07:02,540 Another way 95 00:07:05,180 --> 00:07:09,720 visualize phage are important actually. 96 00:07:09,750 --> 00:07:11,540 I shouldn't have told you that I should have told you. 97 00:07:11,540 --> 00:07:18,370 That was our little magic trick that I had magician never reveals his secrets. 98 00:07:18,600 --> 00:07:23,460 What we're doing is we're visualizing this so we can kind of get an idea of what's going on. 99 00:07:23,490 --> 00:07:25,350 So index it zero. 100 00:07:25,550 --> 00:07:31,440 Front line feature ADF don't transpose a lot of transposing plots here. 101 00:07:31,470 --> 00:07:37,840 But that's a long title Eagles final feature important. 102 00:07:38,020 --> 00:07:39,600 Go legend. 103 00:07:39,630 --> 00:07:43,000 No thank you well. 104 00:07:43,040 --> 00:07:44,690 There we go. 105 00:07:44,690 --> 00:07:48,480 So does this make sense. 106 00:07:48,490 --> 00:07:50,620 Well what is happening here. 107 00:07:50,890 --> 00:07:57,640 So this in essence is how much each feature here contributes to predicting the target very well whether 108 00:07:57,640 --> 00:07:59,530 someone has heart disease or not. 109 00:07:59,530 --> 00:08:07,050 You can see that some are negative and some are positive and now you might be thinking we have seen 110 00:08:07,470 --> 00:08:09,530 something similar to this right. 111 00:08:09,550 --> 00:08:13,770 Whether one feature is negative and one feature is positive. 112 00:08:13,930 --> 00:08:15,950 It's not coming to you right now it's okay. 113 00:08:15,980 --> 00:08:23,870 Took me a while to connect these two but if we go right back up here right back up to our correlation 114 00:08:23,870 --> 00:08:24,740 matrix. 115 00:08:24,740 --> 00:08:29,540 So this is where I remember for logistic regression if we want to figure out feature importance we use 116 00:08:29,540 --> 00:08:37,310 coed because correlation matrix coefficient kind of kind of the same thing kind of not really but you 117 00:08:37,310 --> 00:08:41,510 get my picture so we can see here some different values. 118 00:08:41,510 --> 00:08:44,360 So if we got C P equals zero point four three. 119 00:08:44,870 --> 00:08:48,310 Okay foul AK equals zero point for two. 120 00:08:48,530 --> 00:08:49,200 Then we got. 121 00:08:49,220 --> 00:08:49,960 What else do we have. 122 00:08:49,970 --> 00:08:52,360 Exam equals negative point four four. 123 00:08:52,590 --> 00:08:53,400 Mm hmm. 124 00:08:53,480 --> 00:08:54,740 If we go down here 125 00:08:58,470 --> 00:09:00,460 now we've got the same thing. 126 00:09:00,480 --> 00:09:01,320 So got P. 127 00:09:01,340 --> 00:09:08,130 Because point six six is a high number here and then we've got foul AK okay which is zero point 0 2 128 00:09:08,450 --> 00:09:12,080 we've got X Ang which is negative zero point six. 129 00:09:12,120 --> 00:09:17,820 So now what we're doing is we're doing model driven exploratory data analysis. 130 00:09:17,820 --> 00:09:22,800 These values have come from building a machine learning model which has found patterns in the data and 131 00:09:22,800 --> 00:09:29,360 it's telling us how it contributes right or how it correlates to our target variable. 132 00:09:29,460 --> 00:09:31,150 That's what we're looking at here. 133 00:09:31,170 --> 00:09:34,230 So what can we do with these values. 134 00:09:34,230 --> 00:09:37,260 Well let's explore them a little bit and just see if they make sense. 135 00:09:37,260 --> 00:09:42,570 So we can see that sex is fairly negatively correlated like it's almost right here at the bottom. 136 00:09:43,210 --> 00:09:44,280 Mm hmm. 137 00:09:44,430 --> 00:09:49,680 If the value is negative we saw before with the correlation matrix that means that there's a negative 138 00:09:49,680 --> 00:09:51,090 correlation. 139 00:09:51,120 --> 00:09:51,830 Okay. 140 00:09:51,870 --> 00:09:59,100 When the value for sex increases the target value decreases because of the negative coefficient. 141 00:09:59,100 --> 00:10:02,040 Let's look at this see if this actually reflects the data. 142 00:10:02,670 --> 00:10:08,950 Let's go sex let's compare the F target. 143 00:10:09,610 --> 00:10:10,400 What do we say. 144 00:10:10,420 --> 00:10:17,810 So which means as a value for sex increases the target value decrease because negative coefficient has 145 00:10:17,810 --> 00:10:23,580 a value for sex increases so as it goes up the target value decreases. 146 00:10:23,600 --> 00:10:27,350 That doesn't really make sense when we look at this look. 147 00:10:27,350 --> 00:10:33,820 So sex goes up increases but so does the target value. 148 00:10:33,820 --> 00:10:35,900 Oh you know what. 149 00:10:35,900 --> 00:10:41,710 After looking at this you can see that the ratio is what we're thinking about here. 150 00:10:41,780 --> 00:10:50,090 So as sex goes up the target value the ratio decreases so you can see here if the sex is zero for female 151 00:10:50,480 --> 00:10:53,600 there's almost a three to one ratio here. 152 00:10:53,600 --> 00:10:56,810 So seventy two divided by 24 is almost three. 153 00:10:56,860 --> 00:10:57,760 Look at that. 154 00:10:57,860 --> 00:11:01,460 Seventy two divided by right. 155 00:11:01,790 --> 00:11:02,340 Exactly. 156 00:11:02,350 --> 00:11:05,040 Twenty four so it's a three to one ratio here. 157 00:11:05,250 --> 00:11:14,100 And then as sex increases the target goes down to about a one to 2 ratio C there's roughly a 50/50 here. 158 00:11:14,770 --> 00:11:21,060 Okay so that's a negative correlation which means that sex is strongly negatively or a negative coefficient 159 00:11:21,060 --> 00:11:28,650 sorry because we're using the coefficient co f we're using co f up here the CO f attribute. 160 00:11:28,650 --> 00:11:32,180 Now let's have a look at a positive one. 161 00:11:32,230 --> 00:11:36,220 C P Well maybe slope because we've already explored c p before. 162 00:11:36,360 --> 00:11:38,710 So let's do that paid a cross tab. 163 00:11:38,740 --> 00:11:44,970 I'm not actually sure what slope is so we might have to revisit our data dictionary and again this is 164 00:11:44,970 --> 00:11:47,890 model driven exploratory data analysis here on it. 165 00:11:47,910 --> 00:11:52,890 We're trying to figure out what's going on using all the results from our model and seeing if what our 166 00:11:52,890 --> 00:11:54,270 models learned holds water. 167 00:11:54,870 --> 00:12:01,950 So it's saying here that as slope increases because it's a positive coefficient that means that the 168 00:12:01,950 --> 00:12:03,720 target should also increase as well. 169 00:12:03,750 --> 00:12:04,470 So let's have a look. 170 00:12:04,950 --> 00:12:07,560 So slope we're going to value 0 1 2. 171 00:12:07,560 --> 00:12:14,120 So as it increases from 0 to 1 to 2 does the number of samples here increase. 172 00:12:14,130 --> 00:12:21,190 Well I think it does because here we've got slightly more here we've got slightly more there or we've 173 00:12:21,190 --> 00:12:22,230 got almost double there. 174 00:12:22,240 --> 00:12:26,910 But then as it gets really high we've got basically triple. 175 00:12:26,950 --> 00:12:28,560 People with heart disease. 176 00:12:28,650 --> 00:12:32,230 So let's go back up to the top and see what slope is in our data dictionary 177 00:12:35,110 --> 00:12:36,310 slope. 178 00:12:36,530 --> 00:12:39,530 What you wanna do is copy this so we can come back down. 179 00:12:39,550 --> 00:12:44,550 Copy that will come right back down to where we were. 180 00:12:44,560 --> 00:12:46,510 Now let's see if this makes sense right. 181 00:12:47,540 --> 00:12:50,940 Got a positive correlation according to our model of positive coefficient. 182 00:12:50,940 --> 00:12:53,810 So I'll put here we'll change that to markdown. 183 00:12:54,000 --> 00:13:02,270 That doesn't look very nice we'll put these as dot point slope the slope of the peak exercise s t segment 184 00:13:02,360 --> 00:13:04,730 so zero is unloading. 185 00:13:04,880 --> 00:13:07,380 So better heart rate with exercise uncommon. 186 00:13:07,380 --> 00:13:09,140 Yeah that that is pretty uncommon. 187 00:13:09,260 --> 00:13:11,800 Flat sloping minimal change typical healthy heart. 188 00:13:11,810 --> 00:13:12,320 Okay. 189 00:13:12,410 --> 00:13:13,400 Yeah. 190 00:13:13,400 --> 00:13:20,730 And then two signs downslope and so signs of an unhealthy heart are okay. 191 00:13:20,810 --> 00:13:22,030 That's making sense right. 192 00:13:22,100 --> 00:13:23,910 This correlation here. 193 00:13:24,010 --> 00:13:25,460 You want a medical expert. 194 00:13:25,540 --> 00:13:30,390 But you know a little bit about about how things go if you say someone had an unhealthy heart you would 195 00:13:30,390 --> 00:13:35,870 are someone while you know that they potentially have an unhealthy heart because their slope value is 196 00:13:35,870 --> 00:13:40,890 too would you say they're more likely to have heart disease. 197 00:13:40,920 --> 00:13:41,200 Okay. 198 00:13:41,200 --> 00:13:47,870 Having a value of one or not have heart disease well according to our model it's giving it a positive 199 00:13:47,870 --> 00:13:48,910 coefficient to slope. 200 00:13:48,920 --> 00:13:55,310 So as the slope value increases it means the model is more likely to predict a higher value of the target. 201 00:13:55,340 --> 00:13:56,960 And what's our higher value other target. 202 00:13:56,960 --> 00:14:01,210 Because we're only predicting zero or one the higher value is one. 203 00:14:01,360 --> 00:14:05,210 I know we could keep going with this but this is something that you could probably share. 204 00:14:05,210 --> 00:14:08,520 What is the importance of having something like this. 205 00:14:08,690 --> 00:14:10,880 First of all you find out more. 206 00:14:10,880 --> 00:14:17,570 So if some of the correlations in feature importance is here confusing a subject matter expert may be 207 00:14:17,570 --> 00:14:21,830 able to shed some light on the situation so this is something you can might be out to take to one of 208 00:14:21,830 --> 00:14:25,970 your one of your partners one of your colleagues and go Hey I'm not sure what's actually going on here 209 00:14:25,970 --> 00:14:32,720 with chest pain having a positive coefficient are you able to help me out a number or two you could 210 00:14:32,720 --> 00:14:34,030 redirect your efforts. 211 00:14:34,190 --> 00:14:39,920 So if some of these features offer more value than others this may change how you collect data for different 212 00:14:39,920 --> 00:14:40,760 problems to see here. 213 00:14:40,760 --> 00:14:47,320 These don't really influence these ones yes age cholesterol dressed. 214 00:14:47,330 --> 00:14:49,940 These are all really low coefficients here. 215 00:14:49,940 --> 00:14:55,430 Now that might influence how you you go about collecting data in the future maybe finding someone's 216 00:14:55,430 --> 00:15:00,740 cholesterol level is really hard to do if it's not contributing much to the patterns a model is finding 217 00:15:00,740 --> 00:15:06,890 you might scrap that in future data collection and then again this is the third point it's less but 218 00:15:06,890 --> 00:15:07,480 better right. 219 00:15:07,490 --> 00:15:12,140 So if some features are offering far more values and others you could reduce the number of features 220 00:15:12,140 --> 00:15:17,450 your model tries to find patterns in as well as improving these ones so that's what we just continuation 221 00:15:17,450 --> 00:15:20,280 really point to and three are kind of the same thing. 222 00:15:20,390 --> 00:15:26,390 You could combine these in some way or improve these or just make the ones that are offering the most 223 00:15:26,390 --> 00:15:27,740 better. 224 00:15:27,890 --> 00:15:33,620 This would not only save you on computation by having a model find patterns across less features you 225 00:15:33,620 --> 00:15:38,930 could still achieve the same performance with using the features that offer the most value. 226 00:15:38,930 --> 00:15:41,340 So that's something to keep in mind. 227 00:15:41,410 --> 00:15:47,540 All right so we've fulfilled all of the requirements in our project that our boss was asking for we 228 00:15:47,540 --> 00:15:54,140 got feature importance we got cross validation classification metrics yes we've got a confusion matrix 229 00:15:54,230 --> 00:15:55,730 many different pages here. 230 00:15:55,880 --> 00:15:59,630 My goodness we've got a ROIC curve we've got area under the curve. 231 00:15:59,630 --> 00:16:01,160 Beautiful. 232 00:16:01,340 --> 00:16:04,380 I wonder if there's wonder if there's anything we're missing out. 233 00:16:04,430 --> 00:16:06,850 Well let's let's figure that out in the next video.