1
00:00:00,260 --> 00:00:01,520
All right all right all right.

2
00:00:01,590 --> 00:00:06,420
To finish off this prove a concept that we're working through of predicting whether or not someone has

3
00:00:06,420 --> 00:00:07,730
heart disease.

4
00:00:07,980 --> 00:00:12,810
We're going to fulfill the last requirement that our boss asked of us and that is feature importance

5
00:00:12,820 --> 00:00:18,200
so let's make a little heading here feature important and beautiful.

6
00:00:18,210 --> 00:00:23,790
Now what is feature important will feature importance let's just put a little note to ourselves feature

7
00:00:23,790 --> 00:00:36,520
importance is another way of asking which features contributed most to the outcomes of the model.

8
00:00:36,570 --> 00:00:37,730
You could also extend it.

9
00:00:37,860 --> 00:00:41,100
And how did they contribute.

10
00:00:41,100 --> 00:00:46,410
This is useful to know since we're trying to predict heart disease using a patient's medical characteristics

11
00:00:47,370 --> 00:00:49,560
which characteristics for a refresher.

12
00:00:49,560 --> 00:00:52,390
We've seen this a few times now but we'll refresh ourselves.

13
00:00:52,470 --> 00:01:00,510
We're using this data here so which one of these characteristics of age sex C.P. all these other ones

14
00:01:00,510 --> 00:01:03,490
cholesterol black eggs sang.

15
00:01:03,510 --> 00:01:07,800
And if you want to a breakdown of what they are we can revisit our data dictionary up the top.

16
00:01:07,920 --> 00:01:13,490
We're looking at here will feature important which one of these contributes.

17
00:01:13,570 --> 00:01:16,910
And how do they contribute or how much do they contribute to predicting the target.

18
00:01:17,620 --> 00:01:21,270
So how would you exactly find it out.

19
00:01:21,280 --> 00:01:34,270
Well finding feature importance is different for each machine learning model now as we saw before with

20
00:01:34,270 --> 00:01:37,060
how to tune height parameters of a certain model.

21
00:01:37,060 --> 00:01:42,580
What you might want to look at and since we're using logistic regression to calculate all of these metrics

22
00:01:43,170 --> 00:01:47,680
and logistic regression is the model that we found by grid search by tuning the hard parameters to get

23
00:01:47,680 --> 00:01:49,930
the best results so far.

24
00:01:49,930 --> 00:01:59,590
What you might look up is something like this how to find feature important using logistic regression

25
00:02:02,300 --> 00:02:09,050
if we go here I'll look into feature importance using logistic regression model based feature importance

26
00:02:09,530 --> 00:02:12,530
how to find the importance of features of a logistic regression model.

27
00:02:12,620 --> 00:02:16,670
If you went through this you could insert almost any model here.

28
00:02:16,670 --> 00:02:21,650
So random forest or cane and classifiers or something like that and you'd find a bunch of different

29
00:02:21,650 --> 00:02:22,710
methods.

30
00:02:22,730 --> 00:02:25,950
So what we're going to do again is pretend that we've done our research.

31
00:02:26,060 --> 00:02:29,210
Our boss is gone can you get the feature importance of that model we've gone.

32
00:02:29,210 --> 00:02:29,690
Hold on.

33
00:02:29,690 --> 00:02:32,020
I'm not entirely sure what feature importance is.

34
00:02:32,060 --> 00:02:37,880
So we've gone away for overnight or for a couple of hours done our research got our model figured out

35
00:02:37,910 --> 00:02:40,790
Okay this is how we find feature importance beautiful.

36
00:02:40,890 --> 00:02:42,050
So that's what we're going to do here.

37
00:02:42,050 --> 00:02:47,000
We've retained we've done this research but remember part of being a machine learning engineer part

38
00:02:47,000 --> 00:02:48,070
of being a data scientist.

39
00:02:48,080 --> 00:02:52,340
The most important thing is researching and experimenting.

40
00:02:52,340 --> 00:02:55,520
It's built into our framework that we're using experiments.

41
00:02:55,520 --> 00:02:59,780
That's what we're doing here where we're in the modeling phase really that we've created the heading

42
00:02:59,780 --> 00:03:01,950
for it but really we're experimenting.

43
00:03:02,330 --> 00:03:18,470
So let's figure it out let's find the feature important for now logistic regression model I'll put here

44
00:03:18,620 --> 00:03:27,110
a little note one line to find feature important is to search for so you can put it in here I'll put

45
00:03:27,110 --> 00:03:27,860
it in brackets

46
00:03:30,500 --> 00:03:34,220
model name feature importance.

47
00:03:34,280 --> 00:03:35,030
That's one way to do it.

48
00:03:35,030 --> 00:03:37,850
That's what we just saw up in this little search here.

49
00:03:37,850 --> 00:03:41,840
That's what you should try a few curious to have a fun feature importance with there but whichever model

50
00:03:41,840 --> 00:03:42,910
that you're using.

51
00:03:42,920 --> 00:03:51,760
So first of all we're going to fit an instance of logistic regression will create one with the best

52
00:03:51,760 --> 00:03:55,890
parameter GC log rig based programs.

53
00:03:55,960 --> 00:03:57,300
We'll see what that is.

54
00:03:57,370 --> 00:04:02,310
Shifting into wonderful and then we'll go see a laugh equals logistic regression.

55
00:04:02,350 --> 00:04:08,740
We're going to posit a C value of this this long long long decimal.

56
00:04:08,740 --> 00:04:09,430
Here we go.

57
00:04:09,430 --> 00:04:18,160
And then we're gonna pass it here solve solver live linear beautiful so go to our CSF so we'll instantiate

58
00:04:18,160 --> 00:04:24,970
that that's our classifier and then we'll go see a left off it X train y train.

59
00:04:25,150 --> 00:04:32,220
Wonderful and now through our research we've found that there's an attribute to our fitted logistic

60
00:04:32,220 --> 00:04:35,850
regression model called co F which stands for coefficient.

61
00:04:36,480 --> 00:04:37,060
Okay.

62
00:04:37,200 --> 00:04:42,060
And the way I can remember this for logistic regression is coefficient kind of reminds me I may have

63
00:04:42,060 --> 00:04:44,490
correlation but let's see what happens

64
00:04:47,540 --> 00:04:52,280
remember we found this through our research we've looked up how to find the feature importance for a

65
00:04:52,280 --> 00:04:58,670
logistic regression model and it's told us if we're using psychic learn specifically logistic regression

66
00:04:58,790 --> 00:05:07,160
and if we fit the model we can call the CO f attribute which gives us the coefficient the value or how

67
00:05:07,160 --> 00:05:09,030
each parameter.

68
00:05:09,080 --> 00:05:10,400
Let's look at our dataset again

69
00:05:13,340 --> 00:05:16,760
how each of these independent variables.

70
00:05:16,760 --> 00:05:26,440
So the X train data set the coefficient contributes to our labels here or the target labels.

71
00:05:26,750 --> 00:05:32,360
So that was a bit of a mouthful but what we're gonna do is manipulate this coefficient array so that

72
00:05:32,360 --> 00:05:36,400
it looks like it makes sense because right now it's just a just a list of numbers.

73
00:05:36,410 --> 00:05:40,610
But if you can add them up they're actually going to be the same length as what's happening here.

74
00:05:40,610 --> 00:05:41,640
These Amana columns.

75
00:05:42,140 --> 00:05:43,970
But let's just twist them together.

76
00:05:44,000 --> 00:05:50,780
So what are we going to match the features to columns Coates of features two columns.

77
00:05:50,810 --> 00:05:52,540
So this is going to make sense.

78
00:05:52,580 --> 00:05:53,570
That's what I'm trying to get at.

79
00:05:53,570 --> 00:05:58,880
This will make it make sense dict so feature dict we're going to create a dictionary and we're gonna

80
00:05:58,880 --> 00:06:11,990
zip together def columns and we're going to list along with C a pro EF 0 and let's say

81
00:06:15,070 --> 00:06:19,060
featured in Boehm.

82
00:06:19,160 --> 00:06:21,510
Look at that how good is that.

83
00:06:22,000 --> 00:06:23,070
So if go here.

84
00:06:23,900 --> 00:06:28,870
We've met all of the different variables here to the right column.

85
00:06:28,890 --> 00:06:31,710
Let's remind ourselves of what's happening here.

86
00:06:31,760 --> 00:06:32,910
Go ahead.

87
00:06:33,070 --> 00:06:38,520
So all we've done is we've taken the call where for Ray right which is an attribute of our classifier

88
00:06:39,420 --> 00:06:44,630
and we've taken the columns from our data frame and we've mapped them to each other.

89
00:06:45,360 --> 00:06:50,440
So what this is telling us is how much and what why.

90
00:06:50,450 --> 00:06:52,590
See we've got some negative values here.

91
00:06:52,610 --> 00:06:57,770
Whether it's a negative or positive correlation how much each of these contribute to predicting the

92
00:06:57,770 --> 00:06:59,630
target variable.

93
00:06:59,630 --> 00:07:01,580
Well that's a bit of a mouthful then.

94
00:07:01,850 --> 00:07:02,540
Another way

95
00:07:05,180 --> 00:07:09,720
visualize phage are important actually.

96
00:07:09,750 --> 00:07:11,540
I shouldn't have told you that I should have told you.

97
00:07:11,540 --> 00:07:18,370
That was our little magic trick that I had magician never reveals his secrets.

98
00:07:18,600 --> 00:07:23,460
What we're doing is we're visualizing this so we can kind of get an idea of what's going on.

99
00:07:23,490 --> 00:07:25,350
So index it zero.

100
00:07:25,550 --> 00:07:31,440
Front line feature ADF don't transpose a lot of transposing plots here.

101
00:07:31,470 --> 00:07:37,840
But that's a long title Eagles final feature important.

102
00:07:38,020 --> 00:07:39,600
Go legend.

103
00:07:39,630 --> 00:07:43,000
No thank you well.

104
00:07:43,040 --> 00:07:44,690
There we go.

105
00:07:44,690 --> 00:07:48,480
So does this make sense.

106
00:07:48,490 --> 00:07:50,620
Well what is happening here.

107
00:07:50,890 --> 00:07:57,640
So this in essence is how much each feature here contributes to predicting the target very well whether

108
00:07:57,640 --> 00:07:59,530
someone has heart disease or not.

109
00:07:59,530 --> 00:08:07,050
You can see that some are negative and some are positive and now you might be thinking we have seen

110
00:08:07,470 --> 00:08:09,530
something similar to this right.

111
00:08:09,550 --> 00:08:13,770
Whether one feature is negative and one feature is positive.

112
00:08:13,930 --> 00:08:15,950
It's not coming to you right now it's okay.

113
00:08:15,980 --> 00:08:23,870
Took me a while to connect these two but if we go right back up here right back up to our correlation

114
00:08:23,870 --> 00:08:24,740
matrix.

115
00:08:24,740 --> 00:08:29,540
So this is where I remember for logistic regression if we want to figure out feature importance we use

116
00:08:29,540 --> 00:08:37,310
coed because correlation matrix coefficient kind of kind of the same thing kind of not really but you

117
00:08:37,310 --> 00:08:41,510
get my picture so we can see here some different values.

118
00:08:41,510 --> 00:08:44,360
So if we got C P equals zero point four three.

119
00:08:44,870 --> 00:08:48,310
Okay foul AK equals zero point for two.

120
00:08:48,530 --> 00:08:49,200
Then we got.

121
00:08:49,220 --> 00:08:49,960
What else do we have.

122
00:08:49,970 --> 00:08:52,360
Exam equals negative point four four.

123
00:08:52,590 --> 00:08:53,400
Mm hmm.

124
00:08:53,480 --> 00:08:54,740
If we go down here

125
00:08:58,470 --> 00:09:00,460
now we've got the same thing.

126
00:09:00,480 --> 00:09:01,320
So got P.

127
00:09:01,340 --> 00:09:08,130
Because point six six is a high number here and then we've got foul AK okay which is zero point 0 2

128
00:09:08,450 --> 00:09:12,080
we've got X Ang which is negative zero point six.

129
00:09:12,120 --> 00:09:17,820
So now what we're doing is we're doing model driven exploratory data analysis.

130
00:09:17,820 --> 00:09:22,800
These values have come from building a machine learning model which has found patterns in the data and

131
00:09:22,800 --> 00:09:29,360
it's telling us how it contributes right or how it correlates to our target variable.

132
00:09:29,460 --> 00:09:31,150
That's what we're looking at here.

133
00:09:31,170 --> 00:09:34,230
So what can we do with these values.

134
00:09:34,230 --> 00:09:37,260
Well let's explore them a little bit and just see if they make sense.

135
00:09:37,260 --> 00:09:42,570
So we can see that sex is fairly negatively correlated like it's almost right here at the bottom.

136
00:09:43,210 --> 00:09:44,280
Mm hmm.

137
00:09:44,430 --> 00:09:49,680
If the value is negative we saw before with the correlation matrix that means that there's a negative

138
00:09:49,680 --> 00:09:51,090
correlation.

139
00:09:51,120 --> 00:09:51,830
Okay.

140
00:09:51,870 --> 00:09:59,100
When the value for sex increases the target value decreases because of the negative coefficient.

141
00:09:59,100 --> 00:10:02,040
Let's look at this see if this actually reflects the data.

142
00:10:02,670 --> 00:10:08,950
Let's go sex let's compare the F target.

143
00:10:09,610 --> 00:10:10,400
What do we say.

144
00:10:10,420 --> 00:10:17,810
So which means as a value for sex increases the target value decrease because negative coefficient has

145
00:10:17,810 --> 00:10:23,580
a value for sex increases so as it goes up the target value decreases.

146
00:10:23,600 --> 00:10:27,350
That doesn't really make sense when we look at this look.

147
00:10:27,350 --> 00:10:33,820
So sex goes up increases but so does the target value.

148
00:10:33,820 --> 00:10:35,900
Oh you know what.

149
00:10:35,900 --> 00:10:41,710
After looking at this you can see that the ratio is what we're thinking about here.

150
00:10:41,780 --> 00:10:50,090
So as sex goes up the target value the ratio decreases so you can see here if the sex is zero for female

151
00:10:50,480 --> 00:10:53,600
there's almost a three to one ratio here.

152
00:10:53,600 --> 00:10:56,810
So seventy two divided by 24 is almost three.

153
00:10:56,860 --> 00:10:57,760
Look at that.

154
00:10:57,860 --> 00:11:01,460
Seventy two divided by right.

155
00:11:01,790 --> 00:11:02,340
Exactly.

156
00:11:02,350 --> 00:11:05,040
Twenty four so it's a three to one ratio here.

157
00:11:05,250 --> 00:11:14,100
And then as sex increases the target goes down to about a one to 2 ratio C there's roughly a 50/50 here.

158
00:11:14,770 --> 00:11:21,060
Okay so that's a negative correlation which means that sex is strongly negatively or a negative coefficient

159
00:11:21,060 --> 00:11:28,650
sorry because we're using the coefficient co f we're using co f up here the CO f attribute.

160
00:11:28,650 --> 00:11:32,180
Now let's have a look at a positive one.

161
00:11:32,230 --> 00:11:36,220
C P Well maybe slope because we've already explored c p before.

162
00:11:36,360 --> 00:11:38,710
So let's do that paid a cross tab.

163
00:11:38,740 --> 00:11:44,970
I'm not actually sure what slope is so we might have to revisit our data dictionary and again this is

164
00:11:44,970 --> 00:11:47,890
model driven exploratory data analysis here on it.

165
00:11:47,910 --> 00:11:52,890
We're trying to figure out what's going on using all the results from our model and seeing if what our

166
00:11:52,890 --> 00:11:54,270
models learned holds water.

167
00:11:54,870 --> 00:12:01,950
So it's saying here that as slope increases because it's a positive coefficient that means that the

168
00:12:01,950 --> 00:12:03,720
target should also increase as well.

169
00:12:03,750 --> 00:12:04,470
So let's have a look.

170
00:12:04,950 --> 00:12:07,560
So slope we're going to value 0 1 2.

171
00:12:07,560 --> 00:12:14,120
So as it increases from 0 to 1 to 2 does the number of samples here increase.

172
00:12:14,130 --> 00:12:21,190
Well I think it does because here we've got slightly more here we've got slightly more there or we've

173
00:12:21,190 --> 00:12:22,230
got almost double there.

174
00:12:22,240 --> 00:12:26,910
But then as it gets really high we've got basically triple.

175
00:12:26,950 --> 00:12:28,560
People with heart disease.

176
00:12:28,650 --> 00:12:32,230
So let's go back up to the top and see what slope is in our data dictionary

177
00:12:35,110 --> 00:12:36,310
slope.

178
00:12:36,530 --> 00:12:39,530
What you wanna do is copy this so we can come back down.

179
00:12:39,550 --> 00:12:44,550
Copy that will come right back down to where we were.

180
00:12:44,560 --> 00:12:46,510
Now let's see if this makes sense right.

181
00:12:47,540 --> 00:12:50,940
Got a positive correlation according to our model of positive coefficient.

182
00:12:50,940 --> 00:12:53,810
So I'll put here we'll change that to markdown.

183
00:12:54,000 --> 00:13:02,270
That doesn't look very nice we'll put these as dot point slope the slope of the peak exercise s t segment

184
00:13:02,360 --> 00:13:04,730
so zero is unloading.

185
00:13:04,880 --> 00:13:07,380
So better heart rate with exercise uncommon.

186
00:13:07,380 --> 00:13:09,140
Yeah that that is pretty uncommon.

187
00:13:09,260 --> 00:13:11,800
Flat sloping minimal change typical healthy heart.

188
00:13:11,810 --> 00:13:12,320
Okay.

189
00:13:12,410 --> 00:13:13,400
Yeah.

190
00:13:13,400 --> 00:13:20,730
And then two signs downslope and so signs of an unhealthy heart are okay.

191
00:13:20,810 --> 00:13:22,030
That's making sense right.

192
00:13:22,100 --> 00:13:23,910
This correlation here.

193
00:13:24,010 --> 00:13:25,460
You want a medical expert.

194
00:13:25,540 --> 00:13:30,390
But you know a little bit about about how things go if you say someone had an unhealthy heart you would

195
00:13:30,390 --> 00:13:35,870
are someone while you know that they potentially have an unhealthy heart because their slope value is

196
00:13:35,870 --> 00:13:40,890
too would you say they're more likely to have heart disease.

197
00:13:40,920 --> 00:13:41,200
Okay.

198
00:13:41,200 --> 00:13:47,870
Having a value of one or not have heart disease well according to our model it's giving it a positive

199
00:13:47,870 --> 00:13:48,910
coefficient to slope.

200
00:13:48,920 --> 00:13:55,310
So as the slope value increases it means the model is more likely to predict a higher value of the target.

201
00:13:55,340 --> 00:13:56,960
And what's our higher value other target.

202
00:13:56,960 --> 00:14:01,210
Because we're only predicting zero or one the higher value is one.

203
00:14:01,360 --> 00:14:05,210
I know we could keep going with this but this is something that you could probably share.

204
00:14:05,210 --> 00:14:08,520
What is the importance of having something like this.

205
00:14:08,690 --> 00:14:10,880
First of all you find out more.

206
00:14:10,880 --> 00:14:17,570
So if some of the correlations in feature importance is here confusing a subject matter expert may be

207
00:14:17,570 --> 00:14:21,830
able to shed some light on the situation so this is something you can might be out to take to one of

208
00:14:21,830 --> 00:14:25,970
your one of your partners one of your colleagues and go Hey I'm not sure what's actually going on here

209
00:14:25,970 --> 00:14:32,720
with chest pain having a positive coefficient are you able to help me out a number or two you could

210
00:14:32,720 --> 00:14:34,030
redirect your efforts.

211
00:14:34,190 --> 00:14:39,920
So if some of these features offer more value than others this may change how you collect data for different

212
00:14:39,920 --> 00:14:40,760
problems to see here.

213
00:14:40,760 --> 00:14:47,320
These don't really influence these ones yes age cholesterol dressed.

214
00:14:47,330 --> 00:14:49,940
These are all really low coefficients here.

215
00:14:49,940 --> 00:14:55,430
Now that might influence how you you go about collecting data in the future maybe finding someone's

216
00:14:55,430 --> 00:15:00,740
cholesterol level is really hard to do if it's not contributing much to the patterns a model is finding

217
00:15:00,740 --> 00:15:06,890
you might scrap that in future data collection and then again this is the third point it's less but

218
00:15:06,890 --> 00:15:07,480
better right.

219
00:15:07,490 --> 00:15:12,140
So if some features are offering far more values and others you could reduce the number of features

220
00:15:12,140 --> 00:15:17,450
your model tries to find patterns in as well as improving these ones so that's what we just continuation

221
00:15:17,450 --> 00:15:20,280
really point to and three are kind of the same thing.

222
00:15:20,390 --> 00:15:26,390
You could combine these in some way or improve these or just make the ones that are offering the most

223
00:15:26,390 --> 00:15:27,740
better.

224
00:15:27,890 --> 00:15:33,620
This would not only save you on computation by having a model find patterns across less features you

225
00:15:33,620 --> 00:15:38,930
could still achieve the same performance with using the features that offer the most value.

226
00:15:38,930 --> 00:15:41,340
So that's something to keep in mind.

227
00:15:41,410 --> 00:15:47,540
All right so we've fulfilled all of the requirements in our project that our boss was asking for we

228
00:15:47,540 --> 00:15:54,140
got feature importance we got cross validation classification metrics yes we've got a confusion matrix

229
00:15:54,230 --> 00:15:55,730
many different pages here.

230
00:15:55,880 --> 00:15:59,630
My goodness we've got a ROIC curve we've got area under the curve.

231
00:15:59,630 --> 00:16:01,160
Beautiful.

232
00:16:01,340 --> 00:16:04,380
I wonder if there's wonder if there's anything we're missing out.

233
00:16:04,430 --> 00:16:06,850
Well let's let's figure that out in the next video.