1
00:00:00,560 --> 00:00:01,070
Oh right.

2
00:00:01,060 --> 00:00:02,220
All right.

3
00:00:02,250 --> 00:00:04,470
So we're going to start off on the desktop.

4
00:00:04,470 --> 00:00:05,850
Likely always do.

5
00:00:05,860 --> 00:00:10,830
We're going to come into terminal and I'll zoom in here a little bit.

6
00:00:10,860 --> 00:00:16,980
First thing is to get an environment activated so into a contact and list to remind ourselves of where

7
00:00:16,980 --> 00:00:18,180
our environments are.

8
00:00:18,180 --> 00:00:23,670
I've got three here the one we've been working with is in the sample project photo so I'm going to go

9
00:00:23,670 --> 00:00:29,490
conduct activate and then copy this I could copy and paste it but I'm just gonna type it out because

10
00:00:29,880 --> 00:00:31,520
that's what we're in the habit of doing.

11
00:00:31,790 --> 00:00:44,430
So it uses slash Daniel slash desktop slash M.O. cause or typos galore today slash sample project your

12
00:00:44,670 --> 00:00:51,220
file path here might be different unless your name's Daniel because otherwise great name.

13
00:00:51,760 --> 00:00:59,010
And then here once you've got our environment activated we can see that this is changed from base to

14
00:00:59,100 --> 00:01:00,690
this string here.

15
00:01:00,690 --> 00:01:01,200
Wonderful.

16
00:01:01,200 --> 00:01:04,590
That means our contact by environment is activated.

17
00:01:04,590 --> 00:01:05,650
So then we'll go here.

18
00:01:05,700 --> 00:01:07,000
Jupiter notebook.

19
00:01:07,020 --> 00:01:14,260
So we can get our notebook server load it up wonderful this is gonna open our browser.

20
00:01:14,580 --> 00:01:18,480
I haven't changed directory so that's why I'm at the top folder here.

21
00:01:18,510 --> 00:01:19,060
But that's right.

22
00:01:19,080 --> 00:01:26,070
I can change in a desktop email course sample project is where we've been working we've got our introduction

23
00:01:26,070 --> 00:01:28,600
to map port lib num pi and pen is notebooks.

24
00:01:28,620 --> 00:01:40,280
Wonderful but I'm going to create a new one because we're going through introduction to so I could learn.

25
00:01:40,310 --> 00:01:41,310
Wonderful.

26
00:01:41,330 --> 00:01:43,820
We'll put a beautiful heading up the top here.

27
00:01:44,030 --> 00:01:51,600
Introduction to socket line brackets SDK learn to remind ourselves go.

28
00:01:51,620 --> 00:02:01,630
This notebook demonstrates some of the most useful functions of the beautiful which it is beautiful.

29
00:02:01,800 --> 00:02:06,670
So I could learn library wonderful will hit escape.

30
00:02:06,680 --> 00:02:11,680
And aim to change that into markdown we'll run it and make we'll put in here what we're going to cover.

31
00:02:11,690 --> 00:02:18,880
Just so just so we get a little bit of a highlight what we're going to cover.

32
00:02:19,170 --> 00:02:27,060
So we're gonna start off with zero because we're going to index Python style zero will be an end to

33
00:02:27,060 --> 00:02:27,500
end.

34
00:02:27,780 --> 00:02:30,440
So I can't learn workflow.

35
00:02:30,630 --> 00:02:32,430
Wonderful.

36
00:02:32,430 --> 00:02:35,660
And then we're going to deep dive into each step of this workflow.

37
00:02:35,820 --> 00:02:42,550
So we'll look at getting the data ready then will choose the right estimate.

38
00:02:42,550 --> 00:02:47,540
Now estimate it is another word in so I can't learn you'll see in a second for machine learning models

39
00:02:47,550 --> 00:02:53,000
so when you hear the word model or at machine learning algorithm or estimate.

40
00:02:53,130 --> 00:02:54,650
That's what SBA loan uses for it.

41
00:02:54,660 --> 00:03:02,980
They use the term estimate out for machine learning model output slash algorithm for our problems.

42
00:03:03,090 --> 00:03:13,950
Three will fit the model slash algorithm slash estimate and use it to make predictions on our data.

43
00:03:14,070 --> 00:03:14,910
Wonderful.

44
00:03:15,000 --> 00:03:24,160
And then four will look at evaluating a model and then five we're going to improve the model and then

45
00:03:24,160 --> 00:03:31,540
six will save and load and trained model and then seven why don't we put it all together.

46
00:03:33,160 --> 00:03:36,260
Maybe putting it all together sounds a bit better there.

47
00:03:37,090 --> 00:03:38,750
Wonderful.

48
00:03:38,840 --> 00:03:40,370
All right well let's dive into it.

49
00:03:40,400 --> 00:03:45,800
Let's start with zero will create another hitting so zero and end to end.

50
00:03:45,830 --> 00:03:50,300
Now follow along if you can because that's the best but I'll go a little fast to be honest because I'm

51
00:03:50,300 --> 00:03:52,400
kind of just talking through this as I go.

52
00:03:52,430 --> 00:03:57,860
So if you can't keep pace that is absolutely fine you could potentially slow the video down or just

53
00:03:57,860 --> 00:04:03,200
revisit it a few times until you see what's going on but don't worry you'll have access to all this

54
00:04:03,200 --> 00:04:05,060
code here sorry.

55
00:04:05,440 --> 00:04:11,950
And end to end socket learn workflow and we've got the steps right here what we have today is so let's

56
00:04:11,950 --> 00:04:15,210
do it step one getting the data ready let's see what that looks like.

57
00:04:15,220 --> 00:04:20,020
We had a little bit of experience in the past few sections on using a heart disease data.

58
00:04:20,050 --> 00:04:25,820
So let's see if we can do that and maybe build a machine learning model on that heart disease data set.

59
00:04:26,230 --> 00:04:33,310
All right we'll do one get the data ready import pandas as payday.

60
00:04:33,310 --> 00:04:36,090
Now remember this the end to end like it.

61
00:04:36,190 --> 00:04:38,490
So we're going to kind of breeze through these steps.

62
00:04:38,650 --> 00:04:41,020
So we'll deep dive into each of them as we go.

63
00:04:41,410 --> 00:04:48,970
So we'll get heart disease Eagles PD Jo Reed see us V now I believe I've moved everything into a data

64
00:04:48,970 --> 00:04:55,840
folder so usually we could just put the CSP name here but I've changed it up and now all of our CSC

65
00:04:55,840 --> 00:05:02,500
folders are within this data folder just to kind of clean and keep our sample project folder nice and

66
00:05:02,500 --> 00:05:03,460
neat looking.

67
00:05:03,550 --> 00:05:06,610
So that's why I've got the file path data here.

68
00:05:06,880 --> 00:05:12,210
Heart disease DOD CSA and then we'll go to heart disease.

69
00:05:12,210 --> 00:05:17,930
We'll view it just to make sure wonderful there's all of our roads there.

70
00:05:18,340 --> 00:05:25,600
So what we're going to do with this is use these columns all across here age sex C.P. whatever these

71
00:05:25,600 --> 00:05:30,640
are to try and predict the target which is 1 0 0 0 heart disease or not.

72
00:05:30,700 --> 00:05:37,450
So the first things first is we need an X and you'll see this a lot in cyclone X is kind of the beaches

73
00:05:37,450 --> 00:05:43,660
matrix which is essentially these columns and then Y will be this column.

74
00:05:43,660 --> 00:05:45,320
So let's see how we do that.

75
00:05:45,400 --> 00:05:54,700
So we create X may we'll put here create X which is called features matrix oftentimes has other names

76
00:05:54,700 --> 00:05:59,290
could be data could be feature variables heart disease Dart.

77
00:05:59,320 --> 00:06:04,740
We want to every column except the target column.

78
00:06:04,750 --> 00:06:06,420
So anything access equals one.

79
00:06:06,460 --> 00:06:07,170
Beautiful.

80
00:06:07,180 --> 00:06:14,650
And then we're going to create Y which is label label matrix all labels.

81
00:06:14,650 --> 00:06:16,390
Let's just call it labels.

82
00:06:16,390 --> 00:06:18,690
Beautiful Y equals.

83
00:06:18,730 --> 00:06:20,020
We want to heart disease.

84
00:06:20,030 --> 00:06:25,620
Now we only want the target column to target.

85
00:06:25,840 --> 00:06:26,260
Wonderful.

86
00:06:26,290 --> 00:06:30,720
So we've created X and Y so basically X and Y X is just easier.

87
00:06:30,760 --> 00:06:33,240
And why is this column here.

88
00:06:33,370 --> 00:06:34,290
Wonderful.

89
00:06:34,320 --> 00:06:35,090
Then what's next.

90
00:06:35,090 --> 00:06:41,710
In socket learn when we go up here to choose the right estimate a slash algorithm for our problems.

91
00:06:41,800 --> 00:06:46,270
Well our problem is classification because we want to classify whether someone has heart disease or

92
00:06:46,270 --> 00:06:47,070
not.

93
00:06:47,080 --> 00:06:48,050
So let's see that in action.

94
00:06:48,110 --> 00:06:57,740
So to is choose the right model and have a parameters hybrid parameters like dials on a model that you

95
00:06:57,740 --> 00:07:00,110
can tune to make it better or worse.

96
00:07:00,110 --> 00:07:06,800
So let's use a random forest SDK line if you're wondering what is Daniel talking about where the random

97
00:07:06,800 --> 00:07:07,400
forest.

98
00:07:07,400 --> 00:07:10,750
Well we're going to see this in a little more debt than a future video.

99
00:07:10,780 --> 00:07:12,070
But it's so far away.

100
00:07:12,080 --> 00:07:16,480
Just imagine random forest from Cyclone dot ensemble.

101
00:07:16,610 --> 00:07:18,470
Import random forest classifier.

102
00:07:18,470 --> 00:07:21,820
This is just a classification machine learning model.

103
00:07:21,860 --> 00:07:23,900
That is a motion need to know for now.

104
00:07:23,900 --> 00:07:30,470
So it's capable of learning patterns in data and then classifying whether a sample a.k.a. a row is one

105
00:07:30,470 --> 00:07:32,040
thing or another thing.

106
00:07:32,150 --> 00:07:38,300
And so we'll instantiate that class using CSF which is short for classifier and socket line.

107
00:07:38,300 --> 00:07:43,940
Other times you'll see a called model but we're gonna use CnF because we'll stick with the documentation

108
00:07:44,330 --> 00:07:49,610
to see it angles random for us classifier beautiful.

109
00:07:49,670 --> 00:07:56,790
And we're going to keep the default parameters hybrid parameters.

110
00:07:56,810 --> 00:07:57,530
The default

111
00:08:00,110 --> 00:08:06,940
hybrid parameters so CnF you can see what parameters your models using using CnF don't get parameter.

112
00:08:06,940 --> 00:08:13,740
So we'll see this maybe take a little while to import in the meantime we'll create the heading for this

113
00:08:13,740 --> 00:08:23,390
one so train is fit the model to the data I'm wondering why that's taking so long.

114
00:08:23,540 --> 00:08:25,090
Maybe it will finish while we're typing.

115
00:08:25,100 --> 00:08:30,050
So first of all we need to train our model on training set and tested on a test set.

116
00:08:30,050 --> 00:08:31,540
So we need to create that.

117
00:08:31,880 --> 00:08:34,670
What is this oh my bad.

118
00:08:34,690 --> 00:08:35,360
We need this.

119
00:08:35,370 --> 00:08:36,580
It's not an attribute.

120
00:08:36,880 --> 00:08:38,170
It's a method that we get.

121
00:08:38,410 --> 00:08:42,910
So this is a hybrid parameters with the little knobs on our machine learning model random forest classifier.

122
00:08:43,030 --> 00:08:49,810
We'll have a look at these in a future section but we want to also fit the model to the training data

123
00:08:51,620 --> 00:08:54,760
beautiful let's do that.

124
00:08:54,800 --> 00:08:57,680
So we want to go from S.K. lined up model selection.

125
00:08:57,680 --> 00:09:03,620
We need to split our data into training and test and we can do that with SBA loans train test bullet

126
00:09:04,040 --> 00:09:05,840
train test split.

127
00:09:05,870 --> 00:09:06,560
Wonderful.

128
00:09:06,570 --> 00:09:23,660
And we're going to go X train x test y train y test equals train let's test the splint x y test size.

129
00:09:23,670 --> 00:09:28,980
We want test size to be zero point two or whatever size you'd like to put in.

130
00:09:29,130 --> 00:09:35,760
Essentially what this is going to do is split our data from X and Y into X train.

131
00:09:35,760 --> 00:09:39,330
So training data y train x test and Y test.

132
00:09:39,330 --> 00:09:40,640
So testing data.

133
00:09:40,740 --> 00:09:47,370
So if you recall from way back in Section 1 we usually fit the model to training data and then evaluate

134
00:09:47,370 --> 00:09:52,330
it see what it's learned on test data so data it's never seen before and we'll see this test size.

135
00:09:52,440 --> 00:09:58,320
Well that means that 80 percent of the data will be used for training because it's zero point two because

136
00:09:58,320 --> 00:10:00,580
test size is going to be 20 percent.

137
00:10:00,630 --> 00:10:05,880
So if we had a thousand rows eight hundred of them would be used for training and 200 of them would

138
00:10:05,880 --> 00:10:08,460
be used for testing but we're getting distracted here.

139
00:10:08,460 --> 00:10:11,740
This is a whirlwind tour of S.K. line.

140
00:10:11,760 --> 00:10:13,710
We want to go CSF.

141
00:10:13,770 --> 00:10:14,670
Now we want to fit it.

142
00:10:14,700 --> 00:10:22,530
We can do that with CnF dot Fed Ex train this is basically saying hey classification model random forest

143
00:10:22,860 --> 00:10:28,710
find the patterns in the training data will go up shift and enter and beautiful.

144
00:10:28,770 --> 00:10:33,230
This is going to give us a little warning here because there's something that's happening so let's read

145
00:10:33,230 --> 00:10:34,050
a future warning.

146
00:10:34,050 --> 00:10:40,410
The default value of an estimate is which is a parameter we go up here and estimate is warning.

147
00:10:40,410 --> 00:10:47,790
That makes sense that it's giving us a warning will change from 10 inversion zero point to zero to 100

148
00:10:47,880 --> 00:10:49,800
in zero point to two.

149
00:10:49,830 --> 00:10:50,680
Mm hmm.

150
00:10:50,730 --> 00:10:52,800
Well let's see if we can just update that now.

151
00:10:52,800 --> 00:10:57,650
So an estimate is we're altering our hyper parameters here equals 100.

152
00:10:57,690 --> 00:11:01,840
Now we shouldn't get that warning beautiful.

153
00:11:01,920 --> 00:11:03,410
We don't.

154
00:11:03,420 --> 00:11:07,960
We can get rid of that output by just putting that little semicolon.

155
00:11:07,970 --> 00:11:08,690
Excellent.

156
00:11:08,690 --> 00:11:10,980
Now our model is fit to the data.

157
00:11:11,030 --> 00:11:12,290
So what can we do.

158
00:11:12,290 --> 00:11:14,390
Well we can make a prediction.

159
00:11:14,460 --> 00:11:16,010
So let's do that.

160
00:11:16,010 --> 00:11:18,180
Make a prediction.

161
00:11:18,210 --> 00:11:19,680
This is still step three by the way

162
00:11:22,450 --> 00:11:25,550
why label equals C left.

163
00:11:25,600 --> 00:11:28,660
Don't predict and we have to pass an empire right.

164
00:11:28,670 --> 00:11:35,260
So let's do that no empire Ray 0 2 3 4 let's see what happens.

165
00:11:35,290 --> 00:11:36,600
Are we in an era.

166
00:11:36,610 --> 00:11:39,960
None Pi is not defined we should have done this right at the top.

167
00:11:40,060 --> 00:11:41,580
Let's do that.

168
00:11:41,600 --> 00:11:44,210
Import num pi as MP.

169
00:11:44,530 --> 00:11:45,750
Beautiful.

170
00:11:45,760 --> 00:11:48,050
Now this should work.

171
00:11:48,110 --> 00:11:49,230
No it doesn't work.

172
00:11:49,230 --> 00:11:54,580
Why doesn't it work are the input so the shape is incorrect.

173
00:11:54,580 --> 00:12:00,700
Now the reason this is is because when you're trying to model on something that looks like X train which

174
00:12:00,940 --> 00:12:08,560
X train happens here we can only make predictions on arrays that look like this because that is what

175
00:12:08,560 --> 00:12:10,300
our model has learned.

176
00:12:10,300 --> 00:12:14,860
So does this NPM rise 0 2 3 4.

177
00:12:14,860 --> 00:12:16,680
Look anything like this.

178
00:12:16,690 --> 00:12:17,820
No not at all.

179
00:12:17,920 --> 00:12:24,300
But what does look like that is X test beautiful.

180
00:12:24,320 --> 00:12:27,550
So what we can do is we can make some predictions.

181
00:12:27,560 --> 00:12:33,530
So why spreads is a conventional name for making predictions on test data equals sale.

182
00:12:33,540 --> 00:12:38,540
If our model don't predict x test now what does this look like.

183
00:12:38,540 --> 00:12:39,920
Why spreads.

184
00:12:40,090 --> 00:12:43,070
We've got an array of zeros and ones and now what does this look like.

185
00:12:43,070 --> 00:12:44,930
We want Y test.

186
00:12:44,930 --> 00:12:48,140
We also have an array of zeros and ones.

187
00:12:48,330 --> 00:12:50,180
Hmm hmm I wonder what we could do with this.

188
00:12:50,330 --> 00:12:55,620
Well go Step Four evaluate the model.

189
00:12:55,640 --> 00:13:00,560
This is where we evaluate how good the predictions or how well the machine learning model we've just

190
00:13:00,560 --> 00:13:08,030
trained our random for us classifier has done learning on the training data.

191
00:13:08,030 --> 00:13:16,980
So why train let's do on the training data training data and test data is what we're doing here.

192
00:13:17,100 --> 00:13:21,750
So see a left dot school where it's actually done very well.

193
00:13:22,360 --> 00:13:28,270
So this is going to return if we press shift tab the mean accuracy on the given test data and labels.

194
00:13:28,270 --> 00:13:29,880
So we've passed it the training data.

195
00:13:29,890 --> 00:13:33,150
So the model has done 100 percent one point zero.

196
00:13:33,160 --> 00:13:36,180
That's the maximum you can get with score on our training data.

197
00:13:36,180 --> 00:13:38,970
Let's see how it's done on the test data.

198
00:13:39,100 --> 00:13:42,790
Now if it gets 100 percent on the test data we might think that something's wrong.

199
00:13:42,790 --> 00:13:43,780
Beautiful.

200
00:13:43,780 --> 00:13:44,210
Excellent.

201
00:13:44,210 --> 00:13:45,910
So what has happened here.

202
00:13:45,910 --> 00:13:50,140
Well the model has found patterns in the training data so well that it's got 100 percent.

203
00:13:50,200 --> 00:13:54,260
Why y because it got trained on the features.

204
00:13:54,490 --> 00:13:56,120
So way back up here.

205
00:13:56,140 --> 00:14:02,710
It got trained on on these as well as the labels so it had a chance to correct itself if it got something

206
00:14:02,710 --> 00:14:09,430
wrong but it performs at 75 percent accuracy which means it gets three out of four predictions correct.

207
00:14:09,460 --> 00:14:15,580
On the test data because it's never seen that data nor has it ever seen the labels.

208
00:14:15,580 --> 00:14:16,150
All right.

209
00:14:16,390 --> 00:14:20,890
So there are some more metrics that we can use rather than just accuracy.

210
00:14:20,920 --> 00:14:30,190
So let's say that from S.K. line dot metrics import classification report because we have a classification

211
00:14:30,190 --> 00:14:30,910
problem.

212
00:14:31,060 --> 00:14:36,850
We want to confusion matrix and one on accuracy score just to see another way of getting the same thing

213
00:14:36,850 --> 00:14:37,680
that we have here.

214
00:14:38,200 --> 00:14:46,640
So we'll start off we'll go print a grant classification report and we want to compare what does this

215
00:14:46,640 --> 00:14:47,130
take.

216
00:14:47,140 --> 00:14:49,650
We can't press shift time because we haven't imported it yet.

217
00:14:49,700 --> 00:14:54,740
I believe it takes y tests to test labels are true labels versus our predictions.

218
00:14:54,740 --> 00:15:01,110
So what this is going to return is spoiler it lets you see a.

219
00:15:01,260 --> 00:15:07,670
So what this shows us is some classification metrics that compare the test labels to the predictions.

220
00:15:07,680 --> 00:15:10,990
The prediction labels that we made up here with our model.

221
00:15:11,070 --> 00:15:16,650
So using the predictive function and then if we do the same again we can do our confusion matrix so

222
00:15:16,650 --> 00:15:23,930
we get confusion matrix and we want Y test that's why Fred's wonderful.

223
00:15:23,970 --> 00:15:31,750
And then we can do accuracy school and we want Y test y periods beautiful.

224
00:15:31,750 --> 00:15:35,530
These are just some more steps that we can do to evaluate our model.

225
00:15:35,550 --> 00:15:40,130
And so next step would be we're not really happy with with how the accuracy score is doing there.

226
00:15:40,130 --> 00:15:48,490
So we want let's go to step five which is come back up here right to the top improve model.

227
00:15:48,590 --> 00:15:49,080
OK.

228
00:15:49,400 --> 00:15:49,900
Let's do it.

229
00:15:49,930 --> 00:15:52,330
Let's see how we can improve a model.

230
00:15:52,700 --> 00:16:02,650
We might go try different amount of an estimate is which is one of the hyper parameters a.k.a. dials

231
00:16:02,650 --> 00:16:05,800
on our machine learning model that we can churn to try and improve it.

232
00:16:06,310 --> 00:16:07,490
So let's see this in action.

233
00:16:07,530 --> 00:16:10,120
MP We'll start off with a random seed.

234
00:16:10,120 --> 00:16:14,630
Well the results are replicable and be random seed.

235
00:16:14,630 --> 00:16:20,310
And I want for I in range 10 10 100.

236
00:16:20,320 --> 00:16:23,730
Step can be 10 here and we won't print.

237
00:16:23,830 --> 00:16:31,570
We'll give a little communication trying model with I estimate is wonderful.

238
00:16:31,690 --> 00:16:34,550
Oh you've hit shift enter too quickly.

239
00:16:34,580 --> 00:16:36,480
It's like I'm trigger happy and again.

240
00:16:36,530 --> 00:16:37,870
My goodness.

241
00:16:37,890 --> 00:16:41,540
OK now then we want to go see a laugh.

242
00:16:41,570 --> 00:16:47,210
Eagles will instantiate another model random forest press tab classify beautiful.

243
00:16:47,210 --> 00:16:55,250
We can thank you auto complete now estimate is eagles I wonderful so this is going to loop through and

244
00:16:55,250 --> 00:17:00,980
try 10 and then 20 and then all out to 100 estimates because we're trying to figure out if we can improve

245
00:17:00,980 --> 00:17:07,120
our model by adjusting one of the Hopper parameters because that's step 5 improved model so we'll go

246
00:17:07,530 --> 00:17:19,620
print if model accuracy on tests and we'll go model dot score x test because we want to evaluate our

247
00:17:19,620 --> 00:17:28,200
model on the test data we're just gonna learn on the training data and then evaluate itself on the test

248
00:17:28,200 --> 00:17:34,260
data and then we'll put a little bit of a percentage sign here maybe limit it to two decimal places.

249
00:17:34,260 --> 00:17:36,180
Does that make sense.

250
00:17:36,180 --> 00:17:39,640
Let me read back this model accuracy on test set model school.

251
00:17:39,650 --> 00:17:46,890
Yet we're just adding a little bit of communication here and then maybe we'll print a a space here so

252
00:17:46,890 --> 00:17:47,740
we can kind of see.

253
00:17:47,760 --> 00:17:49,080
Let's say this action.

254
00:17:49,570 --> 00:17:51,900
Try and model with 10 estimate as wide in this work.

255
00:17:51,910 --> 00:17:53,290
No model is not defined.

256
00:17:53,330 --> 00:18:00,640
Whoops see I told you model can be used we need CnF CCL f remember is short for classifier and let's

257
00:18:00,640 --> 00:18:00,910
do it.

258
00:18:02,540 --> 00:18:03,980
How come it's not working again.

259
00:18:06,650 --> 00:18:11,490
Random Forest classifier is not fitted yet call with fit.

260
00:18:12,410 --> 00:18:14,660
Yes we've instantiated but we haven't.

261
00:18:14,660 --> 00:18:19,740
How would you want to score a model we've never never fitted on our day yet.

262
00:18:20,110 --> 00:18:24,800
They say this is great love coding full of errors.

263
00:18:25,040 --> 00:18:26,030
Wonderful.

264
00:18:26,030 --> 00:18:30,560
Now if we would look back through this you would say that the best model with the best accuracy if we

265
00:18:30,560 --> 00:18:34,310
go back is 20 estimates.

266
00:18:34,430 --> 00:18:35,270
Well there we go.

267
00:18:35,300 --> 00:18:42,380
We've gone from what was our original one seventy five percent to 83 percent by adjusting one of the

268
00:18:42,380 --> 00:18:47,460
high parameters of our model to 20 instead of the default which is 10 currently.

269
00:18:47,540 --> 00:18:51,010
But it's gonna be updated to 100 and we're going really faster here.

270
00:18:51,020 --> 00:18:52,340
What's a final stamp.

271
00:18:52,340 --> 00:18:57,450
We go back up here right to the top save a model and load it.

272
00:18:57,500 --> 00:18:58,010
Okay.

273
00:18:58,130 --> 00:18:59,420
Most do it.

274
00:18:59,420 --> 00:19:03,670
So we want to go six save a model and load it.

275
00:19:03,680 --> 00:19:09,320
Now we can save the model with Python's pickle y rate so we're going to import pickle and then to save

276
00:19:09,320 --> 00:19:17,180
the model will go pig or dump and we'll pass it our model a.k.a. ACL F4 classifier will pass and also

277
00:19:17,300 --> 00:19:19,680
give it open with the file name that we want.

278
00:19:19,700 --> 00:19:27,050
If I can remember these random forest Model 1 fancy name haha pickle and then we want to write binary

279
00:19:27,140 --> 00:19:29,820
with WB wonderful.

280
00:19:29,990 --> 00:19:32,690
So this is going to save lot.

281
00:19:32,840 --> 00:19:35,260
See I'm not even following my own model here.

282
00:19:35,290 --> 00:19:38,650
If in doubt run the code now this should have saved a model to file.

283
00:19:38,870 --> 00:19:45,960
Let's go up and have a look if we refresh this beautiful random forest model there.

284
00:19:45,960 --> 00:19:48,800
Now what happens if we tried to import that.

285
00:19:48,840 --> 00:20:00,940
So we want to go loaded model Eagle's pickle dot load and we want to go open pass the file name random.

286
00:20:01,350 --> 00:20:02,370
Can we do that.

287
00:20:02,370 --> 00:20:03,570
Yes we can.

288
00:20:03,570 --> 00:20:04,910
Tab auto complete.

289
00:20:04,920 --> 00:20:13,200
And now this time we need be followed after that read binaries and then we'll go loaded model dot School

290
00:20:13,860 --> 00:20:20,910
X test y test see what happens 73.

291
00:20:21,010 --> 00:20:25,510
Now that should line up seventy three point seven with the last model that we tried up here and it does

292
00:20:26,110 --> 00:20:27,040
wonderful.

293
00:20:27,070 --> 00:20:32,680
Now we've just zoomed through that and took about 20 minutes or so to do an entire socket learn workflow

294
00:20:33,040 --> 00:20:34,430
with a lot of talking.

295
00:20:34,450 --> 00:20:36,810
So we started off with one getting the data ready.

296
00:20:36,820 --> 00:20:39,060
We saw our data frame of heart disease.

297
00:20:39,190 --> 00:20:43,880
Then we split it into x and y which is often what you'll do in supervised learning problems in psychic

298
00:20:43,890 --> 00:20:44,600
line.

299
00:20:44,650 --> 00:20:47,920
We're going to use that the features here to predict the target.

300
00:20:47,920 --> 00:20:48,920
Wonderful.

301
00:20:49,000 --> 00:20:50,520
And then we chose the right model.

302
00:20:50,540 --> 00:20:53,560
We kind of skimmed over a bit of how you'd actually do that.

303
00:20:53,590 --> 00:20:59,050
But from experience I know random forest classifier is pretty good at doing classification problems.

304
00:20:59,110 --> 00:21:00,880
So we'll see how to do that.

305
00:21:00,880 --> 00:21:05,050
And then we decided to use the default hyper parameters which are those little knobs you can turn on

306
00:21:05,050 --> 00:21:05,830
a model.

307
00:21:06,010 --> 00:21:11,860
And we saw how to fit the model to the training data so a.k.a. a model finding patterns in the training

308
00:21:11,860 --> 00:21:12,670
data.

309
00:21:12,670 --> 00:21:18,850
And first of all to even get our data from a big set into training a test we use this little helper

310
00:21:18,850 --> 00:21:20,620
function from so I can't learn.

311
00:21:20,620 --> 00:21:21,560
Thank you socket line.

312
00:21:22,210 --> 00:21:24,510
And then we made a prediction or at least we tried to.

313
00:21:24,520 --> 00:21:30,550
And then we realized hold on our model can't make predictions on things that aren't the same shape because

314
00:21:30,550 --> 00:21:33,110
remember psychic loan is built on Empire raise.

315
00:21:33,160 --> 00:21:39,490
So if we pass that array this bad boy here that isn't the same shape as the data that it's trained on

316
00:21:39,840 --> 00:21:41,180
now it's going to throw us in error.

317
00:21:41,650 --> 00:21:49,090
So we fix that error by asking our model to predict on X test which in turn it predicted a label y spreads

318
00:21:49,300 --> 00:21:55,600
for each of our samples then we compared our predictions to the Y test through evaluation.

319
00:21:55,820 --> 00:22:01,990
Our model got 100 percent on the training data but only got 75 percent accuracy on the test data which

320
00:22:01,990 --> 00:22:03,460
actually is still pretty good.

321
00:22:03,490 --> 00:22:09,940
A coin toss for 0 0 1 would be 50 percent so our models are halfway to being the perfect model.

322
00:22:09,940 --> 00:22:14,920
And then we go here we tried a few more classification metrics so we got a little bit more of an insight

323
00:22:14,950 --> 00:22:16,740
rather than just accuracy.

324
00:22:16,740 --> 00:22:22,540
We did a confusion matrix and use the accuracy score function and then step 5.

325
00:22:22,540 --> 00:22:28,150
We tried a bunch of different hyper parameters a.k.a. different estimate as we found out that maybe

326
00:22:28,150 --> 00:22:32,800
20 estimate is the best for our model because it got the highest accuracy.

327
00:22:32,800 --> 00:22:39,520
And then finally we saved a model and reloaded and it got the same score as our most recently trained

328
00:22:39,520 --> 00:22:43,280
model which was ninety estimated seventy three point seventy seven.

329
00:22:43,480 --> 00:22:46,270
And now this is a whole lot to take in.

330
00:22:46,360 --> 00:22:51,880
I completely understand that we've gone through it relatively well actually quite fast but what we're

331
00:22:51,880 --> 00:22:55,770
going to do is break down each of these steps throughout the next few videos.

332
00:22:56,050 --> 00:23:01,690
So if you want to go back through try and put all the code that we've used into one cell if you can

333
00:23:01,840 --> 00:23:04,500
see if you can get a model train can you get it saved.

334
00:23:04,510 --> 00:23:09,310
Can you again reimpose it have a little practice but otherwise I'll see in the next video and we're

335
00:23:09,310 --> 00:23:13,870
going to check out how to get our data ready for using it with machine learning models.