1
00:00:00,780 --> 00:00:07,960
In the last lecture, we saw how to create, how to compile and how to screen our classification modern

2
00:00:08,220 --> 00:00:09,780
intensive, locate us.

3
00:00:11,590 --> 00:00:17,170
In this video, we will be and crane a deflation model using get us.

4
00:00:18,860 --> 00:00:26,510
For this, we will be using the very popular regression dataset that is California housing data set.

5
00:00:28,560 --> 00:00:33,300
This dataset is available in a Skillern database library.

6
00:00:35,140 --> 00:00:42,460
The objective here is to predict the prices of homes using eight different independent variables.

7
00:00:43,540 --> 00:00:44,980
So let's get started.

8
00:00:45,790 --> 00:00:52,080
First, we are importing some basic liabilities, such as by undoes and my plot.

9
00:00:54,920 --> 00:00:57,770
Then we are importing tents up, flowing gave us.

10
00:01:00,250 --> 00:01:08,050
And then since this data is available in Skillern dataset, we are also importing California housing

11
00:01:08,230 --> 00:01:09,570
from Escalon big.

12
00:01:10,930 --> 00:01:15,760
And we are saving this database and doing another really well called housing.

13
00:01:18,090 --> 00:01:23,670
I also want to share one small shortcut to access the help of any function.

14
00:01:23,910 --> 00:01:32,310
So if you just click inbetween of any function parenthesis and then hold the ship key and click on tab.

15
00:01:32,670 --> 00:01:39,900
So if you hit ship less there, it will open the help or documentation of that function.

16
00:01:41,260 --> 00:01:46,530
If you hit shift 10 two times, it will expand the documentation.

17
00:01:48,050 --> 00:01:50,990
You can see here this function will return.

18
00:01:51,080 --> 00:01:54,380
Different parameters such as not data.

19
00:01:54,780 --> 00:01:57,260
It will give us the independent variables.

20
00:01:57,490 --> 00:02:00,600
Dot, dot, dot will give us the dependent variable.

21
00:02:01,760 --> 00:02:05,270
And the feature entity will give us the details of features.

22
00:02:07,130 --> 00:02:08,810
Let's just close this.

23
00:02:09,590 --> 00:02:16,850
So if you have any doubt regarding any function, just click between Prentice's and her shift plus.

24
00:02:19,410 --> 00:02:24,330
So I have alluded this set some of the information about this database.

25
00:02:25,920 --> 00:02:29,840
So in this database, there are around 20000 records.

26
00:02:30,780 --> 00:02:33,240
There are eight independent variables.

27
00:02:34,230 --> 00:02:37,930
We have first variable as made in.

28
00:02:38,730 --> 00:02:41,910
This is the medium income in that particular block.

29
00:02:41,970 --> 00:02:43,090
Warehouses located.

30
00:02:44,100 --> 00:02:46,940
Then we have a second variable that is house eight.

31
00:02:47,430 --> 00:02:50,550
This is the median house eight in that block.

32
00:02:51,570 --> 00:02:55,560
Then we have average rooms, which is the average number of rooms.

33
00:02:56,010 --> 00:03:01,050
And then we have average bedrooms for average number of bedrooms.

34
00:03:01,710 --> 00:03:03,660
Next, we have population variable.

35
00:03:04,080 --> 00:03:05,700
That is the block population.

36
00:03:07,680 --> 00:03:10,260
Then we have average occupancy.

37
00:03:10,470 --> 00:03:12,650
That is the average house occupancy.

38
00:03:13,260 --> 00:03:21,090
And then latitude and longitude of that pole's block using this eight independent variables.

39
00:03:21,510 --> 00:03:24,540
We want to predict the value of the house.

40
00:03:25,020 --> 00:03:27,810
The values are in a hundred thousand.

41
00:03:28,080 --> 00:03:30,660
So supposing forward, why would you build these five?

42
00:03:30,720 --> 00:03:32,850
That means the value of their houses.

43
00:03:33,360 --> 00:03:34,980
Five hundred thousand dollars.

44
00:03:37,370 --> 00:03:38,960
So this is our dataset.

45
00:03:39,440 --> 00:03:44,150
We have this eight independent variable and one target variable as space.

46
00:03:45,650 --> 00:03:50,960
If you want some more detail about this dataset, you can click on this documentation link.

47
00:03:52,700 --> 00:03:57,550
This will open the official Escalon documentation of this dataset here.

48
00:03:58,090 --> 00:04:01,220
You will get to know about all the parameters that you can give.

49
00:04:02,040 --> 00:04:05,360
And what are included in this database?

50
00:04:06,620 --> 00:04:07,910
Let's go back.

51
00:04:10,990 --> 00:04:15,160
So this housing is in the form of a dictionary.

52
00:04:17,650 --> 00:04:20,950
We have one you loopier as feature me.

53
00:04:21,860 --> 00:04:25,100
So just look at the feature names first.

54
00:04:26,680 --> 00:04:31,120
You can see these are the eight variable names that we have discussed already.

55
00:04:33,720 --> 00:04:41,950
Now to EXs, the independent data we have to use housing data and to access the independent data set.

56
00:04:42,330 --> 00:04:44,540
We have to use housing target.

57
00:04:47,430 --> 00:04:56,190
So in this line of code, we are splitting our data first into green full and best dataset.

58
00:04:57,450 --> 00:05:04,890
Then we add further dividing this green full dataset in to extreme and X valuation dataset.

59
00:05:08,200 --> 00:05:16,240
We will be using test train straight from Escalon model selection, and then we will use test train,

60
00:05:16,330 --> 00:05:19,240
the split method to divide our data.

61
00:05:20,860 --> 00:05:24,760
We are not giving any additional parameter for test sites.

62
00:05:25,330 --> 00:05:31,080
That's because by default, the test size is 25 percent of the total data.

63
00:05:32,200 --> 00:05:40,120
So 25 percent of the total data that is around 20000 goes will go into test set.

64
00:05:40,630 --> 00:05:45,780
And then the remaining 75 percent will go in to training site.

65
00:05:47,160 --> 00:05:49,320
All of that 75 percent.

66
00:05:49,920 --> 00:05:56,890
Again, 25 percent will go into validation set and less of 75 percent will go into a working site.

67
00:05:58,270 --> 00:05:59,800
Let's run this.

68
00:06:02,700 --> 00:06:11,430
Next step is to process our data and we will be using the standard is scalar from Escala to standardize

69
00:06:11,430 --> 00:06:11,930
our data.

70
00:06:14,180 --> 00:06:20,490
And standardizing, we subtract the mean of each variable from their individual values.

71
00:06:21,020 --> 00:06:30,080
And then we also divide it by the variance because at the end we want all the variables with mean as

72
00:06:30,080 --> 00:06:33,080
zero and their variance as one.

73
00:06:34,340 --> 00:06:42,230
This is a standard procedure to create any machine learning more than the steps here are very simple.

74
00:06:42,530 --> 00:06:47,270
First, we are importing standard scale it from a skill pre processing.

75
00:06:48,320 --> 00:06:53,900
Then we are creating the scalar object using standard scalar method.

76
00:06:55,150 --> 00:06:56,780
Then we are training.

77
00:06:56,780 --> 00:07:00,140
This is Escala Object using Nowhat Extreme Data.

78
00:07:01,920 --> 00:07:09,230
So on our extra data, this is killer will find the values to subtract, as I mean, and to be weighed

79
00:07:09,230 --> 00:07:10,020
as a variance.

80
00:07:10,620 --> 00:07:12,480
Then we will use those values.

81
00:07:12,660 --> 00:07:18,120
Or this is scalar object to standardize a word validation and assets as well.

82
00:07:19,550 --> 00:07:22,200
Just so repeat, we are fitting.

83
00:07:22,260 --> 00:07:24,780
This is scalar object on our training data.

84
00:07:25,320 --> 00:07:29,070
And we are transforming our validation and test set using.

85
00:07:29,070 --> 00:07:33,150
This is scalar object that we have put it on our extreme data.

86
00:07:35,840 --> 00:07:40,190
Fate will be using fate, underscore, transformed, mattered.

87
00:07:40,440 --> 00:07:44,700
And we will be using extreme as parameter to transform.

88
00:07:44,760 --> 00:07:51,990
We will be using dort transform method of this scalar object and we will be using relevant datasets

89
00:07:51,990 --> 00:07:52,220
here.

90
00:07:55,720 --> 00:07:59,210
So let's create a standardized datasets.

91
00:08:00,430 --> 00:08:03,400
We are saving this object in their original name only.

92
00:08:03,760 --> 00:08:08,110
So we are replacing the ordinal extreme with the standardized version of extreme.

93
00:08:09,280 --> 00:08:14,370
Is regional ex validation set in to a standardized version of X validation set?

94
00:08:14,470 --> 00:08:17,420
And same for this site as what it says.

95
00:08:17,830 --> 00:08:22,340
And this if you want some more information on this riskily.

96
00:08:22,690 --> 00:08:25,900
You can always do for two and Skillern documentation.

97
00:08:26,100 --> 00:08:27,640
More standard is scalars.

98
00:08:29,380 --> 00:08:31,990
Next step is to set random seeds.

99
00:08:33,830 --> 00:08:37,760
This is to generate the same result every time we run this modern.

100
00:08:42,220 --> 00:08:49,630
Now, as I said earlier, our initial data set was off or on point, deposing laws or records.

101
00:08:50,440 --> 00:08:55,060
Now let us see the shape of the extended SEC.

102
00:08:57,560 --> 00:09:04,580
Here you can see we have eight columns and around eleven thousand six hundred reports.

103
00:09:05,480 --> 00:09:05,960
You know what?

104
00:09:06,020 --> 00:09:06,980
Extremely desolate.

105
00:09:08,950 --> 00:09:17,710
We should never own five Posehn records in our X test dataset and around 4000 in valuation, say.

106
00:09:20,820 --> 00:09:26,220
Now, let's create the structure for over regulation neural network.

107
00:09:28,890 --> 00:09:31,590
Here we will be first having an input layer.

108
00:09:32,960 --> 00:09:37,520
Then we will be having the first dance layer with party neurons.

109
00:09:38,180 --> 00:09:42,920
Then we want to create second dance layer with another turbin neurons.

110
00:09:43,760 --> 00:09:51,620
And then since this is the immigration problem, we will be having a single output neuron without any

111
00:09:51,620 --> 00:09:52,730
activation function.

112
00:09:54,620 --> 00:09:59,840
A single neuron, since we want a continuous value as our output.

113
00:10:01,370 --> 00:10:04,060
Again, we will be using the sequential EPA.

114
00:10:05,060 --> 00:10:10,040
We are saving this structure or our model as model.

115
00:10:12,550 --> 00:10:14,800
And then for the first, Leot.

116
00:10:16,850 --> 00:10:17,630
We are writing.

117
00:10:17,690 --> 00:10:19,730
Get us dot leered, dot dense.

118
00:10:20,360 --> 00:10:25,950
Here in parenthesis, we have to provide the number of neurons, which is starty.

119
00:10:26,280 --> 00:10:31,710
Then, as discussed in our three lectures, we will be using activation function as the RELU.

120
00:10:32,450 --> 00:10:40,010
And then since this is our first Soudan there, we need to provide the input shape since the number

121
00:10:40,010 --> 00:10:43,630
of independent variables in our data is eight.

122
00:10:44,400 --> 00:10:47,240
We will be using input chip equal to.

123
00:10:52,130 --> 00:10:54,860
You can also use like this input shape.

124
00:10:55,220 --> 00:11:02,680
Do extreme dort shape and then calling the second and so on, elements of over input chip.

125
00:11:03,860 --> 00:11:09,140
This way you don't have to worry about changing this number every time you change your data base.

126
00:11:09,650 --> 00:11:17,870
You can just straight xstream not shift and it will automatically get the number of variables from the

127
00:11:17,870 --> 00:11:20,900
shape attribute of forward extreme object.

128
00:11:23,680 --> 00:11:29,050
So this is the structure of our first dance layer will be cleared.

129
00:11:29,140 --> 00:11:31,330
Second, dense layer in a similar fashion.

130
00:11:31,780 --> 00:11:34,950
We will be using give us dot layer, dot dense.

131
00:11:35,380 --> 00:11:39,580
And then the number of neurons in the parenthesis, which is 30.

132
00:11:40,030 --> 00:11:42,300
And activation function as the RELU.

133
00:11:44,070 --> 00:11:49,700
Similarly, for the output layer, we will be using guitars, dot layard's dance.

134
00:11:50,130 --> 00:11:56,550
And since this aggression problem, we will be using a single neuron without any activation function.

135
00:11:58,670 --> 00:11:59,570
Just run this.

136
00:12:01,580 --> 00:12:08,000
And again, one important thing you can comment using hash symbol inside the cells.

137
00:12:09,560 --> 00:12:16,980
So Biton will execute on this part of the code and will not be executing any code which starts with

138
00:12:16,980 --> 00:12:20,050
Peche as planned for starting the coming.

139
00:12:21,670 --> 00:12:30,700
So now we have created the structure or architecture of what neural network does to conform and view

140
00:12:30,700 --> 00:12:31,310
this structure.

141
00:12:31,670 --> 00:12:35,860
We can call thought somebody might just write Mordred or.

142
00:12:40,890 --> 00:12:47,490
Somebody here, you will get the information about the structure that we have created.

143
00:12:47,640 --> 00:12:50,280
So we have first dense layer with dirt in Iran.

144
00:12:51,330 --> 00:12:53,460
We have second dense, layered with protein neurons.

145
00:12:53,940 --> 00:12:58,350
And lastly, we have a single output layer with one neuron.

146
00:13:00,450 --> 00:13:02,030
This is what we wanted.

147
00:13:03,180 --> 00:13:06,680
The next step should be to come find this Martin.

148
00:13:09,280 --> 00:13:14,830
Again, the combined method works similar for both classification and duration model.

149
00:13:15,970 --> 00:13:19,900
First, we have to mention the loss in classification.

150
00:13:19,930 --> 00:13:21,380
We were using Crosson Groppi.

151
00:13:22,360 --> 00:13:29,970
But Pia, since we are planning regulation, we have to use mean squared error, also known as MASC.

152
00:13:31,840 --> 00:13:34,140
The second parameter is optimizer.

153
00:13:36,010 --> 00:13:41,200
Again, here also we are using as Sudi, so plastic gradient descent.

154
00:13:42,370 --> 00:13:49,870
And here we are also providing the learning rate by default, the value of learning rated zero point

155
00:13:49,870 --> 00:13:52,740
zero one and two genes.

156
00:13:53,740 --> 00:13:54,850
You can just write.

157
00:13:56,150 --> 00:13:58,330
The new value in the parents is.

158
00:14:00,540 --> 00:14:04,090
We have already discussed what is learning English in order to re lecture.

159
00:14:05,080 --> 00:14:08,330
So if you have any doubts, just revisit that lecture.

160
00:14:09,940 --> 00:14:13,270
And then the next parameter that we are passing is metrics.

161
00:14:13,600 --> 00:14:15,250
This is an optional parameter.

162
00:14:17,490 --> 00:14:20,060
In classification, we were using accuracy.

163
00:14:21,080 --> 00:14:31,490
But in regulation, we can use mean absolute error or m a e absolute error is the difference between

164
00:14:31,490 --> 00:14:34,040
the predicted value and the actual value.

165
00:14:35,060 --> 00:14:39,960
Whereas the mean is squared error is the square of that difference.

166
00:14:41,390 --> 00:14:49,160
So we are concluding both mean squared error as a lost function and Emmy as the metrics we additionally

167
00:14:49,160 --> 00:14:50,060
want to calculate.

168
00:14:51,480 --> 00:14:59,370
Again, just remember, if you want to look at the documentation or hell, just click inside any of

169
00:14:59,370 --> 00:15:00,840
the parenthesis and.

170
00:15:01,940 --> 00:15:07,550
And press shiftless tab, you will get this kind of documentation.

171
00:15:07,880 --> 00:15:12,850
And here you can see that by default the learning rate value is 0.01.

172
00:15:13,970 --> 00:15:17,420
So in classification, we did not provided any learning rate.

173
00:15:17,750 --> 00:15:22,530
So the learning rate that was used, there was zero point zero one.

174
00:15:24,260 --> 00:15:27,430
But you can always teach these values according to your need.

175
00:15:29,450 --> 00:15:30,550
Let's run this.

176
00:15:31,360 --> 00:15:33,140
So we have compiled over more than.

177
00:15:35,640 --> 00:15:40,740
The next step, the next step is to screen out more than using screening data.

178
00:15:42,240 --> 00:15:44,340
The method or the process is same.

179
00:15:44,940 --> 00:15:48,760
We are creating another object model underscoring speed for screening.

180
00:15:49,530 --> 00:15:57,930
Then we are using more than not fit and we are passing our training dataset number of box and the validation

181
00:15:57,980 --> 00:15:59,730
dataset that we have created.

182
00:16:00,890 --> 00:16:02,950
Just run this a statement.

183
00:16:05,860 --> 00:16:06,250
Again.

184
00:16:08,740 --> 00:16:14,980
Just like the classification model, you will get the lost value, which is the squared error.

185
00:16:15,910 --> 00:16:20,380
You will get the Emmy you and you mean absolute terror.

186
00:16:21,340 --> 00:16:25,960
And similarly, you will get these two values for your validation set.

187
00:16:26,000 --> 00:16:26,380
That's when.

188
00:16:30,070 --> 00:16:37,150
And you can see that the loss on both training set and relevation set is decreasing with each epoch.

189
00:16:39,550 --> 00:16:43,900
Now we have these values for our training and validation set.

190
00:16:44,680 --> 00:16:49,600
We can also evaluate performance of this train model on our test set.

191
00:16:50,860 --> 00:16:55,120
And we are going to use the same method as fitted with classification model.

192
00:16:55,540 --> 00:16:58,140
We'll call model DOT evaluate.

193
00:16:58,450 --> 00:17:00,860
And then we will pass our training test.

194
00:17:01,860 --> 00:17:02,500
Then this.

195
00:17:04,990 --> 00:17:10,180
You can see on our trained deselect loss is zero point three.

196
00:17:10,420 --> 00:17:19,870
That is masc or squared added and m e e mean absolute error is zero point four four nine three.

197
00:17:22,420 --> 00:17:29,200
Now, just like in classification, we can call modern history, not history, that will give us the

198
00:17:29,320 --> 00:17:32,860
values of all this much crisis in the form of dictionary.

199
00:17:33,800 --> 00:17:40,210
And this year you will get the loss and Emmy on dreaming.

200
00:17:40,300 --> 00:17:40,990
Does it end?

201
00:17:41,130 --> 00:17:43,200
And Relevation Lowson validation me.

202
00:17:43,890 --> 00:17:53,410
The beauty of this is we can load this dictionary on a block just like we did for classification, and

203
00:17:53,410 --> 00:17:55,630
that will show us how we're training.

204
00:17:55,630 --> 00:18:03,160
Loss and reputation loss are changing with each epoch and whether we have achieved the convergence or

205
00:18:03,160 --> 00:18:03,540
not.

206
00:18:04,770 --> 00:18:05,830
This and this.

207
00:18:07,970 --> 00:18:14,480
So you can see we have the loss values and the Emmy win lose photo, well, training and validation

208
00:18:14,480 --> 00:18:16,100
set out on this graph.

209
00:18:16,880 --> 00:18:24,920
And one thing to notice is this graph is still going down, meaning that if we won some more epochs,

210
00:18:25,700 --> 00:18:31,190
this will further decrease the losses and improve the accuracy of our THAN.

211
00:18:33,220 --> 00:18:40,140
So this is the one way to tell whether you have achieved convergence or not or whether you have to increase

212
00:18:40,140 --> 00:18:41,650
your epoch relative or not.

213
00:18:42,700 --> 00:18:46,440
You have to look at this validation, loss and validation Emmy value.

214
00:18:47,620 --> 00:18:49,090
So this the validation loss.

215
00:18:49,420 --> 00:18:53,410
And you can clearly see that it is going down.

216
00:18:55,490 --> 00:18:59,630
So to improve the accuracy, we can read on this called.

217
00:19:02,060 --> 00:19:04,210
Run it for finding more Reeboks.

218
00:19:09,140 --> 00:19:11,240
So just go there.

219
00:19:13,310 --> 00:19:20,780
Now, one important thing about guitars, these guitars have the beats and Bias's value in the memory.

220
00:19:21,020 --> 00:19:27,850
So if you just redone this whole this statement, again, this will not screen the data from Esack.

221
00:19:28,190 --> 00:19:32,720
But it will start training the data from this position.

222
00:19:34,790 --> 00:19:40,960
So if we run this a statement two times that is similar to running a statement with the box.

223
00:19:42,660 --> 00:19:45,420
If we just read on there one more time.

224
00:19:48,020 --> 00:19:50,450
You can see earlier the lost values were.

225
00:19:51,600 --> 00:19:55,160
Around point seven or eight point two for the first Sepo.

226
00:19:55,660 --> 00:19:57,730
And then gradually decreasing.

227
00:19:58,240 --> 00:20:02,170
But now we have started from that 20 plus epoch.

228
00:20:07,620 --> 00:20:11,190
Last same, the lost value on our test set was zero open three zero.

229
00:20:11,790 --> 00:20:16,020
Let's see whether we have improved this lost value or not.

230
00:20:18,000 --> 00:20:23,310
You can see the losses decrease from zero point three to zero point two five.

231
00:20:25,230 --> 00:20:32,260
So what have put this is was correct that the model was not converged in 2012.

232
00:20:33,000 --> 00:20:38,360
There was room of improvement and B, B, then the whole more than 420 more epochs.

233
00:20:38,460 --> 00:20:40,370
That is a total of four epochs.

234
00:20:43,630 --> 00:20:45,300
And we can see this graph.

235
00:20:46,850 --> 00:20:48,160
You can just.

236
00:20:49,050 --> 00:20:51,240
Focus on this reputation loss lane.

237
00:20:51,870 --> 00:20:56,910
Earlier, it was set upon for a long period and this.

238
00:20:59,300 --> 00:21:02,780
There is a slight decrease in valuation loss.

239
00:21:03,020 --> 00:21:06,320
Now you can see that the line is flat turned out.

240
00:21:06,890 --> 00:21:11,060
This means we have achieved the convergence on this model.

241
00:21:12,440 --> 00:21:14,360
So not just with regulation.

242
00:21:14,780 --> 00:21:16,880
If you are running classification model as well.

243
00:21:18,050 --> 00:21:22,640
Just look at this graph to identify whether you have achieved convergence or not.

244
00:21:24,250 --> 00:21:30,740
Now, to predict the values on the new dataset, you can always use, not predict metric.

245
00:21:31,400 --> 00:21:34,310
So your object name and dot predict method.

246
00:21:34,760 --> 00:21:36,330
And then the new dataset.

247
00:21:37,310 --> 00:21:38,830
I don't have any new dataset.

248
00:21:38,960 --> 00:21:44,000
So I'm just creating the sample of those three values of my X dataset.

249
00:21:44,120 --> 00:21:53,420
And considering it as my new dataset and then saving the information in Y pretty good values and using

250
00:21:53,520 --> 00:21:54,070
model lot.

251
00:21:54,080 --> 00:21:59,870
Predict my turn to predict the values that the Fed will use using this model.

252
00:22:01,760 --> 00:22:04,580
That's all for dyslectic in the next lecture.

253
00:22:04,640 --> 00:22:08,820
We will be looking at the functional EPA of get us.

254
00:22:09,830 --> 00:22:10,220
Thank you.