1
00:00:01,260 --> 00:00:05,390
Now, in this lecture, you're going to be on Mardle using Cadeaux.

2
00:00:07,660 --> 00:00:09,690
But before that, we will do one small thing.

3
00:00:10,280 --> 00:00:18,660
We will decode another small set of our training examples as validation set validation said is used

4
00:00:18,660 --> 00:00:21,020
to tune the model, hypothetically does.

5
00:00:22,140 --> 00:00:26,910
So we'll just take our first 5000 examples as validation set.

6
00:00:27,750 --> 00:00:32,400
And the remaining we will see them in another variable called partial training, say.

7
00:00:34,590 --> 00:00:40,710
So here I am setting the NICE's or phosphide told him in a variable called Val NICE's.

8
00:00:44,220 --> 00:00:50,480
Now, I use this very well could pick out the information of First Beethoven and stored it into revalidation

9
00:00:50,580 --> 00:00:51,060
images.

10
00:00:54,070 --> 00:00:59,380
So the first thousand green images are now stored in val images.

11
00:01:00,670 --> 00:01:05,020
The remaining part of the training images is stored in part underscored grainy images.

12
00:01:08,800 --> 00:01:10,270
We do the same thing with labels.

13
00:01:11,620 --> 00:01:19,160
The first 5000 labels are stored in that labels and the rest of the labels are stored in part grain

14
00:01:19,180 --> 00:01:19,690
labels.

15
00:01:22,890 --> 00:01:26,540
So we have created two parts of the trading examples.

16
00:01:27,050 --> 00:01:30,280
One is validation set and one is the partial dating site.

17
00:01:31,150 --> 00:01:36,760
Will use the partial training set to train the model and the validation set will be used to do only

18
00:01:36,770 --> 00:01:37,520
hyper barometer's.

19
00:01:38,270 --> 00:01:41,270
And you will see the use of validation set in the coming lichter's.

20
00:01:44,750 --> 00:01:49,610
Now, there are two ways in which we can define that model using get us.

21
00:01:50,570 --> 00:01:55,250
One is using sequential EPA and the other is using functionally EPA.

22
00:01:57,710 --> 00:02:04,040
Sequencing the EPA is used when you want to make a normal neural network with linear stack of layers.

23
00:02:05,120 --> 00:02:11,840
The functional Libya is used for a big, complex network structure where you have multiple usages of

24
00:02:11,840 --> 00:02:12,930
several small one.

25
00:02:12,930 --> 00:02:16,920
It looks wiltsie an example of functional EPA.

26
00:02:17,330 --> 00:02:23,600
When we build regression model for this classification example, we will use sequential EPA.

27
00:02:26,560 --> 00:02:29,710
Now we'll be taking three steps in defining a model.

28
00:02:30,790 --> 00:02:33,220
The first step is defining the network structure.

29
00:02:34,270 --> 00:02:42,220
This includes setting up the number of players, number of neurons in each layer and the activation

30
00:02:42,220 --> 00:02:44,050
function to be used in each layer.

31
00:02:45,190 --> 00:02:47,560
This is captured and this part of the code.

32
00:02:52,290 --> 00:02:55,350
The second part is configuring the learning process.

33
00:02:56,730 --> 00:03:03,330
This includes selecting the lost function and optimize it and some metrics will be monitored.

34
00:03:05,070 --> 00:03:06,900
This is this part of the good.

35
00:03:12,110 --> 00:03:15,920
The third is specifying the operations and training the model.

36
00:03:17,360 --> 00:03:20,960
This is done in this last part of this called.

37
00:03:23,720 --> 00:03:25,890
So let's start discussing each line of good.

38
00:03:25,970 --> 00:03:28,930
One by one here.

39
00:03:29,120 --> 00:03:32,930
We start by creating a new variable called model.

40
00:03:35,120 --> 00:03:39,040
And this video will contain the information of the structure of our network.

41
00:03:41,050 --> 00:03:50,500
This is the function we use for using sequence, the EPA us model underscores the question then to start

42
00:03:50,500 --> 00:03:51,620
defining the structure.

43
00:03:52,270 --> 00:03:54,250
We use this pipe symbol.

44
00:03:55,720 --> 00:04:00,640
This pipe operator comes with the magnet package, which is auto installed.

45
00:04:00,670 --> 00:04:07,450
When we install Kiraz package, this is used for passing values as arguments to a function.

46
00:04:09,040 --> 00:04:10,970
We can do away with the symbol also.

47
00:04:11,110 --> 00:04:16,090
But the use of this symbol makes the code more readable and compact.

48
00:04:17,080 --> 00:04:19,700
So as a good practice, we will use this operator.

49
00:04:20,960 --> 00:04:22,510
This is a pipe operator.

50
00:04:25,170 --> 00:04:34,330
And if you remember even this operator that we used by assigning Dukane images and train labels, this

51
00:04:34,330 --> 00:04:35,800
multiple assignment operator.

52
00:04:36,670 --> 00:04:40,830
This is committers Xerox packet, which was also part of the key aspects.

53
00:04:42,730 --> 00:04:46,120
We're using this also because it makes the court compact.

54
00:04:48,170 --> 00:04:53,090
So first, we plan on the input layer, my flattening.

55
00:04:53,450 --> 00:04:56,090
I mean that we have 28 micro indeed image.

56
00:04:56,930 --> 00:05:01,820
We can turn it into one dimension by putting all these pixels in one line.

57
00:05:02,690 --> 00:05:09,200
What does happen if you have this three by three two dimensional array?

58
00:05:09,590 --> 00:05:14,390
You can flatten it by putting these multiple rows in front of each other.

59
00:05:14,780 --> 00:05:17,060
So this will become a one dimensional Eddie.

60
00:05:20,280 --> 00:05:26,390
This step is important as we have to give a straight say of input values in place of what to lean,

61
00:05:26,400 --> 00:05:32,580
but you can converted in one dimensional using edit easier function also.

62
00:05:33,830 --> 00:05:36,630
But when we have cameras, why should we bother?

63
00:05:36,930 --> 00:05:46,300
Just specify here live flatten like this and specify the input shape.

64
00:05:47,760 --> 00:05:51,090
That is what is the kind of input that this layer is having.

65
00:05:52,440 --> 00:06:00,650
It will automatically convert this 28 by 28 input into seven eight report pixel values for the next

66
00:06:00,650 --> 00:06:00,810
year.

67
00:06:04,040 --> 00:06:08,400
Next, we specify the details of what thens had to live.

68
00:06:09,650 --> 00:06:17,390
That is, we are telling that this list is Dennes, meaning each neuron is connected to all neurons

69
00:06:17,480 --> 00:06:18,290
of the next year.

70
00:06:19,280 --> 00:06:26,780
And in this layer, we want one twenty eight neurons and the activation function for all these neurons

71
00:06:27,080 --> 00:06:29,880
will meet the loop that is rectified.

72
00:06:29,910 --> 00:06:32,210
Linear Unit two.

73
00:06:32,300 --> 00:06:35,100
In this way we have defined one hidden layer.

74
00:06:35,840 --> 00:06:36,600
It's a densely.

75
00:06:37,310 --> 00:06:38,730
It has 128 neurons.

76
00:06:38,960 --> 00:06:40,640
And they look at vision function.

77
00:06:43,200 --> 00:06:44,970
Next, we specify the output layer.

78
00:06:45,750 --> 00:06:47,740
You can add more layers also.

79
00:06:48,510 --> 00:06:50,280
But here I'm using only one hit.

80
00:06:50,910 --> 00:06:55,410
And one output layer in this last net.

81
00:06:55,890 --> 00:07:01,700
We have 10 neutrons waiting because we have been classes to.

82
00:07:01,710 --> 00:07:09,090
We predicted each of these neurons will be predicting the probability of one class, such as whether

83
00:07:09,090 --> 00:07:10,770
it is a shirt or a boot.

84
00:07:11,700 --> 00:07:13,730
And if you remember from your theory lecture.

85
00:07:14,910 --> 00:07:21,270
This Softmax activation, just make sure that the sum of all the probabilities come out to one.

86
00:07:22,860 --> 00:07:26,880
So this last letter has 10 neurons with softmax activation.

87
00:07:28,980 --> 00:07:30,200
That's all for the structure.

88
00:07:31,230 --> 00:07:37,650
In short, it is a 70, 80 foot hyphen, 128, 110 neural network.

89
00:07:39,810 --> 00:07:45,690
Once we have run the entire code, I would suggest that you come back to this point, an experiment

90
00:07:45,690 --> 00:07:47,640
a little bit here, right.

91
00:07:47,790 --> 00:07:50,070
To see what is the effect of having more lives.

92
00:07:50,490 --> 00:07:54,570
And what is the effect of increasing or decreasing the number of neurons.

93
00:07:54,740 --> 00:07:57,820
Indeed, in these next underscored,

94
00:08:01,350 --> 00:08:06,510
you can see that a new variable called model is created and it has the structure stolen in it.

95
00:08:08,760 --> 00:08:12,960
Now, let's look at the second step at this step.

96
00:08:13,470 --> 00:08:15,300
We configured the learning process.

97
00:08:17,710 --> 00:08:22,060
In this, the first thing is specifying optimizer.

98
00:08:23,770 --> 00:08:26,950
We have discussed the concept behind stochastic gradient descent.

99
00:08:28,150 --> 00:08:34,780
This is Didi's is that only there are other optimizers also with small differences.

100
00:08:35,700 --> 00:08:40,330
Other optimizers include Adam Automats Prop and few others.

101
00:08:42,010 --> 00:08:45,700
In fact, in the coming name, we may see a few more added to this list.

102
00:08:46,990 --> 00:08:54,430
But to answer the question which should be used when ideally the optimizer depends on the shape of the

103
00:08:54,430 --> 00:08:55,310
error function go.

104
00:08:56,890 --> 00:08:58,300
But we do not know that shape.

105
00:08:59,110 --> 00:09:01,300
So we do not know the ideal optimize it.

106
00:09:03,520 --> 00:09:08,980
But practically in most of these scenarios, all of these work very well.

107
00:09:09,940 --> 00:09:16,660
It's just that for some scenarios, as Didi's converges faster and for some situations, automats prop

108
00:09:16,660 --> 00:09:17,530
converges faster.

109
00:09:19,150 --> 00:09:23,840
So my suggestion would be labeled Branly model ones with Edgerly.

110
00:09:24,850 --> 00:09:31,060
If you think it is taking too long to convert and your alternative much improvement and training, it

111
00:09:31,060 --> 00:09:35,140
is always worth a short brio automats prop optimize it also.

112
00:09:37,420 --> 00:09:40,660
Let's move on to this again parameter which is lost function.

113
00:09:42,100 --> 00:09:49,510
We have discussed this in the theory part for classification models, views cross entropy and for regression

114
00:09:49,510 --> 00:09:52,290
models we usually mean square values.

115
00:09:54,000 --> 00:09:56,830
But within Crosson probably you will find three options.

116
00:09:58,240 --> 00:10:01,780
Which of these three should you use different on the type of problem you have?

117
00:10:03,940 --> 00:10:10,830
These arbitrary names, sparse, categorical cross entropy, binary cross entropy and Categorical Cross

118
00:10:10,830 --> 00:10:11,290
and Brookie.

119
00:10:13,930 --> 00:10:21,590
If your problem has two glasses to be predicted, like whether a male is spam of Norks, man, use deep

120
00:10:21,610 --> 00:10:22,750
binary cross Brookie.

121
00:10:25,380 --> 00:10:32,610
If you have multiple classes such as this problem where we have fashion objects and each example is

122
00:10:32,740 --> 00:10:38,260
exclusive, meaning each image contains only one object to be predicted.

123
00:10:39,580 --> 00:10:42,990
Then we use sparse, categorical cross and Ruby.

124
00:10:44,670 --> 00:10:48,500
So that is why I have written losses equal to pass categorical course and repeated.

125
00:10:51,200 --> 00:10:58,580
If we have multiple classes and one observation can belong to many classes, for example, if we are

126
00:10:58,580 --> 00:11:04,670
labelling, whether an email is from someone you know or not, and we are also labeling whether the

127
00:11:04,670 --> 00:11:06,220
email is important or not.

128
00:11:07,730 --> 00:11:13,600
Here, one e-mail can be bought, it can be from someone you know, and it can be important.

129
00:11:15,470 --> 00:11:18,100
So it may belong to two classes at the same time.

130
00:11:18,980 --> 00:11:21,910
In this interview, we use categorical cross and Droopy.

131
00:11:24,230 --> 00:11:25,550
I hope you understood this.

132
00:11:26,420 --> 00:11:28,160
Here's a test of what I just said.

133
00:11:29,600 --> 00:11:30,560
You can look at this.

134
00:11:31,160 --> 00:11:32,210
Come and take part here.

135
00:11:32,390 --> 00:11:35,240
To understand the three Crosson rupees.

136
00:11:37,420 --> 00:11:39,040
The third parameter is metrics.

137
00:11:39,940 --> 00:11:45,440
This is not mandatory, but we specified this to monitor the performance of model on the training.

138
00:11:48,380 --> 00:11:54,610
Basically, we would like to see the improvement and accuracy of our classification model on the mean

139
00:11:54,610 --> 00:11:58,310
squared error of our regression model over each epoch.

140
00:12:00,440 --> 00:12:07,340
As I told you, we go over the entire Bringuier does it several times each time.

141
00:12:07,670 --> 00:12:10,340
We will calculate the accuracy of our model.

142
00:12:10,760 --> 00:12:17,480
At that instant and store it so that we can see if the learning process is having any improvement in

143
00:12:17,480 --> 00:12:18,440
accuracy or not.

144
00:12:20,960 --> 00:12:21,940
So that these three.

145
00:12:21,980 --> 00:12:24,490
But I would just say we can run this part of the code.

146
00:12:28,080 --> 00:12:30,190
Now we are configured the learning process also.

147
00:12:30,700 --> 00:12:38,040
This brings us to the third part where we actually train, not more than training is done using the

148
00:12:38,040 --> 00:12:41,440
correct function within feet function.

149
00:12:41,680 --> 00:12:44,680
We have to specify the input variable first.

150
00:12:46,120 --> 00:12:49,300
That is the posture training dataset that we will input.

151
00:12:51,580 --> 00:12:58,990
Then comes the actual output corresponding to those inputs to the actual output is stored in partial

152
00:12:58,990 --> 00:12:59,860
train labels.

153
00:13:00,520 --> 00:13:01,870
So that is the second parameter.

154
00:13:04,690 --> 00:13:07,570
Next, we specify the Époque number.

155
00:13:08,140 --> 00:13:12,790
This is the number of times an entire training data will be put into the model.

156
00:13:14,170 --> 00:13:16,910
We said this 230 for this example.

157
00:13:19,060 --> 00:13:20,680
Then we have that site.

158
00:13:21,040 --> 00:13:25,270
This is the number of observations which will be used during each.

159
00:13:25,300 --> 00:13:27,220
Forward and backward propagation.

160
00:13:27,220 --> 00:13:30,380
Step two, we take a bad size of 100.

161
00:13:33,740 --> 00:13:41,810
Lastly, we tell that we have a separate ventilation data also, which is a list of valid images and

162
00:13:41,810 --> 00:13:48,490
that labels and we would like to see the accuracy scored on this validating data as well.

163
00:13:51,190 --> 00:13:57,080
Keep in mind that only this part, brain images and bartering labels will be used to bring any more

164
00:13:57,080 --> 00:14:01,420
than this validation data is like this data.

165
00:14:01,720 --> 00:14:09,270
In this scenario, that is our model will not have seen the validation data when it calculates the accuracy

166
00:14:09,280 --> 00:14:10,520
on this validation data.

167
00:14:11,660 --> 00:14:14,210
Well, let's follow this line of thought as well.

168
00:14:14,640 --> 00:14:16,030
And this will be number one.

169
00:14:24,890 --> 00:14:27,770
Well, you can see that neural network model is getting green.

170
00:14:29,120 --> 00:14:34,100
And the accuracy and loss value is being recorded for each epoch.

171
00:14:36,580 --> 00:14:39,720
In the next video, we will see the performance of this train.

172
00:14:39,790 --> 00:14:40,080
More than.