1
00:00:00,233 --> 00:00:00,966
Hello my friends.

2
00:00:00,966 --> 00:00:01,733
Welcome back.

3
00:00:01,733 --> 00:00:05,700
I'm sure you feel amazing after having
built your very first artificial

4
00:00:05,700 --> 00:00:06,700
neural network.

5
00:00:06,700 --> 00:00:09,066
But remember,
that's only half part of the job.

6
00:00:09,066 --> 00:00:13,700
The second half will be, of course,
to train it on the whole training set.

7
00:00:13,700 --> 00:00:16,066
And we are going to do this in two steps.

8
00:00:16,066 --> 00:00:20,233
The first one is to compile the A
and then with an optimizer,

9
00:00:20,233 --> 00:00:23,933
a loss function and a metric,
which will be of course the accuracy

10
00:00:24,166 --> 00:00:26,400
because we're doing some classification.

11
00:00:26,400 --> 00:00:29,600
And then the second step will be
of course to train the A

12
00:00:29,600 --> 00:00:32,866
and on the training set
over a certain number of epochs.

13
00:00:33,266 --> 00:00:34,266
Are you ready?

14
00:00:34,266 --> 00:00:35,100
Let's do this.

15
00:00:35,100 --> 00:00:38,466
Starting with this first step
compiling the an.

16
00:00:39,100 --> 00:00:39,500
All right.

17
00:00:39,500 --> 00:00:43,000
So once again doing
this will be super simple

18
00:00:43,000 --> 00:00:46,466
thanks to the TensorFlow library
that integrated Keras.

19
00:00:46,700 --> 00:00:50,866
Because indeed, to compile our A
and then we first need to start from our

20
00:00:50,866 --> 00:00:56,166
A an object which I remind was created
as an instance of the sequential class.

21
00:00:56,600 --> 00:01:00,233
And then from this object
we're going to call a new method,

22
00:01:00,233 --> 00:01:02,733
which this time of course
won't be the add method.

23
00:01:02,733 --> 00:01:05,666
But can you actually guess what
this method is going to be?

24
00:01:05,666 --> 00:01:09,400
You know, there is no trap in TensorFlow
nor any confusion.

25
00:01:09,666 --> 00:01:13,600
Well, the method to compile
an artificial neural network

26
00:01:13,800 --> 00:01:16,866
is simply the compile method.

27
00:01:17,066 --> 00:01:18,866
As simple as that, right?

28
00:01:18,866 --> 00:01:22,466
We didn't even have to look
at the TensorFlow documentation,

29
00:01:22,466 --> 00:01:26,066
which by the way, I still recommend
to have a look at because you will get

30
00:01:26,066 --> 00:01:30,000
a lot of information on the diverse tools
you have in the TensorFlow library.

31
00:01:30,300 --> 00:01:32,866
But here it's super intuitive,
it's super easy.

32
00:01:32,866 --> 00:01:35,100
So now I have a next question for you.

33
00:01:35,100 --> 00:01:35,833
According to you,

34
00:01:35,833 --> 00:01:39,600
what do we have to enter as parameters
inside this compile method?

35
00:01:39,833 --> 00:01:41,333
Well I actually said it.

36
00:01:41,333 --> 00:01:43,200
We have to enter three parameters.

37
00:01:43,200 --> 00:01:46,200
The first one is the optimizer to choose
and not to miser.

38
00:01:46,433 --> 00:01:49,466
Then the second one is the loss
to choose the loss function.

39
00:01:49,700 --> 00:01:53,633
And the third one is the metrics
with an s parameter.

40
00:01:53,733 --> 00:01:54,166
Because.

41
00:01:54,166 --> 00:01:58,033
Note that you can actually choose
several metrics to evaluate your

42
00:01:58,100 --> 00:02:00,833
and at the same time,
but we will only choose one

43
00:02:00,833 --> 00:02:02,533
and we will choose the accuracy.

44
00:02:02,533 --> 00:02:03,233
But there you go.

45
00:02:03,233 --> 00:02:04,833
These are the three parameters.

46
00:02:04,833 --> 00:02:07,833
So I suggest that
we start by entering them.

47
00:02:07,900 --> 00:02:10,533
And then we will enter their values okay.

48
00:02:10,533 --> 00:02:15,433
So let's start with the first one
optimizer equals all right.

49
00:02:15,433 --> 00:02:20,166
And comma the next one is the loss
for the loss function.

50
00:02:20,500 --> 00:02:25,500
And finally
the third one is the metrics parameter.

51
00:02:25,966 --> 00:02:26,533
All right.

52
00:02:26,533 --> 00:02:29,600
So for the optimizer
which one would you like to get.

53
00:02:29,933 --> 00:02:33,866
Well, in the intuition lectures
Kirill mentioned that the best one

54
00:02:33,866 --> 00:02:37,366
are the optimizers that can perform
stochastic gradient descent.

55
00:02:37,366 --> 00:02:40,433
And the best of them, you know,
the one that I recommend by default

56
00:02:40,566 --> 00:02:44,400
is the Adam Optimizer,
which is very performance

57
00:02:44,400 --> 00:02:47,466
optimizer that can perform
stochastic gradient descent.

58
00:02:47,466 --> 00:02:51,366
And by that, let me just remind what
stochastic gradient descent allows to do.

59
00:02:51,600 --> 00:02:56,966
Well, you know, it is what will update
the weights in order to reduce the loss

60
00:02:57,000 --> 00:03:00,300
error between your predictions
and the real results.

61
00:03:00,300 --> 00:03:04,800
You know, when we trained in and on the
training set, we will at each iteration

62
00:03:04,800 --> 00:03:09,900
compare the predictions in a batch
to the real results in the same batch.

63
00:03:10,100 --> 00:03:13,966
And that optimizer here
will update the weights through

64
00:03:13,966 --> 00:03:16,300
stochastic gradient descent,
because we're going to choose Adam

65
00:03:16,300 --> 00:03:20,533
optimizer two at the next iteration
hopefully reduce the loss.

66
00:03:21,033 --> 00:03:21,366
All right.

67
00:03:21,366 --> 00:03:25,366
So that's why right here we have to choose
an optimizer but also a loss function

68
00:03:25,466 --> 00:03:27,500
which is the way to compute the difference

69
00:03:27,500 --> 00:03:29,233
between the predictions
and the real results.

70
00:03:29,233 --> 00:03:33,433
And then the accuracy of course because
that's our final evaluation metric.

71
00:03:33,900 --> 00:03:34,200
All right.

72
00:03:34,200 --> 00:03:37,000
So as we said
we're going to choose the Adam optimizer.

73
00:03:37,000 --> 00:03:40,566
And the code name for
that is simply but with no capital letter.

74
00:03:40,866 --> 00:03:43,700
Adam okay. Congratulations.

75
00:03:43,700 --> 00:03:47,666
Now you know how to compile an artificial
neural network with an optimizer.

76
00:03:48,033 --> 00:03:51,033
But then we also have to compile it
with the loss function.

77
00:03:51,133 --> 00:03:53,633
And now you have to know something
very important.

78
00:03:53,633 --> 00:03:56,966
When you are doing binary classification,
you know, classification

79
00:03:57,100 --> 00:04:02,000
when you have to predict a binary outcome,
well, the loss function must always be

80
00:04:02,366 --> 00:04:06,566
the following one entered
in quotes, of course, which is binary

81
00:04:07,100 --> 00:04:10,466
underscore cross entropy.

82
00:04:11,200 --> 00:04:12,400
Just like that.

83
00:04:12,400 --> 00:04:16,000
And now let me tell you what
you would have to enter if you were doing

84
00:04:16,000 --> 00:04:17,500
non binary classification.

85
00:04:17,500 --> 00:04:20,300
You know, like for example
predicting three different categories.

86
00:04:20,300 --> 00:04:26,000
Well here you would have to enter 
categorical cross entropy loss okay.

87
00:04:26,000 --> 00:04:29,600
For binary classification
the loss must be binary cross entropy.

88
00:04:29,600 --> 00:04:34,133
And for non binary classification the loss
must be categorical cross entropy.

89
00:04:34,333 --> 00:04:36,100
And then also you know when doing

90
00:04:36,100 --> 00:04:39,833
non binary classification
when predicting more than two categories.

91
00:04:40,000 --> 00:04:45,000
Well the activation should not be sigmoid
but softmax right.

92
00:04:45,000 --> 00:04:49,133
I take this opportunity to also give you
the other cases of classification

93
00:04:49,133 --> 00:04:51,866
which you could encounter. Okay.
So now you know everything.

94
00:04:51,866 --> 00:04:55,700
And then remember that for regression
because we can also do

95
00:04:55,700 --> 00:04:57,866
artificial neural networks for regression.

96
00:04:57,866 --> 00:05:00,533
Well we have this free course
which I gave you the link.

97
00:05:00,533 --> 00:05:04,733
You can just take this course for free
and you will get the full implementation

98
00:05:04,733 --> 00:05:08,366
of an artificial neural network
for a regression case study.

99
00:05:08,366 --> 00:05:13,133
So you have really everything that you can
do with an artificial neural network.

100
00:05:13,633 --> 00:05:14,900
All right. Great.

101
00:05:14,900 --> 00:05:18,233
And now let's enter
the final parameter here metrics.

102
00:05:18,666 --> 00:05:22,066
As I said we can actually choose
several metrics at the same time.

103
00:05:22,300 --> 00:05:25,200
Therefore in order
to enter the values of this parameter

104
00:05:25,200 --> 00:05:28,200
well we have to enter them
in a pair of square brackets,

105
00:05:28,200 --> 00:05:31,266
which is supposed to be, you know,
the list of the different metrics

106
00:05:31,433 --> 00:05:35,100
with which you want to evaluate your in
and during the training,

107
00:05:35,366 --> 00:05:38,566
but we will only choose the main one,
you know, the most essential one,

108
00:05:38,766 --> 00:05:42,933
which is the accuracy
and which you have to enter in quotes.

109
00:05:42,933 --> 00:05:43,333
All right.

110
00:05:43,333 --> 00:05:46,233
So accuracy,
just like the classic spelling.

111
00:05:46,233 --> 00:05:48,633
And now now congratulations.

112
00:05:48,633 --> 00:05:51,633
You know how to do a full compile of your.

113
00:05:51,633 --> 00:05:55,600
And then with an optimizer
a loss and some metrics.

114
00:05:56,100 --> 00:05:56,733
Perfect.

115
00:05:56,733 --> 00:06:01,166
So now let's move on to the ultimate step
meaning the step

116
00:06:01,166 --> 00:06:05,733
where we will train the A
and onto the whole training set.

117
00:06:05,933 --> 00:06:08,133
So let's create a new code cell.

118
00:06:08,133 --> 00:06:12,000
And now according to you
how do we need to start this training.

119
00:06:12,333 --> 00:06:14,933
Well once again
you know it's always the same thing.

120
00:06:14,933 --> 00:06:18,633
We need to take our A and object,
then call a new method

121
00:06:18,633 --> 00:06:22,100
which will perform the training
and then enter a couple of parameters.

122
00:06:22,333 --> 00:06:23,400
So let's do this.

123
00:06:23,400 --> 00:06:26,400
Let's start with a
and and first our object.

124
00:06:26,400 --> 00:06:29,600
And then according to you
what will be the method

125
00:06:29,600 --> 00:06:32,600
that can train your artificial neural
network on the training set.

126
00:06:32,700 --> 00:06:34,633
Well nothing has changed here.

127
00:06:34,633 --> 00:06:39,600
And actually I think I said it
earlier in the course, the method to train

128
00:06:39,600 --> 00:06:42,700
whatever machine learning
model is always the same one.

129
00:06:42,700 --> 00:06:46,300
It is the fit method, the fit method,

130
00:06:46,500 --> 00:06:49,400
and which will take always
the same parameters.

131
00:06:49,400 --> 00:06:52,200
The first one is x train

132
00:06:52,200 --> 00:06:55,200
for you know, the matrix of features
of the training set.

133
00:06:55,400 --> 00:07:00,300
Then y train for the dependent variable
vector of the training set.

134
00:07:00,466 --> 00:07:04,566
And then when training an artificial
neural network, we actually need to enter

135
00:07:04,566 --> 00:07:09,066
two more parameters
which are first, the batch size.

136
00:07:09,300 --> 00:07:14,066
Because indeed batch learning
is always more efficient and more perform.

137
00:07:14,066 --> 00:07:17,133
And when training
an artificial neural network, meaning that

138
00:07:17,133 --> 00:07:21,000
instead of comparing your prediction
to the real result one by one,

139
00:07:21,000 --> 00:07:24,066
you know to compute and reduce the loss,
well, you're going to do that

140
00:07:24,066 --> 00:07:28,966
with several predictions compared
to several real results into a batch.

141
00:07:29,133 --> 00:07:33,333
And the batch size here, you know,
the batch size parameter gives exactly

142
00:07:33,333 --> 00:07:35,800
the number of predictions
you want to have in the batch

143
00:07:35,800 --> 00:07:38,800
to be compared
to that same number of real results.

144
00:07:39,000 --> 00:07:44,666
And the classic value of the batch size
that is usually chosen is 32, right?

145
00:07:44,666 --> 00:07:48,766
If you don't want to spend too much time
tuning this hyperparameter,

146
00:07:48,966 --> 00:07:52,066
well, I recommend to choose the default
value 32.

147
00:07:52,066 --> 00:07:55,866
But anyway, I wanted to highlight
that hyperparameter here because indeed,

148
00:07:55,866 --> 00:07:59,766
it is very important to remember
that we are doing batch learning.

149
00:07:59,766 --> 00:08:02,366
Okay, so batch size equal 32.

150
00:08:02,366 --> 00:08:07,100
And finally I'm sure you know which
final parameter we have to add to here.

151
00:08:07,266 --> 00:08:10,333
That's of course the number of epochs.

152
00:08:10,333 --> 00:08:13,233
You know,
a neural network has to be trained over

153
00:08:13,233 --> 00:08:17,700
a certain amount of epochs
so as to improve the accuracy over time.

154
00:08:17,700 --> 00:08:20,700
And we will clearly see that
once we execute this cell.

155
00:08:21,066 --> 00:08:25,833
So the name of the parameter
for the number of epochs is simply epochs.

156
00:08:26,200 --> 00:08:28,933
And well,
you will see that it will go very fast.

157
00:08:28,933 --> 00:08:30,900
So we can just take 100 epochs.

158
00:08:30,900 --> 00:08:33,400
But once again
feel free to choose another number

159
00:08:33,400 --> 00:08:36,433
as long as it is not too small,
because you know your neural network

160
00:08:36,433 --> 00:08:39,600
needs a certain amount of epochs
in order to learn properly.

161
00:08:39,600 --> 00:08:43,200
You know, learn the correlations
to get the ultimate best predictions.

162
00:08:43,900 --> 00:08:45,266
All right. Great.

163
00:08:45,266 --> 00:08:47,500
So we're actually done with part
three now.

164
00:08:47,500 --> 00:08:49,500
So I suggest we no longer wait.

165
00:08:49,500 --> 00:08:51,633
And that we execute all the cells.

166
00:08:51,633 --> 00:08:56,300
We haven't executed so far which I think
you know start from part two right. Yes.

167
00:08:56,300 --> 00:08:59,200
This was the last cell of the data
preprocessing phase.

168
00:08:59,200 --> 00:09:00,633
It was run properly.

169
00:09:00,633 --> 00:09:03,366
So let's actually run each cell one by one

170
00:09:03,366 --> 00:09:06,766
and see what we're going to get in the end
during the training.

171
00:09:06,766 --> 00:09:07,933
So let's start with this one.

172
00:09:07,933 --> 00:09:10,633
Initializing the and good.

173
00:09:10,633 --> 00:09:14,100
Now this one adding the input layer
and the first hidden layer.

174
00:09:14,766 --> 00:09:15,600
Good.

175
00:09:15,600 --> 00:09:18,233
Now this one adding
the second hidden layer.

176
00:09:18,233 --> 00:09:19,166
All good.

177
00:09:19,166 --> 00:09:21,666
And now this one adding the output layer.

178
00:09:21,666 --> 00:09:22,866
All good still.

179
00:09:22,866 --> 00:09:25,000
Now we end to part three.

180
00:09:25,000 --> 00:09:29,333
Executing
first this cell compiling the an all good.

181
00:09:29,566 --> 00:09:31,733
And now are you ready my friends.

182
00:09:31,733 --> 00:09:35,100
We're about to train
the artificial neural network

183
00:09:35,100 --> 00:09:39,366
on the training set over 100 epochs.

184
00:09:39,366 --> 00:09:40,333
And here we go.

185
00:09:40,333 --> 00:09:41,766
The training is starting.

186
00:09:41,766 --> 00:09:44,733
And as I told you, it's going pretty fast.
But look at this.

187
00:09:44,733 --> 00:09:48,466
Look at the accuracy
and how it is evolving over the epochs.

188
00:09:48,466 --> 00:09:52,000
And we can see that
it is actually increasing pretty fast.

189
00:09:52,000 --> 00:09:52,800
And mostly

190
00:09:52,800 --> 00:09:56,800
we see that it is actually converging,
you know, converging pretty quickly.

191
00:09:56,800 --> 00:10:02,966
You know, we converged at oh point 86,
you know, at about the 20 epochs.

192
00:10:02,966 --> 00:10:06,900
We actually didn't need that 100 epochs,
but 20 was fine.

193
00:10:06,900 --> 00:10:08,566
But anyway,
you know, it's going really fast

194
00:10:08,566 --> 00:10:10,433
and I'm sure it's very over soon.

195
00:10:10,433 --> 00:10:13,200
Now because indeed, yes, there we go.

196
00:10:13,200 --> 00:10:17,700
The training was done in more or less 20s
and the final accuracy

197
00:10:17,700 --> 00:10:20,733
we get on the training set,
we'll have to check the same on

198
00:10:20,733 --> 00:10:24,300
the test set is averaging around
oh point 86.

199
00:10:24,300 --> 00:10:25,000
That's really good.

200
00:10:25,000 --> 00:10:29,500
That means that out of 100 observations
you have 86 correct predictions.

201
00:10:29,766 --> 00:10:34,133
So congratulations, you made a very good
first deep learning model.

202
00:10:34,133 --> 00:10:35,433
You can be proud of yourself.

203
00:10:35,433 --> 00:10:39,300
And mostly now you can take a little break
because we're going to answer part four.

204
00:10:39,300 --> 00:10:41,366
And not only we're going to end
support for

205
00:10:41,366 --> 00:10:44,266
but also you're going to see that
you're going to have a little homework,

206
00:10:44,266 --> 00:10:48,266
which will consist of predicting
the result of a single observation,

207
00:10:48,266 --> 00:10:49,966
meaning a single customer.

208
00:10:49,966 --> 00:10:53,666
You will have to predict if this customer
will stay in or leave the bank.

209
00:10:53,833 --> 00:10:55,900
You will enter your solution here

210
00:10:55,900 --> 00:10:59,833
and we will implement the solution
together in the next tutorial.

211
00:11:00,166 --> 00:11:01,533
So make sure to do it.

212
00:11:01,533 --> 00:11:03,700
Please try at least to do it.

213
00:11:03,700 --> 00:11:07,066
You actually know how to do it
because we already learned how

214
00:11:07,066 --> 00:11:10,566
to do a single prediction before you know
the prediction of a single observation.

215
00:11:10,800 --> 00:11:12,000
So you have everything.

216
00:11:12,000 --> 00:11:15,600
Maybe check out again part
three classification if you have a doubt.

217
00:11:15,833 --> 00:11:17,900
But there you go. That's your homework.

218
00:11:17,900 --> 00:11:21,400
You have to use R A and model to predict
if the customer

219
00:11:21,400 --> 00:11:24,966
with the following information
will leave the bank yes or no.

220
00:11:25,300 --> 00:11:28,633
And these following informations
are that it is a French customer

221
00:11:28,733 --> 00:11:31,600
with a credit score of 600 and male one.

222
00:11:31,600 --> 00:11:33,033
He is four years old.

223
00:11:33,033 --> 00:11:35,033
He has been in the bank for three years.

224
00:11:35,033 --> 00:11:38,266
He has $60,000 in his account.

225
00:11:38,500 --> 00:11:40,166
He has two products in the bank.

226
00:11:40,166 --> 00:11:41,266
He has a credit card.

227
00:11:41,266 --> 00:11:47,033
Indeed, he is also an active member
and he has an estimated salary of $50,000.

228
00:11:47,233 --> 00:11:51,000
And the question is,
so should we say goodbye to that customer?

229
00:11:51,266 --> 00:11:55,333
Well, please figure it out and we will see
if you right in the next tutorial.

230
00:11:55,700 --> 00:11:57,566
Until then, enjoy machine learning.