1
00:00:00,766 --> 00:00:03,133
Hello and welcome back to the course
on Deep Learning.

2
00:00:03,133 --> 00:00:06,633
Now that we've seen neural networks
in action, it's time for us to find out

3
00:00:06,633 --> 00:00:08,333
how they learn.

4
00:00:08,333 --> 00:00:10,333
So let's dive straight into it.

5
00:00:10,333 --> 00:00:12,966
There are two fundamentally different

6
00:00:12,966 --> 00:00:15,966
approaches to getting a program
to do what you want it to do.

7
00:00:16,066 --> 00:00:20,533
One is hard coded coding
where you actually tell

8
00:00:20,533 --> 00:00:25,000
the program specific rules
and what outcomes you want,

9
00:00:25,000 --> 00:00:28,166
and you just guide it
throughout the whole way, and you account

10
00:00:28,166 --> 00:00:32,700
for all the possible options
that the program has to deal with.

11
00:00:33,166 --> 00:00:39,333
On the other hand, you have neural
networks where you create a facility

12
00:00:39,333 --> 00:00:43,433
for the program to be able to understand
what it needs to do on its own,

13
00:00:43,433 --> 00:00:48,166
to basically create this neural network
where you provided inputs.

14
00:00:48,400 --> 00:00:52,033
You tell it what you want as outputs,
and then you let it figure everything

15
00:00:52,033 --> 00:00:53,300
out on its own.

16
00:00:53,300 --> 00:00:55,866
Two fundamentally different approaches.

17
00:00:55,866 --> 00:01:00,233
And that is something to keep in mind
as we go through these tutorials.

18
00:01:00,666 --> 00:01:05,900
Our goal is to create this network
which then learns on its own.

19
00:01:06,200 --> 00:01:10,933
We are going to avoid it
trying to put in the rules.

20
00:01:10,933 --> 00:01:14,400
And a good example
that I can give you right now is,

21
00:01:14,400 --> 00:01:18,000
this will come further in the course,
but it's just a very visual example.

22
00:01:18,000 --> 00:01:21,366
For instance, how do you distinguish
between a dog and a cat?

23
00:01:21,666 --> 00:01:25,900
For on the left side, on the approach
that's, depicted on the left, you would,

24
00:01:26,400 --> 00:01:30,466
program things in like,
the cats ears have to be like this.

25
00:01:30,466 --> 00:01:32,533
Look out for, whiskers.

26
00:01:32,533 --> 00:01:34,066
Look out for this type of nose.

27
00:01:34,066 --> 00:01:37,466
Look out for this type of shape. of, face.

28
00:01:37,766 --> 00:01:38,800
Look out for these colors.

29
00:01:38,800 --> 00:01:39,166
You kind of.

30
00:01:39,166 --> 00:01:41,433
You describe all of these things,
and you'd have conditions

31
00:01:41,433 --> 00:01:45,200
like if if the ears are pointy,
then cat, if the ears,

32
00:01:46,600 --> 00:01:49,466
sloping down, then possibly dog and so on.

33
00:01:49,466 --> 00:01:53,100
On the other hand, for a neural network,
you'd just code the neural networks,

34
00:01:53,100 --> 00:01:56,666
you code the architecture,
and then you point the neural network

35
00:01:56,666 --> 00:01:59,666
at a folder of all these cats and dogs

36
00:01:59,666 --> 00:02:02,566
with images of cats and dogs,
which are already categorized.

37
00:02:02,566 --> 00:02:04,500
And you tell it, okay, I've got you.

38
00:02:04,500 --> 00:02:06,700
I've got some images of cats and dogs.

39
00:02:06,700 --> 00:02:08,766
Go and learn what a cat is.

40
00:02:08,766 --> 00:02:10,466
Go and learn what a dog is.

41
00:02:10,466 --> 00:02:13,600
And the neural network will on its own
understand

42
00:02:13,600 --> 00:02:15,133
everything it needs to understand

43
00:02:15,133 --> 00:02:18,133
and then further down, once it's trained
up, when you give it

44
00:02:18,133 --> 00:02:21,366
a new image of a cat or a dog,
it'll be able to understand what it was.

45
00:02:21,433 --> 00:02:23,100
So they're there. They are.

46
00:02:23,100 --> 00:02:25,533
Those are the two fundamentally
different approaches.

47
00:02:25,533 --> 00:02:27,633
And today we're going to slowly start

48
00:02:27,633 --> 00:02:30,700
getting into
how that second approach works.

49
00:02:30,966 --> 00:02:33,266
All right. So let's get straight to it.

50
00:02:33,266 --> 00:02:37,066
Here we have a very basic neural network
with a one layer.

51
00:02:37,066 --> 00:02:40,200
It's this is called a single layer
feedforward neural network.

52
00:02:40,500 --> 00:02:42,633
And it is also called a perceptron.

53
00:02:42,633 --> 00:02:46,966
Now before we proceed one thing that we do
need to adjust is that output value.

54
00:02:47,233 --> 00:02:50,900
Right now you can see that it's just a
why we need to put a y hat in there.

55
00:02:51,000 --> 00:02:55,366
And the reason for that
is usually y stands for the actual value.

56
00:02:55,366 --> 00:02:56,400
And that's what we're going to be using.

57
00:02:56,400 --> 00:03:00,600
So y is going to be the actual value
which we see in reality

58
00:03:01,100 --> 00:03:04,900
output value is the predicted value
by the algorithm,

59
00:03:04,900 --> 00:03:09,133
by the neural network
y hat is the, output value.

60
00:03:09,133 --> 00:03:11,600
Basically that's the denomination
for the output value.

61
00:03:11,600 --> 00:03:17,100
And the perceptron was first invented
in 1957 by Frank Rosenblatt.

62
00:03:17,333 --> 00:03:21,800
And his whole idea was to create
something that can actually learn,

63
00:03:22,066 --> 00:03:25,066
and, adjust itself.

64
00:03:25,066 --> 00:03:27,866
And this is what we're going
to be looking at now.

65
00:03:27,866 --> 00:03:30,133
So, we've got a perceptron.

66
00:03:30,133 --> 00:03:31,900
Let's see how a perceptron learns.

67
00:03:31,900 --> 00:03:35,700
So let's say we have some input values,
that have been supplied

68
00:03:35,700 --> 00:03:39,900
to the perceptron, and,
or basically to our neural network.

69
00:03:40,200 --> 00:03:40,666
Then,

70
00:03:41,700 --> 00:03:45,033
the activation function is applied,
we have an output,

71
00:03:45,400 --> 00:03:48,933
and now we're going
to plot the output on a, chart.

72
00:03:49,033 --> 00:03:51,666
So there it is, our output y hat.

73
00:03:51,666 --> 00:03:53,033
Now, what we need

74
00:03:53,033 --> 00:03:56,800
to do is in order to be able to learn,
we need to compare the output value

75
00:03:56,800 --> 00:04:01,000
to the actual value that we want
the neural network to get right.

76
00:04:01,466 --> 00:04:04,466
And that is, the value y.

77
00:04:04,666 --> 00:04:07,666
And so if we plot it here, you'll see that
there's a bit of a difference.

78
00:04:08,166 --> 00:04:10,733
Now we're going to calculate
a function called the cost

79
00:04:10,733 --> 00:04:13,433
function is calculated
as one half of the difference

80
00:04:13,433 --> 00:04:16,733
of the squared difference
between the actual value and output value.

81
00:04:17,066 --> 00:04:20,400
Now there there are many ways
you can come up with a cost function.

82
00:04:20,400 --> 00:04:23,266
There are many different cost functions
that you can use.

83
00:04:23,266 --> 00:04:25,666
this is probably the most commonly
used cost function.

84
00:04:25,666 --> 00:04:30,466
And why it is specifically this function
that we use.

85
00:04:30,466 --> 00:04:34,200
We'll find out further down when we're
talking about a gradient descent.

86
00:04:34,200 --> 00:04:37,666
But for now we're just going to, agree
that this is the cost function.

87
00:04:37,666 --> 00:04:40,400
And basically
what the cost function is telling us is,

88
00:04:40,400 --> 00:04:43,900
what is the error that you have
in your prediction?

89
00:04:44,133 --> 00:04:47,833
And, our goal is to minimize
the cost function because the,

90
00:04:47,833 --> 00:04:51,300
the lower the cost function, the closer
the y hat is to y.

91
00:04:52,033 --> 00:04:52,300
Okay.

92
00:04:52,300 --> 00:04:54,333
So as long as we agree on that,
let's proceed.

93
00:04:54,333 --> 00:04:58,300
So basically from here, what happens is, 
there's our cost function.

94
00:04:58,300 --> 00:05:02,833
And from here what happens is now
we're going to, once we've compared,

95
00:05:02,966 --> 00:05:08,666
now we're going to feed this information
back into, the neural network.

96
00:05:08,833 --> 00:05:09,600
So there we go.

97
00:05:09,600 --> 00:05:12,600
There's, the information
going back into the neural network,

98
00:05:12,866 --> 00:05:15,600
and it goes to the weights,
and the weights get updated.

99
00:05:15,600 --> 00:05:17,966
Basically,
the only thing that we have control of

100
00:05:17,966 --> 00:05:23,033
in this very simple neural network
are the weights w1, w2 all the way to W1.

101
00:05:23,866 --> 00:05:26,700
And, 
our goal is to minimize the cost function.

102
00:05:26,700 --> 00:05:29,366
So all we can do is update the weights.

103
00:05:29,366 --> 00:05:30,900
So we update the weights.

104
00:05:30,900 --> 00:05:36,133
tweak them a little bit and how exactly
we'll find out for the down.

105
00:05:36,133 --> 00:05:40,000
But for now we, we agree that we update
the weights and then we continue.

106
00:05:40,000 --> 00:05:44,700
So but here I've put up this,
screenshots of the data

107
00:05:44,700 --> 00:05:49,766
just to make some one point very clear
that right now, throughout this whole

108
00:05:49,766 --> 00:05:53,900
experiment, everything we're doing right
now, we're dealing with just the one row.

109
00:05:53,900 --> 00:05:57,800
So we're dealing with we have a data
set of one row where we have,

110
00:05:58,166 --> 00:06:03,733
for instance, we're dealing with,
how long you study like, the variable

111
00:06:03,733 --> 00:06:08,033
that we're predicting is what, what,
results you're going to get on an exam.

112
00:06:08,266 --> 00:06:11,433
And the dependent independent variables
that we have is how many hours

113
00:06:11,433 --> 00:06:13,833
did you study for
how many hours did you sleep,

114
00:06:13,833 --> 00:06:16,700
and what did you get on the quiz
in the mid semester.

115
00:06:16,700 --> 00:06:18,866
So in in the middle of the semester
it a quiz.

116
00:06:18,866 --> 00:06:19,800
What percentage did you get there.

117
00:06:19,800 --> 00:06:23,933
So based on those variables
we're trying to predict what score

118
00:06:23,933 --> 00:06:24,600
you'll get for the exam.

119
00:06:24,600 --> 00:06:26,700
And in an exam the 93%.

120
00:06:26,700 --> 00:06:29,466
That's the actual value. So that's why.

121
00:06:29,466 --> 00:06:32,200
So, so we feed these three values

122
00:06:32,200 --> 00:06:35,200
into our neural network again
for the second time now,

123
00:06:35,533 --> 00:06:38,733
and then we're going
to be comparing the result to Y.

124
00:06:39,000 --> 00:06:40,600
So let's see how this works.

125
00:06:40,600 --> 00:06:42,666
We feed these values
into the neural network.

126
00:06:43,700 --> 00:06:46,600
Everything gets adjusted and weights
get adjust.

127
00:06:46,600 --> 00:06:50,433
So as you can see this is again
we're going to feed the values again.

128
00:06:50,433 --> 00:06:53,100
The point here is
that we're feeding in these same values.

129
00:06:53,100 --> 00:06:54,400
So we only have one row.

130
00:06:54,400 --> 00:06:56,300
We're trying to we're training on one row.

131
00:06:56,300 --> 00:06:59,300
This is because this is just a very simple
basic example.

132
00:06:59,466 --> 00:07:01,633
Then we'll see what happens
when there's more rows.

133
00:07:01,633 --> 00:07:06,066
So again we feed these rows
in our cost functions get adjusted.

134
00:07:06,066 --> 00:07:10,433
As you can see everything happens along
those lines again.

135
00:07:10,433 --> 00:07:13,600
So as you can see,
every time our y hat is changing

136
00:07:13,600 --> 00:07:16,366
because we've tweaked the weights,
all I had is changing.

137
00:07:16,366 --> 00:07:18,266
Our cost function is changing.
Let's have a look again.

138
00:07:18,266 --> 00:07:21,366
So we feed those in Y hat is changing.

139
00:07:21,366 --> 00:07:22,733
Cost function is changing.

140
00:07:22,733 --> 00:07:25,266
We get information
back, feed back to the weights

141
00:07:25,266 --> 00:07:26,933
so that the weights get adjusted. Again.

142
00:07:26,933 --> 00:07:28,566
We feed in the same values.

143
00:07:28,566 --> 00:07:32,700
Every time everything gets adjusted
goes back to the weights and one more time

144
00:07:33,033 --> 00:07:34,333
feed in. Okay

145
00:07:35,600 --> 00:07:36,600
and another time.

146
00:07:36,600 --> 00:07:39,600
So we've adjust the weight, adjusted
the weights we feed in the information.

147
00:07:40,066 --> 00:07:41,300
And there we go.

148
00:07:41,300 --> 00:07:45,633
So, now this time the y hat is equal
to y equals functional zero.

149
00:07:45,833 --> 00:07:48,333
Usually we won't get cost
function equal to zero.

150
00:07:48,333 --> 00:07:50,700
But this is a very simple example.

151
00:07:50,700 --> 00:07:54,600
So hopefully all that made sense
every time we feed in exactly

152
00:07:54,600 --> 00:07:56,166
that same row.

153
00:07:56,166 --> 00:07:59,166
Because just in this case
we're just dealing with that one row

154
00:07:59,700 --> 00:08:04,400
into our neural network, where then, 
the weights get, the values get

155
00:08:04,433 --> 00:08:06,900
well supplied by the weights,
the activation function is applied.

156
00:08:06,900 --> 00:08:09,900
We get y hat, Y hat is compared to y.

157
00:08:10,200 --> 00:08:12,233
Then we see how the cost function
has changed.

158
00:08:12,233 --> 00:08:13,566
Feed back, feed that information

159
00:08:13,566 --> 00:08:16,800
back into the neural network
and then just adjust the weights again.

160
00:08:17,700 --> 00:08:21,066
and then we repeat the same process again
with the same exact row.

161
00:08:21,266 --> 00:08:23,333
we're trying to minimize
that cost function.

162
00:08:23,333 --> 00:08:26,566
So up until now we've been dealing with
just that one row.

163
00:08:26,866 --> 00:08:29,366
Let's see what happens
when you have multiple rows.

164
00:08:29,366 --> 00:08:31,200
So here's the full data set.

165
00:08:31,200 --> 00:08:35,266
We have eight rows of,
how many hours you slept.

166
00:08:35,266 --> 00:08:39,133
Or maybe these are, different students
in a day taking the same exam.

167
00:08:39,133 --> 00:08:43,166
How many other hours they studied,
how many hours they slept before the exam?

168
00:08:43,166 --> 00:08:47,033
What to get on the quiz
and their final result on the test.

169
00:08:47,366 --> 00:08:51,800
And as you can see here on the left,
I've got eight of these perceptrons.

170
00:08:51,800 --> 00:08:54,666
Actually,
they are all the same perceptron.

171
00:08:54,666 --> 00:08:55,900
So this is also important to understand.

172
00:08:55,900 --> 00:09:01,133
I just multiplied it or like duplicated
eight times just so that we can

173
00:09:01,733 --> 00:09:04,200
conceptual understand.

174
00:09:04,200 --> 00:09:06,666
But the important thing
here, it's the same neural network.

175
00:09:06,666 --> 00:09:10,300
We're going to be feeding these
into one same neural network.

176
00:09:10,300 --> 00:09:11,566
So let's go. Let's get started.

177
00:09:11,566 --> 00:09:14,566
So one epoch,

178
00:09:14,700 --> 00:09:18,166
as you all here
had learned, mentioning one epoch is

179
00:09:18,166 --> 00:09:22,233
when we go through our whole data set
and we train our,

180
00:09:22,600 --> 00:09:26,233
neural network on, on all of these, rows.

181
00:09:26,233 --> 00:09:27,333
So let's go, let's get started.

182
00:09:27,333 --> 00:09:31,200
So there's our first row,
and there's y hat for the first row.

183
00:09:32,400 --> 00:09:35,133
There's the second
row, there's y hat for the second row.

184
00:09:35,133 --> 00:09:39,266
So again it's being fed
into the same neural network every time.

185
00:09:39,300 --> 00:09:41,100
I've just copied them several times.

186
00:09:41,100 --> 00:09:44,100
So we can visually see how
this is happening.

187
00:09:44,933 --> 00:09:47,733
Then again is it's happening again.

188
00:09:47,733 --> 00:09:50,533
That's third row. Fourth row.

189
00:09:50,533 --> 00:09:53,533
There's our y hat for the fourth row
and so on basically.

190
00:09:53,666 --> 00:09:56,500
Then we get the same values
for the remaining four rows as well.

191
00:09:56,500 --> 00:10:02,400
So every time we just feed in a row into 
our neural network, we get a value.

192
00:10:02,800 --> 00:10:06,900
then we compare to the actual values.

193
00:10:06,900 --> 00:10:08,600
So they are the actual values.

194
00:10:08,600 --> 00:10:11,500
So for every single row
we have an actual value.

195
00:10:11,500 --> 00:10:14,666
And now based on all of these differences

196
00:10:14,666 --> 00:10:18,233
between y hat
and y, we can calculate the cost function

197
00:10:18,233 --> 00:10:22,200
which is the sum of all of those

198
00:10:22,200 --> 00:10:25,366
squared differences between y hat and y.

199
00:10:25,366 --> 00:10:27,066
And all of that is halved.

200
00:10:28,100 --> 00:10:30,200
And there's our cost function.

201
00:10:30,200 --> 00:10:33,833
And basically now what we do
after we have the full cost function,

202
00:10:34,166 --> 00:10:39,433
we go back and we update the weights,
we update w1, w2, w3.

203
00:10:39,433 --> 00:10:42,433
And the important thing to remember here
is that all of these,

204
00:10:42,600 --> 00:10:47,266
perceptrons, all of these neural networks
is actually one neural network.

205
00:10:47,266 --> 00:10:49,500
So there's not eight of them,
there's just one.

206
00:10:49,500 --> 00:10:52,766
And when we update the weights
we're going to update the weights

207
00:10:53,100 --> 00:10:54,400
in that one neural network.

208
00:10:54,400 --> 00:10:57,466
So basically the weights are going
to be the same for all of the rows.

209
00:10:57,766 --> 00:11:00,433
So it's not the case
that every row has its own weights.

210
00:11:00,433 --> 00:11:02,733
Now all the rows share the weights.

211
00:11:02,733 --> 00:11:06,300
And so that's why we 
looked at the cost function,

212
00:11:06,300 --> 00:11:09,900
which is the sum of the 
squared differences.

213
00:11:10,200 --> 00:11:11,866
And then we updated the weights.

214
00:11:11,866 --> 00:11:15,166
And now from here
that was just one iteration.

215
00:11:15,166 --> 00:11:16,400
Next we're going to

216
00:11:17,533 --> 00:11:18,933
run this whole thing again.

217
00:11:18,933 --> 00:11:23,433
We're going to, feed every single row
into the neural network,

218
00:11:23,600 --> 00:11:26,300
find out our cost function
and do this whole process again.

219
00:11:26,300 --> 00:11:30,566
So just as we saw previously,
where we had just one row

220
00:11:30,566 --> 00:11:33,533
and we were doing everything again
and again, again, again, same thing here.

221
00:11:33,533 --> 00:11:37,500
But now we're going to be doing it
for rows or 800 rows or a thousand rows,

222
00:11:37,500 --> 00:11:40,500
however many rows
you have in your data set.

223
00:11:40,666 --> 00:11:43,666
you do this process
and then you calculate the cost function.

224
00:11:44,100 --> 00:11:46,700
And the goal here is to minimize

225
00:11:46,700 --> 00:11:50,766
the cost function, 
and to get as soon as you found

226
00:11:50,766 --> 00:11:54,300
the minimum of the cost function,
that is your final neural network.

227
00:11:54,300 --> 00:11:57,833
That means your weights have been adjusted
and you have,

228
00:11:58,433 --> 00:12:01,800
found the optimal,

229
00:12:02,766 --> 00:12:04,400
weights for

230
00:12:04,400 --> 00:12:07,566
this, data set that you, you're
training on and you're ready

231
00:12:07,566 --> 00:12:10,566
to proceed to the testing phase
or to the application phase.

232
00:12:11,400 --> 00:12:14,400
And this whole process is called back
propagation.

233
00:12:14,833 --> 00:12:20,366
So some additional reading that you might
want to do for the cost function.

234
00:12:20,366 --> 00:12:24,766
And I know we just talked about one
and there are many different ones.

235
00:12:24,766 --> 00:12:28,200
A good article is located
on Cross-Validated.

236
00:12:28,666 --> 00:12:29,766
it's called a list of cost

237
00:12:29,766 --> 00:12:32,766
functions used in neural networks
alongside applications.

238
00:12:32,933 --> 00:12:35,700
So the URL is there,
but you can just Google

239
00:12:35,700 --> 00:12:38,933
for that exact search term
or a search phrase.

240
00:12:38,933 --> 00:12:41,933
And you will that this one will be
the first one that pops up.

241
00:12:42,000 --> 00:12:45,133
It's actually got some good
examples and application

242
00:12:45,666 --> 00:12:48,300
or use cases for different cost function.

243
00:12:48,300 --> 00:12:50,200
So if you're interested
to learn more about cost functions

244
00:12:50,200 --> 00:12:51,866
check out this article.

245
00:12:51,866 --> 00:12:54,266
And on that note,
I hope you enjoyed today's tutorial.

246
00:12:54,266 --> 00:12:55,933
I look forward to seeing you next time.

247
00:12:55,933 --> 00:12:57,966
Until then, enjoy deep learning.