1
00:00:00,366 --> 00:00:02,700
Hello and welcome back to the course
on Deep Learning.

2
00:00:02,700 --> 00:00:06,800
Today we're kicking off convolutional
Neural networks is going to be exciting.

3
00:00:06,800 --> 00:00:08,466
Let's dive straight into it.

4
00:00:08,466 --> 00:00:10,733
We're going to start off with an image.

5
00:00:10,733 --> 00:00:13,466
What do you see
when you look at this image.

6
00:00:13,466 --> 00:00:17,366
Do you see a person looking at you or
do you see a person looking to the right?

7
00:00:18,033 --> 00:00:21,566
You can see that your brain is is
struggling,

8
00:00:21,566 --> 00:00:25,700
is struggling to adjust if you look
to the right side of the image.

9
00:00:25,733 --> 00:00:27,333
Just look at the right
border of the image.

10
00:00:27,333 --> 00:00:30,200
You'll see a person looking to the right.
If you look at the left

11
00:00:30,200 --> 00:00:33,200
border of the image,
you'll see a person looking at you.

12
00:00:33,600 --> 00:00:36,666
And this just proves that

13
00:00:37,133 --> 00:00:42,066
what our brain is looking for
when we see things is features.

14
00:00:42,066 --> 00:00:44,500
Depending on the features that it sees,
depending

15
00:00:44,500 --> 00:00:48,333
on the features that you process,
you categorize things in certain ways.

16
00:00:48,566 --> 00:00:51,566
So when you look on the right side
of the image,

17
00:00:51,666 --> 00:00:53,933
you see certain features of a person
looking to the right

18
00:00:53,933 --> 00:00:56,933
because they're closer
to your center of focus,

19
00:00:57,100 --> 00:01:00,500
and therefore your brain classifies
that as a person looking to the right.

20
00:01:00,800 --> 00:01:04,200
When you look to the left
side of the image, you see more features

21
00:01:04,200 --> 00:01:09,000
of a person looking at you and therefore
your brain classifies it as such.

22
00:01:09,400 --> 00:01:11,100
So let's have a look at another one.

23
00:01:11,100 --> 00:01:12,733
This is a very famous image.

24
00:01:12,733 --> 00:01:15,600
You probably have already seen
it, but what do you see here?

25
00:01:16,666 --> 00:01:19,666
So some people will say that they see a

26
00:01:19,766 --> 00:01:23,366
young lady wearing a dress looking away.

27
00:01:23,700 --> 00:01:26,833
Some people will say they see an old lady

28
00:01:27,100 --> 00:01:30,100
wearing a scarf on her head, looking down.

29
00:01:30,100 --> 00:01:33,433
So I'm going to point these features out
and you'll see that it'll become

30
00:01:33,566 --> 00:01:34,200
very obvious.

31
00:01:34,200 --> 00:01:37,400
So this is the face of the young lady
looking away.

32
00:01:37,400 --> 00:01:40,300
She's looking into the distance.
That's her coat.

33
00:01:40,300 --> 00:01:41,166
That's her hair.

34
00:01:41,166 --> 00:01:43,500
That's her little feather in her hair.

35
00:01:43,500 --> 00:01:48,900
And on the other hand, this is the head
of the old lady looking down.

36
00:01:48,900 --> 00:01:49,766
That's her nose.

37
00:01:49,766 --> 00:01:52,200
That's her mouth, that's her chin.

38
00:01:52,200 --> 00:01:53,433
That's the scarf on her head.

39
00:01:53,433 --> 00:01:55,600
And she's looking down.

40
00:01:55,600 --> 00:01:59,266
So, as you can see, two and one,
and depending on which features

41
00:01:59,266 --> 00:02:02,266
your brain picks up,
it will switch between

42
00:02:02,433 --> 00:02:05,866
classifying
each the image as one or the other.

43
00:02:06,733 --> 00:02:09,666
The oldest one of these illusions,

44
00:02:09,666 --> 00:02:13,500
recorded in the printed work, is this one.

45
00:02:13,800 --> 00:02:15,133
It's the duck or the rabbit.

46
00:02:15,133 --> 00:02:16,866
So is this a duck or is this a rabbit?

47
00:02:16,866 --> 00:02:18,266
Another example.

48
00:02:18,266 --> 00:02:22,400
And now I'm going to show you an image
which will just for a second,

49
00:02:22,433 --> 00:02:25,533
just look at it and see what,
what's what is emotions

50
00:02:25,533 --> 00:02:28,533
or what kind of experience
visual experience you go through.

51
00:02:28,966 --> 00:02:31,000
So what do you see?

52
00:02:31,000 --> 00:02:35,433
Does you feel like a bit not dizzy,
but a little bit dazzled?

53
00:02:35,566 --> 00:02:38,700
Like your brain is trying to try and
understand what it is, what it is like.

54
00:02:39,000 --> 00:02:40,133
It's trying to.

55
00:02:40,133 --> 00:02:44,200
It's jumping
between her eyes up and down, eyes. And,

56
00:02:45,166 --> 00:02:46,166
this is a classic

57
00:02:46,166 --> 00:02:49,833
example of when there are certain features

58
00:02:49,833 --> 00:02:53,400
where it could be this, it could be that,
but your brain cannot decide.

59
00:02:54,000 --> 00:02:58,566
And because both seem, plausible
and, yeah.

60
00:02:58,566 --> 00:03:03,066
So basically, all of these examples
illustrate to us how the brain works, that

61
00:03:03,266 --> 00:03:08,266
it processes certain features on an image
or on whatever you see in, in real life.

62
00:03:08,633 --> 00:03:10,733
And it classifies that as such.

63
00:03:10,733 --> 00:03:15,200
And you probably been in situations
when you look over your shoulder quickly

64
00:03:15,200 --> 00:03:18,833
and you see something,
you think it's I don't know if it's like a

65
00:03:19,966 --> 00:03:23,833
a ball, but it turns out to be a cat
or you think it's a, it's a car,

66
00:03:23,833 --> 00:03:25,433
but it turns out to be a shadow
and things like that.

67
00:03:25,433 --> 00:03:28,233
That's because you don't have enough time
to process those features,

68
00:03:28,233 --> 00:03:31,100
or you don't have enough features
to classify things as such.

69
00:03:31,100 --> 00:03:34,433
And this is for me, it's

70
00:03:34,466 --> 00:03:37,466
this is very interesting
because what we're going to be doing with,

71
00:03:37,700 --> 00:03:40,700
neural networks, with convolutional
neural networks is very similar.

72
00:03:40,700 --> 00:03:44,466
And you'll find that the way
that computers are going to be processing

73
00:03:44,466 --> 00:03:48,100
images is going to be extremely similar
to the way we are processing images.

74
00:03:48,100 --> 00:03:52,200
So it's it's very valuable to understand
and just kind of remember these things

75
00:03:52,200 --> 00:03:53,500
that this is how we do it.

76
00:03:53,500 --> 00:03:56,500
And I'm going to take this lady off
your screens because it's

77
00:03:56,533 --> 00:03:58,466
she's probably already
freaking you out by now.

78
00:03:58,466 --> 00:04:00,866
So here's, something different.

79
00:04:00,866 --> 00:04:02,000
Here's an experiment.

80
00:04:02,000 --> 00:04:06,900
An experiment, done on computers,
on convolutional neural networks.

81
00:04:06,900 --> 00:04:10,500
So we're slowly moving
now, from humans to computers.

82
00:04:11,233 --> 00:04:14,233
And this slide is a
is from a told by Geoffrey Hinton.

83
00:04:14,433 --> 00:04:17,233
and here you have

84
00:04:17,233 --> 00:04:19,866
basically it describes
an experiment that he had done

85
00:04:19,866 --> 00:04:23,600
on, some convolutional neural networks
that he had trained up.

86
00:04:24,300 --> 00:04:27,333
So here you see three images,
and we're going to go through them

87
00:04:27,333 --> 00:04:30,000
left to right
and see how you would classify them,

88
00:04:30,000 --> 00:04:31,700
and then see how the computer
classify them.

89
00:04:31,700 --> 00:04:34,033
So on the left, what do you think this is.

90
00:04:35,333 --> 00:04:37,600
He probably
said cheetah and you will be right.

91
00:04:37,600 --> 00:04:38,766
And this is what the computer said.

92
00:04:38,766 --> 00:04:42,600
So and right away, right off the bat we're
going to learn how to read these images.

93
00:04:42,600 --> 00:04:47,033
Because, if you're going to go deep
into convolutional

94
00:04:47,033 --> 00:04:50,466
neural networks,
no pun intended, if you're going to,

95
00:04:51,066 --> 00:04:53,900
start learning more and more about them
and using them, you'll see a lot of these.

96
00:04:53,900 --> 00:04:57,000
So and I've actually seen people
read them incorrectly.

97
00:04:57,000 --> 00:05:01,333
So here at the top, 
cheetah is what it actually is.

98
00:05:01,333 --> 00:05:03,900
So that's the actual correct label.

99
00:05:03,900 --> 00:05:04,800
of the image.

100
00:05:04,800 --> 00:05:09,533
That's what, the label of the images,
regardless of any processing and, and,

101
00:05:09,833 --> 00:05:11,133
the computer vision.

102
00:05:11,133 --> 00:05:13,800
and then here are the guesses.

103
00:05:13,800 --> 00:05:16,800
The top 4 or 5 sometimes guesses of the,

104
00:05:16,933 --> 00:05:20,533
algorithm
and they're given the probability.

105
00:05:20,533 --> 00:05:23,900
So the computer said 
or the neural network said

106
00:05:23,900 --> 00:05:27,133
cheetah, leopard, snow leopard
or Egyptian cat can be one of the four.

107
00:05:27,400 --> 00:05:29,033
And cheetah has the highest vote.

108
00:05:29,033 --> 00:05:32,766
And throughout this, part of the course,
you will understand what these votes mean.

109
00:05:32,766 --> 00:05:34,666
And, how they are derived.

110
00:05:34,666 --> 00:05:36,400
But for now, it's pretty intuitive. Right?

111
00:05:36,400 --> 00:05:38,233
So, it's a cheetah in reality.

112
00:05:38,233 --> 00:05:40,566
And the neural network guessed right.

113
00:05:40,566 --> 00:05:43,433
It said with, with a high probability,
about like 95, 99%.

114
00:05:43,433 --> 00:05:44,033
It's a cheetah.

115
00:05:45,300 --> 00:05:47,366
then the second one, what do you think?

116
00:05:47,366 --> 00:05:50,766
Is it that is that is a bullet train

117
00:05:51,133 --> 00:05:54,633
and the neural network
was able to distinguish between

118
00:05:54,633 --> 00:05:57,933
bullet train, passenger
car, subway train, electric locomotive.

119
00:05:57,933 --> 00:06:00,366
Those are the top choices. Of course,
it had many more options.

120
00:06:00,366 --> 00:06:03,600
These neural networks,
learn to distinguish from

121
00:06:04,200 --> 00:06:08,666
not just four categories from dozens,
thousands of categories at the same time.

122
00:06:08,666 --> 00:06:10,800
So those are the four options
that it picked.

123
00:06:10,800 --> 00:06:12,700
And so that's bullet train editable train.

124
00:06:12,700 --> 00:06:15,700
So what do you think the last one is?

125
00:06:16,266 --> 00:06:18,466
very there are a couple of options there.

126
00:06:18,466 --> 00:06:21,433
It's not very clear
what is it could be a frying pan.

127
00:06:21,433 --> 00:06:22,733
It could be a magnifying glass.

128
00:06:22,733 --> 00:06:26,966
It could be
even maybe a pair of scissors.

129
00:06:26,966 --> 00:06:29,166
Some might say, well,
the neural network said

130
00:06:29,166 --> 00:06:32,333
it was a pair of scissors,
but you can see how you can go wrong here.

131
00:06:32,433 --> 00:06:35,366
First of all, it's not a very clear image.

132
00:06:35,366 --> 00:06:38,233
And also you can see that the,

133
00:06:38,233 --> 00:06:41,700
probabilities are not as clear here.

134
00:06:41,700 --> 00:06:46,200
So the neural network was a bit confused,
a bit indecisive, just as we are.

135
00:06:46,200 --> 00:06:49,600
So, it said scissors with the highest
probability, but then it had hand gloss,

136
00:06:49,600 --> 00:06:53,700
which it actually was with not,
not so far away on the second place.

137
00:06:53,700 --> 00:06:55,766
And frying pan. stethoscope.

138
00:06:55,766 --> 00:06:58,500
So basically, here you can see that

139
00:06:58,500 --> 00:07:01,533
scissors was its first guess,
but the correct option was number two.

140
00:07:01,533 --> 00:07:03,133
And that's why it's highlighted in red.

141
00:07:03,133 --> 00:07:03,900
So there we go.

142
00:07:03,900 --> 00:07:06,933
That's that's what neural networks
are already capable of.

143
00:07:06,933 --> 00:07:08,800
And this is actually quite an old slide.

144
00:07:08,800 --> 00:07:10,500
This was several years ago.

145
00:07:10,500 --> 00:07:11,733
Now they're even better.

146
00:07:11,733 --> 00:07:13,333
And you will see that from

147
00:07:13,333 --> 00:07:16,500
the practical application that you will
be coding together with headland.

148
00:07:16,800 --> 00:07:20,100
But now let's try to understand a bit
better what Convnets or convolutional

149
00:07:20,100 --> 00:07:23,533
neural networks actually are and
why are they gaining so much popularity.

150
00:07:23,800 --> 00:07:25,666
And they actually are gaining popularity.

151
00:07:25,666 --> 00:07:30,833
So you can see here a, Google
Trends comparison I did just yesterday.

152
00:07:31,666 --> 00:07:35,600
Here you can see that, 
kind of convolutional neural networks

153
00:07:35,600 --> 00:07:39,333
are even taking over
artificial neural networks.

154
00:07:39,333 --> 00:07:43,100
So, a massive increase.

155
00:07:43,100 --> 00:07:47,700
And this just going to keep going that way
because it is a very important field

156
00:07:47,966 --> 00:07:52,466
that that is where all, the things happen
such as, like self-driving cars.

157
00:07:52,466 --> 00:07:54,066
How do they recognize,

158
00:07:54,066 --> 00:07:57,833
people on the road, how to recognize
stop signs and things like that?

159
00:07:57,833 --> 00:08:01,033
How do, how does Facebook how's Facebook

160
00:08:01,066 --> 00:08:04,833
able to tag images or people in images?

161
00:08:04,833 --> 00:08:08,666
And not only just like remember
previously, years ago,

162
00:08:08,700 --> 00:08:12,600
you had to tag people yourself,
then it would recognize faces.

163
00:08:12,600 --> 00:08:14,166
You had to add them and add the names.

164
00:08:14,166 --> 00:08:18,000
And now it just recognizes the faces
and adds the names at the same time.

165
00:08:18,433 --> 00:08:23,400
Well, that is what convolutional
neural networks are capable of.

166
00:08:23,666 --> 00:08:29,433
And speaking of Facebook, 
if Geoffrey Hinton is the godfather of,

167
00:08:30,266 --> 00:08:33,566
artificial neural networks
and deep learning, then Yann

168
00:08:33,566 --> 00:08:38,733
LeCun is the grandfather
of convolutional neural networks.

169
00:08:39,000 --> 00:08:42,033
Yann LeCun is a student of Geoffrey
Hinton's.

170
00:08:42,466 --> 00:08:45,466
And, in fact, here
you can see them together.

171
00:08:45,566 --> 00:08:48,566
And, Geoffrey Hinton now is,

172
00:08:48,566 --> 00:08:51,266
pioneering deep learning at Google.

173
00:08:51,266 --> 00:08:53,333
Yann LeCun is the director of Facebook

174
00:08:53,333 --> 00:08:56,500
Artificial Intelligence Research
and also professor at NYU.

175
00:08:56,833 --> 00:09:00,000
So Australia, where,
I love this part of the course.

176
00:09:00,000 --> 00:09:03,766
Slowly we're building up
this, name, these names or this,

177
00:09:04,266 --> 00:09:08,966
kind of picture of the profiles
of the people who are driving this field.

178
00:09:09,266 --> 00:09:14,300
And, next, in the next couple of parts
will get to know about a few more,

179
00:09:14,300 --> 00:09:17,366
and we'll have this whole mafia,
as they call themselves,

180
00:09:17,366 --> 00:09:21,000
or Yann LeCun calls them mafia
or conspiracy of deep learning.

181
00:09:21,000 --> 00:09:24,000
And you'll learn a bit more
about how this whole field develops.

182
00:09:24,400 --> 00:09:24,600
yeah.

183
00:09:24,600 --> 00:09:27,300
It's just
these are just some great, great people.

184
00:09:27,300 --> 00:09:31,766
And so Yann LeCun back
in, in the 80s, in the 90s, made

185
00:09:32,000 --> 00:09:36,166
significant contributions to the field of,
convolutional neural networks.

186
00:09:36,166 --> 00:09:40,900
And as we will see, throughout this,
course has been able

187
00:09:40,900 --> 00:09:46,100
to, develop or help the world
develop something so extremely powerful.

188
00:09:46,466 --> 00:09:50,800
So moving on to
how convolutional neural networks work.

189
00:09:51,366 --> 00:09:53,233
you haven't input. It's very simple.

190
00:09:53,233 --> 00:09:54,200
It's very straightforward.

191
00:09:54,200 --> 00:09:58,000
So you have an input image, it goes
through the convolutional neural network

192
00:09:58,233 --> 00:09:59,700
and you have an output label.

193
00:09:59,700 --> 00:10:02,700
So it classifies that image as something

194
00:10:03,233 --> 00:10:06,300
like as a cheetah
or a bullet train or something else.

195
00:10:06,600 --> 00:10:10,200
Now, kind of like going into a bit more,
detail.

196
00:10:10,433 --> 00:10:14,400
For instance, you can, after
neural network has been trained up,

197
00:10:14,900 --> 00:10:18,166
on uncertain images, on certain,

198
00:10:18,166 --> 00:10:22,833
classified images or categorized images
that have been categorized prior,

199
00:10:23,100 --> 00:10:26,100
after that, you can give it,
let's say a neural network

200
00:10:26,100 --> 00:10:30,066
has been trained up to recognize, 
facial expressions and motions.

201
00:10:30,366 --> 00:10:34,966
You can give it a face,
of a smiling person, not just a face,

202
00:10:35,133 --> 00:10:39,166
like a drawing of a face like this,
but actual face of a person smiling.

203
00:10:39,266 --> 00:10:41,466
And it'll tell you
that that person is happy.

204
00:10:41,466 --> 00:10:44,733
And, you can give it a face of a person
that's frowning.

205
00:10:44,733 --> 00:10:47,166
It will tell you that the person is sad.

206
00:10:47,166 --> 00:10:48,466
It can recognize these emotions.

207
00:10:48,466 --> 00:10:48,966
And as you can see,

208
00:10:48,966 --> 00:10:53,200
that's already very powerful in terms
of so many different applications.

209
00:10:53,200 --> 00:10:57,500
Just this one, example
you can think of right away.

210
00:10:57,500 --> 00:11:00,433
And, and in both cases,
it'll give you a probability.

211
00:11:00,433 --> 00:11:04,866
So it won't say, you know, with 100%
the person's, happy or sad,

212
00:11:04,866 --> 00:11:11,700
it'll be 99 or 98, or maybe 80%
when it's unclear of what's going on.

213
00:11:11,700 --> 00:11:14,700
And just like we are right,
sometimes we can mistake

214
00:11:15,033 --> 00:11:16,500
things for what they're not.

215
00:11:16,500 --> 00:11:17,933
Or sometimes we can.

216
00:11:17,933 --> 00:11:22,200
sometimes it's it's just not clear
if the person is smiling or frowning

217
00:11:22,200 --> 00:11:25,200
or if it's, 
if it's a dog or a cat or if it's,

218
00:11:25,600 --> 00:11:28,166
a train or a bullet train.

219
00:11:28,166 --> 00:11:28,433
Right.

220
00:11:28,433 --> 00:11:30,933
Sometimes we don't have
we haven't seen enough features

221
00:11:30,933 --> 00:11:34,866
and all goes down to features,
because that's how we, process

222
00:11:34,866 --> 00:11:38,800
visual information, as we saw
from the start of this, tutorial. So.

223
00:11:39,100 --> 00:11:40,966
But how does a neural network,

224
00:11:40,966 --> 00:11:44,033
how is a neural network able
to recognize these features?

225
00:11:44,033 --> 00:11:47,500
Well,
it all starts at the very, basic level.

226
00:11:48,000 --> 00:11:50,733
you have let's say you have an image,
you have two images.

227
00:11:50,733 --> 00:11:53,733
one is a black and white image of two
by two pixels,

228
00:11:53,900 --> 00:11:56,366
and one is a colored image of two
by two pixels.

229
00:11:56,366 --> 00:11:59,433
Well, neural networks
leverage the fact that,

230
00:11:59,900 --> 00:12:04,600
the black and white
image is a two dimensional like array.

231
00:12:04,600 --> 00:12:05,700
So the way we see it

232
00:12:05,700 --> 00:12:09,600
right now on the left
is just the visual representation, right?

233
00:12:09,600 --> 00:12:11,100
So it's some kind of picture.

234
00:12:11,100 --> 00:12:13,933
And for simplicity's sake, it's
just a two by two picture.

235
00:12:13,933 --> 00:12:16,866
But in computer terms it's
actually a two dimensional array

236
00:12:16,866 --> 00:12:21,866
with every single of those, one of those
pixels having a value between 0 and 255.

237
00:12:22,200 --> 00:12:27,566
So that's eight eight bits of information
to the two, to the power of eight is 256.

238
00:12:27,566 --> 00:12:30,266
So therefore the values are from 0 to 255.

239
00:12:30,266 --> 00:12:32,100
And that's intensity of the color.

240
00:12:32,100 --> 00:12:33,433
And in this case the color white.

241
00:12:33,433 --> 00:12:38,533
So zero will be a completely black pixel
255 will be a completely white pixel.

242
00:12:38,533 --> 00:12:44,300
And between them you have the grayscale
range of possible options for this pixel.

243
00:12:44,466 --> 00:12:49,900
And based on that information, computers
are able to, then work with the image.

244
00:12:49,900 --> 00:12:53,033
And that's kind of like the starting point
that any image is

245
00:12:53,033 --> 00:12:56,266
actually has a digital
representation, has a digital form,

246
00:12:56,433 --> 00:12:59,300
and those are just basically ones
and zeros

247
00:12:59,300 --> 00:13:03,133
that form a number 0
to 255 for every single pixel.

248
00:13:03,133 --> 00:13:04,233
And that's what the computer works with.

249
00:13:04,233 --> 00:13:05,833
It doesn't actually work with,

250
00:13:05,833 --> 00:13:08,000
you know, the colors or anything works
with the ones and zeros.

251
00:13:08,000 --> 00:13:12,200
At the end of the day, that's that's
kind of like the foundation of it all.

252
00:13:12,766 --> 00:13:16,900
and in a color image, it's
actually a three dimensional array.

253
00:13:17,066 --> 00:13:21,633
You've got, blue pixel, you've got a blue
layer, a green layer and a red layer.

254
00:13:21,900 --> 00:13:24,900
And, and that stands for RGB,
a red green, blue.

255
00:13:25,266 --> 00:13:29,700
And each one of those, 
colors has its own intensity.

256
00:13:29,700 --> 00:13:32,700
So basically a pixel has,

257
00:13:32,800 --> 00:13:36,700
three, three values assigned to it.

258
00:13:36,833 --> 00:13:40,400
Each one of them is between 0 and 256 255.

259
00:13:40,933 --> 00:13:45,666
and therefore you can, find out
what's this image,

260
00:13:46,200 --> 00:13:50,233
what color exactly this pixel is by
combining those three values.

261
00:13:50,233 --> 00:13:53,233
And again, computers
are going to be working with that.

262
00:13:53,366 --> 00:13:55,700
So that's, the foundation of it all.

263
00:13:55,700 --> 00:13:56,633
That's the red channel.

264
00:13:56,633 --> 00:13:58,966
The green channel, the blue channel.

265
00:13:58,966 --> 00:14:01,966
and finally, let's have a look at,

266
00:14:02,466 --> 00:14:05,733
for instance, an example,
a very trivial example of,

267
00:14:06,300 --> 00:14:09,533
a smiling face in, in computer terms,

268
00:14:09,533 --> 00:14:14,933
if we just really simplify things
instead of having from 0 to 255,

269
00:14:15,533 --> 00:14:17,066
instead of having those values

270
00:14:17,066 --> 00:14:20,700
just so that we can understand things
better and really grasp the concepts,

271
00:14:20,900 --> 00:14:26,700
we're going to say zero
is, is white one is black, right?

272
00:14:26,700 --> 00:14:30,400
So we're just going to simplify
things to, to the extreme.

273
00:14:30,766 --> 00:14:33,766
And you will see that
that image can be represented like that.

274
00:14:33,833 --> 00:14:35,800
So the reason why we've brought this up

275
00:14:35,800 --> 00:14:38,800
is because we go into all of our intuition
stores.

276
00:14:38,800 --> 00:14:40,700
We're going to structure
on images like this,

277
00:14:40,700 --> 00:14:43,700
which are very simple,
but at the same time, then

278
00:14:43,700 --> 00:14:47,200
all those concepts can translate
back to the 0 to 256

279
00:14:47,200 --> 00:14:50,266
range of values,
and everything applies the same way there.

280
00:14:50,566 --> 00:14:52,300
And the steps
that we're going to be going through with

281
00:14:52,300 --> 00:14:54,800
these images are step number one
convolution.

282
00:14:54,800 --> 00:14:56,700
Step number two max pooling.

283
00:14:56,700 --> 00:14:58,366
Step number three flattening.

284
00:14:58,366 --> 00:15:00,433
And step number four full connection.

285
00:15:00,433 --> 00:15:03,300
And I can imagine that
probably none of these words

286
00:15:03,300 --> 00:15:05,600
mean much to you at the moment.

287
00:15:05,600 --> 00:15:09,833
But by the end of this section
of the course, you will understand

288
00:15:09,833 --> 00:15:13,833
them in great detail
and exactly what they're doing.

289
00:15:13,833 --> 00:15:15,900
So we'll get started in the next tutorial.

290
00:15:15,900 --> 00:15:21,866
For now, the additional reading that
you might want to look into is Yann LeCun

291
00:15:22,033 --> 00:15:27,600
original paper, that gave the rise
to convolutional neural networks.

292
00:15:28,133 --> 00:15:31,166
it's called gradient based learning
applied to document recognition.

293
00:15:31,633 --> 00:15:34,400
you may have seen this image
before floating around the internet.

294
00:15:34,400 --> 00:15:35,700
It is from that paper.

295
00:15:35,700 --> 00:15:40,000
So if you want to go back
to the very beginnings, of how

296
00:15:40,000 --> 00:15:43,533
it all happened, where it all came from,
this is the paper to look into,

297
00:15:44,233 --> 00:15:46,266
and I look forward
to seeing you in the next tutorial.

298
00:15:46,266 --> 00:15:48,233
Until then, enjoy deep learning.