1
00:00:00,566 --> 00:00:03,000
Hello and welcome back to the course
on Deep Learning.

2
00:00:03,000 --> 00:00:05,700
In the previous tutorial,
we found out what Convolutional

3
00:00:05,700 --> 00:00:07,166
Neural Networks are all about.

4
00:00:07,166 --> 00:00:09,600
And today
we're going to dive into step one.

5
00:00:09,600 --> 00:00:10,933
Can evolution.

6
00:00:10,933 --> 00:00:14,600
So this is the convolution function.

7
00:00:14,866 --> 00:00:18,433
I know we try to stay away from
mathematics and keep things intuitive,

8
00:00:18,433 --> 00:00:23,000
but I couldn't help but share this formula
for you because it is so simple.

9
00:00:23,066 --> 00:00:26,933
A convolution is basically a combined
integration of two functions.

10
00:00:27,200 --> 00:00:30,900
And it shows you
how one function modifies,

11
00:00:30,900 --> 00:00:32,633
the other, modifies
the shape of the other.

12
00:00:32,633 --> 00:00:36,300
And if you've done any signal processing
or electrical engineering

13
00:00:36,300 --> 00:00:39,300
or a profession
where signal processing is required,

14
00:00:39,466 --> 00:00:42,300
you would have inevitably
come across the convolution function.

15
00:00:42,300 --> 00:00:43,700
It is quite popular.

16
00:00:43,700 --> 00:00:49,400
Now once again, we're going to, keep the
mathematics lights or keep them separate.

17
00:00:49,400 --> 00:00:54,833
And if you'd like to get into the math
behind the convolutional neural networks,

18
00:00:54,833 --> 00:00:59,433
a great additional read is, Introduction
to Convolutional Neural Networks

19
00:00:59,666 --> 00:01:05,500
by Jensen Wu, who is a professor
at the Nanjing University in China.

20
00:01:05,666 --> 00:01:10,400
This paper was published
literally days ago, like 5 or 6 days ago.

21
00:01:10,500 --> 00:01:13,933
And it is oriented specifically at people
who are starting out,

22
00:01:13,933 --> 00:01:17,300
at beginners who are getting to know
convolutional neural networks.

23
00:01:17,300 --> 00:01:20,100
So, the mathematics
there should be accessible.

24
00:01:20,100 --> 00:01:24,000
I actually emailed,
a professor Shan seen Wu, and,

25
00:01:24,266 --> 00:01:27,266
yeah, he said his whole goal is to

26
00:01:27,266 --> 00:01:30,500
make, break, the complex things down

27
00:01:30,500 --> 00:01:33,566
so that people who are new to
this field can understand. And.

28
00:01:34,100 --> 00:01:38,866
also, he mentioned that he's got some
materials available on his homepage.

29
00:01:38,866 --> 00:01:40,500
So if you in the URL,

30
00:01:40,500 --> 00:01:44,533
if you just remove the last two parts
and you just go to, like

31
00:01:45,966 --> 00:01:49,233
to that part, that's his homepage
and you'll be able to find

32
00:01:49,233 --> 00:01:52,900
more additional tutorials and materials
which haven't been published as papers.

33
00:01:53,233 --> 00:01:55,800
But he uses them in his tutorials.

34
00:01:55,800 --> 00:01:57,866
So, you might find those useful.

35
00:01:57,866 --> 00:02:01,433
So browse around there
if you'd like to get an introduction

36
00:02:01,433 --> 00:02:05,266
into the mathematics behind
convolutional neural networks and kind of,

37
00:02:05,766 --> 00:02:07,600
build a solid. Base around that.

38
00:02:07,600 --> 00:02:12,400
Area, but we're going to move on and, 
we're going to talk about the convolution.

39
00:02:12,400 --> 00:02:15,800
So what is a convolution
in intuitive terms

40
00:02:16,400 --> 00:02:18,900
here on the left
we've got an input image as we discussed.

41
00:02:18,900 --> 00:02:20,900
that's how we're going to look at images.

42
00:02:20,900 --> 00:02:22,800
Just ones and zeros to simplify things.

43
00:02:22,800 --> 00:02:24,900
And you can see the smiley face there.

44
00:02:24,900 --> 00:02:26,300
There we've got a feature detector.

45
00:02:26,300 --> 00:02:30,000
So feature detectors a three by three
matrix doesn't have to be three by three.

46
00:02:30,000 --> 00:02:31,766
No it doesn't.

47
00:02:31,766 --> 00:02:34,000
AlexNet. I think it uses, seven.

48
00:02:34,000 --> 00:02:34,666
By seven.

49
00:02:35,700 --> 00:02:37,933
and then some other, one of those other.

50
00:02:37,933 --> 00:02:40,933
Famous ones
uses like five by five feature detectors.

51
00:02:41,266 --> 00:02:45,666
they can be different, but usually
you'll see that there are three by three.

52
00:02:45,900 --> 00:02:49,266
And, They are, you know, reasons
to make them three by three.

53
00:02:49,266 --> 00:02:52,100
So we're going to stick
to the conventional way.

54
00:02:52,100 --> 00:02:54,066
Having a three by three feature detector.

55
00:02:54,066 --> 00:02:56,333
also the feature detectors called

56
00:02:56,333 --> 00:02:58,633
these are important terms
because you might come across them.

57
00:02:58,633 --> 00:03:01,700
They're many different terms,
for the feature detector.

58
00:03:01,700 --> 00:03:04,033
But the most common ones
are feature detector.

59
00:03:04,033 --> 00:03:09,166
Or you might hear it being called kernel,
or you might hear it being called filter.

60
00:03:09,366 --> 00:03:13,066
So in this course we're going to be using
either filter or a feature detector

61
00:03:13,100 --> 00:03:14,366
interchangeably.

62
00:03:14,366 --> 00:03:16,900
But just bear in mind
that it has those names.

63
00:03:16,900 --> 00:03:21,433
And A convolution operation is signified
by an X

64
00:03:21,433 --> 00:03:25,266
in a circle,
just as you saw in the formulas before.

65
00:03:25,633 --> 00:03:29,600
And here what happens
is, on an intuitive level,

66
00:03:29,666 --> 00:03:33,633
or just to think of it in terms of what
is actually happening in the background

67
00:03:33,633 --> 00:03:36,633
rather than the mathematics, well,
you take this feature detector,

68
00:03:36,933 --> 00:03:40,433
or filter and you put it on your image
like you see on the left.

69
00:03:40,533 --> 00:03:44,133
So you cover the, for instance,
in this case

70
00:03:44,366 --> 00:03:48,033
the top left corner, 
nine pixels in the top left corner.

71
00:03:48,300 --> 00:03:51,600
And you basically multiply

72
00:03:52,400 --> 00:03:54,900
each value by each value,
so respective values.

73
00:03:54,900 --> 00:03:59,733
So, the top zero by the top left
go value by the top left value.

74
00:03:59,733 --> 00:04:02,100
Then basically it's position number.

75
00:04:02,100 --> 00:04:03,433
One, one by position number one.

76
00:04:03,433 --> 00:04:08,600
One position by a number or zero, one by
zero, one zero, two by zero two and so on.

77
00:04:08,600 --> 00:04:13,066
So just it's, element
wise multiplication of these matrices

78
00:04:13,233 --> 00:04:14,400
and then you add up the results.

79
00:04:14,400 --> 00:04:16,600
So in this case nothing matches up.

80
00:04:16,600 --> 00:04:19,633
So always always either
a zero by zero zero by one.

81
00:04:19,800 --> 00:04:21,600
So the result is zero.

82
00:04:21,600 --> 00:04:23,000
Here you can see that one.

83
00:04:23,000 --> 00:04:25,800
Of them matched up the one on the left.

84
00:04:25,800 --> 00:04:28,066
Matched up.
And therefore we got a one here.

85
00:04:28,066 --> 00:04:30,733
Nothing matched up. Nothing matched up,
nothing matched up.

86
00:04:30,733 --> 00:04:32,066
Then we move on to the next row.

87
00:04:32,066 --> 00:04:35,500
So, and the step at which we're moving

88
00:04:35,500 --> 00:04:38,500
this whole, filter is called the stride.

89
00:04:38,500 --> 00:04:40,433
So here we have a stride of one pixel.

90
00:04:40,433 --> 00:04:43,033
So here you can see again something
matched up the bottom right corner.

91
00:04:43,033 --> 00:04:44,033
Matched up.

92
00:04:44,033 --> 00:04:45,366
Against stride.

93
00:04:45,366 --> 00:04:48,200
But Bottom one in the middle matched up.

94
00:04:48,200 --> 00:04:50,000
here. Top right, one matched up.

95
00:04:50,000 --> 00:04:52,133
Then nothing. Mentioned.
The stride is one.

96
00:04:52,133 --> 00:04:53,366
you can change the stride.

97
00:04:53,366 --> 00:04:55,933
you can make it one, two.

98
00:04:55,933 --> 00:04:58,566
you can make it three. Whatever you like.

99
00:04:58,566 --> 00:05:02,633
the conventionally the one that works
well is usually a two.

100
00:05:02,666 --> 00:05:04,433
So that's what people stick to.

101
00:05:04,433 --> 00:05:08,733
And we'll we'll talk about what the stride
is, towards the end of this tutorial.

102
00:05:09,466 --> 00:05:11,700
So here we've got, we're matching up.

103
00:05:11,700 --> 00:05:12,600
So we just keep our eye here.

104
00:05:12,600 --> 00:05:15,600
You can see we've got a two
because two of them matched up.

105
00:05:15,633 --> 00:05:17,800
And so on and so on, so on.

106
00:05:17,800 --> 00:05:19,300
there we go. There's
another one that matched up.

107
00:05:21,300 --> 00:05:23,566
There we go.

108
00:05:23,566 --> 00:05:24,666
And there we're done.

109
00:05:24,666 --> 00:05:27,633
So what's what have we created?

110
00:05:27,633 --> 00:05:28,066
Right.

111
00:05:28,066 --> 00:05:31,066
a couple of important things here.

112
00:05:31,533 --> 00:05:34,666
the image on the right is called
a feature map.

113
00:05:35,200 --> 00:05:36,566
Also has several terms.

114
00:05:36,566 --> 00:05:40,100
It also can be called sometimes,
it convolved feature.

115
00:05:40,833 --> 00:05:43,833
So when you apply a convolution
operation operator to something,

116
00:05:44,100 --> 00:05:45,666
it doesn't become convoluted.

117
00:05:45,666 --> 00:05:46,866
It becomes convolved.

118
00:05:46,866 --> 00:05:49,933
And yeah, I use sometimes I like I.

119
00:05:50,333 --> 00:05:51,233
Think to myself in the.

120
00:05:51,233 --> 00:05:54,233
Wrong way,
but it's the correct term is convolved.

121
00:05:54,633 --> 00:05:55,700
it's a kind of old feature.

122
00:05:55,700 --> 00:05:57,900
Or it can also be called
the activation map,

123
00:05:57,900 --> 00:06:00,900
but we're going to be calling it
a feature map in this course.

124
00:06:01,000 --> 00:06:01,666
so it can be called it.

125
00:06:01,666 --> 00:06:03,333
Any one of those things.

126
00:06:03,333 --> 00:06:06,200
And what have we done here.

127
00:06:06,200 --> 00:06:09,833
Well, as you can see,
we've reduced the size of the image.

128
00:06:09,833 --> 00:06:10,533
That's number one.

129
00:06:10,533 --> 00:06:13,533
And that's the important thing
I wanted to mention about,

130
00:06:13,866 --> 00:06:16,800
your input image and the feature detect
and the stride.

131
00:06:16,800 --> 00:06:17,266
Right.

132
00:06:17,266 --> 00:06:19,933
If you have a stride of one,
you can see the image reduced a bit.

133
00:06:19,933 --> 00:06:23,100
But if you have a Strided two,
the image is going to reduce more.

134
00:06:23,100 --> 00:06:25,466
So the feature map
is going to be even smaller.

135
00:06:25,466 --> 00:06:30,400
And that's an, a very important, 
function of the feature

136
00:06:30,400 --> 00:06:35,400
detector of this whole convolution
step is to make the image smaller.

137
00:06:35,633 --> 00:06:38,900
because that will be
it'll be easier to process it.

138
00:06:39,966 --> 00:06:41,966
and. it'll be just faster.

139
00:06:41,966 --> 00:06:44,966
It will and,

140
00:06:45,966 --> 00:06:48,266
It'll be just faster
because imagine like here

141
00:06:48,266 --> 00:06:51,766
we've got a what, a seven by seven image.

142
00:06:51,766 --> 00:06:54,933
But imagine
if you have a proper photo, right.

143
00:06:54,933 --> 00:06:59,200
or you have a, like a 256 by 256
pixel image.

144
00:06:59,200 --> 00:07:03,433
That's it's a huge number of pixels
by 256, squared.

145
00:07:03,766 --> 00:07:06,766
or like let's say
you have a 300 by 300 pixels.

146
00:07:06,800 --> 00:07:09,800
So, so we don't get confused
with the RGB 256.

147
00:07:09,800 --> 00:07:14,400
Let's just say we have a 300 by 300, image
in terms of size in pixels.

148
00:07:14,633 --> 00:07:17,300
Then you have 300 squared
number of pixels.

149
00:07:17,300 --> 00:07:19,066
That's a huge number.

150
00:07:19,066 --> 00:07:24,433
and therefore feature detectors,
will reduce the size of the image.

151
00:07:24,433 --> 00:07:27,433
And therefore stride of two
is actually beneficial.

152
00:07:27,600 --> 00:07:29,866
But then the question is
do we lose information

153
00:07:29,866 --> 00:07:34,233
or are we losing information
when we're applying the feature detector.

154
00:07:34,400 --> 00:07:36,800
Well, some information we are losing

155
00:07:36,800 --> 00:07:40,300
of course, because we have less values
in our resulting matrix.

156
00:07:40,566 --> 00:07:44,166
But at the same time, the purpose of
the feature detector is to detect certain

157
00:07:44,166 --> 00:07:47,733
features, certain parts of the image
that are integral.

158
00:07:48,466 --> 00:07:50,966
And so, for instance,
if you think about it this way,

159
00:07:50,966 --> 00:07:53,966
like the feature
detector has a certain pattern on it,

160
00:07:54,000 --> 00:07:57,866
the highest number in your feature map
is when that pattern matches up.

161
00:07:57,866 --> 00:08:00,833
In fact,
the highest number you can get is in.

162
00:08:00,833 --> 00:08:05,500
Another simplified example is when,
the feature is that it matches exactly.

163
00:08:05,500 --> 00:08:09,166
And you can see with that number four
we have in our feature map.

164
00:08:09,400 --> 00:08:10,466
That's exactly.

165
00:08:10,466 --> 00:08:15,666
So if you look over here, that's exactly
where this feature detector,

166
00:08:15,666 --> 00:08:19,000
because there's only four ones in
it matched perfectly.

167
00:08:19,000 --> 00:08:21,300
So you can see this this part over here.

168
00:08:21,300 --> 00:08:23,300
So the feature was detected here.

169
00:08:23,300 --> 00:08:27,066
And as we discussed
at the very start of this section

170
00:08:28,200 --> 00:08:29,966
that features is

171
00:08:29,966 --> 00:08:33,000
how we see things,
is how we recognize things.

172
00:08:33,000 --> 00:08:35,333
We don't look at every single.

173
00:08:35,333 --> 00:08:37,300
Pixel, so to speak,

174
00:08:37,300 --> 00:08:40,233
in what we see on an image
or in real life.

175
00:08:40,233 --> 00:08:41,766
We don't look at every single pixel.

176
00:08:41,766 --> 00:08:46,333
We look at features, we look at the
the nose, the hat, the the feather.

177
00:08:47,000 --> 00:08:49,766
the, the, eyes under

178
00:08:49,766 --> 00:08:53,766
or the little black marks
under the cheetah's, eyes to distinguish

179
00:08:53,766 --> 00:08:57,266
it between a cheetah and a leopard
or the shape of the train.

180
00:08:57,300 --> 00:09:00,600
We don't, to distinguish between a bullet
train, a normal train, and so on.

181
00:09:00,600 --> 00:09:02,533
So we don't look at everything,

182
00:09:02,533 --> 00:09:04,500
we look at features,
and that's what we're preserving.

183
00:09:04,500 --> 00:09:08,033
And that's
what the feature map helps us preserve.

184
00:09:08,033 --> 00:09:12,500
Actually, that's what it
it's, allows us to bring forward

185
00:09:12,500 --> 00:09:16,366
and get rid of all of the unnecessary
things that even as humans,

186
00:09:16,366 --> 00:09:19,366
we don't processes so much information.

187
00:09:19,466 --> 00:09:23,966
going into your eyes at the, at any
given time, like gigabytes of information,

188
00:09:24,133 --> 00:09:28,200
if you look at every single, dot,
if not terabytes of information

189
00:09:28,200 --> 00:09:32,133
going into your eyes per second,
and still we're able

190
00:09:32,133 --> 00:09:35,233
to process that
because we get rid of what is unnecessary.

191
00:09:35,233 --> 00:09:38,233
We only focus on the important features
of features that are important to us.

192
00:09:38,800 --> 00:09:41,866
And, 
that is exactly what the feature map does.

193
00:09:42,133 --> 00:09:44,533
So now moving on.

194
00:09:44,533 --> 00:09:46,166
This is our input image.

195
00:09:46,166 --> 00:09:49,400
And you we create a feature map.

196
00:09:49,400 --> 00:09:52,400
So the front one let's say
the front one is the one we just created.

197
00:09:52,600 --> 00:09:54,166
But then how come there's many of them.

198
00:09:54,166 --> 00:09:57,100
But we create multiple feature maps.

199
00:09:57,100 --> 00:10:00,166
because we use different filters.

200
00:10:00,166 --> 00:10:00,500
Right.

201
00:10:00,500 --> 00:10:01,933
And that's another way.

202
00:10:01,933 --> 00:10:03,766
That we preserve lots of the information.

203
00:10:03,766 --> 00:10:07,733
So we don't just have one feature map,
we look for certain features

204
00:10:07,733 --> 00:10:12,266
and then, or basically
the network decides through its training.

205
00:10:12,266 --> 00:10:14,333
And this is something we'll discuss
towards the end of this section.

206
00:10:14,333 --> 00:10:18,000
Through its training,
it decides which, features

207
00:10:18,000 --> 00:10:21,733
are important for certain types
or certain categories, and.

208
00:10:21,733 --> 00:10:23,166
It looks for them, and therefore.

209
00:10:23,166 --> 00:10:26,033
We'll have different filters.
And we'll talk about filters just now.

210
00:10:26,033 --> 00:10:27,700
But basically it'll apply these filters.

211
00:10:27,700 --> 00:10:32,466
So to get this feature map
it applied a filter like the one we saw.

212
00:10:32,466 --> 00:10:34,633
But then to get this feature map
but apply a different filter

213
00:10:34,633 --> 00:10:37,233
to get this feature map
to apply a different filter and so on.

214
00:10:38,200 --> 00:10:40,166
and. So

215
00:10:40,166 --> 00:10:43,166
basically
it just creates these feature maps.

216
00:10:43,500 --> 00:10:47,700
And actually that's why personally
I think the term feature detector

217
00:10:47,933 --> 00:10:49,500
is better than filter.

218
00:10:49,500 --> 00:10:50,400
So remember over here

219
00:10:50,400 --> 00:10:55,000
we have this filter
which we also can call a feature detector.

220
00:10:55,000 --> 00:10:59,366
Well actually the word feature detector
I think is better suited.

221
00:10:59,366 --> 00:11:03,300
And the reason for that
is that's what the purpose is, right?

222
00:11:03,300 --> 00:11:06,400
We don't want to just we don't want to
just filter out our image.

223
00:11:06,400 --> 00:11:07,633
But even though that's the whole

224
00:11:07,633 --> 00:11:10,133
that's the same same
just a question of terminology.

225
00:11:10,133 --> 00:11:12,166
But basically we want to detect features.
All right.

226
00:11:12,166 --> 00:11:15,666
In this in this layer
we're going to our in this.

227
00:11:16,700 --> 00:11:20,200
Feature map we've detected
where certain features are in the image.

228
00:11:20,200 --> 00:11:22,133
In this feature map
we've detected where certain

229
00:11:22,133 --> 00:11:25,133
other features are where
a certain specific feature is located.

230
00:11:25,266 --> 00:11:28,200
And this feature map we've detected where

231
00:11:28,200 --> 00:11:31,200
a certain other feature
is located on the image.

232
00:11:31,266 --> 00:11:33,300
So that's, that's what we were doing.

233
00:11:33,300 --> 00:11:34,500
And let's have a look
at a couple of examples.

234
00:11:34,500 --> 00:11:40,133
So, here, we're using
and this is, from GitHub, dawg.

235
00:11:40,566 --> 00:11:41,666
They're documentation.

236
00:11:41,666 --> 00:11:45,000
It's a free, 
like a kind of tool, like paint.

237
00:11:45,200 --> 00:11:47,000
And you can.

238
00:11:47,000 --> 00:11:50,300
Use it to adjust your images or work
with your images, but basically they have

239
00:11:50,300 --> 00:11:53,466
some valuable examples
in their documentation.

240
00:11:53,466 --> 00:11:57,066
And here
they have a picture of the Taj Mahal.

241
00:11:57,066 --> 00:11:59,700
And you can choose
which filter you want to apply.

242
00:11:59,700 --> 00:12:02,700
So if you download this program
and you upload a photo into it,

243
00:12:02,700 --> 00:12:06,566
and then you can actually, 
start a convolution matrix

244
00:12:06,566 --> 00:12:10,700
and apply filters,
and you will see that, these things,

245
00:12:10,800 --> 00:12:13,800
these convolution matrices are actually
applied in image processing and.

246
00:12:14,100 --> 00:12:15,133
Design and so on.

247
00:12:15,133 --> 00:12:16,700
So let's have a look at what we get,
what we get.

248
00:12:16,700 --> 00:12:19,133
So so if we apply this filter five in the.

249
00:12:19,133 --> 00:12:21,000
Middle of minus
one minus one minus one minus.

250
00:12:21,000 --> 00:12:23,700
One, you can see that it's sharpens
the image.

251
00:12:23,700 --> 00:12:25,566
And yeah.

252
00:12:25,566 --> 00:12:28,733
So this is, it's quite intuitive
if you think of it.

253
00:12:28,800 --> 00:12:32,833
So five is the pixel
the main pixel like in the middle of the,

254
00:12:32,833 --> 00:12:36,233
of the filter or the feature detector.

255
00:12:36,433 --> 00:12:37,966
And then minus one, minus one and minus.

256
00:12:37,966 --> 00:12:43,333
One just to kind of like reduces the 
pixels around the inner in an intuitive.

257
00:12:44,300 --> 00:12:46,100
sense.

258
00:12:46,100 --> 00:12:46,866
then blur.

259
00:12:46,866 --> 00:12:50,300
So basically it takes, c equal significant

260
00:12:50,300 --> 00:12:54,500
gives equal significance to all of the,
pixels around the one in the center.

261
00:12:54,500 --> 00:12:58,966
And therefore it combines them together
and you get a blur edge enhance.

262
00:12:58,966 --> 00:13:00,666
So here you can see that.

263
00:13:00,666 --> 00:13:02,833
minus one and one and then.

264
00:13:02,833 --> 00:13:03,800
You get zeros. Right.

265
00:13:03,800 --> 00:13:09,566
So you did delete remove the pixels
around, the main one in the middle.

266
00:13:09,566 --> 00:13:12,000
And you only keep this. One
at a minus one.

267
00:13:12,000 --> 00:13:12,733
And it gives you an edge.

268
00:13:12,733 --> 00:13:14,233
And this was a bit harder to. Understand.

269
00:13:14,233 --> 00:13:16,200
How it works.

270
00:13:16,200 --> 00:13:19,200
Like probably 100,
just to think of it intuitively.

271
00:13:19,200 --> 00:13:20,833
edge detect. Right.

272
00:13:20,833 --> 00:13:23,500
So this one probably makes more sense.
Right?

273
00:13:23,500 --> 00:13:27,366
You, take the middle one,
you reduce the middle one.

274
00:13:28,900 --> 00:13:32,433
The probably like the strength
of the middle pixel.

275
00:13:32,433 --> 00:13:35,600
And then you look
for the ones you look for.

276
00:13:35,600 --> 00:13:37,333
these ones you,

277
00:13:38,400 --> 00:13:40,433
increase the strength.

278
00:13:40,433 --> 00:13:42,000
Of the ones around them.

279
00:13:42,000 --> 00:13:44,566
So you have. The ones. There.

280
00:13:44,566 --> 00:13:45,833
yeah. So that's that.

281
00:13:45,833 --> 00:13:49,100
Gives you, like, an edge detection,
and you can see what you get there and.

282
00:13:49,100 --> 00:13:51,700
Boss, another one. So.

283
00:13:51,700 --> 00:13:55,566
the, the key here
is that it's asymmetrical.

284
00:13:55,566 --> 00:13:58,066
And you can see the image becomes
asymmetrical as well.

285
00:13:58,066 --> 00:13:58,233
So you.

286
00:13:58,233 --> 00:14:02,566
Got like that
kind of, feeling that it's standing out.

287
00:14:02,566 --> 00:14:03,633
Towards you.

288
00:14:03,633 --> 00:14:06,466
And that's what you get
when you have like minuses here.

289
00:14:06,466 --> 00:14:07,100
And pluses here.

290
00:14:07,100 --> 00:14:09,933
Again, this is very
this is getting a bit technical now.

291
00:14:09,933 --> 00:14:12,700
But at least we can get some kind of
intuitive understanding.

292
00:14:12,700 --> 00:14:14,066
Let's just go quickly through them again.

293
00:14:14,066 --> 00:14:16,966
So there's sharpen. There's blur.

294
00:14:16,966 --> 00:14:19,733
There's edge enhance. There's edge detect.

295
00:14:19,733 --> 00:14:20,733
There's emboss.

296
00:14:20,733 --> 00:14:24,433
And so as you can see these are great
examples of the same image.

297
00:14:24,566 --> 00:14:27,233
But we're getting feature maps.

298
00:14:27,233 --> 00:14:28,066
So we use different

299
00:14:28,066 --> 00:14:31,500
feature detectors to get different feature
maps of the same image.

300
00:14:31,733 --> 00:14:36,266
And therefore now we have lots of the lots
of this versions of this image.

301
00:14:37,666 --> 00:14:40,566
where
in each one we've tried to detect certain

302
00:14:40,566 --> 00:14:44,766
things, these terms,
they're not applicable to us.

303
00:14:44,766 --> 00:14:47,833
They're we can say like emboss is probably
not applicable to us

304
00:14:47,833 --> 00:14:51,533
in terms of convolutional neural networks,
but edge detect, that's important.

305
00:14:51,533 --> 00:14:53,166
We want to detect the edges.

306
00:14:53,166 --> 00:14:56,400
Edge enhance probably not blur sharpen.

307
00:14:56,400 --> 00:14:59,700
So certain things like edge detectors
probably the most important one

308
00:15:00,000 --> 00:15:02,366
for our type of, work.

309
00:15:02,366 --> 00:15:04,800
And in terms of understanding
like computers,

310
00:15:04,800 --> 00:15:06,266
they will decide for themselves.

311
00:15:06,266 --> 00:15:08,933
The neural network will decide for itself
what's important, what's not.

312
00:15:08,933 --> 00:15:12,833
And it probably won't be even,
recognizable to the human eye.

313
00:15:12,833 --> 00:15:14,600
You won't be able to understand
what those features

314
00:15:14,600 --> 00:15:16,700
mean, but the computer will decide.

315
00:15:16,700 --> 00:15:19,733
And that's the beauty
that, of neural networks,

316
00:15:19,733 --> 00:15:22,766
that they can process so many different.

317
00:15:22,766 --> 00:15:24,400
Things and understand without.

318
00:15:24,400 --> 00:15:27,566
Even having that intuition,
without having that,

319
00:15:28,066 --> 00:15:30,700
explanation why they will understand
which features are.

320
00:15:30,700 --> 00:15:34,266
Important to them,
whether we have a name for them or not.

321
00:15:35,133 --> 00:15:38,666
That's a whole that's an irrelevant
question for the artificial neural

322
00:15:38,666 --> 00:15:39,866
network.

323
00:15:39,866 --> 00:15:43,566
And my favorite one,
here's a image of Geoffrey Hinton.

324
00:15:43,566 --> 00:15:45,933
photo of Geoffrey Hinton.

325
00:15:45,933 --> 00:15:50,400
passed through, the,
one of these filters.

326
00:15:50,800 --> 00:15:52,933
All right, so that brings us
to the end of today's tutorial.

327
00:15:52,933 --> 00:15:55,300
I hope you enjoyed learning
about convolution.

328
00:15:55,300 --> 00:16:00,100
The key takeaway is that, convolution,
the the primary purpose of a convolution

329
00:16:00,300 --> 00:16:04,466
is to find features in your image
using the feature detector,

330
00:16:04,500 --> 00:16:08,200
put them into a feature map,
and by having them in a feature map,

331
00:16:08,200 --> 00:16:11,500
it still preserves
the spatial relationships,

332
00:16:12,000 --> 00:16:15,633
between pixels, which is very important
for us to, you know,

333
00:16:15,633 --> 00:16:19,033
because if they're completely jumbled up,
then we've, we've lost the pattern.

334
00:16:19,200 --> 00:16:23,333
And at the same time, it's important
to understand that most of the time,

335
00:16:23,333 --> 00:16:29,300
the features a neural network will detect
and use to recognize certain images

336
00:16:29,300 --> 00:16:32,766
and classes will mean nothing to humans,
but nevertheless, they work.

337
00:16:33,000 --> 00:16:34,300
And that's what convolution is.

338
00:16:34,300 --> 00:16:36,133
And I look forward
to seeing you on next tutorial.

339
00:16:36,133 --> 00:16:37,933
Until then, enjoy deep learning.