1
00:00:00,566 --> 00:00:02,833
Hello and welcome back to the course
on Deep Learning.

2
00:00:02,833 --> 00:00:06,066
Today we're talking about ReLU
which is the rectified linear units.

3
00:00:06,066 --> 00:00:12,100
And this is an additional
step on top of our convolution step.

4
00:00:12,100 --> 00:00:14,500
So it's not a separate
big step. It's a small step.

5
00:00:14,500 --> 00:00:16,133
It's step one be basically.

6
00:00:16,133 --> 00:00:18,133
And what is going on here.

7
00:00:18,133 --> 00:00:20,333
Well we have our input image.

8
00:00:20,333 --> 00:00:22,800
We have our convolution layer
which we've discussed.

9
00:00:22,800 --> 00:00:27,066
And then on top of that
we're going to apply wait for it.

10
00:00:27,066 --> 00:00:30,766
Our favorite rectifier function.

11
00:00:31,000 --> 00:00:33,900
And you're already familiar
with the rectifier function from

12
00:00:33,900 --> 00:00:38,266
the previous section
on artificial neural networks.

13
00:00:38,666 --> 00:00:40,800
And in our.

14
00:00:40,800 --> 00:00:44,700
So sometimes authors or instructors,

15
00:00:45,500 --> 00:00:48,933
separate the convolution
and the rectifier as two separate steps.

16
00:00:48,933 --> 00:00:50,100
In our examples.

17
00:00:50,100 --> 00:00:53,100
We're just going to, consider them the,

18
00:00:53,100 --> 00:00:56,800
just one big step, for the convolution
than the rectifier.

19
00:00:57,100 --> 00:01:00,200
And the reason
why we're applying the rectifier

20
00:01:00,200 --> 00:01:04,066
is because we want to increase
non linearity in our image

21
00:01:04,233 --> 00:01:08,033
or in our network
in our convolutional neural network.

22
00:01:08,033 --> 00:01:10,700
And rectifier acts as that,

23
00:01:11,933 --> 00:01:12,533
filter

24
00:01:12,533 --> 00:01:15,600
or acts as that function
which breaks up linearity.

25
00:01:15,700 --> 00:01:19,900
And the reason why we want to increase
non-linearity in our network

26
00:01:19,900 --> 00:01:23,133
is because images themselves

27
00:01:23,133 --> 00:01:26,566
are highly nonlinear, especially if you're
recognizing different objects,

28
00:01:26,966 --> 00:01:31,200
next to each other,
just on backgrounds and stuff like that.

29
00:01:31,200 --> 00:01:35,066
Like the image is going to have lots
of nonlinear elements, and the transition

30
00:01:35,066 --> 00:01:37,966
between pixels, adjacent pixels
is often going to be non-linear.

31
00:01:37,966 --> 00:01:40,966
That's, you know, because as borders is
different colors is different.

32
00:01:41,400 --> 00:01:43,600
there's different elements in your images.

33
00:01:43,600 --> 00:01:45,466
And but at the same time,
when we're applying

34
00:01:45,466 --> 00:01:48,466
mathematical operations
such as convolution,

35
00:01:48,600 --> 00:01:49,600
you know, and,

36
00:01:49,600 --> 00:01:52,500
running this feature detection
to create our feature maps,

37
00:01:52,500 --> 00:01:56,800
we risk that
we might, create something linear

38
00:01:57,133 --> 00:01:59,866
and therefore
we need to break up the linearity.

39
00:01:59,866 --> 00:02:02,033
so let's have a look at an example.

40
00:02:02,033 --> 00:02:05,466
here is a image, an original image.

41
00:02:05,833 --> 00:02:09,466
Now when we apply a, feature detection,

42
00:02:09,466 --> 00:02:12,933
a detector to this image,
we get something like this.

43
00:02:13,133 --> 00:02:15,000
So you can see here
that black is negative.

44
00:02:15,000 --> 00:02:15,900
White is positive values.

45
00:02:15,900 --> 00:02:20,833
Well, when you apply, a feature detector
to a, like a proper image,

46
00:02:21,000 --> 00:02:23,766
which has not just zeros and ones
but has lots of different values.

47
00:02:23,766 --> 00:02:25,133
And you, apply,

48
00:02:25,133 --> 00:02:28,900
as we saw previously, feature text is
can have negative values in themselves.

49
00:02:28,900 --> 00:02:30,933
Sometimes you'll get negative values.

50
00:02:30,933 --> 00:02:34,500
And here their black ones are negative,
white ones are positive.

51
00:02:34,633 --> 00:02:38,533
And what a rectified linear unit.

52
00:02:40,500 --> 00:02:43,900
function does is it removes all the black.

53
00:02:43,900 --> 00:02:46,433
Right.
Anything below zero turns into zero.

54
00:02:46,433 --> 00:02:49,133
And so from this
it turns into this. Right.

55
00:02:49,133 --> 00:02:54,500
And so it's it's pretty
hard to see what exactly is the benefit

56
00:02:54,900 --> 00:02:58,433
in terms
of for in terms of breaking up linearity.

57
00:02:58,833 --> 00:03:00,900
I'll try to explain.

58
00:03:00,900 --> 00:03:03,900
I'll try to like show
an example on this image.

59
00:03:04,566 --> 00:03:08,133
But at the end of the day, it's
this is a very mathematical concept,

60
00:03:08,133 --> 00:03:12,366
and we would have to go into a lot of math
to really explain what is going on.

61
00:03:12,366 --> 00:03:13,733
But let's let's try let's have a look.

62
00:03:13,733 --> 00:03:17,500
So, for instance,
let's look at this, this building here.

63
00:03:17,500 --> 00:03:20,166
Right. So this is a building on its own.

64
00:03:20,166 --> 00:03:24,266
and then you can see this shadow,
this black part, this shadow over here.

65
00:03:24,466 --> 00:03:27,200
Well, you can see that it's white.

66
00:03:27,200 --> 00:03:30,066
The the reflection of the light,
and then it's a gray,

67
00:03:30,066 --> 00:03:32,933
and then it gets darker
and then it gets darker again.

68
00:03:32,933 --> 00:03:33,200
Right.

69
00:03:33,200 --> 00:03:35,766
So and when we take it out,
we take out that black part.

70
00:03:35,766 --> 00:03:38,133
So think of it
in terms of linearity. Right.

71
00:03:38,133 --> 00:03:42,033
So it looks like
when you go from white to gray

72
00:03:42,033 --> 00:03:44,900
the next step would be black right.
The next step would be black.

73
00:03:44,900 --> 00:03:49,233
It's it's a linear progression
from bright to dark.

74
00:03:49,533 --> 00:03:53,400
And therefore
this is kind of like a linear situation.

75
00:03:53,400 --> 00:03:56,000
When you take out the black
you break up the linearity.

76
00:03:56,000 --> 00:03:57,900
let's try another one.

77
00:03:57,900 --> 00:03:59,033
Let's have a look here.

78
00:03:59,033 --> 00:04:01,900
And at the same time
it's still that same building, right?

79
00:04:01,900 --> 00:04:06,566
It's not it's not like you are,
you're like it's not like you're

80
00:04:06,666 --> 00:04:09,733
blending two buildings into each other,
but that is secondary.

81
00:04:09,733 --> 00:04:12,033
The main point is breaking up
the linearity.

82
00:04:12,033 --> 00:04:13,500
So let's have a look here. Same thing.

83
00:04:13,500 --> 00:04:19,033
So you see white gray black gray white.

84
00:04:19,466 --> 00:04:22,466
And when you break it up
you don't have that anymore right.

85
00:04:22,466 --> 00:04:26,266
You don't have that progression
the gradual progression that you just have

86
00:04:26,266 --> 00:04:29,266
like an abrupt change.

87
00:04:29,566 --> 00:04:33,266
And that helps introduce
non-linearity into your image.

88
00:04:33,366 --> 00:04:39,100
So it's a very rough explanation, very, 
kind of like on the, on the fingers

89
00:04:39,400 --> 00:04:43,466
explanation, rather than, technical,
but hopefully

90
00:04:43,466 --> 00:04:47,266
it kind of helps you understand a bit
better what we're talking about here.

91
00:04:47,266 --> 00:04:50,433
So here again, you can see white
gray is a better example.

92
00:04:50,433 --> 00:04:55,400
Even still bright
darker darker darker darker darker darker.

93
00:04:55,500 --> 00:04:58,100
So this part looks like it's linear.

94
00:04:58,100 --> 00:04:59,400
Then you break it up like that.

95
00:04:59,400 --> 00:05:00,933
again.

96
00:05:00,933 --> 00:05:05,566
So this is a very rough
explanation is not absolutely perfect,

97
00:05:05,566 --> 00:05:08,566
but at least it gives you
some idea of what's going on.

98
00:05:08,700 --> 00:05:11,700
but if you'd like to learn more,
there's a good paper.

99
00:05:12,400 --> 00:05:14,066
as always, there's always a paper.

100
00:05:14,066 --> 00:05:15,700
This one is by CCJ. Cool.

101
00:05:15,700 --> 00:05:17,800
From the University of California.

102
00:05:17,800 --> 00:05:19,533
And it's called Understanding

103
00:05:19,533 --> 00:05:22,600
Convolutional Neural Networks
with a mathematical model.

104
00:05:23,033 --> 00:05:26,833
And basically there
he answers to questions and

105
00:05:27,233 --> 00:05:28,766
you need to just look at the first one.

106
00:05:28,766 --> 00:05:31,766
And the question is why
in a nonlinear activation function,

107
00:05:31,800 --> 00:05:35,200
he's essentially at the filter
output of all intermediate layers.

108
00:05:36,066 --> 00:05:39,266
So that kind of explains it
in a bit more detail,

109
00:05:39,900 --> 00:05:43,733
both in terms of intuition
and mostly in terms of mathematics.

110
00:05:44,166 --> 00:05:46,766
So that's an interesting paper
where you can get some more additional

111
00:05:46,766 --> 00:05:47,933
information on this topic.

112
00:05:47,933 --> 00:05:51,400
And if you really want to dig in
and explore,

113
00:05:51,733 --> 00:05:53,233
some, some cool stuff here,

114
00:05:53,233 --> 00:05:55,500
then there's another paper
that you might be interested in.

115
00:05:55,500 --> 00:05:58,966
It's called, 
delving deep into rectifiers, surpassing

116
00:05:58,966 --> 00:06:02,400
human level level performance on image net
classification.

117
00:06:02,766 --> 00:06:05,733
And here, the authors,

118
00:06:05,733 --> 00:06:09,000
coming here and others
from Microsoft Research,

119
00:06:09,300 --> 00:06:12,300
they propose a,

120
00:06:12,800 --> 00:06:16,133
different type of, rectified linear unit.

121
00:06:16,866 --> 00:06:17,566
function.

122
00:06:17,566 --> 00:06:18,600
They proposed,

123
00:06:18,600 --> 00:06:21,600
parametric rectified linear unit function,
which you see here on the right.

124
00:06:21,900 --> 00:06:26,566
And they argue that it delivers better
results without sacrificing performance.

125
00:06:26,566 --> 00:06:29,966
So interesting read if you'd like to get
a bit more into this topic.

126
00:06:30,333 --> 00:06:31,800
And, that's all for today.

127
00:06:31,800 --> 00:06:35,033
The ReLU layer is pretty simple,
pretty straightforward.

128
00:06:35,033 --> 00:06:37,700
Just just applying the 
rectifier function.

129
00:06:37,700 --> 00:06:39,100
And I look forward
to seeing you next time.

130
00:06:39,100 --> 00:06:40,700
Until then, enjoy deep learning.