1
00:00:01,733 --> 00:00:04,033
Hello and welcome back to the course
on Deep Learning.

2
00:00:04,033 --> 00:00:07,033
Today we're going to wrap up
with backpropagation.

3
00:00:07,200 --> 00:00:07,500
All right.

4
00:00:07,500 --> 00:00:08,666
So we already know

5
00:00:08,666 --> 00:00:11,800
pretty much everything we need to know
about what happens in the neural network.

6
00:00:12,000 --> 00:00:15,100
We know that, there's a process
called forward propagation

7
00:00:15,100 --> 00:00:18,566
where information is entered
into the input layer

8
00:00:18,566 --> 00:00:23,533
and then it's propagated forward
to get our Y hats, our output values.

9
00:00:23,533 --> 00:00:28,666
And then we compare those to the actual
values, that we have in our training set.

10
00:00:29,066 --> 00:00:31,866
And then we calculate the errors.

11
00:00:31,866 --> 00:00:35,366
Then the errors are back
propagated through the network

12
00:00:35,366 --> 00:00:36,866
in the opposite direction.

13
00:00:36,866 --> 00:00:41,100
And, that allows us to train the network
by adjusting the weights.

14
00:00:41,500 --> 00:00:45,000
So the one key important thing
to remember here

15
00:00:45,000 --> 00:00:50,933
is that backpropagation is an advanced
algorithm driven by very,

16
00:00:51,300 --> 00:00:55,200
interesting
and sophisticated mathematics,

17
00:00:55,400 --> 00:01:00,200
which allows us to adjust the weights
all of them at the same time.

18
00:01:00,200 --> 00:01:02,400
All of the weights are adjusted
simultaneously.

19
00:01:02,400 --> 00:01:06,633
So if, we were doing this manually
or if we are coming up

20
00:01:06,633 --> 00:01:10,300
with a different type of algorithm,
then even if we calculate the error.

21
00:01:10,300 --> 00:01:14,066
And then we were trying to understand
what effect each of the weights has

22
00:01:14,066 --> 00:01:17,300
on the error, we'd have to somehow,

23
00:01:17,300 --> 00:01:20,700
adjust each of the weights in the pin
independently or individually.

24
00:01:21,900 --> 00:01:23,400
The huge advantage

25
00:01:23,400 --> 00:01:26,400
of backpropagation,
and this is a key thing to remember,

26
00:01:26,433 --> 00:01:29,866
is that during the process
of backpropagation,

27
00:01:30,133 --> 00:01:33,333
simply because of the way,

28
00:01:33,333 --> 00:01:36,333
the algorithm is structured,

29
00:01:36,733 --> 00:01:40,500
you are able to adjust
all of the weights at the same time.

30
00:01:40,500 --> 00:01:43,500
So you basically know
which part of the error

31
00:01:43,500 --> 00:01:46,866
each of your weights in the neural network
is responsible for.

32
00:01:47,266 --> 00:01:50,266
Now that is the key fundamental

33
00:01:50,433 --> 00:01:54,133
underlying, principle of backpropagation.

34
00:01:54,133 --> 00:02:00,566
And, this was why it picked up
so rapidly in the 1980s.

35
00:02:00,566 --> 00:02:02,633
And this was the major breakthrough.

36
00:02:02,633 --> 00:02:06,300
And if you'd like to learn more about that
and how exactly the mathematics,

37
00:02:06,900 --> 00:02:10,066
works in the background,
then a good article,

38
00:02:10,066 --> 00:02:11,366
which we've already mentioned

39
00:02:11,366 --> 00:02:16,166
is the neural networks and deep learning
is actually a book by Michael Nielsen.

40
00:02:16,400 --> 00:02:19,400
There
you'll find, the mathematics written out

41
00:02:19,800 --> 00:02:23,533
and, it'll help you understand
how exactly this is possible.

42
00:02:23,533 --> 00:02:28,066
But for now, for our purposes,
if from an intuition point of view,

43
00:02:28,066 --> 00:02:33,200
the important part is to remember that, 
that's what, backpropagation does.

44
00:02:33,200 --> 00:02:36,200
It adjusts
all of the weights at the same time.

45
00:02:36,800 --> 00:02:40,366
And now we're going to just wrap
everything up with a step by step

46
00:02:40,366 --> 00:02:44,900
walkthrough of what happens
in, the training of a neural network.

47
00:02:45,266 --> 00:02:45,566
All right.

48
00:02:45,566 --> 00:02:48,233
So step one,
we randomly initialized the weights

49
00:02:48,233 --> 00:02:50,966
to small numbers close to zero,
but not zero.

50
00:02:50,966 --> 00:02:53,600
we didn't really focus
on the initialization of weights

51
00:02:53,600 --> 00:02:58,033
during the intuition tutorials, but, 
the weights have to start somewhere,

52
00:02:58,200 --> 00:03:02,533
and they are initialized
with random values near zero.

53
00:03:02,533 --> 00:03:06,600
And from there, through the process
of forward propagation backpropagation,

54
00:03:06,600 --> 00:03:10,866
these weights are adjusted,
until the error is minimized,

55
00:03:11,833 --> 00:03:13,766
until the cost function is minimized.

56
00:03:13,766 --> 00:03:17,566
then step two, inputs
the first observation of your data sets.

57
00:03:17,566 --> 00:03:19,300
So the first row into the input layer.

58
00:03:19,300 --> 00:03:21,366
Each feature is one input node.

59
00:03:21,366 --> 00:03:24,700
So basically take the columns
and put them into the input nodes.

60
00:03:25,300 --> 00:03:27,800
step three forward propagation
from left to right.

61
00:03:27,800 --> 00:03:28,966
The neurons are activated

62
00:03:28,966 --> 00:03:32,800
in a way that the impact of each neuron
activation is limited by the weights.

63
00:03:32,800 --> 00:03:37,800
The weights basically determine, 
how important each neuron's activation is,

64
00:03:38,033 --> 00:03:41,733
then propagate the activations
until getting the predicted result

65
00:03:41,966 --> 00:03:43,800
Y hat in this case.

66
00:03:43,800 --> 00:03:46,633
So basically you propagate
from left to right.

67
00:03:46,633 --> 00:03:48,833
You go all the way
until you get to the end.

68
00:03:48,833 --> 00:03:50,166
You get your Y hat.

69
00:03:50,166 --> 00:03:52,600
Then compare the prediction
result to the actual result.

70
00:03:52,600 --> 00:03:55,266
Measure the generated error.

71
00:03:55,266 --> 00:03:57,333
and then you do the back propagation
from right to left.

72
00:03:57,333 --> 00:03:58,000
The error is back.

73
00:03:58,000 --> 00:03:58,500
Propagate it.

74
00:03:58,500 --> 00:04:01,800
Update the weights according to how much
they're responsible for the error.

75
00:04:02,100 --> 00:04:06,200
Again, you are able to calculate that
because of the way that back propagate

76
00:04:06,500 --> 00:04:09,366
that perturbation algorithm is structured,

77
00:04:09,366 --> 00:04:12,566
the learning rate decides
by how much we update the weights.

78
00:04:12,566 --> 00:04:16,900
Learning rate is a parameter
you can control in your, neural network.

79
00:04:17,666 --> 00:04:18,600
Step six.

80
00:04:18,600 --> 00:04:22,600
Repeat steps 1 to 5 and update the weights
after each observation.

81
00:04:22,933 --> 00:04:24,766
that is called reinforcement learning.

82
00:04:24,766 --> 00:04:30,600
And in our case that was stochastic
gradient descent or repeat

83
00:04:30,600 --> 00:04:33,833
steps 1 to 5, but update weights
only after a batch of observations.

84
00:04:33,833 --> 00:04:37,800
So batch learning it's
either, full gradient descent

85
00:04:37,800 --> 00:04:40,800
or batch gradient descent
or mini batch gradient descent.

86
00:04:40,833 --> 00:04:44,266
And step seven, when the whole training
set pass through the artificial neural

87
00:04:44,400 --> 00:04:48,933
neural network,
that makes an epoch, redo more epochs.

88
00:04:48,933 --> 00:04:51,933
So basically you just keep doing that
and doing that and doing that,

89
00:04:52,200 --> 00:04:54,800
to allowing your neural network

90
00:04:54,800 --> 00:04:58,366
to train better and better and better
and constantly adjust itself,

91
00:04:59,633 --> 00:05:02,600
as you minimize the cost function.

92
00:05:02,600 --> 00:05:04,300
So there we go.

93
00:05:04,300 --> 00:05:06,266
those are the steps you need to take

94
00:05:06,266 --> 00:05:09,466
to build your artificial neural networks
and train it.

95
00:05:09,900 --> 00:05:13,533
And, these are the steps
that you will be taking together

96
00:05:13,600 --> 00:05:15,933
had learned in the practical tutorials.

97
00:05:15,933 --> 00:05:19,200
Wish you the best of luck, and I
look forward to seeing you next time.

98
00:05:19,400 --> 00:05:22,400
Until then, enjoy deep learning.