1
00:00:00,670 --> 00:00:05,950
In the last lesson we spoke about the inspiration and the general structure of a neural network and

2
00:00:05,950 --> 00:00:08,040
how the learning happens.

3
00:00:08,050 --> 00:00:13,270
We discussed the importance of the connection weights and we also spoke about how neural networks might

4
00:00:13,270 --> 00:00:17,680
be good at things that traditional computers are not very good at.

5
00:00:17,710 --> 00:00:25,870
In this lesson we'll talk more about layers in a neural network feature generation and learning artificial

6
00:00:25,930 --> 00:00:32,680
neural networks are such a hot topic these days they are receiving a lot of hype when you come across

7
00:00:32,830 --> 00:00:37,680
a news piece or an article on machine learning on say Hacker News or another outlet.

8
00:00:37,820 --> 00:00:44,410
They're almost always talking about deep learning and artificial neural networks like the time that

9
00:00:44,500 --> 00:00:52,060
Alpha go Pete the 9th then go champion Lee Siegel across several games of Go or Google's self-driving

10
00:00:52,060 --> 00:00:57,640
cars or Google Translate which can sing by the way.

11
00:00:57,790 --> 00:01:02,490
I a though though because they are using Nokia though.

12
00:01:02,770 --> 00:01:05,140
Do you like your health though though.

13
00:01:10,730 --> 00:01:11,640
How.

14
00:01:11,660 --> 00:01:16,210
Put the link to that Google Translate music video into the course resources.

15
00:01:16,810 --> 00:01:17,900
So if you're curious.

16
00:01:18,010 --> 00:01:19,130
Check it out.

17
00:01:19,150 --> 00:01:22,550
But now let's go back to the more important question at hand.

18
00:01:22,690 --> 00:01:25,300
Why are neural networks so special.

19
00:01:25,300 --> 00:01:27,850
What makes them so different.

20
00:01:27,850 --> 00:01:33,520
Well let's think back to what we were doing in our regression tutorial when we were estimating the house

21
00:01:33,520 --> 00:01:35,620
prices in Boston.

22
00:01:35,770 --> 00:01:41,650
We were sifting through our data and we're choosing our features and we were specifying our model and

23
00:01:41,650 --> 00:01:48,860
then we're using that model to estimate our house prices or make our predictions in this model that

24
00:01:48,860 --> 00:01:55,670
we had we had eleven different features and by features I mean things like the number of rooms or the

25
00:01:55,670 --> 00:01:59,630
amount of crime in the area or the amount of pollution.

26
00:01:59,660 --> 00:02:05,830
The point is that we as the programmers we chose to include certain features in our model.

27
00:02:05,960 --> 00:02:09,740
And similarly we chose what to do with those features.

28
00:02:09,860 --> 00:02:14,020
Should we add them up should we multiply them together should we combine them.

29
00:02:14,090 --> 00:02:18,020
These were all decisions that we took when specifying our model.

30
00:02:18,020 --> 00:02:24,980
Now similarly when we were working with our native base spam classifier our features were the tokens

31
00:02:25,160 --> 00:02:27,560
for the words in our emails.

32
00:02:27,770 --> 00:02:34,490
We extracted these tokens we extract these features and then we use them to make our predictions.

33
00:02:34,520 --> 00:02:39,980
Both of these approaches are what would be called a shallow learning algorithm.

34
00:02:39,980 --> 00:02:46,130
We as the data scientists as the machine learning experts were choosing which features would help us

35
00:02:46,160 --> 00:02:50,210
make sense of our data and make better predictions.

36
00:02:50,210 --> 00:02:55,240
All that our algorithm had to do was learn the parameters for the model.

37
00:02:55,520 --> 00:03:00,910
In the case of our regression we were training our model to learn these theta values.

38
00:03:01,160 --> 00:03:07,880
In the case of naive bays we had to learn the probabilities to help classify our emails as spam or not

39
00:03:07,880 --> 00:03:09,340
spam.

40
00:03:09,350 --> 00:03:16,730
The thing is deciding which features to use in a model and how to use these features is actually a very

41
00:03:16,940 --> 00:03:19,300
very challenging thing.

42
00:03:19,460 --> 00:03:25,040
When we were working with our regression model we chose to exclude some features and transform others

43
00:03:25,790 --> 00:03:30,920
and we did that to get a better model and to get a more reliable estimate.

44
00:03:30,920 --> 00:03:32,480
And that's the crux of it right.

45
00:03:32,540 --> 00:03:38,870
To get better results we needed to know something about the relationships in our data because certain

46
00:03:38,870 --> 00:03:39,910
relationships are linear.

47
00:03:40,490 --> 00:03:42,370
And that's pretty straightforward.

48
00:03:42,560 --> 00:03:49,700
But other relationships are non-linear like the distance from employment centers and pollution in our

49
00:03:49,700 --> 00:03:51,760
Boston house price data set.

50
00:03:51,770 --> 00:03:56,620
This is where our relationship would best be represented by a curve.

51
00:03:56,730 --> 00:04:02,660
In the case of classification we could also find nonlinear relationships right for example.

52
00:04:02,880 --> 00:04:10,560
So you have some data points and they're distributed like this and the line that best separates them

53
00:04:11,100 --> 00:04:14,970
would be a squiggly one would look like this.

54
00:04:14,970 --> 00:04:18,350
In this case you've got a nonlinear decision boundary.

55
00:04:18,840 --> 00:04:23,730
But what's the mathematical formula for this line that best fits the data.

56
00:04:23,960 --> 00:04:29,490
Also will this formula play nicely with the model that we're using or do we need to choose a different

57
00:04:29,490 --> 00:04:30,460
model.

58
00:04:30,480 --> 00:04:35,850
Is there some kind of transformation that we can make to our features to get a better estimate.

59
00:04:36,030 --> 00:04:41,160
Maybe a feature needs to be squared or a feature needs to be combined with one or other features to

60
00:04:41,160 --> 00:04:43,010
get a better prediction.

61
00:04:43,020 --> 00:04:50,070
These are all things that you as the model architect you as the programmer have to figure out when you're

62
00:04:50,070 --> 00:04:58,170
using shallow learning algorithms Now in contrast with deep learning the neural network will do the

63
00:04:58,170 --> 00:05:02,930
job of feature selection for the programmer.

64
00:05:02,940 --> 00:05:10,380
The neural network will learn the features from the data by itself regardless of whether you've got

65
00:05:10,620 --> 00:05:15,240
a linear relationship or a non-linear relationship in the data.

66
00:05:15,240 --> 00:05:22,290
The neural network will learn to combine features in the most effective way and this is the reason that

67
00:05:22,290 --> 00:05:26,220
we didn't have to go around programming neural networks explicitly.

68
00:05:26,220 --> 00:05:31,590
This is why we can teach a car to drive itself without having to write the code to make the self-driving

69
00:05:31,590 --> 00:05:34,290
car stop at a red light.

70
00:05:34,290 --> 00:05:38,690
Let's talk about this whole process with a bit more of a concrete example.

71
00:05:38,820 --> 00:05:43,180
Let's talk about image recognition with image recognition.

72
00:05:43,260 --> 00:05:48,600
A neural network can learn to identify what is in the image.

73
00:05:48,690 --> 00:05:56,280
Say that we want to identifying if an image contains cats or if it doesn't contain cats and that we

74
00:05:56,280 --> 00:06:01,260
would do this if we would feed through labelled images into the network.

75
00:06:01,260 --> 00:06:08,580
So a bunch of training data images containing cats labeled as cats images not containing cats labeled

76
00:06:08,780 --> 00:06:10,190
not a cat.

77
00:06:10,200 --> 00:06:17,520
And as these images are fed through the network and the network extracts the features that matter the

78
00:06:17,520 --> 00:06:24,510
weights between the neurons in the network are updated and by the end of training you can use this network

79
00:06:24,750 --> 00:06:32,730
to identify cats and images that the network has never seen before and at no point in this whole process

80
00:06:33,060 --> 00:06:39,600
do you have to sit there and laboriously program the neural network to look for fur or tails or whiskers

81
00:06:39,870 --> 00:06:41,580
or cat like faces.

82
00:06:41,580 --> 00:06:45,590
You as the programmer don't have to supply the features.

83
00:06:45,690 --> 00:06:51,300
Instead the neural network automatically generates the features from the training data set.

84
00:06:51,720 --> 00:06:57,840
And this ability the ability of the neural network to learn the underlying identifying characteristics

85
00:06:58,140 --> 00:07:06,210
of a cat from a set of images is one of the reasons why deep learning is both so powerful and also so

86
00:07:06,210 --> 00:07:07,100
magical.

87
00:07:07,170 --> 00:07:09,890
And I do say magical because it really feels that way.

88
00:07:10,110 --> 00:07:13,500
Deep Learning removes the whole feature selection process.

89
00:07:13,710 --> 00:07:17,910
It removes the learning one step away from the programmer.

90
00:07:18,180 --> 00:07:24,060
And this is always a contrast to simpler algorithms where the machine learning algorithm is close to

91
00:07:24,060 --> 00:07:30,990
the programmer and it feels a lot less like artificial intelligence in comparison to something like

92
00:07:30,990 --> 00:07:32,240
deep learning.

93
00:07:33,000 --> 00:07:35,250
But of course there is no magic.

94
00:07:35,250 --> 00:07:39,660
So how does the neural network learn to generate its own features.

95
00:07:40,710 --> 00:07:45,000
Let's stick with this cat example in our thought experiment.

96
00:07:45,000 --> 00:07:51,270
The catch detection neural network is going to answer one of life's most important questions.

97
00:07:51,270 --> 00:07:55,050
Do we have a kitty or do we not have a kitty.

98
00:07:55,350 --> 00:07:58,580
That's the output that the network will generate for us.

99
00:07:58,650 --> 00:08:03,090
Yay or Nay thumbs up or thumbs down.

100
00:08:03,510 --> 00:08:08,160
Now neural networks are often represented in a chant like this.

101
00:08:08,160 --> 00:08:12,710
This chant flows from left to right on the left.

102
00:08:12,790 --> 00:08:18,900
We have the inputs to the neural network and on the right we have the output.

103
00:08:18,970 --> 00:08:22,670
Now we've already talked a little bit about the individual neurons right.

104
00:08:22,720 --> 00:08:25,230
We've talked about how they might fire or not fire.

105
00:08:25,900 --> 00:08:28,600
Well this was kind of the model from the 1940s.

106
00:08:28,600 --> 00:08:35,560
These were some of the earliest models the neuron what activate or not activate one or zero.

107
00:08:35,650 --> 00:08:41,260
But we could also make our neuron a little bit more nuanced in the case where we just had a one or a

108
00:08:41,260 --> 00:08:41,910
zero.

109
00:08:41,950 --> 00:08:47,710
The function that would determine whether a neuron would activate would probably look something like

110
00:08:47,710 --> 00:08:55,590
this would be a stepwise function for certain values would output a one for other values would output

111
00:08:55,710 --> 00:08:56,970
a zero.

112
00:08:57,300 --> 00:09:03,800
But what if instead our neuron actually gave us a probability between 0 and 1.

113
00:09:03,930 --> 00:09:09,990
That way it could say things like well there's a 10 percent chance of this image containing a cat and

114
00:09:09,990 --> 00:09:16,810
therefore it is not a cat or it could say things like well I'm 90 percent sure that this is a cat.

115
00:09:17,040 --> 00:09:22,650
In this case you wouldn't have a stepwise function you'd have something a little bit more smooth something

116
00:09:22,980 --> 00:09:25,550
maybe that looks like this.

117
00:09:25,680 --> 00:09:29,930
This is a sigmoid function and you can see that it's continuous.

118
00:09:30,150 --> 00:09:36,120
So if each individual neuron in the network would be using this function then it could fire with a weak

119
00:09:36,120 --> 00:09:43,760
signal or with a slightly stronger signal depending on where it is on this function these functions

120
00:09:44,060 --> 00:09:50,000
the ones that determine how if and how strong these neurons actually fire have a name.

121
00:09:50,000 --> 00:09:56,720
These are called activation functions and it's the value of these activation functions that determine

122
00:09:56,750 --> 00:09:58,050
the output.

123
00:09:58,140 --> 00:09:59,250
An Iran.

124
00:09:59,900 --> 00:10:05,710
Now if we take a look at this chart again what we can see is that these neurons are grouped into layers.

125
00:10:06,020 --> 00:10:08,570
And in this case we've got three layers.

126
00:10:08,600 --> 00:10:11,360
This is a very very simple network.

127
00:10:11,360 --> 00:10:13,810
This first layer here is called the input layer.

128
00:10:15,080 --> 00:10:18,460
Each node in the input layer represents a feature.

129
00:10:18,950 --> 00:10:22,560
So in this example we've got six features.

130
00:10:22,700 --> 00:10:25,450
The second layer here is called the hidden layer.

131
00:10:25,550 --> 00:10:27,150
Very mysterious right.

132
00:10:27,200 --> 00:10:30,070
Six nodes in the hidden layer.

133
00:10:30,080 --> 00:10:37,030
Notice how each input feature is actually connected to each and every node in the hidden layer.

134
00:10:37,190 --> 00:10:40,250
And that last layer is called the output layer.

135
00:10:40,910 --> 00:10:48,400
And here again each neuron in that second layer connects to each neuron in the output layer.

136
00:10:48,410 --> 00:10:51,070
In this case we've got one but we could have more right.

137
00:10:51,080 --> 00:10:54,620
We could have two or three or ten or what have you.

138
00:10:54,620 --> 00:10:58,870
The point I'm trying to make is that the neurons are grouped they're grouped into layers.

139
00:10:58,910 --> 00:11:04,610
You can have different number of neurons in each layer and between the two layers all the neurons are

140
00:11:04,610 --> 00:11:06,940
connected to each other.

141
00:11:06,980 --> 00:11:11,960
So one thing that we can do is we can change the architecture of our network.

142
00:11:12,050 --> 00:11:17,960
We can change the number of layers that it has the deeper the network the more layers it has.

143
00:11:17,960 --> 00:11:20,720
That's where the deep and deep learning comes from.

144
00:11:20,720 --> 00:11:23,430
So in this case we've got two hidden layers.

145
00:11:23,570 --> 00:11:28,150
If we want an even deeper network then we could add a third layer yet again.

146
00:11:28,160 --> 00:11:32,690
So in this case we have three hidden layers five total three hidden.

147
00:11:33,410 --> 00:11:39,770
But here's the million dollar question what goes on in the hidden layers.

148
00:11:39,770 --> 00:11:43,110
Well let's tackle that first hidden layer shall we.

149
00:11:43,400 --> 00:11:48,430
In that first hidden layer you'll see that every input is connected to every single node.

150
00:11:48,710 --> 00:11:56,450
And this is important because we were saying how the overall goal of the neural network will be to discover

151
00:11:56,540 --> 00:11:59,760
the optimal combination of features.

152
00:11:59,780 --> 00:12:05,220
So the fact that they're all connected means it gets to try out every single combination.

153
00:12:05,300 --> 00:12:07,660
This is what this first hidden layer will do.

154
00:12:07,790 --> 00:12:14,390
We'll combine each of the input features every which way and it will try to learn the best way to combine

155
00:12:14,390 --> 00:12:16,080
these features.

156
00:12:16,220 --> 00:12:21,920
Now in our thought experiment what we would do is we would show this neural network an image and we

157
00:12:21,920 --> 00:12:29,720
will ask the neural network to tell us if this image contains a count or not count each pixel in the

158
00:12:29,720 --> 00:12:33,760
image will be an input to the neural network.

159
00:12:33,770 --> 00:12:40,850
Now I've drawn six inputs in the previous line but if we had a 30 by 30 pixel image then that would

160
00:12:40,850 --> 00:12:49,070
actually be 900 different input nodes because we would convert the image into a matrix where each number

161
00:12:49,340 --> 00:12:55,130
represents the value of a particular pixel when I say value would be something like the color.

162
00:12:55,170 --> 00:12:55,960
Right.

163
00:12:56,130 --> 00:12:58,080
Is it black is it white and red.

164
00:12:58,080 --> 00:12:58,740
Green.

165
00:12:58,740 --> 00:13:01,620
What have you now for the sake of argument.

166
00:13:01,830 --> 00:13:06,620
This picture of a giraffe is what we're going to be sending over to our neural network.

167
00:13:06,750 --> 00:13:14,540
We're going to convert it into numbers and then those numbers get passed on to the first hidden layer.

168
00:13:14,580 --> 00:13:16,320
So what happens next.

169
00:13:16,440 --> 00:13:23,580
That first hidden layer will combine all the pixels of this image and it will start to generate features

170
00:13:23,670 --> 00:13:26,080
from those pixels.

171
00:13:26,100 --> 00:13:29,400
Now what kind of features will it generate.

172
00:13:29,400 --> 00:13:36,420
Well the neural network will probably start to detect simple patterns like lines or edges or textures

173
00:13:37,260 --> 00:13:43,400
and those patterns those are going to be the features that the first hidden layer will generate.

174
00:13:43,500 --> 00:13:49,880
The second hidden layer will then use those features that the first hidden layer outputs it.

175
00:13:50,280 --> 00:13:52,000
And we'll try to work with them.

176
00:13:52,380 --> 00:13:57,030
So that second hidden layer is no longer confronted with the individual pixels it's confronted with

177
00:13:57,030 --> 00:14:01,610
the features that the first hidden layer generated.

178
00:14:01,630 --> 00:14:04,510
So what will the second layer do with those things.

179
00:14:04,510 --> 00:14:10,510
What it might do is try to combine these edges and lines and textures into something that's of a higher

180
00:14:10,510 --> 00:14:11,650
level of complexity.

181
00:14:12,340 --> 00:14:18,640
So it might start to detect things like shapes like rectangles circles or shadows or something that's

182
00:14:18,640 --> 00:14:25,810
a little bit more complex than lines and edges and textures and those shapes in turn are going to feed

183
00:14:25,810 --> 00:14:28,190
through to that third hidden layer.

184
00:14:28,540 --> 00:14:34,060
The third hidden layer gets these shapes as an input and it will take these shapes and make its own

185
00:14:34,060 --> 00:14:42,680
features say something like eyes or is or a tail or legs and these are the features that the output

186
00:14:42,680 --> 00:14:50,600
layer then will use to identify if this image contains a cat or not a cat.

187
00:14:50,600 --> 00:14:52,990
So this brings us to the output layer.

188
00:14:53,030 --> 00:14:57,630
If a neural network we're a company then this output layer would be the CEO.

189
00:14:58,340 --> 00:15:04,820
The output layer will look at what that last hidden layer sending over and make its decision.

190
00:15:05,060 --> 00:15:09,030
So see that top neuron in that third hidden layer files.

191
00:15:09,080 --> 00:15:15,680
And it says it detected an AI and then that second neuron in that last hidden layer of files and says

192
00:15:15,920 --> 00:15:21,450
it detected some is the next one down reports that it detected a tail.

193
00:15:21,560 --> 00:15:29,180
But the fourth one is silent no legs says the fourth neuron to the CEO how the output layer then has

194
00:15:29,180 --> 00:15:30,710
to make a decision.

195
00:15:30,830 --> 00:15:33,010
Was this a picture of a cat.

196
00:15:33,050 --> 00:15:40,400
Well the CEO will take a good hard look at his managers and then he'll take a weighted average of their

197
00:15:40,400 --> 00:15:47,810
outputs so the output layer comes back to us and says Yes I am 75 percent certain that we have a cat

198
00:15:47,900 --> 00:15:51,640
in this image but we know the true answer right.

199
00:15:51,680 --> 00:15:57,770
We fed them this image and we say well sorry Mark but that's not good enough a cat.

200
00:15:57,800 --> 00:15:59,060
It is not.

201
00:15:59,060 --> 00:16:06,140
You have made the wrong call and I'm afraid we still have a long way to go well what happens now.

202
00:16:06,140 --> 00:16:09,920
Well our network just made a prediction but it got it wrong.

203
00:16:09,920 --> 00:16:16,300
So our CEO is angry and he has to figure out how far off he was with his prediction.

204
00:16:16,340 --> 00:16:19,230
So he looks at his loss and he adjusts his weights.

205
00:16:19,430 --> 00:16:25,400
And maybe next time he'll be more suspicious of his first two managers and he'll listen more closely

206
00:16:25,430 --> 00:16:27,080
to his fourth manager.

207
00:16:27,080 --> 00:16:29,980
Maybe that will result in a better prediction.

208
00:16:30,080 --> 00:16:36,440
So he runs back inside his company calls a meeting and while he starts yelling at his managers and they're

209
00:16:36,440 --> 00:16:41,750
all shifting around uncomfortably in their three thousand dollar ergonomic chairs and they're looking

210
00:16:41,750 --> 00:16:45,710
at the ground this is embarrassing right.

211
00:16:46,130 --> 00:16:50,030
But the managers know exactly who's to blame for this fiasco.

212
00:16:50,240 --> 00:16:56,090
If the associates reporting to them so the managers in that third hidden layer adjust their weights.

213
00:16:56,170 --> 00:16:59,950
Call a meeting and they start yelling at their associates.

214
00:17:00,070 --> 00:17:02,060
Now the associates are vexed.

215
00:17:02,060 --> 00:17:07,410
So the associates adjust their weights and start giving the juniors and the interns a hung time.

216
00:17:07,430 --> 00:17:10,310
But the juniors they only have the inputs to blame right.

217
00:17:10,430 --> 00:17:14,180
Stupid pixels but pixels can't talk back.

218
00:17:14,240 --> 00:17:19,960
So the juniors just adjust their weights and try to generate slightly different features.

219
00:17:20,060 --> 00:17:27,350
The next time round for the next image and this whole process is called back propagation.

220
00:17:27,350 --> 00:17:35,060
The era is passed down through the network from the output node so that each node in each layer adjusts

221
00:17:35,090 --> 00:17:35,720
its weights.

222
00:17:36,530 --> 00:17:39,910
So at this point the question is Well how are the weights adjusted.

223
00:17:40,020 --> 00:17:42,690
Are they just it down or they adjust that up.

224
00:17:42,710 --> 00:17:51,410
Well this depends on the loss function the slope or the gradient of this lost function will determine

225
00:17:51,470 --> 00:17:58,610
the adjustment and we actually cover this in detail in our separate module that is dedicated to gradient

226
00:17:58,610 --> 00:17:59,980
descent.

227
00:18:00,020 --> 00:18:05,510
The point I'm trying to make here is that through trial and error and lots of yelling from the higher

228
00:18:05,510 --> 00:18:12,050
ups in the company the network is able to start to generate its own features and detect patterns in

229
00:18:12,050 --> 00:18:13,480
the data.

230
00:18:13,520 --> 00:18:19,250
Now even though I spoke of lines and of shapes at one end and a higher level features like eyes and

231
00:18:19,250 --> 00:18:26,690
tails at the other end of the network the truth is that we don't know exactly what features the neural

232
00:18:26,690 --> 00:18:33,590
network actually generates but what we do know is that the neural network breaks down the input data

233
00:18:33,680 --> 00:18:34,740
into chunks.

234
00:18:34,940 --> 00:18:41,960
It creates a hierarchy but we don't know exactly what goes on under the hood.

235
00:18:41,960 --> 00:18:49,050
And therein also lies a problem because this makes neural networks a bit like a black box.

236
00:18:49,160 --> 00:18:53,440
We don't know exactly what's going on inside the neural networks.

237
00:18:53,660 --> 00:18:59,220
So a neural network isn't exactly what you would call a tractable model.

238
00:18:59,220 --> 00:19:05,240
And this is a big problem when you're not just interested in the accuracy of a prediction but you need

239
00:19:05,240 --> 00:19:10,080
to understand why a neural network has given a particular output.

240
00:19:10,250 --> 00:19:17,780
There is in fact a whole subfield of machine learning dedicated to analyzing neural networks to better

241
00:19:17,780 --> 00:19:21,410
understand why they do what they do.

242
00:19:21,410 --> 00:19:24,790
So I know this is a very very dense lesson on the theory.

243
00:19:24,830 --> 00:19:27,170
So here's the executive summary.

244
00:19:27,170 --> 00:19:33,980
Each neuron in a neural network will be activated based on a mathematical formula called the activation

245
00:19:33,980 --> 00:19:35,110
function.

246
00:19:35,420 --> 00:19:42,770
The activation function determines how strong this neuron will fire and then through trial and error

247
00:19:43,220 --> 00:19:48,910
the neural network is able to generate its own features from the input data.

248
00:19:48,920 --> 00:19:55,700
This allows the neural network to solve both linear problems and nonlinear problems because it tries

249
00:19:55,700 --> 00:20:02,650
all these combinations the deeper the network the more complex and the more high level the features

250
00:20:02,650 --> 00:20:06,040
are that are generated at each layer.

251
00:20:06,040 --> 00:20:11,770
One piece of good news is that the pattern of learning for a neural network is very similar to our other

252
00:20:11,770 --> 00:20:13,840
machine learning algorithms.

253
00:20:13,930 --> 00:20:15,530
It makes a prediction.

254
00:20:15,730 --> 00:20:22,600
It figures out how far off the prediction was by looking at the loss and then it adjusts its parameters

255
00:20:22,870 --> 00:20:25,660
it adjusts its weights between the neurons.

256
00:20:25,660 --> 00:20:32,860
In this case and the process by which the error gets sent back down through the network so that each

257
00:20:32,860 --> 00:20:37,800
node can adjust its weights is called back propagation.

258
00:20:37,840 --> 00:20:46,000
So in summary neural networks are very powerful and a reasonable question as well if they're so powerful

259
00:20:46,390 --> 00:20:49,820
can we use neural networks for everything.

260
00:20:49,850 --> 00:20:51,710
And the answer is yes you can.

261
00:20:51,710 --> 00:20:58,220
You can solve almost every machine learning problem with the neural network but would you want to use

262
00:20:58,310 --> 00:21:00,930
a neural network to solve every problem.

263
00:21:01,100 --> 00:21:03,280
In this case the answer is no.

264
00:21:03,320 --> 00:21:04,610
Why.

265
00:21:04,610 --> 00:21:07,760
Well let's talk about that in the next lesson.

266
00:21:07,790 --> 00:21:08,680
I'll see you there.

267
00:21:08,690 --> 00:21:09,220
Take care.