1 00:00:00,670 --> 00:00:05,950 In the last lesson we spoke about the inspiration and the general structure of a neural network and 2 00:00:05,950 --> 00:00:08,040 how the learning happens. 3 00:00:08,050 --> 00:00:13,270 We discussed the importance of the connection weights and we also spoke about how neural networks might 4 00:00:13,270 --> 00:00:17,680 be good at things that traditional computers are not very good at. 5 00:00:17,710 --> 00:00:25,870 In this lesson we'll talk more about layers in a neural network feature generation and learning artificial 6 00:00:25,930 --> 00:00:32,680 neural networks are such a hot topic these days they are receiving a lot of hype when you come across 7 00:00:32,830 --> 00:00:37,680 a news piece or an article on machine learning on say Hacker News or another outlet. 8 00:00:37,820 --> 00:00:44,410 They're almost always talking about deep learning and artificial neural networks like the time that 9 00:00:44,500 --> 00:00:52,060 Alpha go Pete the 9th then go champion Lee Siegel across several games of Go or Google's self-driving 10 00:00:52,060 --> 00:00:57,640 cars or Google Translate which can sing by the way. 11 00:00:57,790 --> 00:01:02,490 I a though though because they are using Nokia though. 12 00:01:02,770 --> 00:01:05,140 Do you like your health though though. 13 00:01:10,730 --> 00:01:11,640 How. 14 00:01:11,660 --> 00:01:16,210 Put the link to that Google Translate music video into the course resources. 15 00:01:16,810 --> 00:01:17,900 So if you're curious. 16 00:01:18,010 --> 00:01:19,130 Check it out. 17 00:01:19,150 --> 00:01:22,550 But now let's go back to the more important question at hand. 18 00:01:22,690 --> 00:01:25,300 Why are neural networks so special. 19 00:01:25,300 --> 00:01:27,850 What makes them so different. 20 00:01:27,850 --> 00:01:33,520 Well let's think back to what we were doing in our regression tutorial when we were estimating the house 21 00:01:33,520 --> 00:01:35,620 prices in Boston. 22 00:01:35,770 --> 00:01:41,650 We were sifting through our data and we're choosing our features and we were specifying our model and 23 00:01:41,650 --> 00:01:48,860 then we're using that model to estimate our house prices or make our predictions in this model that 24 00:01:48,860 --> 00:01:55,670 we had we had eleven different features and by features I mean things like the number of rooms or the 25 00:01:55,670 --> 00:01:59,630 amount of crime in the area or the amount of pollution. 26 00:01:59,660 --> 00:02:05,830 The point is that we as the programmers we chose to include certain features in our model. 27 00:02:05,960 --> 00:02:09,740 And similarly we chose what to do with those features. 28 00:02:09,860 --> 00:02:14,020 Should we add them up should we multiply them together should we combine them. 29 00:02:14,090 --> 00:02:18,020 These were all decisions that we took when specifying our model. 30 00:02:18,020 --> 00:02:24,980 Now similarly when we were working with our native base spam classifier our features were the tokens 31 00:02:25,160 --> 00:02:27,560 for the words in our emails. 32 00:02:27,770 --> 00:02:34,490 We extracted these tokens we extract these features and then we use them to make our predictions. 33 00:02:34,520 --> 00:02:39,980 Both of these approaches are what would be called a shallow learning algorithm. 34 00:02:39,980 --> 00:02:46,130 We as the data scientists as the machine learning experts were choosing which features would help us 35 00:02:46,160 --> 00:02:50,210 make sense of our data and make better predictions. 36 00:02:50,210 --> 00:02:55,240 All that our algorithm had to do was learn the parameters for the model. 37 00:02:55,520 --> 00:03:00,910 In the case of our regression we were training our model to learn these theta values. 38 00:03:01,160 --> 00:03:07,880 In the case of naive bays we had to learn the probabilities to help classify our emails as spam or not 39 00:03:07,880 --> 00:03:09,340 spam. 40 00:03:09,350 --> 00:03:16,730 The thing is deciding which features to use in a model and how to use these features is actually a very 41 00:03:16,940 --> 00:03:19,300 very challenging thing. 42 00:03:19,460 --> 00:03:25,040 When we were working with our regression model we chose to exclude some features and transform others 43 00:03:25,790 --> 00:03:30,920 and we did that to get a better model and to get a more reliable estimate. 44 00:03:30,920 --> 00:03:32,480 And that's the crux of it right. 45 00:03:32,540 --> 00:03:38,870 To get better results we needed to know something about the relationships in our data because certain 46 00:03:38,870 --> 00:03:39,910 relationships are linear. 47 00:03:40,490 --> 00:03:42,370 And that's pretty straightforward. 48 00:03:42,560 --> 00:03:49,700 But other relationships are non-linear like the distance from employment centers and pollution in our 49 00:03:49,700 --> 00:03:51,760 Boston house price data set. 50 00:03:51,770 --> 00:03:56,620 This is where our relationship would best be represented by a curve. 51 00:03:56,730 --> 00:04:02,660 In the case of classification we could also find nonlinear relationships right for example. 52 00:04:02,880 --> 00:04:10,560 So you have some data points and they're distributed like this and the line that best separates them 53 00:04:11,100 --> 00:04:14,970 would be a squiggly one would look like this. 54 00:04:14,970 --> 00:04:18,350 In this case you've got a nonlinear decision boundary. 55 00:04:18,840 --> 00:04:23,730 But what's the mathematical formula for this line that best fits the data. 56 00:04:23,960 --> 00:04:29,490 Also will this formula play nicely with the model that we're using or do we need to choose a different 57 00:04:29,490 --> 00:04:30,460 model. 58 00:04:30,480 --> 00:04:35,850 Is there some kind of transformation that we can make to our features to get a better estimate. 59 00:04:36,030 --> 00:04:41,160 Maybe a feature needs to be squared or a feature needs to be combined with one or other features to 60 00:04:41,160 --> 00:04:43,010 get a better prediction. 61 00:04:43,020 --> 00:04:50,070 These are all things that you as the model architect you as the programmer have to figure out when you're 62 00:04:50,070 --> 00:04:58,170 using shallow learning algorithms Now in contrast with deep learning the neural network will do the 63 00:04:58,170 --> 00:05:02,930 job of feature selection for the programmer. 64 00:05:02,940 --> 00:05:10,380 The neural network will learn the features from the data by itself regardless of whether you've got 65 00:05:10,620 --> 00:05:15,240 a linear relationship or a non-linear relationship in the data. 66 00:05:15,240 --> 00:05:22,290 The neural network will learn to combine features in the most effective way and this is the reason that 67 00:05:22,290 --> 00:05:26,220 we didn't have to go around programming neural networks explicitly. 68 00:05:26,220 --> 00:05:31,590 This is why we can teach a car to drive itself without having to write the code to make the self-driving 69 00:05:31,590 --> 00:05:34,290 car stop at a red light. 70 00:05:34,290 --> 00:05:38,690 Let's talk about this whole process with a bit more of a concrete example. 71 00:05:38,820 --> 00:05:43,180 Let's talk about image recognition with image recognition. 72 00:05:43,260 --> 00:05:48,600 A neural network can learn to identify what is in the image. 73 00:05:48,690 --> 00:05:56,280 Say that we want to identifying if an image contains cats or if it doesn't contain cats and that we 74 00:05:56,280 --> 00:06:01,260 would do this if we would feed through labelled images into the network. 75 00:06:01,260 --> 00:06:08,580 So a bunch of training data images containing cats labeled as cats images not containing cats labeled 76 00:06:08,780 --> 00:06:10,190 not a cat. 77 00:06:10,200 --> 00:06:17,520 And as these images are fed through the network and the network extracts the features that matter the 78 00:06:17,520 --> 00:06:24,510 weights between the neurons in the network are updated and by the end of training you can use this network 79 00:06:24,750 --> 00:06:32,730 to identify cats and images that the network has never seen before and at no point in this whole process 80 00:06:33,060 --> 00:06:39,600 do you have to sit there and laboriously program the neural network to look for fur or tails or whiskers 81 00:06:39,870 --> 00:06:41,580 or cat like faces. 82 00:06:41,580 --> 00:06:45,590 You as the programmer don't have to supply the features. 83 00:06:45,690 --> 00:06:51,300 Instead the neural network automatically generates the features from the training data set. 84 00:06:51,720 --> 00:06:57,840 And this ability the ability of the neural network to learn the underlying identifying characteristics 85 00:06:58,140 --> 00:07:06,210 of a cat from a set of images is one of the reasons why deep learning is both so powerful and also so 86 00:07:06,210 --> 00:07:07,100 magical. 87 00:07:07,170 --> 00:07:09,890 And I do say magical because it really feels that way. 88 00:07:10,110 --> 00:07:13,500 Deep Learning removes the whole feature selection process. 89 00:07:13,710 --> 00:07:17,910 It removes the learning one step away from the programmer. 90 00:07:18,180 --> 00:07:24,060 And this is always a contrast to simpler algorithms where the machine learning algorithm is close to 91 00:07:24,060 --> 00:07:30,990 the programmer and it feels a lot less like artificial intelligence in comparison to something like 92 00:07:30,990 --> 00:07:32,240 deep learning. 93 00:07:33,000 --> 00:07:35,250 But of course there is no magic. 94 00:07:35,250 --> 00:07:39,660 So how does the neural network learn to generate its own features. 95 00:07:40,710 --> 00:07:45,000 Let's stick with this cat example in our thought experiment. 96 00:07:45,000 --> 00:07:51,270 The catch detection neural network is going to answer one of life's most important questions. 97 00:07:51,270 --> 00:07:55,050 Do we have a kitty or do we not have a kitty. 98 00:07:55,350 --> 00:07:58,580 That's the output that the network will generate for us. 99 00:07:58,650 --> 00:08:03,090 Yay or Nay thumbs up or thumbs down. 100 00:08:03,510 --> 00:08:08,160 Now neural networks are often represented in a chant like this. 101 00:08:08,160 --> 00:08:12,710 This chant flows from left to right on the left. 102 00:08:12,790 --> 00:08:18,900 We have the inputs to the neural network and on the right we have the output. 103 00:08:18,970 --> 00:08:22,670 Now we've already talked a little bit about the individual neurons right. 104 00:08:22,720 --> 00:08:25,230 We've talked about how they might fire or not fire. 105 00:08:25,900 --> 00:08:28,600 Well this was kind of the model from the 1940s. 106 00:08:28,600 --> 00:08:35,560 These were some of the earliest models the neuron what activate or not activate one or zero. 107 00:08:35,650 --> 00:08:41,260 But we could also make our neuron a little bit more nuanced in the case where we just had a one or a 108 00:08:41,260 --> 00:08:41,910 zero. 109 00:08:41,950 --> 00:08:47,710 The function that would determine whether a neuron would activate would probably look something like 110 00:08:47,710 --> 00:08:55,590 this would be a stepwise function for certain values would output a one for other values would output 111 00:08:55,710 --> 00:08:56,970 a zero. 112 00:08:57,300 --> 00:09:03,800 But what if instead our neuron actually gave us a probability between 0 and 1. 113 00:09:03,930 --> 00:09:09,990 That way it could say things like well there's a 10 percent chance of this image containing a cat and 114 00:09:09,990 --> 00:09:16,810 therefore it is not a cat or it could say things like well I'm 90 percent sure that this is a cat. 115 00:09:17,040 --> 00:09:22,650 In this case you wouldn't have a stepwise function you'd have something a little bit more smooth something 116 00:09:22,980 --> 00:09:25,550 maybe that looks like this. 117 00:09:25,680 --> 00:09:29,930 This is a sigmoid function and you can see that it's continuous. 118 00:09:30,150 --> 00:09:36,120 So if each individual neuron in the network would be using this function then it could fire with a weak 119 00:09:36,120 --> 00:09:43,760 signal or with a slightly stronger signal depending on where it is on this function these functions 120 00:09:44,060 --> 00:09:50,000 the ones that determine how if and how strong these neurons actually fire have a name. 121 00:09:50,000 --> 00:09:56,720 These are called activation functions and it's the value of these activation functions that determine 122 00:09:56,750 --> 00:09:58,050 the output. 123 00:09:58,140 --> 00:09:59,250 An Iran. 124 00:09:59,900 --> 00:10:05,710 Now if we take a look at this chart again what we can see is that these neurons are grouped into layers. 125 00:10:06,020 --> 00:10:08,570 And in this case we've got three layers. 126 00:10:08,600 --> 00:10:11,360 This is a very very simple network. 127 00:10:11,360 --> 00:10:13,810 This first layer here is called the input layer. 128 00:10:15,080 --> 00:10:18,460 Each node in the input layer represents a feature. 129 00:10:18,950 --> 00:10:22,560 So in this example we've got six features. 130 00:10:22,700 --> 00:10:25,450 The second layer here is called the hidden layer. 131 00:10:25,550 --> 00:10:27,150 Very mysterious right. 132 00:10:27,200 --> 00:10:30,070 Six nodes in the hidden layer. 133 00:10:30,080 --> 00:10:37,030 Notice how each input feature is actually connected to each and every node in the hidden layer. 134 00:10:37,190 --> 00:10:40,250 And that last layer is called the output layer. 135 00:10:40,910 --> 00:10:48,400 And here again each neuron in that second layer connects to each neuron in the output layer. 136 00:10:48,410 --> 00:10:51,070 In this case we've got one but we could have more right. 137 00:10:51,080 --> 00:10:54,620 We could have two or three or ten or what have you. 138 00:10:54,620 --> 00:10:58,870 The point I'm trying to make is that the neurons are grouped they're grouped into layers. 139 00:10:58,910 --> 00:11:04,610 You can have different number of neurons in each layer and between the two layers all the neurons are 140 00:11:04,610 --> 00:11:06,940 connected to each other. 141 00:11:06,980 --> 00:11:11,960 So one thing that we can do is we can change the architecture of our network. 142 00:11:12,050 --> 00:11:17,960 We can change the number of layers that it has the deeper the network the more layers it has. 143 00:11:17,960 --> 00:11:20,720 That's where the deep and deep learning comes from. 144 00:11:20,720 --> 00:11:23,430 So in this case we've got two hidden layers. 145 00:11:23,570 --> 00:11:28,150 If we want an even deeper network then we could add a third layer yet again. 146 00:11:28,160 --> 00:11:32,690 So in this case we have three hidden layers five total three hidden. 147 00:11:33,410 --> 00:11:39,770 But here's the million dollar question what goes on in the hidden layers. 148 00:11:39,770 --> 00:11:43,110 Well let's tackle that first hidden layer shall we. 149 00:11:43,400 --> 00:11:48,430 In that first hidden layer you'll see that every input is connected to every single node. 150 00:11:48,710 --> 00:11:56,450 And this is important because we were saying how the overall goal of the neural network will be to discover 151 00:11:56,540 --> 00:11:59,760 the optimal combination of features. 152 00:11:59,780 --> 00:12:05,220 So the fact that they're all connected means it gets to try out every single combination. 153 00:12:05,300 --> 00:12:07,660 This is what this first hidden layer will do. 154 00:12:07,790 --> 00:12:14,390 We'll combine each of the input features every which way and it will try to learn the best way to combine 155 00:12:14,390 --> 00:12:16,080 these features. 156 00:12:16,220 --> 00:12:21,920 Now in our thought experiment what we would do is we would show this neural network an image and we 157 00:12:21,920 --> 00:12:29,720 will ask the neural network to tell us if this image contains a count or not count each pixel in the 158 00:12:29,720 --> 00:12:33,760 image will be an input to the neural network. 159 00:12:33,770 --> 00:12:40,850 Now I've drawn six inputs in the previous line but if we had a 30 by 30 pixel image then that would 160 00:12:40,850 --> 00:12:49,070 actually be 900 different input nodes because we would convert the image into a matrix where each number 161 00:12:49,340 --> 00:12:55,130 represents the value of a particular pixel when I say value would be something like the color. 162 00:12:55,170 --> 00:12:55,960 Right. 163 00:12:56,130 --> 00:12:58,080 Is it black is it white and red. 164 00:12:58,080 --> 00:12:58,740 Green. 165 00:12:58,740 --> 00:13:01,620 What have you now for the sake of argument. 166 00:13:01,830 --> 00:13:06,620 This picture of a giraffe is what we're going to be sending over to our neural network. 167 00:13:06,750 --> 00:13:14,540 We're going to convert it into numbers and then those numbers get passed on to the first hidden layer. 168 00:13:14,580 --> 00:13:16,320 So what happens next. 169 00:13:16,440 --> 00:13:23,580 That first hidden layer will combine all the pixels of this image and it will start to generate features 170 00:13:23,670 --> 00:13:26,080 from those pixels. 171 00:13:26,100 --> 00:13:29,400 Now what kind of features will it generate. 172 00:13:29,400 --> 00:13:36,420 Well the neural network will probably start to detect simple patterns like lines or edges or textures 173 00:13:37,260 --> 00:13:43,400 and those patterns those are going to be the features that the first hidden layer will generate. 174 00:13:43,500 --> 00:13:49,880 The second hidden layer will then use those features that the first hidden layer outputs it. 175 00:13:50,280 --> 00:13:52,000 And we'll try to work with them. 176 00:13:52,380 --> 00:13:57,030 So that second hidden layer is no longer confronted with the individual pixels it's confronted with 177 00:13:57,030 --> 00:14:01,610 the features that the first hidden layer generated. 178 00:14:01,630 --> 00:14:04,510 So what will the second layer do with those things. 179 00:14:04,510 --> 00:14:10,510 What it might do is try to combine these edges and lines and textures into something that's of a higher 180 00:14:10,510 --> 00:14:11,650 level of complexity. 181 00:14:12,340 --> 00:14:18,640 So it might start to detect things like shapes like rectangles circles or shadows or something that's 182 00:14:18,640 --> 00:14:25,810 a little bit more complex than lines and edges and textures and those shapes in turn are going to feed 183 00:14:25,810 --> 00:14:28,190 through to that third hidden layer. 184 00:14:28,540 --> 00:14:34,060 The third hidden layer gets these shapes as an input and it will take these shapes and make its own 185 00:14:34,060 --> 00:14:42,680 features say something like eyes or is or a tail or legs and these are the features that the output 186 00:14:42,680 --> 00:14:50,600 layer then will use to identify if this image contains a cat or not a cat. 187 00:14:50,600 --> 00:14:52,990 So this brings us to the output layer. 188 00:14:53,030 --> 00:14:57,630 If a neural network we're a company then this output layer would be the CEO. 189 00:14:58,340 --> 00:15:04,820 The output layer will look at what that last hidden layer sending over and make its decision. 190 00:15:05,060 --> 00:15:09,030 So see that top neuron in that third hidden layer files. 191 00:15:09,080 --> 00:15:15,680 And it says it detected an AI and then that second neuron in that last hidden layer of files and says 192 00:15:15,920 --> 00:15:21,450 it detected some is the next one down reports that it detected a tail. 193 00:15:21,560 --> 00:15:29,180 But the fourth one is silent no legs says the fourth neuron to the CEO how the output layer then has 194 00:15:29,180 --> 00:15:30,710 to make a decision. 195 00:15:30,830 --> 00:15:33,010 Was this a picture of a cat. 196 00:15:33,050 --> 00:15:40,400 Well the CEO will take a good hard look at his managers and then he'll take a weighted average of their 197 00:15:40,400 --> 00:15:47,810 outputs so the output layer comes back to us and says Yes I am 75 percent certain that we have a cat 198 00:15:47,900 --> 00:15:51,640 in this image but we know the true answer right. 199 00:15:51,680 --> 00:15:57,770 We fed them this image and we say well sorry Mark but that's not good enough a cat. 200 00:15:57,800 --> 00:15:59,060 It is not. 201 00:15:59,060 --> 00:16:06,140 You have made the wrong call and I'm afraid we still have a long way to go well what happens now. 202 00:16:06,140 --> 00:16:09,920 Well our network just made a prediction but it got it wrong. 203 00:16:09,920 --> 00:16:16,300 So our CEO is angry and he has to figure out how far off he was with his prediction. 204 00:16:16,340 --> 00:16:19,230 So he looks at his loss and he adjusts his weights. 205 00:16:19,430 --> 00:16:25,400 And maybe next time he'll be more suspicious of his first two managers and he'll listen more closely 206 00:16:25,430 --> 00:16:27,080 to his fourth manager. 207 00:16:27,080 --> 00:16:29,980 Maybe that will result in a better prediction. 208 00:16:30,080 --> 00:16:36,440 So he runs back inside his company calls a meeting and while he starts yelling at his managers and they're 209 00:16:36,440 --> 00:16:41,750 all shifting around uncomfortably in their three thousand dollar ergonomic chairs and they're looking 210 00:16:41,750 --> 00:16:45,710 at the ground this is embarrassing right. 211 00:16:46,130 --> 00:16:50,030 But the managers know exactly who's to blame for this fiasco. 212 00:16:50,240 --> 00:16:56,090 If the associates reporting to them so the managers in that third hidden layer adjust their weights. 213 00:16:56,170 --> 00:16:59,950 Call a meeting and they start yelling at their associates. 214 00:17:00,070 --> 00:17:02,060 Now the associates are vexed. 215 00:17:02,060 --> 00:17:07,410 So the associates adjust their weights and start giving the juniors and the interns a hung time. 216 00:17:07,430 --> 00:17:10,310 But the juniors they only have the inputs to blame right. 217 00:17:10,430 --> 00:17:14,180 Stupid pixels but pixels can't talk back. 218 00:17:14,240 --> 00:17:19,960 So the juniors just adjust their weights and try to generate slightly different features. 219 00:17:20,060 --> 00:17:27,350 The next time round for the next image and this whole process is called back propagation. 220 00:17:27,350 --> 00:17:35,060 The era is passed down through the network from the output node so that each node in each layer adjusts 221 00:17:35,090 --> 00:17:35,720 its weights. 222 00:17:36,530 --> 00:17:39,910 So at this point the question is Well how are the weights adjusted. 223 00:17:40,020 --> 00:17:42,690 Are they just it down or they adjust that up. 224 00:17:42,710 --> 00:17:51,410 Well this depends on the loss function the slope or the gradient of this lost function will determine 225 00:17:51,470 --> 00:17:58,610 the adjustment and we actually cover this in detail in our separate module that is dedicated to gradient 226 00:17:58,610 --> 00:17:59,980 descent. 227 00:18:00,020 --> 00:18:05,510 The point I'm trying to make here is that through trial and error and lots of yelling from the higher 228 00:18:05,510 --> 00:18:12,050 ups in the company the network is able to start to generate its own features and detect patterns in 229 00:18:12,050 --> 00:18:13,480 the data. 230 00:18:13,520 --> 00:18:19,250 Now even though I spoke of lines and of shapes at one end and a higher level features like eyes and 231 00:18:19,250 --> 00:18:26,690 tails at the other end of the network the truth is that we don't know exactly what features the neural 232 00:18:26,690 --> 00:18:33,590 network actually generates but what we do know is that the neural network breaks down the input data 233 00:18:33,680 --> 00:18:34,740 into chunks. 234 00:18:34,940 --> 00:18:41,960 It creates a hierarchy but we don't know exactly what goes on under the hood. 235 00:18:41,960 --> 00:18:49,050 And therein also lies a problem because this makes neural networks a bit like a black box. 236 00:18:49,160 --> 00:18:53,440 We don't know exactly what's going on inside the neural networks. 237 00:18:53,660 --> 00:18:59,220 So a neural network isn't exactly what you would call a tractable model. 238 00:18:59,220 --> 00:19:05,240 And this is a big problem when you're not just interested in the accuracy of a prediction but you need 239 00:19:05,240 --> 00:19:10,080 to understand why a neural network has given a particular output. 240 00:19:10,250 --> 00:19:17,780 There is in fact a whole subfield of machine learning dedicated to analyzing neural networks to better 241 00:19:17,780 --> 00:19:21,410 understand why they do what they do. 242 00:19:21,410 --> 00:19:24,790 So I know this is a very very dense lesson on the theory. 243 00:19:24,830 --> 00:19:27,170 So here's the executive summary. 244 00:19:27,170 --> 00:19:33,980 Each neuron in a neural network will be activated based on a mathematical formula called the activation 245 00:19:33,980 --> 00:19:35,110 function. 246 00:19:35,420 --> 00:19:42,770 The activation function determines how strong this neuron will fire and then through trial and error 247 00:19:43,220 --> 00:19:48,910 the neural network is able to generate its own features from the input data. 248 00:19:48,920 --> 00:19:55,700 This allows the neural network to solve both linear problems and nonlinear problems because it tries 249 00:19:55,700 --> 00:20:02,650 all these combinations the deeper the network the more complex and the more high level the features 250 00:20:02,650 --> 00:20:06,040 are that are generated at each layer. 251 00:20:06,040 --> 00:20:11,770 One piece of good news is that the pattern of learning for a neural network is very similar to our other 252 00:20:11,770 --> 00:20:13,840 machine learning algorithms. 253 00:20:13,930 --> 00:20:15,530 It makes a prediction. 254 00:20:15,730 --> 00:20:22,600 It figures out how far off the prediction was by looking at the loss and then it adjusts its parameters 255 00:20:22,870 --> 00:20:25,660 it adjusts its weights between the neurons. 256 00:20:25,660 --> 00:20:32,860 In this case and the process by which the error gets sent back down through the network so that each 257 00:20:32,860 --> 00:20:37,800 node can adjust its weights is called back propagation. 258 00:20:37,840 --> 00:20:46,000 So in summary neural networks are very powerful and a reasonable question as well if they're so powerful 259 00:20:46,390 --> 00:20:49,820 can we use neural networks for everything. 260 00:20:49,850 --> 00:20:51,710 And the answer is yes you can. 261 00:20:51,710 --> 00:20:58,220 You can solve almost every machine learning problem with the neural network but would you want to use 262 00:20:58,310 --> 00:21:00,930 a neural network to solve every problem. 263 00:21:01,100 --> 00:21:03,280 In this case the answer is no. 264 00:21:03,320 --> 00:21:04,610 Why. 265 00:21:04,610 --> 00:21:07,760 Well let's talk about that in the next lesson. 266 00:21:07,790 --> 00:21:08,680 I'll see you there. 267 00:21:08,690 --> 00:21:09,220 Take care.