1 00:00:00,766 --> 00:00:03,133 Hello and welcome back to the course on Deep Learning. 2 00:00:03,133 --> 00:00:06,633 Now that we've seen neural networks in action, it's time for us to find out 3 00:00:06,633 --> 00:00:08,333 how they learn. 4 00:00:08,333 --> 00:00:10,333 So let's dive straight into it. 5 00:00:10,333 --> 00:00:12,966 There are two fundamentally different 6 00:00:12,966 --> 00:00:15,966 approaches to getting a program to do what you want it to do. 7 00:00:16,066 --> 00:00:20,533 One is hard coded coding where you actually tell 8 00:00:20,533 --> 00:00:25,000 the program specific rules and what outcomes you want, 9 00:00:25,000 --> 00:00:28,166 and you just guide it throughout the whole way, and you account 10 00:00:28,166 --> 00:00:32,700 for all the possible options that the program has to deal with. 11 00:00:33,166 --> 00:00:39,333 On the other hand, you have neural networks where you create a facility 12 00:00:39,333 --> 00:00:43,433 for the program to be able to understand what it needs to do on its own, 13 00:00:43,433 --> 00:00:48,166 to basically create this neural network where you provided inputs. 14 00:00:48,400 --> 00:00:52,033 You tell it what you want as outputs, and then you let it figure everything 15 00:00:52,033 --> 00:00:53,300 out on its own. 16 00:00:53,300 --> 00:00:55,866 Two fundamentally different approaches. 17 00:00:55,866 --> 00:01:00,233 And that is something to keep in mind as we go through these tutorials. 18 00:01:00,666 --> 00:01:05,900 Our goal is to create this network which then learns on its own. 19 00:01:06,200 --> 00:01:10,933 We are going to avoid it trying to put in the rules. 20 00:01:10,933 --> 00:01:14,400 And a good example that I can give you right now is, 21 00:01:14,400 --> 00:01:18,000 this will come further in the course, but it's just a very visual example. 22 00:01:18,000 --> 00:01:21,366 For instance, how do you distinguish between a dog and a cat? 23 00:01:21,666 --> 00:01:25,900 For on the left side, on the approach that's, depicted on the left, you would, 24 00:01:26,400 --> 00:01:30,466 program things in like, the cats ears have to be like this. 25 00:01:30,466 --> 00:01:32,533 Look out for, whiskers. 26 00:01:32,533 --> 00:01:34,066 Look out for this type of nose. 27 00:01:34,066 --> 00:01:37,466 Look out for this type of shape. of, face. 28 00:01:37,766 --> 00:01:38,800 Look out for these colors. 29 00:01:38,800 --> 00:01:39,166 You kind of. 30 00:01:39,166 --> 00:01:41,433 You describe all of these things, and you'd have conditions 31 00:01:41,433 --> 00:01:45,200 like if if the ears are pointy, then cat, if the ears, 32 00:01:46,600 --> 00:01:49,466 sloping down, then possibly dog and so on. 33 00:01:49,466 --> 00:01:53,100 On the other hand, for a neural network, you'd just code the neural networks, 34 00:01:53,100 --> 00:01:56,666 you code the architecture, and then you point the neural network 35 00:01:56,666 --> 00:01:59,666 at a folder of all these cats and dogs 36 00:01:59,666 --> 00:02:02,566 with images of cats and dogs, which are already categorized. 37 00:02:02,566 --> 00:02:04,500 And you tell it, okay, I've got you. 38 00:02:04,500 --> 00:02:06,700 I've got some images of cats and dogs. 39 00:02:06,700 --> 00:02:08,766 Go and learn what a cat is. 40 00:02:08,766 --> 00:02:10,466 Go and learn what a dog is. 41 00:02:10,466 --> 00:02:13,600 And the neural network will on its own understand 42 00:02:13,600 --> 00:02:15,133 everything it needs to understand 43 00:02:15,133 --> 00:02:18,133 and then further down, once it's trained up, when you give it 44 00:02:18,133 --> 00:02:21,366 a new image of a cat or a dog, it'll be able to understand what it was. 45 00:02:21,433 --> 00:02:23,100 So they're there. They are. 46 00:02:23,100 --> 00:02:25,533 Those are the two fundamentally different approaches. 47 00:02:25,533 --> 00:02:27,633 And today we're going to slowly start 48 00:02:27,633 --> 00:02:30,700 getting into how that second approach works. 49 00:02:30,966 --> 00:02:33,266 All right. So let's get straight to it. 50 00:02:33,266 --> 00:02:37,066 Here we have a very basic neural network with a one layer. 51 00:02:37,066 --> 00:02:40,200 It's this is called a single layer feedforward neural network. 52 00:02:40,500 --> 00:02:42,633 And it is also called a perceptron. 53 00:02:42,633 --> 00:02:46,966 Now before we proceed one thing that we do need to adjust is that output value. 54 00:02:47,233 --> 00:02:50,900 Right now you can see that it's just a why we need to put a y hat in there. 55 00:02:51,000 --> 00:02:55,366 And the reason for that is usually y stands for the actual value. 56 00:02:55,366 --> 00:02:56,400 And that's what we're going to be using. 57 00:02:56,400 --> 00:03:00,600 So y is going to be the actual value which we see in reality 58 00:03:01,100 --> 00:03:04,900 output value is the predicted value by the algorithm, 59 00:03:04,900 --> 00:03:09,133 by the neural network y hat is the, output value. 60 00:03:09,133 --> 00:03:11,600 Basically that's the denomination for the output value. 61 00:03:11,600 --> 00:03:17,100 And the perceptron was first invented in 1957 by Frank Rosenblatt. 62 00:03:17,333 --> 00:03:21,800 And his whole idea was to create something that can actually learn, 63 00:03:22,066 --> 00:03:25,066 and, adjust itself. 64 00:03:25,066 --> 00:03:27,866 And this is what we're going to be looking at now. 65 00:03:27,866 --> 00:03:30,133 So, we've got a perceptron. 66 00:03:30,133 --> 00:03:31,900 Let's see how a perceptron learns. 67 00:03:31,900 --> 00:03:35,700 So let's say we have some input values, that have been supplied 68 00:03:35,700 --> 00:03:39,900 to the perceptron, and, or basically to our neural network. 69 00:03:40,200 --> 00:03:40,666 Then, 70 00:03:41,700 --> 00:03:45,033 the activation function is applied, we have an output, 71 00:03:45,400 --> 00:03:48,933 and now we're going to plot the output on a, chart. 72 00:03:49,033 --> 00:03:51,666 So there it is, our output y hat. 73 00:03:51,666 --> 00:03:53,033 Now, what we need 74 00:03:53,033 --> 00:03:56,800 to do is in order to be able to learn, we need to compare the output value 75 00:03:56,800 --> 00:04:01,000 to the actual value that we want the neural network to get right. 76 00:04:01,466 --> 00:04:04,466 And that is, the value y. 77 00:04:04,666 --> 00:04:07,666 And so if we plot it here, you'll see that there's a bit of a difference. 78 00:04:08,166 --> 00:04:10,733 Now we're going to calculate a function called the cost 79 00:04:10,733 --> 00:04:13,433 function is calculated as one half of the difference 80 00:04:13,433 --> 00:04:16,733 of the squared difference between the actual value and output value. 81 00:04:17,066 --> 00:04:20,400 Now there there are many ways you can come up with a cost function. 82 00:04:20,400 --> 00:04:23,266 There are many different cost functions that you can use. 83 00:04:23,266 --> 00:04:25,666 this is probably the most commonly used cost function. 84 00:04:25,666 --> 00:04:30,466 And why it is specifically this function that we use. 85 00:04:30,466 --> 00:04:34,200 We'll find out further down when we're talking about a gradient descent. 86 00:04:34,200 --> 00:04:37,666 But for now we're just going to, agree that this is the cost function. 87 00:04:37,666 --> 00:04:40,400 And basically what the cost function is telling us is, 88 00:04:40,400 --> 00:04:43,900 what is the error that you have in your prediction? 89 00:04:44,133 --> 00:04:47,833 And, our goal is to minimize the cost function because the, 90 00:04:47,833 --> 00:04:51,300 the lower the cost function, the closer the y hat is to y. 91 00:04:52,033 --> 00:04:52,300 Okay. 92 00:04:52,300 --> 00:04:54,333 So as long as we agree on that, let's proceed. 93 00:04:54,333 --> 00:04:58,300 So basically from here, what happens is, there's our cost function. 94 00:04:58,300 --> 00:05:02,833 And from here what happens is now we're going to, once we've compared, 95 00:05:02,966 --> 00:05:08,666 now we're going to feed this information back into, the neural network. 96 00:05:08,833 --> 00:05:09,600 So there we go. 97 00:05:09,600 --> 00:05:12,600 There's, the information going back into the neural network, 98 00:05:12,866 --> 00:05:15,600 and it goes to the weights, and the weights get updated. 99 00:05:15,600 --> 00:05:17,966 Basically, the only thing that we have control of 100 00:05:17,966 --> 00:05:23,033 in this very simple neural network are the weights w1, w2 all the way to W1. 101 00:05:23,866 --> 00:05:26,700 And, our goal is to minimize the cost function. 102 00:05:26,700 --> 00:05:29,366 So all we can do is update the weights. 103 00:05:29,366 --> 00:05:30,900 So we update the weights. 104 00:05:30,900 --> 00:05:36,133 tweak them a little bit and how exactly we'll find out for the down. 105 00:05:36,133 --> 00:05:40,000 But for now we, we agree that we update the weights and then we continue. 106 00:05:40,000 --> 00:05:44,700 So but here I've put up this, screenshots of the data 107 00:05:44,700 --> 00:05:49,766 just to make some one point very clear that right now, throughout this whole 108 00:05:49,766 --> 00:05:53,900 experiment, everything we're doing right now, we're dealing with just the one row. 109 00:05:53,900 --> 00:05:57,800 So we're dealing with we have a data set of one row where we have, 110 00:05:58,166 --> 00:06:03,733 for instance, we're dealing with, how long you study like, the variable 111 00:06:03,733 --> 00:06:08,033 that we're predicting is what, what, results you're going to get on an exam. 112 00:06:08,266 --> 00:06:11,433 And the dependent independent variables that we have is how many hours 113 00:06:11,433 --> 00:06:13,833 did you study for how many hours did you sleep, 114 00:06:13,833 --> 00:06:16,700 and what did you get on the quiz in the mid semester. 115 00:06:16,700 --> 00:06:18,866 So in in the middle of the semester it a quiz. 116 00:06:18,866 --> 00:06:19,800 What percentage did you get there. 117 00:06:19,800 --> 00:06:23,933 So based on those variables we're trying to predict what score 118 00:06:23,933 --> 00:06:24,600 you'll get for the exam. 119 00:06:24,600 --> 00:06:26,700 And in an exam the 93%. 120 00:06:26,700 --> 00:06:29,466 That's the actual value. So that's why. 121 00:06:29,466 --> 00:06:32,200 So, so we feed these three values 122 00:06:32,200 --> 00:06:35,200 into our neural network again for the second time now, 123 00:06:35,533 --> 00:06:38,733 and then we're going to be comparing the result to Y. 124 00:06:39,000 --> 00:06:40,600 So let's see how this works. 125 00:06:40,600 --> 00:06:42,666 We feed these values into the neural network. 126 00:06:43,700 --> 00:06:46,600 Everything gets adjusted and weights get adjust. 127 00:06:46,600 --> 00:06:50,433 So as you can see this is again we're going to feed the values again. 128 00:06:50,433 --> 00:06:53,100 The point here is that we're feeding in these same values. 129 00:06:53,100 --> 00:06:54,400 So we only have one row. 130 00:06:54,400 --> 00:06:56,300 We're trying to we're training on one row. 131 00:06:56,300 --> 00:06:59,300 This is because this is just a very simple basic example. 132 00:06:59,466 --> 00:07:01,633 Then we'll see what happens when there's more rows. 133 00:07:01,633 --> 00:07:06,066 So again we feed these rows in our cost functions get adjusted. 134 00:07:06,066 --> 00:07:10,433 As you can see everything happens along those lines again. 135 00:07:10,433 --> 00:07:13,600 So as you can see, every time our y hat is changing 136 00:07:13,600 --> 00:07:16,366 because we've tweaked the weights, all I had is changing. 137 00:07:16,366 --> 00:07:18,266 Our cost function is changing. Let's have a look again. 138 00:07:18,266 --> 00:07:21,366 So we feed those in Y hat is changing. 139 00:07:21,366 --> 00:07:22,733 Cost function is changing. 140 00:07:22,733 --> 00:07:25,266 We get information back, feed back to the weights 141 00:07:25,266 --> 00:07:26,933 so that the weights get adjusted. Again. 142 00:07:26,933 --> 00:07:28,566 We feed in the same values. 143 00:07:28,566 --> 00:07:32,700 Every time everything gets adjusted goes back to the weights and one more time 144 00:07:33,033 --> 00:07:34,333 feed in. Okay 145 00:07:35,600 --> 00:07:36,600 and another time. 146 00:07:36,600 --> 00:07:39,600 So we've adjust the weight, adjusted the weights we feed in the information. 147 00:07:40,066 --> 00:07:41,300 And there we go. 148 00:07:41,300 --> 00:07:45,633 So, now this time the y hat is equal to y equals functional zero. 149 00:07:45,833 --> 00:07:48,333 Usually we won't get cost function equal to zero. 150 00:07:48,333 --> 00:07:50,700 But this is a very simple example. 151 00:07:50,700 --> 00:07:54,600 So hopefully all that made sense every time we feed in exactly 152 00:07:54,600 --> 00:07:56,166 that same row. 153 00:07:56,166 --> 00:07:59,166 Because just in this case we're just dealing with that one row 154 00:07:59,700 --> 00:08:04,400 into our neural network, where then, the weights get, the values get 155 00:08:04,433 --> 00:08:06,900 well supplied by the weights, the activation function is applied. 156 00:08:06,900 --> 00:08:09,900 We get y hat, Y hat is compared to y. 157 00:08:10,200 --> 00:08:12,233 Then we see how the cost function has changed. 158 00:08:12,233 --> 00:08:13,566 Feed back, feed that information 159 00:08:13,566 --> 00:08:16,800 back into the neural network and then just adjust the weights again. 160 00:08:17,700 --> 00:08:21,066 and then we repeat the same process again with the same exact row. 161 00:08:21,266 --> 00:08:23,333 we're trying to minimize that cost function. 162 00:08:23,333 --> 00:08:26,566 So up until now we've been dealing with just that one row. 163 00:08:26,866 --> 00:08:29,366 Let's see what happens when you have multiple rows. 164 00:08:29,366 --> 00:08:31,200 So here's the full data set. 165 00:08:31,200 --> 00:08:35,266 We have eight rows of, how many hours you slept. 166 00:08:35,266 --> 00:08:39,133 Or maybe these are, different students in a day taking the same exam. 167 00:08:39,133 --> 00:08:43,166 How many other hours they studied, how many hours they slept before the exam? 168 00:08:43,166 --> 00:08:47,033 What to get on the quiz and their final result on the test. 169 00:08:47,366 --> 00:08:51,800 And as you can see here on the left, I've got eight of these perceptrons. 170 00:08:51,800 --> 00:08:54,666 Actually, they are all the same perceptron. 171 00:08:54,666 --> 00:08:55,900 So this is also important to understand. 172 00:08:55,900 --> 00:09:01,133 I just multiplied it or like duplicated eight times just so that we can 173 00:09:01,733 --> 00:09:04,200 conceptual understand. 174 00:09:04,200 --> 00:09:06,666 But the important thing here, it's the same neural network. 175 00:09:06,666 --> 00:09:10,300 We're going to be feeding these into one same neural network. 176 00:09:10,300 --> 00:09:11,566 So let's go. Let's get started. 177 00:09:11,566 --> 00:09:14,566 So one epoch, 178 00:09:14,700 --> 00:09:18,166 as you all here had learned, mentioning one epoch is 179 00:09:18,166 --> 00:09:22,233 when we go through our whole data set and we train our, 180 00:09:22,600 --> 00:09:26,233 neural network on, on all of these, rows. 181 00:09:26,233 --> 00:09:27,333 So let's go, let's get started. 182 00:09:27,333 --> 00:09:31,200 So there's our first row, and there's y hat for the first row. 183 00:09:32,400 --> 00:09:35,133 There's the second row, there's y hat for the second row. 184 00:09:35,133 --> 00:09:39,266 So again it's being fed into the same neural network every time. 185 00:09:39,300 --> 00:09:41,100 I've just copied them several times. 186 00:09:41,100 --> 00:09:44,100 So we can visually see how this is happening. 187 00:09:44,933 --> 00:09:47,733 Then again is it's happening again. 188 00:09:47,733 --> 00:09:50,533 That's third row. Fourth row. 189 00:09:50,533 --> 00:09:53,533 There's our y hat for the fourth row and so on basically. 190 00:09:53,666 --> 00:09:56,500 Then we get the same values for the remaining four rows as well. 191 00:09:56,500 --> 00:10:02,400 So every time we just feed in a row into our neural network, we get a value. 192 00:10:02,800 --> 00:10:06,900 then we compare to the actual values. 193 00:10:06,900 --> 00:10:08,600 So they are the actual values. 194 00:10:08,600 --> 00:10:11,500 So for every single row we have an actual value. 195 00:10:11,500 --> 00:10:14,666 And now based on all of these differences 196 00:10:14,666 --> 00:10:18,233 between y hat and y, we can calculate the cost function 197 00:10:18,233 --> 00:10:22,200 which is the sum of all of those 198 00:10:22,200 --> 00:10:25,366 squared differences between y hat and y. 199 00:10:25,366 --> 00:10:27,066 And all of that is halved. 200 00:10:28,100 --> 00:10:30,200 And there's our cost function. 201 00:10:30,200 --> 00:10:33,833 And basically now what we do after we have the full cost function, 202 00:10:34,166 --> 00:10:39,433 we go back and we update the weights, we update w1, w2, w3. 203 00:10:39,433 --> 00:10:42,433 And the important thing to remember here is that all of these, 204 00:10:42,600 --> 00:10:47,266 perceptrons, all of these neural networks is actually one neural network. 205 00:10:47,266 --> 00:10:49,500 So there's not eight of them, there's just one. 206 00:10:49,500 --> 00:10:52,766 And when we update the weights we're going to update the weights 207 00:10:53,100 --> 00:10:54,400 in that one neural network. 208 00:10:54,400 --> 00:10:57,466 So basically the weights are going to be the same for all of the rows. 209 00:10:57,766 --> 00:11:00,433 So it's not the case that every row has its own weights. 210 00:11:00,433 --> 00:11:02,733 Now all the rows share the weights. 211 00:11:02,733 --> 00:11:06,300 And so that's why we looked at the cost function, 212 00:11:06,300 --> 00:11:09,900 which is the sum of the squared differences. 213 00:11:10,200 --> 00:11:11,866 And then we updated the weights. 214 00:11:11,866 --> 00:11:15,166 And now from here that was just one iteration. 215 00:11:15,166 --> 00:11:16,400 Next we're going to 216 00:11:17,533 --> 00:11:18,933 run this whole thing again. 217 00:11:18,933 --> 00:11:23,433 We're going to, feed every single row into the neural network, 218 00:11:23,600 --> 00:11:26,300 find out our cost function and do this whole process again. 219 00:11:26,300 --> 00:11:30,566 So just as we saw previously, where we had just one row 220 00:11:30,566 --> 00:11:33,533 and we were doing everything again and again, again, again, same thing here. 221 00:11:33,533 --> 00:11:37,500 But now we're going to be doing it for rows or 800 rows or a thousand rows, 222 00:11:37,500 --> 00:11:40,500 however many rows you have in your data set. 223 00:11:40,666 --> 00:11:43,666 you do this process and then you calculate the cost function. 224 00:11:44,100 --> 00:11:46,700 And the goal here is to minimize 225 00:11:46,700 --> 00:11:50,766 the cost function, and to get as soon as you found 226 00:11:50,766 --> 00:11:54,300 the minimum of the cost function, that is your final neural network. 227 00:11:54,300 --> 00:11:57,833 That means your weights have been adjusted and you have, 228 00:11:58,433 --> 00:12:01,800 found the optimal, 229 00:12:02,766 --> 00:12:04,400 weights for 230 00:12:04,400 --> 00:12:07,566 this, data set that you, you're training on and you're ready 231 00:12:07,566 --> 00:12:10,566 to proceed to the testing phase or to the application phase. 232 00:12:11,400 --> 00:12:14,400 And this whole process is called back propagation. 233 00:12:14,833 --> 00:12:20,366 So some additional reading that you might want to do for the cost function. 234 00:12:20,366 --> 00:12:24,766 And I know we just talked about one and there are many different ones. 235 00:12:24,766 --> 00:12:28,200 A good article is located on Cross-Validated. 236 00:12:28,666 --> 00:12:29,766 it's called a list of cost 237 00:12:29,766 --> 00:12:32,766 functions used in neural networks alongside applications. 238 00:12:32,933 --> 00:12:35,700 So the URL is there, but you can just Google 239 00:12:35,700 --> 00:12:38,933 for that exact search term or a search phrase. 240 00:12:38,933 --> 00:12:41,933 And you will that this one will be the first one that pops up. 241 00:12:42,000 --> 00:12:45,133 It's actually got some good examples and application 242 00:12:45,666 --> 00:12:48,300 or use cases for different cost function. 243 00:12:48,300 --> 00:12:50,200 So if you're interested to learn more about cost functions 244 00:12:50,200 --> 00:12:51,866 check out this article. 245 00:12:51,866 --> 00:12:54,266 And on that note, I hope you enjoyed today's tutorial. 246 00:12:54,266 --> 00:12:55,933 I look forward to seeing you next time. 247 00:12:55,933 --> 00:12:57,966 Until then, enjoy deep learning.