0 1 00:00:00,650 --> 00:00:04,520 All right, now it's finally time to do the hard part. 1 2 00:00:04,530 --> 00:00:09,780 We're going to write our own algorithm that will find the lowest cost. 2 3 00:00:09,930 --> 00:00:17,670 And this is the famous gradient descent algorithm, and gradient is well you guessed it just another word 3 4 00:00:17,790 --> 00:00:19,520 for slope. 4 5 00:00:19,520 --> 00:00:26,740 So I'm going to give you a very, very Austrian perspective to think about the gradient descent algorithm. 5 6 00:00:26,740 --> 00:00:32,520 You know, we've got a lot of mountains back in Austria and they're very, very beautiful and you can go 6 7 00:00:32,520 --> 00:00:37,340 ski down them, but a mountain is a force of nature. 7 8 00:00:37,350 --> 00:00:47,670 You have to respect the mountains. You see, the weather can change very, very quickly and this is especially 8 9 00:00:47,730 --> 00:00:49,370 unpredictable in winter 9 10 00:00:49,560 --> 00:00:51,400 and at high altitude. 10 11 00:00:51,480 --> 00:00:57,430 So you know imagine yourself that you've wandered off the beaten track and the fog comes rolling in 11 12 00:00:57,540 --> 00:01:03,870 and at this point you find yourself in a survival situation. 12 13 00:01:03,870 --> 00:01:09,850 This is when you can't see very far and you can only feel the ground beneath your feet. 13 14 00:01:09,870 --> 00:01:15,420 The cold is going to be creeping in through your jacket and you find yourself thinking: How do I get 14 15 00:01:15,420 --> 00:01:15,800 down? 15 16 00:01:15,810 --> 00:01:23,650 How do I get back down? Well, to figure out which way is down and towards that hot cup of tea waiting 16 17 00:01:23,650 --> 00:01:25,460 for you at the end of your journey, 17 18 00:01:25,570 --> 00:01:28,240 you got a feel like, 18 19 00:01:28,240 --> 00:01:29,020 which way is down, 19 20 00:01:29,020 --> 00:01:30,920 what is the slope, right? 20 21 00:01:30,970 --> 00:01:38,520 You're going to look at your feet and you're going to figure out that the fastest way down is in the 21 22 00:01:38,520 --> 00:01:42,010 direction where the slope is steepest. 22 23 00:01:42,180 --> 00:01:45,990 Right? Where the descent is the most steep. 23 24 00:01:46,530 --> 00:01:53,700 And if you take a step downwards in that direction and then kind of get a feel for it again, like which 24 25 00:01:53,700 --> 00:01:59,670 way, which way is the slope and then take another step where the slope is steepest, you'll be down in 25 26 00:01:59,670 --> 00:02:01,010 that valley in no time. 26 27 00:02:01,020 --> 00:02:04,930 And you can sit on that hot cup of tea, right? 27 28 00:02:05,100 --> 00:02:11,910 And this is how you can think about gradient descent, except, instead of a mountain, yeah, gradient descent 28 29 00:02:12,350 --> 00:02:18,060 is going to take place on a cost function and the cost function actually doesn't tend to look like this - 29 30 00:02:18,060 --> 00:02:21,060 it doesn't kind of have a peak, if you will. 30 31 00:02:21,120 --> 00:02:22,100 Right? 31 32 00:02:22,170 --> 00:02:27,450 Because if a function had a peak then it would be called concave, 32 33 00:02:27,450 --> 00:02:29,510 it has a maximum. 33 34 00:02:29,760 --> 00:02:33,240 But with our cost functions, we're going to be looking for minimums. 34 35 00:02:33,240 --> 00:02:33,420 Right? 35 36 00:02:33,420 --> 00:02:37,640 So if you imagine that mountain flipped upside down and all you've got is a valley, 36 37 00:02:37,760 --> 00:02:37,980 right, 37 38 00:02:37,980 --> 00:02:45,070 that you have to kind of find then your cost function is going to look more like this. 38 39 00:02:45,090 --> 00:02:49,390 This is a kind of function that's called convex. 39 40 00:02:49,400 --> 00:02:55,800 It will have a minimum and a hard job is to get to the bottom of it because that's where the cost is 40 41 00:02:55,980 --> 00:02:58,040 lowest. 41 42 00:02:58,470 --> 00:03:01,140 Now, gradient descent isn't always called gradient descent, 42 43 00:03:01,140 --> 00:03:04,840 there's another word for it that you might see as well in the literature. 43 44 00:03:04,920 --> 00:03:13,260 Sometimes it's referred to as Steepest Descent and yes it's an optimization algorithm for finding the 44 45 00:03:13,380 --> 00:03:15,390 minimum of a function. 45 46 00:03:15,390 --> 00:03:21,140 So, you know, think about our mountain example - to find the minimum the function takes these little steps, 46 47 00:03:21,150 --> 00:03:21,420 right? 47 48 00:03:21,420 --> 00:03:28,140 It takes a step in that direction where the slope is steepest, in the direction of the negative of the 48 49 00:03:28,140 --> 00:03:35,240 gradient and bit by bit ends up in the bottom of the valley. 49 50 00:03:35,260 --> 00:03:35,620 All right. 50 51 00:03:35,650 --> 00:03:38,560 So let's implement this in Jupyter notebook. 51 52 00:03:38,560 --> 00:03:41,650 Let's add another markdown cell here. 52 53 00:03:41,650 --> 00:03:51,760 And I'm going to go to "Cell" > "Cell Type" > "Markdown" and put two hash tags there for a section heading and that section 53 54 00:03:51,760 --> 00:04:00,340 heading is gonna be "Python Loops and Gradient Descent". 54 55 00:04:00,340 --> 00:04:06,610 Now, if you're a seasoned programmer you're gonna be familiar with writing loops but if you're new to 55 56 00:04:06,610 --> 00:04:13,840 Python or you're new to programming, then the next couple of minutes are going to be the introduction to this 56 57 00:04:14,080 --> 00:04:20,240 topic of loops. Loops are little bits of code that are executed over and over again. 57 58 00:04:20,290 --> 00:04:22,520 We're going to walk down that mountain. 58 59 00:04:22,540 --> 00:04:26,140 We're going to walk down into the valley with our gradient descent algorithm. 59 60 00:04:26,140 --> 00:04:33,970 So this is going to be a very, very useful tool for accomplishing that because our algorithm has to complete 60 61 00:04:33,970 --> 00:04:35,590 that famous three step process. 61 62 00:04:35,590 --> 00:04:35,990 Right? 62 63 00:04:36,040 --> 00:04:41,080 Predict, calculate error and learn, and repeat. 63 64 00:04:41,080 --> 00:04:47,560 So instead of writing the same Python instructions over and over again we're going to be using loops 64 65 00:04:47,830 --> 00:04:51,360 to simplify that for us. 65 66 00:04:51,820 --> 00:04:55,730 Speaking of "for", this is the first loop I'm going to introduce to you, guys. 66 67 00:04:55,750 --> 00:05:04,850 So this is gonna be the for loop in Python. So I'll just write a little comment here "Python for loop" and 67 68 00:05:04,870 --> 00:05:09,170 this is what the syntax looks like. We're going to have the keyword "for". 68 69 00:05:09,220 --> 00:05:15,460 And then there's gonna be a variable, in this case I'm gonna call it "n", and then then other keyword 69 70 00:05:16,300 --> 00:05:28,070 "in" and then I'm going to say "range(5):", new line and now we're inside the loop here 70 71 00:05:28,070 --> 00:05:33,900 we're gonna print famous first words "Hello World". 71 72 00:05:34,160 --> 00:05:36,290 OK, let's hit Shift + Enter. 72 73 00:05:36,290 --> 00:05:38,010 See what happens. 73 74 00:05:38,050 --> 00:05:38,330 All right. 74 75 00:05:38,350 --> 00:05:41,600 So we've printed "Hello World" five times. 75 76 00:05:41,600 --> 00:05:41,800 Right? 76 77 00:05:41,810 --> 00:05:43,210 One, two, three, four, five. 77 78 00:05:43,640 --> 00:05:50,780 If I change this range to 3, l print it three times. If I change it to a thousand, it'll print it a thousand 78 79 00:05:50,780 --> 00:05:58,250 times, but let's stick to the stick to five for the time being, and let's take a closer look at this value 79 80 00:05:58,340 --> 00:05:59,130 n here. 80 81 00:05:59,150 --> 00:05:59,420 Right. 81 82 00:05:59,450 --> 00:06:06,580 n is just a variable and it's going to keep track of how often our for loop has run. 82 83 00:06:06,590 --> 00:06:15,440 So if I say "'Hello World', n", then I can see what the value is of the variable n each time the 83 84 00:06:15,440 --> 00:06:16,050 loop is run. 84 85 00:06:16,070 --> 00:06:17,200 So it starts at zero. 85 86 00:06:17,270 --> 00:06:20,830 Programmers like to start counting from zero. 86 87 00:06:21,050 --> 00:06:23,900 And that's the very first time the loop runs. 87 88 00:06:23,900 --> 00:06:28,400 Then this print statement is executed another time. 88 89 00:06:28,490 --> 00:06:33,410 So second time, third time, fourth time, fifth time. 89 90 00:06:33,590 --> 00:06:37,010 And at this point the loop stops. 90 91 00:06:37,010 --> 00:06:38,890 Right. 91 92 00:06:39,080 --> 00:06:43,620 Show you that - "print( 92 93 00:06:43,840 --> 00:06:46,240 'End of Loop')". 93 94 00:06:46,770 --> 00:06:49,170 So the Python program will come in here 94 95 00:06:49,240 --> 00:06:53,460 and execute whatever's inside the loop, and you can tell what's inside 95 96 00:06:53,460 --> 00:06:59,850 by the spacing, a predefined number of times, in this case five times. 96 97 00:06:59,850 --> 00:07:00,320 Right. 97 98 00:07:00,330 --> 00:07:03,630 0, 1, 2, 3 ,4 . 98 99 00:07:03,630 --> 00:07:07,380 Now we can call a little counter variable here and we can call it "i", 99 100 00:07:07,380 --> 00:07:09,960 this is another one that is often used. 100 101 00:07:09,960 --> 00:07:13,240 So if I call it "i", I get exactly the same result. 101 102 00:07:13,290 --> 00:07:15,180 It really doesn't matter what you call it. 102 103 00:07:15,180 --> 00:07:19,380 You can call it counter. As long as you're consistent, 103 104 00:07:19,380 --> 00:07:28,840 you can access the variable, the looping counter, inside of the loop by its name. All right, 104 105 00:07:28,840 --> 00:07:30,710 so that's the that's the for loop. 105 106 00:07:30,880 --> 00:07:38,590 It executes a predefined number of times and it's got this very, very simple syntax "for n in range", 106 107 00:07:38,650 --> 00:07:40,460 And then some number here. 107 108 00:07:40,480 --> 00:07:45,490 So this is how often times you want to execute the loop. With that out of the way, 108 109 00:07:45,520 --> 00:07:53,620 let me show you another type of loop. This type of loop is also very, very common. 109 110 00:07:53,800 --> 00:07:57,510 And this is the so-called "while" loop. 110 111 00:08:00,360 --> 00:08:03,350 A while loop works a little differently. 111 112 00:08:03,480 --> 00:08:04,130 Right. 112 113 00:08:04,140 --> 00:08:08,210 It has a condition that it checks every time it runs, right. 113 114 00:08:08,220 --> 00:08:14,610 So it will check the condition and then if that condition holds, it's going to run the code inside the 114 115 00:08:14,610 --> 00:08:19,890 loop and it's going to continue doing that until the condition fails. 115 116 00:08:19,890 --> 00:08:27,870 So if I have a counter and I say it's equal to zero and then I can write my while loop like this I can 116 117 00:08:27,870 --> 00:08:39,170 say "while", which is a key word and say, counter is smaller than, I don't know what, 7, colon print 117 118 00:08:41,690 --> 00:08:42,350 Counting 118 119 00:08:47,580 --> 00:08:48,320 counter. 119 120 00:08:48,830 --> 00:08:57,910 So print the value of my counter inside my loop and then I'll say "counter = counter + 1". 120 121 00:08:57,910 --> 00:09:04,190 So I'm going to increment my counter variable by 1 every time the loop runs. 121 122 00:09:06,280 --> 00:09:12,850 And then when I finish with the loop I'll print something else. 122 123 00:09:12,850 --> 00:09:14,190 Yeah. 123 124 00:09:14,200 --> 00:09:17,260 "Ready or not, here I come"!. 124 125 00:09:18,590 --> 00:09:23,260 Yeah let's make that loop a little bit more menacing than the last one. 125 126 00:09:23,260 --> 00:09:33,640 So if I hit Shift+Enter now, I can see my print statement inside my loop executed seven times, 126 127 00:09:33,640 --> 00:09:39,410 right? Starts at zero and executes it until our condition fails. 127 128 00:09:39,420 --> 00:09:46,630 This is the condition, so whatever follows the while keyword is the condition that's checked. 128 129 00:09:47,220 --> 00:09:50,080 And this fails when counter is equal to seven. 129 130 00:09:50,200 --> 00:09:53,280 Seven is equal to seven, it's not smaller than seven. 130 131 00:09:53,340 --> 00:09:56,720 So this will be false at this point. 131 132 00:09:57,000 --> 00:10:02,770 The loop terminates and the code inside is not executed anymore. 132 133 00:10:03,060 --> 00:10:06,810 And we jump to our print statement below. 133 134 00:10:06,810 --> 00:10:09,430 Right, this one "Ready or not, here I come!". 134 135 00:10:09,780 --> 00:10:11,780 And this is what we're seeing here. 135 136 00:10:13,140 --> 00:10:21,020 Again, I can accomplish the very, very same thing as with the for loop so I can execute it five times. 136 137 00:10:21,090 --> 00:10:27,530 So if you want you can also execute a while loop a predefined number of times. 137 138 00:10:27,610 --> 00:10:31,510 Yeah there's a small catch, there's a small gotcha 138 139 00:10:31,650 --> 00:10:37,250 that can happen with while loops that you won't get with for loops. 139 140 00:10:37,250 --> 00:10:44,100 Any guess what this gotcha is? Any guess what it is that can trip you up and where you can shoot yourself 140 141 00:10:44,100 --> 00:10:44,580 in the foot? 141 142 00:10:47,440 --> 00:10:52,920 So with while loops you can get into a situation where they don't stop, where they don't terminate. 142 143 00:10:52,920 --> 00:11:01,980 So, for example, if I had made a typo here and instead of that plus I had hit minus, then my loop would 143 144 00:11:01,980 --> 00:11:05,770 actually run forever, right, because it would start at a zero, 144 145 00:11:05,770 --> 00:11:14,270 then when it reaches this line my counter would go to negative 1, then would come here and go to negative 145 146 00:11:14,270 --> 00:11:17,420 2 and then here negative 3. 146 147 00:11:17,420 --> 00:11:20,830 And this thing would just continue going, right? 147 148 00:11:20,960 --> 00:11:28,370 Which is clearly not my intention, right? It would continue going and cause a lot of problems. 148 149 00:11:28,400 --> 00:11:37,430 So with while loops you have to be careful that you don't accidentally write an infinite loop. 149 150 00:11:37,940 --> 00:11:45,080 So, for loops by their very nature run a predefined number of times, while loops run while a certain condition 150 151 00:11:45,080 --> 00:11:46,700 holds true. 151 152 00:11:46,910 --> 00:11:53,480 And this is where you gotta be, gotta be careful. So with your while loops, you've got to make sure they terminate 152 153 00:11:54,430 --> 00:11:59,940 and the easiest way to remember this is with an old programming joke. 153 154 00:11:59,980 --> 00:12:01,810 Yeah, that goes something like this. 154 155 00:12:02,060 --> 00:12:09,440 A programmer once said to his wife "Honey I'm heading to the supermarket to buy some groceries", to which 155 156 00:12:09,440 --> 00:12:13,710 his wife responded "While you're there, buy some milk". 156 157 00:12:13,710 --> 00:12:16,460 And alas he never returned home again. 157 158 00:12:17,510 --> 00:12:17,900 Oh. 158 159 00:12:17,900 --> 00:12:19,130 Crickets. 159 160 00:12:19,460 --> 00:12:21,220 Back to gradient descent. 160 161 00:12:21,440 --> 00:12:26,330 Let's tackle that in the new cell here at the bottom. 161 162 00:12:26,330 --> 00:12:30,850 The thing with gradient descent is that we need a couple of ingredients right. 162 163 00:12:30,860 --> 00:12:34,170 We need a starting point. 163 164 00:12:34,190 --> 00:12:41,990 Then we need a learning rate, and we're gonna need some maybe temporary value to hold onto something 164 165 00:12:42,200 --> 00:12:45,120 while our program is executing. 165 166 00:12:45,230 --> 00:12:47,720 So I'm gonna create these three things here. 166 167 00:12:47,750 --> 00:12:54,600 I'm going to say "new_x" which is gonna be our starting point, I'm going to set it equal to 3, I'm going to start with 167 168 00:12:54,680 --> 00:12:57,050 3 as the starting point. 168 169 00:12:57,090 --> 00:13:03,440 I'm going to say "previous_x" and this is gonna be my temp value if you will. 169 170 00:13:03,530 --> 00:13:06,300 That only matters inside of the loop. 170 171 00:13:07,180 --> 00:13:11,700 And then I'm going to also specify a learning rate. 171 172 00:13:11,720 --> 00:13:11,990 Yeah. 172 173 00:13:12,010 --> 00:13:14,630 Or gamma or whatever you call it. 173 174 00:13:14,680 --> 00:13:18,280 So I'll call it a step multiplier 174 175 00:13:22,310 --> 00:13:27,900 and I'll set it equal to 0.1. Now it's time to write that loop. 175 176 00:13:27,910 --> 00:13:36,930 It's gonna be a for loop for us. So I'm going to say for and in range and maybe start at 30, 176 177 00:13:37,480 --> 00:13:39,640 colon, 177 178 00:13:39,640 --> 00:13:40,880 and now for the first step - 178 179 00:13:40,930 --> 00:13:43,040 What's the first thing that we have to do? 179 180 00:13:43,510 --> 00:13:46,020 Well, we have to make a guess, right? 180 181 00:13:46,030 --> 00:13:47,770 We have to make some prediction. 181 182 00:13:47,770 --> 00:13:50,930 This is step one of the machine learning process. 182 183 00:13:50,980 --> 00:13:59,280 So I'm going to take our temp value, previous_x and I'm going to set it equal to our random guess. 183 184 00:13:59,310 --> 00:13:59,760 Yeah. 184 185 00:14:00,000 --> 00:14:01,990 new_x = 3 185 186 00:14:02,220 --> 00:14:06,970 Three was a random guess, just our starting point for our gradient descent. 186 187 00:14:06,970 --> 00:14:08,570 I'm going to set them equal to each other. 187 188 00:14:09,860 --> 00:14:16,130 Now we get to step two. Step two is calculating the error because we need to know how far off we were. 188 189 00:14:16,650 --> 00:14:18,010 From the previous lesson, 189 190 00:14:18,050 --> 00:14:28,220 you will know that the steepness of the slope tells us how far off we are, right, from the minimum, because 190 191 00:14:28,220 --> 00:14:30,310 at the minimum the slope is equal to zero. 191 192 00:14:31,100 --> 00:14:35,210 And everywhere else it's equal to some number that isn't zero. 192 193 00:14:35,250 --> 00:14:47,470 So our gradient is gonna be equal to df of the previous_x. 193 194 00:14:47,510 --> 00:14:47,850 Yeah. 194 195 00:14:48,280 --> 00:14:56,140 So we're gonna call our derivative function. I'm going to pass in the temp value. 195 196 00:14:56,170 --> 00:15:05,570 So at the point where we are, in our function, I'm going to store the the slope, yeah, at this point in a variable 196 197 00:15:05,570 --> 00:15:12,380 called gradient. So one thing you might ask at this point is - Why is calculating the gradient 197 198 00:15:12,420 --> 00:15:14,330 step two or calculating the error? 198 199 00:15:14,340 --> 00:15:16,770 What's the link between those two things? 199 200 00:15:17,100 --> 00:15:25,430 And the way to think about it is that the further away we are from our minimum, the steeper our slope. 200 201 00:15:25,440 --> 00:15:33,300 So if the slope is very, very steep then it's indicative of being very, very far away from where we want 201 202 00:15:33,300 --> 00:15:37,380 to be. A steep slope means that we've got a high error. 202 203 00:15:37,440 --> 00:15:42,590 And if the slope is zero or close to it then our error is small. 203 204 00:15:44,390 --> 00:15:47,420 And now it's time for that adjustment step, for that learning step. 204 205 00:15:48,380 --> 00:16:00,770 So the new value of x is gonna be equal to the previous value of x minus, because we get to go down the 205 206 00:16:00,770 --> 00:16:04,310 hill, minus our step multiplier 206 207 00:16:07,750 --> 00:16:17,960 times the slope, times the gradient and remember this is the value of the slope at the previous value 207 208 00:16:17,960 --> 00:16:19,400 of x. 208 209 00:16:19,450 --> 00:16:24,820 So what we're doing here is we're taking a step that's proportional to the negative of the gradient 209 210 00:16:24,820 --> 00:16:27,660 of the function at the point that we're at. 210 211 00:16:28,600 --> 00:16:35,680 And then we're subtracting from the previous x value because we want to move against the gradient towards 211 212 00:16:35,680 --> 00:16:41,130 the minimum and this is where the learning in machine learning takes place. 212 213 00:16:42,260 --> 00:16:49,470 So this loop is going to run 30 times and after it's finished let's print out our results. 213 214 00:16:49,470 --> 00:16:55,350 So I'm going to say "Local minimum occurs at", 214 215 00:16:58,550 --> 00:17:01,350 at what? Well the new value of x, right? 215 216 00:17:01,350 --> 00:17:08,290 Because that's what we're updating in our for loop, and we're gonna print out the slope. 216 217 00:17:08,420 --> 00:17:08,660 Yeah. 217 218 00:17:08,820 --> 00:17:11,370 So we just have to make sure our slope is close to zero. 218 219 00:17:11,370 --> 00:17:12,140 Right? 219 220 00:17:12,270 --> 00:17:16,410 Or the value of df(x) 220 221 00:17:16,590 --> 00:17:19,290 Yeah. 221 222 00:17:19,490 --> 00:17:24,820 yeah, "at this point". 222 223 00:17:24,990 --> 00:17:36,450 So this is gonna be our derivative function and as an input it's gonna get the latest value of x. Finally 223 224 00:17:37,190 --> 00:17:40,650 we're going to print out what the what the cost is at this point. 224 225 00:17:40,680 --> 00:17:49,290 So this is the f(x) value or cost at this point is, 225 226 00:17:52,030 --> 00:18:02,710 and this is gonna be our cost function at the point where the cost is lowest. Now, before I run this, 226 227 00:18:02,730 --> 00:18:12,120 make sure you've got a plus sign here because if you ever have to go to "Restart and Run All" or "Run All 227 228 00:18:12,120 --> 00:18:12,890 Above", 228 229 00:18:12,930 --> 00:18:13,480 yeah, 229 230 00:18:13,710 --> 00:18:18,150 then you may want to make sure that this loop doesn't continue going. 230 231 00:18:18,210 --> 00:18:19,820 I just caught myself out there. 231 232 00:18:19,980 --> 00:18:30,800 So I'm going to hit Shift+Enter now here and I get my print statements shooting off the results of our gradient descent. 232 233 00:18:30,960 --> 00:18:32,060 So, what can we learn from this? 233 234 00:18:32,070 --> 00:18:36,890 What can we deduce from the values that we're seeing here? 234 235 00:18:38,330 --> 00:18:43,340 Well, the first thing is that we can see that they're approximations, right? 235 236 00:18:43,340 --> 00:18:45,700 This isn't an exact value. 236 237 00:18:45,710 --> 00:18:47,920 We're not getting a very clean answer here. 237 238 00:18:49,360 --> 00:18:55,930 But that might be the case because maybe we haven't run our loop often enough. 238 239 00:18:55,930 --> 00:19:06,490 So if I increase the value here from say 30 to 50 let's see what happens with our values that we get 239 240 00:19:06,490 --> 00:19:14,570 printed out. So one thing that we're seeing is that our slope is getting a lot closer to zero here than 240 241 00:19:14,570 --> 00:19:15,610 before. 241 242 00:19:15,620 --> 00:19:23,960 The second thing is is that this value here on f(x) is also getting a lot more precise and so is our 242 243 00:19:23,960 --> 00:19:24,730 new value of x. 243 244 00:19:24,740 --> 00:19:31,890 So it's getting much, much closer to a -0.5. 244 245 00:19:31,920 --> 00:19:36,960 Yeah if I run this 500 times, let's see what happens. 245 246 00:19:38,590 --> 00:19:45,060 So, as you can see, we're converging on this local minimum by brute force, right? 246 247 00:19:45,070 --> 00:19:48,790 We didn't solve our cost function here analytically. 247 248 00:19:48,790 --> 00:19:56,010 What we're doing is we're iterating and going down that valley, that cost function 248 249 00:19:56,380 --> 00:20:01,970 until we reach the minimum point and at the minimum our slope is equal to zero, 249 250 00:20:02,050 --> 00:20:05,250 our cost is equal to 0.75. 250 251 00:20:05,470 --> 00:20:11,960 And this is when the x is equal to -0.5. 251 252 00:20:12,010 --> 00:20:15,880 So obviously you can run this thing a thousand times or what have you. 252 253 00:20:15,880 --> 00:20:16,460 Right? 253 254 00:20:16,600 --> 00:20:23,200 But often times you actually know ahead of time how precise a calculation you need, right. from the resource 254 255 00:20:23,200 --> 00:20:25,120 management point of view. 255 256 00:20:25,130 --> 00:20:32,590 What you can actually do is you can tell the loop to stop running once a certain level of precision 256 257 00:20:32,590 --> 00:20:33,820 is met. 257 258 00:20:33,820 --> 00:20:35,920 And I'm sure you're looking up. 258 259 00:20:35,950 --> 00:20:36,180 Yeah. 259 260 00:20:36,190 --> 00:20:41,500 You scrolling up and you looking at this while loop here and you're thinking ah yeah the while loop seems 260 261 00:20:41,500 --> 00:20:42,680 ideal for this, right. 261 262 00:20:42,700 --> 00:20:49,930 We can run the while loop as long as our calculation is within a certain level of precision - and you'd 262 263 00:20:49,930 --> 00:20:50,500 be right. 263 264 00:20:50,500 --> 00:20:57,430 That's exactly something you could implement if you wanted to, with the structure of the while loop. 264 265 00:20:57,430 --> 00:21:02,500 Let me show you how to do this with the for loop as well. 265 266 00:21:02,620 --> 00:21:09,190 We're going to modify our code here a little bit to include a cutoff point for a certain level of precision 266 267 00:21:10,300 --> 00:21:17,480 and the way I'm going to do this is by adding another variable up top and say "precision" 267 268 00:21:17,840 --> 00:21:24,820 is gonna be equal to 0.0001. 268 269 00:21:24,880 --> 00:21:25,160 Yeah. 269 270 00:21:25,190 --> 00:21:30,700 So this is how precise I want my answer to be. 270 271 00:21:30,770 --> 00:21:33,670 Now, where does this come into play? 271 272 00:21:33,680 --> 00:21:41,030 Well, what we're interested in, in with our precision estimate is what's the difference between the new 272 273 00:21:41,030 --> 00:21:42,450 and the old x, right.x 273 274 00:21:42,800 --> 00:21:49,280 So if those two are getting closer and closer and closer together then our calculation is getting much 274 275 00:21:49,280 --> 00:21:51,350 more precise. 275 276 00:21:51,380 --> 00:22:01,130 So what we can do is we can say well the step size is gonna be the difference between our new x minus 276 277 00:22:01,220 --> 00:22:03,110 our previous x. 277 278 00:22:03,140 --> 00:22:05,430 Yeah that's gonna be the step size. 278 279 00:22:05,540 --> 00:22:12,110 And just to make sure that step size is always a positive number, we're going to say well what we care 279 280 00:22:12,110 --> 00:22:22,250 about is actually the absolute value of our step size and, uh, now, change the number of times this loop 280 281 00:22:22,250 --> 00:22:25,250 runs to maybe 10. 281 282 00:22:25,260 --> 00:22:25,650 Yeah. 282 283 00:22:25,730 --> 00:22:32,690 And I'm going to print out the step size, just we can see how it it evolves over time as the as the loop 283 284 00:22:32,690 --> 00:22:34,180 runs. 284 285 00:22:34,200 --> 00:22:34,960 So let me run this. 285 286 00:22:34,970 --> 00:22:36,530 Let me press Shift+Enter here. 286 287 00:22:37,920 --> 00:22:42,870 And we can see here our step size initially starts out with 0.7. 287 288 00:22:42,870 --> 00:22:43,730 And then it decreases. 288 289 00:22:43,740 --> 00:22:44,040 Right? 289 290 00:22:44,040 --> 00:22:48,490 The new x and the old x are getting closer and closer together. 290 291 00:22:48,690 --> 00:22:56,160 So we can see here our step size is decreasing. 291 292 00:22:56,330 --> 00:22:59,570 Commenting out this print statement so it doesn't execute anymore. 292 293 00:23:00,070 --> 00:23:04,280 And I'm going to add the condition for terminating this for loop. 293 294 00:23:04,360 --> 00:23:13,360 Yeah I'm going to say well if the step size is smaller than the precision, 294 295 00:23:13,360 --> 00:23:20,500 so in other words - if the difference between the new x and the previous x is smaller than 295 296 00:23:20,500 --> 00:23:27,280 0.0001, then we can terminate our loop, then we can stop with our calculations. 296 297 00:23:27,630 --> 00:23:33,670 So, I'm going to put a colon there and then the Python keyword for stopping this loop 297 298 00:23:33,810 --> 00:23:35,780 it's called break. 298 299 00:23:35,970 --> 00:23:37,530 We'll leave it at that. 299 300 00:23:38,220 --> 00:23:43,540 And uh I'm going to say, well, run 500 times. 300 301 00:23:43,540 --> 00:23:55,790 Yeah, for loop run from 0 to 500, but if the step size is smaller than our predetermined precision, 301 302 00:23:56,930 --> 00:24:04,050 then stop running. Let's see how often our loop runs according to this logic. 302 303 00:24:04,670 --> 00:24:06,940 Well, we don't know, right? 303 304 00:24:06,970 --> 00:24:20,310 Could have run any number of times. We probably have to print the value of n - so, print("Loop ran this 304 305 00:24:20,790 --> 00:24:21,900 many times:", n) 305 306 00:24:30,230 --> 00:24:32,830 Forgot the s. 306 307 00:24:33,150 --> 00:24:37,890 So, given these constraints, our loop ran 40 times. 307 308 00:24:37,980 --> 00:24:40,140 It's actually not that much. 308 309 00:24:40,170 --> 00:24:41,450 Not that many times. 309 310 00:24:41,640 --> 00:24:48,230 If we add an extra zero here on the precision that we're looking for and press Shift+Enter again, we 310 311 00:24:48,240 --> 00:24:52,600 can see that it ran 50 times, so it actually never gets up to 500. 311 312 00:24:52,600 --> 00:24:53,190 Yeah. 312 313 00:24:53,310 --> 00:25:00,340 Doesn't, doesn't go up all that way and that's because it reaches that terminating condition, 313 314 00:25:00,450 --> 00:25:06,840 this break statement, a lot sooner, but still the way we wrote this code we have two conditions where 314 315 00:25:06,840 --> 00:25:07,880 it can stop. 315 316 00:25:08,100 --> 00:25:16,320 It can either reach 500 and it will stop there or when it reaches the minimum and that step size becomes 316 317 00:25:16,320 --> 00:25:17,800 very, very, very small 317 318 00:25:17,910 --> 00:25:25,200 then it can also terminate. Now running the Python loop and calculating the minimum is very well and 318 319 00:25:25,200 --> 00:25:30,920 good, but I'm a very, very visual person and I'm sure you might be too. 319 320 00:25:30,920 --> 00:25:37,140 So I find graphing things very, very helpful. The way we're gonna go about graphing it is 320 321 00:25:37,140 --> 00:25:46,520 first off we have to kind of keep track of all the values that we've calculated inside of our loop. 321 322 00:25:46,680 --> 00:25:49,080 So we're going to create two lists. 322 323 00:25:49,140 --> 00:25:49,380 Yeah. 323 324 00:25:49,380 --> 00:25:53,890 Two Python lists - one of them is going to hold onto our x values, 324 325 00:25:54,030 --> 00:26:00,720 so it's gonna be a list and it's going to contain the new x values. And the other thing is I'm going 325 326 00:26:00,720 --> 00:26:04,560 to also create a list for all the slopes. 326 327 00:26:04,620 --> 00:26:12,300 So I'm going to call this "slope_list" and it's going to contain whatever value our derivative has at 327 328 00:26:12,630 --> 00:26:14,210 this x position. 328 329 00:26:14,220 --> 00:26:14,460 Yeah. 329 330 00:26:17,260 --> 00:26:21,370 Now, within our loop we're actually doing these calculations anyhow. 330 331 00:26:21,370 --> 00:26:30,440 So all we need to do is we need to append the x values and the slope values to our list. 331 332 00:26:30,490 --> 00:26:38,650 So I'm going to say "x_list.append()" to add a new value to it of the new x value. 332 333 00:26:38,650 --> 00:26:46,700 And this is the x value that we've updated after we've taken our step down the cost function. 333 334 00:26:46,810 --> 00:26:56,590 So I'm going to append this value to our list and also for our slope list we're going to append 334 335 00:27:00,620 --> 00:27:06,110 the output from our derivative function at the new x value. 335 336 00:27:06,110 --> 00:27:06,940 Yeah. 336 337 00:27:07,220 --> 00:27:08,510 And that's it. 337 338 00:27:08,510 --> 00:27:11,040 That gives us the basis for plotting out charts. 338 339 00:27:11,120 --> 00:27:12,040 So let's do that now. 339 340 00:27:12,590 --> 00:27:23,890 I'm gonna go up here and I'm actually going to copy this cell here, I'm going to go "Edit" > "Copy Cell" and I'm going to reuse 340 341 00:27:23,890 --> 00:27:25,730 it down here a lot of this code. 341 342 00:27:25,750 --> 00:27:36,730 So I'm going to place the cell above. I'm going to edit my comment to, uh, say that we're gonna superimpose the 342 343 00:27:38,890 --> 00:27:40,540 gradient descent calculations. 343 344 00:27:40,570 --> 00:27:40,760 Yeah 344 345 00:27:48,910 --> 00:27:57,520 So this is the goal. Now we've got two charts and what we're gonna do is we're gonna add a scatter plot on 345 346 00:27:57,520 --> 00:28:01,090 top of these with the data that we've captured from our loop. 346 347 00:28:01,630 --> 00:28:02,520 Here's how we're gonna do it. 347 348 00:28:08,040 --> 00:28:09,370 For our first chart, 348 349 00:28:09,390 --> 00:28:15,930 we're gonna say "plt.scatter()" and then we have to supply some arguments. 349 350 00:28:16,350 --> 00:28:27,520 So on the x axis, it's gonna be our list of x values and for our y axis we want to feed our list of values 350 351 00:28:28,060 --> 00:28:31,100 into our cost function. 351 352 00:28:31,160 --> 00:28:31,350 Right? 352 353 00:28:31,360 --> 00:28:33,620 So this is our f(x). 353 354 00:28:33,760 --> 00:28:40,870 Now you might think I can actually just put the x_list in here and press Shift+Enter but this isn't 354 355 00:28:40,870 --> 00:28:42,810 going to work. I'm going to get an error. 355 356 00:28:43,240 --> 00:28:51,340 Yeah, and this is because our function, the way that we've written it cannot process a list. 356 357 00:28:51,340 --> 00:28:55,630 It's unable to process this list as it is. 357 358 00:28:55,630 --> 00:29:01,960 So I'm gonna have to do a little type conversion first. So I'm going to create a variable called values 358 359 00:29:02,050 --> 00:29:11,170 and set it equal to a numpy array which is gonna take as an argument our list of x values. 359 360 00:29:11,320 --> 00:29:22,360 So, our function can work with an array but it can't work with a list and when I press Shift+Enter we should 360 361 00:29:22,360 --> 00:29:29,500 see that now we have a little scatter plot on top of our graph. 361 362 00:29:29,500 --> 00:29:33,640 But in terms of data visualization that was very, very poor. 362 363 00:29:33,760 --> 00:29:34,030 Right? 363 364 00:29:34,150 --> 00:29:43,550 So I'm going to say the color of these dots should be red. 364 365 00:29:43,660 --> 00:29:52,000 They should be a lot larger so that the size equal to maybe 100 and give them a little bit of transparency. 365 366 00:29:52,000 --> 00:29:55,870 So I'm going to say the alpha should be equal to maybe 0.6. 366 367 00:29:55,870 --> 00:29:58,690 See how that looks. 367 368 00:29:58,820 --> 00:30:00,190 It's looking a lot better. 368 369 00:30:00,340 --> 00:30:00,510 Yeah. 369 370 00:30:00,520 --> 00:30:10,740 So we can see here - as our algorithm runs going closer and closer to this minimum. But we can also show 370 371 00:30:10,740 --> 00:30:13,940 this on our second chart as well. 371 372 00:30:13,980 --> 00:30:14,250 Right. 372 373 00:30:14,280 --> 00:30:24,270 So we can see how we're inching closer to where the slope is zero on this right hand chart and we can 373 374 00:30:24,270 --> 00:30:29,880 do that by making use of the other list that we've captured. 374 375 00:30:29,880 --> 00:30:39,930 So, in this case it is a little bit simpler because we just have to write "plt.scatter( 375 376 00:30:40,740 --> 00:30:47,670 x_list)", x values still the same, but for the y values we've done a bit of a calculation already, so 376 377 00:30:47,670 --> 00:30:52,800 we can say "slope_list" 377 378 00:30:57,000 --> 00:30:59,540 and let's also make it a red color, 378 379 00:31:02,920 --> 00:31:13,290 make the dots big, size 100, and alpha 0.5 or something. 379 380 00:31:13,290 --> 00:31:17,490 Let's see how it goes. 380 381 00:31:17,930 --> 00:31:26,030 It's looking not bad, but I do wonder if there maybe should be some transparency on the line itself. 381 382 00:31:26,030 --> 00:31:36,450 So if this thing had an alpha of say 0.6, would it look a bit better? Yeah. 382 383 00:31:36,780 --> 00:31:45,960 Yeah this looks this looks better. Let's do the same thing with our plot at the top as well. 383 384 00:31:46,030 --> 00:31:52,110 Let's give this an alpha of maybe 0.6 as well. 384 385 00:31:52,110 --> 00:31:55,520 See or 0.7 perhaps. 385 386 00:31:58,320 --> 00:31:59,420 0.8 386 387 00:31:59,580 --> 00:32:00,050 Let's try. 387 388 00:32:00,870 --> 00:32:01,270 Yeah. 388 389 00:32:01,350 --> 00:32:03,720 This is looking pretty good. 389 390 00:32:03,720 --> 00:32:11,700 So you can see here that now we have our scatter plot superimposed on our derivative function and it 390 391 00:32:11,700 --> 00:32:15,930 stops when the slope is equal to zero. 391 392 00:32:15,930 --> 00:32:21,900 And on the regular cost function we're moving down and down and down and down into the minimum at the 392 393 00:32:21,900 --> 00:32:24,900 bottom of this parabola. 393 394 00:32:24,940 --> 00:32:30,420 You know the cool thing is that we can even zoom in a little bit and we can even do a little close up 394 395 00:32:30,960 --> 00:32:33,300 of our slope. 395 396 00:32:33,300 --> 00:32:34,290 Let me show you what I mean. 396 397 00:32:34,710 --> 00:32:46,920 So if I take this bit of code here, copy it and paste it and say chart number three, and call this 397 398 00:32:46,920 --> 00:32:47,280 "Derivative 398 399 00:32:50,020 --> 00:33:03,190 (Close up)" and then I change the title and say "Gradient Descent (Close up)" might get rid of the y label. 399 400 00:33:03,580 --> 00:33:11,380 don't need that, I'm going to keep the grid but I'm going to change what's on the axes and go from say 400 401 00:33:12,400 --> 00:33:18,550 0.55 to -0.2. 401 402 00:33:18,550 --> 00:33:24,200 So, zooming in here on the x axis and on the y axis I'm going to do the same, 402 403 00:33:24,200 --> 00:33:33,660 I'm going to zoom in from 0.3 to 0.8. I'm still gonna leave it sky blue. 403 404 00:33:33,660 --> 00:33:41,710 Change the linewidth to 6, alpha is 0.8. 404 405 00:33:41,920 --> 00:33:48,550 Change these values around a little bit to make it a bit more distinct, make the dots a little bigger. 405 406 00:33:48,700 --> 00:34:00,670 And if a press Shift+Enter now, then nothing will happen because I need to adjust my subplot. 406 407 00:34:00,680 --> 00:34:00,850 Right. 407 408 00:34:00,860 --> 00:34:02,300 I'm adding a third plot here. 408 409 00:34:02,330 --> 00:34:09,090 So I have to make sure that I have in this case, what, three columns, right, I've got three charts. 409 410 00:34:09,140 --> 00:34:12,390 This is chart number three of the lot. 410 411 00:34:12,440 --> 00:34:18,710 And this is gonna be also edited to chart number two, right? 411 412 00:34:19,750 --> 00:34:26,840 On the three column subplot, and same with this. This chart number one on the three column subplot. 412 413 00:34:27,030 --> 00:34:29,890 And it's now that I can run this. 413 414 00:34:29,890 --> 00:34:32,270 See what happens. 414 415 00:34:32,570 --> 00:34:33,100 Huh. 415 416 00:34:33,160 --> 00:34:34,750 So I'd say this is pretty good right. 416 417 00:34:34,780 --> 00:34:42,190 We've got a close up here where we can actually watch the gradient descent converge upon that zero value. 417 418 00:34:42,190 --> 00:34:47,980 And you can see those steps getting smaller and smaller and smaller and smaller as we're getting closer 418 419 00:34:47,980 --> 00:34:49,050 to our goal. 419 420 00:34:49,060 --> 00:34:51,530 I think this is incredibly cool. 420 421 00:34:51,700 --> 00:34:54,300 The charts look a little bit squished. 421 422 00:34:54,640 --> 00:35:01,150 Maybe what I'll do is I'll change this from 15 to I don't know 20 on the width. 422 423 00:35:01,150 --> 00:35:09,040 See if that helps. Yeah that definitely looks a little better. 423 424 00:35:09,090 --> 00:35:10,350 Okay brilliant. 424 425 00:35:10,380 --> 00:35:12,990 We've done quite a lot of work in this lesson. 425 426 00:35:12,990 --> 00:35:21,360 This has been a long and difficult lesson but writing the code definitely helps us play around with 426 427 00:35:21,360 --> 00:35:22,800 the gradient descent. 427 428 00:35:22,800 --> 00:35:31,050 Yeah, because what we can do now is we can change a couple of these values and see how it behaves differently. 428 429 00:35:31,380 --> 00:35:41,010 So for example if instead of at 3, we start at -3 with our gradient descent. 429 430 00:35:41,010 --> 00:35:41,940 Let's take a look here. 430 431 00:35:41,950 --> 00:35:50,040 If I starting value is -3 and I rerun the loop and rerun all the calculations and rerun the 431 432 00:35:50,040 --> 00:35:56,040 graphs then we can see how the gradient descent comes in from the other side. 432 433 00:35:56,040 --> 00:36:01,250 So in this case it's from the bottom here instead of from the top. 433 434 00:36:01,530 --> 00:36:08,330 This is really, really cool in being able to actually play with the algorithm. 434 435 00:36:08,330 --> 00:36:16,140 And this is the advantage of writing all the code out and actually running it and rerunning it to see 435 436 00:36:16,140 --> 00:36:23,400 how differently it behaves. Because not only can we change the starting point but we can also change 436 437 00:36:23,660 --> 00:36:25,280 say how many steps we're taking, right? 437 438 00:36:25,290 --> 00:36:33,030 So if we rerun our algorithm to only run about 10 times instead of the usual amount then we can see 438 439 00:36:33,030 --> 00:36:36,380 how we're not getting that close to the minimum. 439 440 00:36:36,380 --> 00:36:36,660 Right? 440 441 00:36:36,690 --> 00:36:41,780 So we should be getting about here, but we're actually not reaching it. 441 442 00:36:41,960 --> 00:36:44,360 So yeah, I think this is really really cool. 442 443 00:36:44,450 --> 00:36:50,400 And in the next couple of lessons we're going to be exploring a couple more of the idiosyncrasies and 443 444 00:36:50,400 --> 00:36:53,810 the strengths and weaknesses of this algorithm 444 445 00:36:53,810 --> 00:36:58,720 now that we've written it and graphed it. I'll see you there.