0 1 00:00:01,110 --> 00:00:06,710 Okay, so now it's time to run out gradient descent. I'm going to scroll down a little bit, 1 2 00:00:06,710 --> 00:00:13,730 hit the plus sign, add a few cells and this cell I'm going to convert to Markdown and I'm going to call 2 3 00:00:13,730 --> 00:00:26,540 this cell the subheading "Batch Gradient Descent with SymPy". Hit Shift+Enter. Now with gradient descent, 3 4 00:00:26,690 --> 00:00:34,850 the first thing is we're going to do a bit of setup, we're going to provide the initial values for the parameters and we're 4 5 00:00:34,850 --> 00:00:36,610 going to specify these up top. 5 6 00:00:36,740 --> 00:00:42,310 And these were things like the multiplier and the maximum number of iterations. 6 7 00:00:42,500 --> 00:00:46,880 So I'm going to say multiplier = 0.1. 7 8 00:00:50,380 --> 00:00:56,460 I'm going to say the max_iter = 200. 8 9 00:00:56,750 --> 00:01:00,980 And now I'm gonna supply my initial values. 9 10 00:01:01,250 --> 00:01:09,830 So previously, we've had a single variable and we've just said you know "new_x =" 10 11 00:01:09,830 --> 00:01:12,530 and then there was our guess, 1.8 something. 11 12 00:01:12,710 --> 00:01:14,500 But this time we have two values. 12 13 00:01:14,500 --> 00:01:17,000 So how are we gonna do this this time? 13 14 00:01:17,510 --> 00:01:23,900 If you have two or more values, the best data structure to work with is a numpy array. 14 15 00:01:23,900 --> 00:01:30,390 So I'm going to "params = np.array([])" 15 16 00:01:30,440 --> 00:01:35,060 and then I'm going to supply two values - 16 17 00:01:35,060 --> 00:01:41,390 our initial guess for X which is gonna be 1.8 and comma and then our initial guess for our 17 18 00:01:41,390 --> 00:01:44,550 y which is going to be 1.0. 18 19 00:01:44,570 --> 00:01:50,060 And this is our initial guess. 19 20 00:01:50,490 --> 00:01:55,390 So I'm going to put that in a comment in the same line after the Python code. 20 21 00:01:55,400 --> 00:02:01,130 Now one thing that you'll see when we write our loop is just how convenient this is because the params 21 22 00:02:01,130 --> 00:02:05,750 is going to hold the values that we actually care about optimizing. 22 23 00:02:05,930 --> 00:02:12,910 It has our x and our y values and these need to be changed to get to the lowest cost. 23 24 00:02:13,190 --> 00:02:19,220 Previously, we only had to worry about one variable and we were using previous_x and 24 25 00:02:19,220 --> 00:02:25,040 new_x inside our loop, but this time we're gonna switch it up and write it a little differently 25 26 00:02:25,490 --> 00:02:31,830 because we're working with a collection instead. Our loop is going to have a very, very similar syntax as 26 27 00:02:31,830 --> 00:02:39,960 before, we're gonna say "for n in range" and then provide the maximum number of iterations, 27 28 00:02:40,200 --> 00:02:43,890 colon do something. 28 29 00:02:43,890 --> 00:02:44,660 Step 1 29 30 00:02:44,710 --> 00:02:56,920 is gonna be working out the slope. Now, if you remember, the slope that we had up here was "diff(f(a,b), 30 31 00:02:56,920 --> 00:03:04,690 a).evalf", and then substituting a particular value. 31 32 00:03:04,720 --> 00:03:08,140 So this is actually the format we're gonna be using. 32 33 00:03:08,170 --> 00:03:11,160 So what's our gradient gonna be in the x direction? 33 34 00:03:11,320 --> 00:03:19,870 I'm going to create a variable called gradient_x and set it equal to, you guessed it, it's 34 35 00:03:20,470 --> 00:03:23,410 "diff(f(a,b), 35 36 00:03:26,130 --> 00:03:28,050 a)". 36 37 00:03:28,110 --> 00:03:37,470 This is gonna be our cost function differentiated with respect to x. ".evalf", open parentheses and 37 38 00:03:37,470 --> 00:03:42,450 then we're gonna substitute some value. I'm going to say "subs = " and then we gonna provide that 38 39 00:03:42,450 --> 00:03:43,100 dictionary. 39 40 00:03:43,170 --> 00:03:50,370 So curly braces and we're gonna say a is gonna be equal to what, it's gonna be equal to the first element 40 41 00:03:50,490 --> 00:03:51,660 in our parameter array, 41 42 00:03:51,690 --> 00:04:06,120 It's gonna be "params[]", first element is at 0, and then for our b, we're gonna say "b: 42 43 00:04:06,960 --> 00:04:12,200 params[1]". 43 44 00:04:12,210 --> 00:04:18,000 So there's a lot going on in this line here, and this is why we've practiced in the cell above a little 44 45 00:04:18,000 --> 00:04:24,540 bit to make sure we can get our head around all of this nesting of the code, but we're differentiating 45 46 00:04:24,690 --> 00:04:33,780 our cost function with respect to one of the parameters, with respect to a and we're evaluating the slope 46 47 00:04:33,880 --> 00:04:40,980 where we're asking Python to figure out what the value is of the slope at a particular point. 47 48 00:04:41,760 --> 00:04:45,780 Which point is that? It's going to be the values in our parameter array. 48 49 00:04:45,780 --> 00:04:52,830 So params[0] and params[1], the very very first time this loop runs those are gonna be equal to our 49 50 00:04:52,890 --> 00:04:55,370 initial guess. 50 51 00:04:55,500 --> 00:05:05,940 So as a challenge, can you create a variable called "gradient_y" and then complete the rest 51 52 00:05:05,940 --> 00:05:07,140 of the code? 52 53 00:05:07,350 --> 00:05:13,530 Can you write the Python code that calculates the value of our slope in the y direction? 53 54 00:05:13,590 --> 00:05:17,100 I'll give you a few seconds to pause the video and have a go at this. 54 55 00:05:19,870 --> 00:05:27,510 And here's the solution. Again we're using "diff(f(a,b))" 55 56 00:05:28,300 --> 00:05:34,120 and this time we're not gonna be differentiating our cost function with respect to a, we're gonna be 56 57 00:05:34,120 --> 00:05:42,670 differentiating it with respect to b because b is our second parameter and I'm going to put a dot after 57 58 00:05:42,670 --> 00:05:43,300 it, 58 59 00:05:43,300 --> 00:05:54,600 I'm going to "evalf" and I'm going to a substitute the same dictionary, I'm gonna say "a: params 59 60 00:05:54,820 --> 00:06:07,350 [0], b: params[1]". I made a typo in params, 60 61 00:06:07,370 --> 00:06:15,070 so I'm gonna fix that. And this is my finished Python code for our gradients. 61 62 00:06:15,230 --> 00:06:20,790 Now I've split this up a little bit to make it a bit easier to read, but I said earlier what we actually 62 63 00:06:20,790 --> 00:06:25,290 want to do is we want to work with arrays primarily because if you're working with more than one value 63 64 00:06:25,620 --> 00:06:28,510 it's handy to work with numpy arrays. 64 65 00:06:28,590 --> 00:06:33,220 So let's combine these two values into a single numpy array. 65 66 00:06:33,230 --> 00:06:49,640 So I'm going to say "gradients = np.array([gradient_x, 66 67 00:06:50,660 --> 00:06:59,360 gradient_y])". 67 68 00:06:59,530 --> 00:07:03,240 So in these three lines of code we have calculated our cost. 68 69 00:07:03,280 --> 00:07:12,370 We've calculated how far away we are from the minimum based on the steepness of the slope. 69 70 00:07:12,370 --> 00:07:16,270 So the next step is updating our parameters. 70 71 00:07:16,270 --> 00:07:22,090 This is the learning step, the most important part of our loop, right? 71 72 00:07:22,300 --> 00:07:31,690 And because we've written our Python code and used numpy arrays, we can actually work with this in a 72 73 00:07:31,690 --> 00:07:35,030 very, very nice and easy to read manner. 73 74 00:07:35,170 --> 00:07:43,750 We can say our new value of our parameter array, so "params =", it should be equal to our old 74 75 00:07:43,750 --> 00:07:44,710 values, 75 76 00:07:44,710 --> 00:07:56,110 so "params - multiplier * gradients". 76 77 00:07:56,110 --> 00:08:00,700 So we're multiplying our gradients array with the learning rate. 77 78 00:08:00,760 --> 00:08:04,370 So the learning rate times all the slopes 78 79 00:08:04,500 --> 00:08:11,400 is gonna be subtracted from the current values of our parameters. 79 80 00:08:11,480 --> 00:08:18,280 So this is updating them and then we're storing those values inside the parameter array. 80 81 00:08:18,310 --> 00:08:25,740 So we're replacing the values inside the parameters, the old ones with the new ones. So these are the 81 82 00:08:25,740 --> 00:08:30,330 old ones on the right hand side of the equation, 82 83 00:08:30,360 --> 00:08:36,570 we're doing some calculation with them and then we're overwriting those values in our parameter array 83 84 00:08:36,870 --> 00:08:38,190 by updating them. 84 85 00:08:38,190 --> 00:08:44,880 This is what this line of code does. And that's our for loop done, just for lines of code. 85 86 00:08:44,880 --> 00:08:48,140 Now it's time to maybe print out the results, so 86 87 00:08:48,170 --> 00:08:56,940 I'm going to say "# Results", and then we're gonna print out maybe the values in our gradient 87 88 00:08:56,940 --> 00:08:59,760 array, so we can see the values of our slopes. 88 89 00:08:59,800 --> 00:09:10,170 So I'm going to say "'Values in a gradient array', gradients". 89 90 00:09:10,170 --> 00:09:17,230 I'm going to print out the final value of our x, so after the optimization has run, 90 91 00:09:17,250 --> 00:09:25,740 so I'm going to say "Minimum occurs at x value of:" 91 92 00:09:26,020 --> 00:09:30,220 and then where is the final optimized value gonna be stored? 92 93 00:09:30,860 --> 00:09:37,310 Well it's gonna be stored in our parameter array at value 0, first value in our parameter 93 94 00:09:37,310 --> 00:09:38,560 array. 94 95 00:09:38,770 --> 00:09:48,460 And I'm also going to print out the "Minimum occurs at y value of ", and this is gonna be the second value 95 96 00:09:48,520 --> 00:09:49,720 in our parameter array. 96 97 00:09:49,750 --> 00:09:59,560 It's gonna be at index 1. And finally I'm going to print out what the cost is at these x and y values. 97 98 00:09:59,580 --> 00:10:09,930 So I'm going to say "The cost is: " as a string then a comma and then how do I calculate the cost at the final 98 99 00:10:09,930 --> 00:10:11,780 values of our X and Y, 99 100 00:10:11,790 --> 00:10:18,900 the optimized values? I'm going to have to substitute them into our cost function which is f, and then the 100 101 00:10:18,900 --> 00:10:30,160 first parameter is gonna be "params[0]" and then a comma and then "params[ 101 102 00:10:30,160 --> 00:10:33,950 1]". And that's it. 102 103 00:10:33,970 --> 00:10:41,450 Now all I have to do is a check if I've made any typos and then run this cell. 103 104 00:10:41,630 --> 00:10:44,520 See what we get. 104 105 00:10:44,550 --> 00:10:45,920 Here we go. 105 106 00:10:46,020 --> 00:10:54,860 So we can see here that the minimum cost is around 0.5 and that our x and y values are actually 106 107 00:10:54,870 --> 00:10:55,900 very, very small. 107 108 00:10:56,470 --> 00:11:03,720 If we look at our chart up here then we can see that this minimum is probably going to occur where X 108 109 00:11:03,930 --> 00:11:07,290 and Y are equal to zero. 109 110 00:11:07,380 --> 00:11:11,820 And on the z axis we know that this is probably going to be around 1/2. 110 111 00:11:12,030 --> 00:11:16,290 Scrolling back down to our cell here, let's see if this is true. 111 112 00:11:16,320 --> 00:11:22,110 Let's see what happens if I run this cell instead of 200 times maybe 500 times and 112 113 00:11:22,230 --> 00:11:31,770 hit Shift+Enter. I've got wait a little longer this code takes some time to execute and I can see here indeed, 113 114 00:11:32,580 --> 00:11:41,040 our minimum cost is definitely 0.5 and and our x and y values have gotten very, very, 114 115 00:11:41,100 --> 00:11:44,540 very small. So that's really, really cool right? 115 116 00:11:44,550 --> 00:11:52,410 We've been able to keep our mathematics that we had to do by hand to a minimum, having a conceptual understanding 116 117 00:11:52,410 --> 00:12:01,170 of what's going on got us very, very far; and we also saw the power of working with arrays and being able 117 118 00:12:01,170 --> 00:12:09,630 to update a whole bunch of values at once. And, in addition, we learned a little bit about a new data structure, 118 119 00:12:10,020 --> 00:12:14,810 namely dictionaries which associate values with a key. 119 120 00:12:14,970 --> 00:12:20,760 So you have a key value pair. And on that note I think I'm gonna get some more coffee or something. 120 121 00:12:20,940 --> 00:12:22,830 I'll see you in the next lesson. 121 122 00:12:22,890 --> 00:12:23,370 Take care.