0 1 00:00:00,650 --> 00:00:05,900 In this lesson we're going to play around with our gradient descent algorithm a little bit and we're 1 2 00:00:05,900 --> 00:00:13,640 gonna see how it is affected by our initial guess, by our starting value. Also so we're gonna be building 2 3 00:00:13,760 --> 00:00:22,490 on our Python programming skills by covering some of the more advanced features of functions in Python. 3 4 00:00:22,490 --> 00:00:28,310 The first thing we're gonna do is we're going to generate a new cost function and kind of go into a 4 5 00:00:28,310 --> 00:00:29,960 second example. 5 6 00:00:29,960 --> 00:00:37,850 So I'm going to change the cell at the bottom here again to Markdown so that we have a nice clean section 6 7 00:00:37,850 --> 00:00:43,590 heading and we're going to call it "Example 2 7 8 00:00:46,020 --> 00:00:57,480 - Multiple Minima vs. Initial Guess & Advanced Functions". 8 9 00:00:57,780 --> 00:01:02,010 The cost function that we're gonna be working with is going to look like this. 9 10 00:01:02,040 --> 00:01:17,800 It's gonna be two dollar signs g(x) = x^4 - 4x^2 + 10 11 00:01:18,060 --> 00:01:19,030 5, 11 12 00:01:19,270 --> 00:01:22,720 And then two dollar signs at the end. 12 13 00:01:22,720 --> 00:01:30,040 So again we're gonna be using LaTeX markdown to write our function in mathematical notation. 13 14 00:01:30,220 --> 00:01:37,660 So as we've talked about before remember that LaTeX uses tags to mark text for special formatting and 14 15 00:01:37,660 --> 00:01:38,650 there's two tags here. 15 16 00:01:38,650 --> 00:01:38,880 Right. 16 17 00:01:38,890 --> 00:01:46,720 There's an opening tag of two dollar signs and there is a closing tag of two dollar signs. 17 18 00:01:46,740 --> 00:01:49,030 Now let's get stuck into the Python code. 18 19 00:01:49,080 --> 00:01:57,000 So the first thing we're gonna do is we're going to make some data and we're going to to create a variable called 19 20 00:01:57,060 --> 00:02:01,080 x_2 since this is gonna be our second example. 20 21 00:02:01,080 --> 00:02:07,500 And again we're gonna be using numpy's linspace to generate our values. 21 22 00:02:07,510 --> 00:02:14,400 So I'm gonna have value starting at -2, going to 2 and I'm going to space them out with about 1000 22 23 00:02:14,400 --> 00:02:16,350 values. 23 24 00:02:16,350 --> 00:02:27,060 Now, as a challenge can you write the g(x) function and the derivative - the dg(x) function in Python? 24 25 00:02:27,060 --> 00:02:29,940 So remember you gonna be writing two functions with the 25 26 00:02:29,960 --> 00:02:36,040 def keyword and applying that power rule that we covered in the previous lesson. 26 27 00:02:36,180 --> 00:02:39,660 I'll give you a few seconds to pause the video and write these two functions 27 28 00:02:42,860 --> 00:02:45,960 Ready? Here's the solution. 28 29 00:02:45,960 --> 00:03:04,270 It s "def" key word, "g(x)", colon, "return x**4 - 4*x**2+ 29 30 00:03:04,270 --> 00:03:06,320 +5" 30 31 00:03:06,340 --> 00:03:11,890 So that's our first function, but the derivative of this function applying the power rule is going to be 31 32 00:03:11,920 --> 00:03:31,480 be "def dg(x):" and then it's "return 4*x**3 - 8*x" - so four 32 33 00:03:31,570 --> 00:03:36,640 gets multiplied by two, becomes eight and the constant drops out. 33 34 00:03:37,430 --> 00:03:38,370 And that's it. 34 35 00:03:38,460 --> 00:03:45,260 I'm going to hit Shift+Enter and if you get this error that I've gotten you've got to scroll all the way up to 35 36 00:03:45,350 --> 00:03:52,910 where you're importing numpy and just hit Shift+Enter on the cell again because np was not recognized 36 37 00:03:52,910 --> 00:04:00,210 by Python because I haven't come back to this notebook in a while. Now I can click back into the cell 37 38 00:04:00,420 --> 00:04:03,630 and hit Shift+Enter and it will run just fine. 38 39 00:04:03,630 --> 00:04:06,990 Now it's time to plot the cost function for this example. 39 40 00:04:07,050 --> 00:04:13,920 I'm going to scroll back up for a second here and I'm going to grab this cell here and I'm going to 40 41 00:04:13,920 --> 00:04:20,610 copy it because we're gonna be reusing some of this code. So I'm going to copy this cell and I'm going to come here 41 42 00:04:21,000 --> 00:04:27,240 and go to "Edit" > "Paste Cell Above", but I'm going to have to make a couple of changes here. 42 43 00:04:27,240 --> 00:04:35,220 So first off I'm going to change the x axis to go from -2 to 2 and the y axis to go from 43 44 00:04:36,240 --> 00:04:45,630 0.5 to 5.5 and then I'm going to change the label here on the y axis as well to g(x) 44 45 00:04:46,870 --> 00:04:53,950 and, of course, on my plot I'm going to take x_2 and g(x_2) and 45 46 00:04:53,950 --> 00:05:01,800 similarly for my derivative the y label is gonna be dg(x) and for my x axis I'm going to go from 46 47 00:05:01,800 --> 00:05:12,560 -2 to 2 as well and for my y axis I'm going to go from -6 to 8. And when it comes to plotting, 47 48 00:05:13,620 --> 00:05:21,280 I'm going to plot x_2 and dg(x_2). I'm going to hit Shift+Enter, 48 49 00:05:21,370 --> 00:05:26,070 see what I get. Voila! So these two plots 49 50 00:05:26,070 --> 00:05:30,370 help us visualize our second example's cost function. 50 51 00:05:30,460 --> 00:05:32,410 So what can we observe here? 51 52 00:05:32,440 --> 00:05:33,880 What can we see? 52 53 00:05:33,880 --> 00:05:37,720 Well there's a couple of things that are of note right. 53 54 00:05:37,720 --> 00:05:42,580 So if we look at the chart on the left we can see that there are two minima. 54 55 00:05:43,000 --> 00:05:46,620 There are two places where the cost is very low. 55 56 00:05:46,870 --> 00:05:56,540 One here and the other one here. Also looking at the right hand chart we see that the derivative intersects 56 57 00:05:56,690 --> 00:06:01,120 the x axis on three points, right? 57 58 00:06:01,190 --> 00:06:12,470 One here one here at x equals zero and one here, so there are three points when the slope is equal to 58 59 00:06:12,470 --> 00:06:13,760 zero. 59 60 00:06:13,760 --> 00:06:23,260 And those three points correspond to this minimum, this minimum, but also this maximum here. 60 61 00:06:23,390 --> 00:06:30,150 Now it's gonna be very interesting how this affects our gradient descent algorithm but before we start 61 62 00:06:30,150 --> 00:06:36,420 playing around with the starting values, we're going to make some modifications to our Python code because 62 63 00:06:36,420 --> 00:06:43,680 this is a really, really good time to talk about some of the advanced features of functions in Python 63 64 00:06:43,680 --> 00:06:45,180 programming. 64 65 00:06:45,270 --> 00:06:50,610 Now, we've written our code already for the gradient descent algorithm but we're gonna do is we're going to 65 66 00:06:50,610 --> 00:06:51,240 scroll up 66 67 00:06:56,840 --> 00:07:00,350 and we're gonna take this cell and copy it. 67 68 00:07:03,700 --> 00:07:12,500 And then we're gonna insert that cell above. Now we're in a good position to start modifying this little 68 69 00:07:12,500 --> 00:07:13,750 bit of code. 69 70 00:07:13,880 --> 00:07:19,280 I really can't wait to show you some more advanced Python programming features when it comes to functions, 70 71 00:07:19,850 --> 00:07:25,170 because Python functions are actually incredibly powerful and versatile things. 71 72 00:07:25,340 --> 00:07:33,500 And in this lesson we're going to cover how to pass a function as an argument, how to make an argument 72 73 00:07:33,830 --> 00:07:42,650 optional by specifying a default value and also how to have a function return multiple values. 73 74 00:07:42,660 --> 00:07:49,390 I'm going to show you all these things by turning our gradient descent algorithm into our very own function. 74 75 00:07:49,640 --> 00:07:54,650 In fact let's add some markdown to show this in our Python notebook. 75 76 00:07:54,950 --> 00:08:00,320 So I'm going to to move this cell, this one here at the bottom, I'm going to press that up arrow here, I'm going to move 76 77 00:08:00,320 --> 00:08:08,850 this cell up and then I'm going to convert this cell to markdown and then I want to use two hashtags 77 78 00:08:09,120 --> 00:08:19,100 and just put down "Gradient Descent as a Python Function". 78 79 00:08:19,110 --> 00:08:20,690 There we go. 79 80 00:08:20,770 --> 00:08:22,720 OK let's get started. 80 81 00:08:25,220 --> 00:08:29,000 The first thing we're gonna do is write our function header as always. 81 82 00:08:29,000 --> 00:08:36,630 So it's gonna be "def", and then we're going to give our function a name, let's call it "gradient_descent". 82 83 00:08:36,630 --> 00:08:37,340 Yeah. 83 84 00:08:37,340 --> 00:08:41,330 Followed by two parentheses and a colon. 84 85 00:08:41,330 --> 00:08:49,810 Now our gradient descent function is gonna be taking four arguments - the derivative function itself, a 85 86 00:08:49,840 --> 00:08:56,740 value for an initial guess, the learning rate and the precision. 86 87 00:08:56,920 --> 00:09:00,330 So let's put these in as parameters between these two parentheses. 87 88 00:09:00,400 --> 00:09:08,950 So the first we said was the derivative function, I'm going to call this "derivative_func", then 88 89 00:09:08,950 --> 00:09:09,910 the initial guess, 89 90 00:09:13,850 --> 00:09:26,350 comma, then a multiplier or learning rate and then the precision. Now if you're looking at this then you're 90 91 00:09:26,350 --> 00:09:29,480 going to be maybe wondering about my intentions here. 91 92 00:09:29,560 --> 00:09:35,910 What do I mean by putting this derivative function as a parameter here? 92 93 00:09:36,130 --> 00:09:42,770 See, the thing about Python is that a function is actually a full blown object. 93 94 00:09:42,780 --> 00:09:43,070 Yeah. 94 95 00:09:43,540 --> 00:09:51,010 So functions are stored in a piece of the computer's memory all on their own. 95 96 00:09:51,010 --> 00:09:53,310 Just like other objects are. 96 97 00:09:53,530 --> 00:09:59,890 And this means that you can stick a function in a variable and pass functions around our program. 97 98 00:10:01,030 --> 00:10:07,660 So in our gradient descent function this derivative_func parameter will be our place holder 98 99 00:10:07,930 --> 00:10:12,610 for the actual derivative function and you'll see this when we call this function. 99 100 00:10:13,150 --> 00:10:21,490 So now let's fill in our function body. In order to make all of these lines part of our function body, 100 101 00:10:21,490 --> 00:10:30,250 we have to indent them because the indentation is how Python knows that these lines should be part of 101 102 00:10:30,250 --> 00:10:31,650 our function. 102 103 00:10:31,750 --> 00:10:38,560 Now to indent a whole group of lines what you can do is you can select them and then use a keyboard 103 104 00:10:38,560 --> 00:10:39,490 shortcut. 104 105 00:10:39,490 --> 00:10:49,540 So on Windows you can press Control and then and then this key here to indent the whole group. 105 106 00:10:49,540 --> 00:10:57,850 And then on Mac instead of Control you simply press Command and the square bracket key. 106 107 00:10:57,850 --> 00:11:05,820 Let me show you. So I'm going to select all of these up to the break statement here and then I'm going to a press 107 108 00:11:06,720 --> 00:11:11,580 control and then the square bracket key and they all move over by 1. 108 109 00:11:11,580 --> 00:11:15,700 So now they're all part of our function body. 109 110 00:11:15,870 --> 00:11:16,500 That's pretty neat. 110 111 00:11:16,500 --> 00:11:28,320 Right? Now I'm going to modify these lines, so our new_x will be equal to to the initial guess that we're 111 112 00:11:28,590 --> 00:11:29,640 making here. 112 113 00:11:29,640 --> 00:11:29,910 Yeah. 113 114 00:11:30,630 --> 00:11:32,940 So the initial guess is our place holder. 114 115 00:11:32,940 --> 00:11:38,340 And when our function gets called we're gonna supply an initial guess and we're going to set new_x equal 115 116 00:11:38,340 --> 00:11:40,600 to that initial guess. 116 117 00:11:40,830 --> 00:11:43,170 I don't need this line anymore. 117 118 00:11:43,170 --> 00:11:51,240 And I also don't need this line anymore, and I don't need this line anymore as well because these variables 118 119 00:11:51,510 --> 00:11:58,680 will get their value when the function is being called. And I'm going to keep these around but I'm going to 119 120 00:11:58,680 --> 00:12:05,470 have to make a modification here - our derivative function here isn't gonna be called df anymore, right? 120 121 00:12:05,520 --> 00:12:07,230 It's gonna have the name of our place holder. 121 122 00:12:07,230 --> 00:12:07,500 Right. 122 123 00:12:07,500 --> 00:12:15,120 It's gonna be called "derivative_func" and the same, 123 124 00:12:15,120 --> 00:12:17,340 yeah, is gonna be the case down here. 124 125 00:12:17,340 --> 00:12:20,660 This is the other occurrence where we're referring to our derivative function. 125 126 00:12:20,730 --> 00:12:27,990 So this is gonna be also called derivative_func. I'm going to delete this comment here, 126 127 00:12:28,020 --> 00:12:29,090 don't need this anymore. 127 128 00:12:31,150 --> 00:12:36,400 And then when it comes to graphing there's another reference here to our previous example. 128 129 00:12:36,470 --> 00:12:43,740 But now I'm going to replace this again with derivative_func. 129 130 00:12:43,990 --> 00:12:46,690 I'm also going to take away this print statement. 130 131 00:12:46,730 --> 00:12:48,690 Don't need this anymore as well. 131 132 00:12:48,940 --> 00:12:51,500 And then I'm also gonna delete these print statements here. 132 133 00:12:51,520 --> 00:12:53,250 We don't need these anymore as well. 133 134 00:12:55,500 --> 00:12:58,940 But there's one additional thing that I do want to add. 134 135 00:12:59,850 --> 00:13:05,730 I'm just gonna delete a couple of these lines here to tidy it up a little bit better so we can actually 135 136 00:13:05,730 --> 00:13:12,850 tell what's going on and now I can add that last thing to our gradient descent function that I wanted 136 137 00:13:12,850 --> 00:13:13,790 to add. 137 138 00:13:14,240 --> 00:13:17,830 I'm going to add my return statement. 138 139 00:13:17,830 --> 00:13:23,800 So this is the keyword return followed by whatever the function spits out. 139 140 00:13:23,830 --> 00:13:26,020 Now, what do we want this function to return? 140 141 00:13:26,020 --> 00:13:32,410 What are the important things that we want out of our gradient descent function? 141 142 00:13:32,410 --> 00:13:38,020 We want three separate things, three separate values that we're interested in. 142 143 00:13:38,140 --> 00:13:47,200 So we want the new_x value, we want the list of x values and we want our list of slopes that we calculated 143 144 00:13:47,440 --> 00:13:54,590 because our list of x values and our list of slope values is what we're going to be using for graphing, 144 145 00:13:54,790 --> 00:13:59,140 and obviously our minimum is going to be that new x value that we spit out. 145 146 00:13:59,170 --> 00:14:08,260 So one of the easiest ways of having a function return more than one value is simply by having our return 146 147 00:14:08,260 --> 00:14:10,720 values separated by a comma. 147 148 00:14:10,720 --> 00:14:22,510 So new_x, x_list, slope_list will return three values. So let's press Shift+Enter 148 149 00:14:22,510 --> 00:14:24,520 now and see if we get any errors. 149 150 00:14:26,930 --> 00:14:28,870 Okay, so far so good. 150 151 00:14:28,870 --> 00:14:31,680 Now it's time to call this function. 151 152 00:14:31,690 --> 00:14:34,730 This is where the rubber meets the road as they say right. 152 153 00:14:34,810 --> 00:14:42,730 Now, since we have multiple return values we can store these return values in three separate variables. 153 154 00:14:42,790 --> 00:14:53,160 So I'm going to call these variables "local_min, list_x, deriv_ 154 155 00:14:53,160 --> 00:14:54,650 list". 155 156 00:14:54,650 --> 00:14:55,070 Yeah. 156 157 00:14:55,590 --> 00:15:03,540 So my function is gonna be returning three values and these are gonna be stored in this order in three 157 158 00:15:03,750 --> 00:15:04,890 variables. 158 159 00:15:04,890 --> 00:15:10,820 Okay, so time to call this function gradient_descent, 159 160 00:15:11,040 --> 00:15:12,790 open parentheses, 160 161 00:15:12,840 --> 00:15:16,980 now it's time to supply those four arguments. 161 162 00:15:16,980 --> 00:15:19,200 So the question is what are those gonna be? 162 163 00:15:19,200 --> 00:15:21,920 So let's supply these arguments by their position. 163 164 00:15:21,960 --> 00:15:29,550 The first thing that will pass into our function is going to be, well, another function, it's gonna be 164 165 00:15:29,550 --> 00:15:35,860 another function, we're going to pass in our derivative function which is kind of crazy, right? 165 166 00:15:35,880 --> 00:15:40,110 Like we're passing in a function to another function like it's no big deal. 166 167 00:15:40,440 --> 00:15:47,490 So I already mentioned that functions are just objects in Python, just like pretty much anything else. 167 168 00:15:47,490 --> 00:15:55,440 So our derivative function is an object and it's got the name dg. That's it. 168 169 00:15:55,440 --> 00:16:02,160 So if we want to get technical what's in fact happening here is that we're giving our gradient descent 169 170 00:16:02,160 --> 00:16:09,600 function a pointer to our dg function that we've defined in the cell above. 170 171 00:16:09,600 --> 00:16:15,000 Now if you've got a programming background then you might be interested to know that what we're doing 171 172 00:16:15,000 --> 00:16:23,360 here is we're not copying our dg object, we're simply pointing to our dg object. Now let's supply other 172 173 00:16:23,400 --> 00:16:31,680 three variables, let's have our starting position be 0.5, let's have our learning rate be 173 174 00:16:31,830 --> 00:16:40,710 0.02 and let's have our precision to be 0.001 and let's add some print 174 175 00:16:40,710 --> 00:16:42,150 statements for good measure. 175 176 00:16:42,190 --> 00:16:54,320 Yeah, "print('Local min occurs at: ', local_min)". Let's print out that first value and let's also 176 177 00:16:54,320 --> 00:17:00,470 print out the number of steps, so the number of steps is gonna be, 177 178 00:17:01,880 --> 00:17:12,020 well it's gonna be the length of our list, so I can have "len", the length function of our list, 178 179 00:17:12,080 --> 00:17:20,060 list_x, so this will include the initial guess plus the number of times that the loop ran - that's 179 180 00:17:20,060 --> 00:17:27,500 gonna be the number of values that are gonna be stored in this list right here. Now as a challenge, 180 181 00:17:27,920 --> 00:17:33,770 can you figure out what the problem is if I tried to run this as it is? And also what I would have to 181 182 00:17:33,770 --> 00:17:41,180 fix in order for our function to run properly? Because there's something that I've missed in our gradient 182 183 00:17:41,180 --> 00:17:50,610 descent function that I have not taken into account yet. And here's the solution - so this variable here 183 184 00:17:50,850 --> 00:18:01,860 step_multiplier exists locally within our function, so it only exists within the function itself but 184 185 00:18:01,860 --> 00:18:05,540 the problem is is that it hasn't been defined anywhere. 185 186 00:18:05,670 --> 00:18:13,770 This means that Python actually does not know what this name refers to - in short, we've got to be consistent 186 187 00:18:14,010 --> 00:18:15,700 with our naming here. 187 188 00:18:15,720 --> 00:18:23,370 So multiplier is the name of the parameter which we actually want to use, we cannot use the name 188 189 00:18:23,370 --> 00:18:27,550 step_multiplier which we've defined earlier and that's the fix. 189 190 00:18:27,570 --> 00:18:31,290 So let me press Shift+Enter to rerun the cell. 190 191 00:18:31,310 --> 00:18:39,950 Now I can press Shift+Enter again to run the cell below, and here's our answer - our local minimum occurs 191 192 00:18:40,310 --> 00:18:45,750 at 1.4 and the number of steps it took to get there was 23. 192 193 00:18:46,190 --> 00:18:50,810 So our function is working. Now looking at how we're calling this function here, 193 194 00:18:50,990 --> 00:19:01,610 gradient_descent(dg, 0.5, 0.02, 0.001) is not 194 195 00:19:01,610 --> 00:19:02,500 very readable. 195 196 00:19:02,510 --> 00:19:09,780 Yeah this is something I really, really dislike when writing code because these numbers just appear like 196 197 00:19:09,780 --> 00:19:10,880 magic numbers. 197 198 00:19:10,950 --> 00:19:13,520 Looking at this we don't know what they mean. 198 199 00:19:13,590 --> 00:19:14,330 Right. 199 200 00:19:14,340 --> 00:19:18,220 We'd have to actually know what the function definition is. 200 201 00:19:18,270 --> 00:19:25,120 It's much nicer to add keywords to these arguments and we can do that by modifying our function call. 201 202 00:19:25,200 --> 00:19:27,690 So I'm going to take this code here, 202 203 00:19:27,750 --> 00:19:33,860 Copy it, paste it down here for reference and then I can fill in the names of these arguments. 203 204 00:19:33,870 --> 00:19:44,950 So this was our derivative_func, it's going to be equal to dg; our initial_guess is gonna 204 205 00:19:44,960 --> 00:19:46,070 be equal to 205 206 00:19:46,280 --> 00:19:52,830 I'm going to say, uh, 0.5 and then I can, uh, I can actually hit enter here and go to a new 206 207 00:19:52,830 --> 00:19:53,680 line, 207 208 00:19:53,760 --> 00:19:59,790 this won't affect the function call at all, but at least that's gonna make our code a lot more readable. 208 209 00:19:59,820 --> 00:20:01,140 So this was our multiplier, 209 210 00:20:03,670 --> 00:20:12,160 and this was our precision. let's see what happens when our initial guess has a starting value of 210 211 00:20:12,160 --> 00:20:13,690 -0.5. 211 212 00:20:13,690 --> 00:20:17,340 Let's run it now. So we can already see a difference here. 212 213 00:20:17,370 --> 00:20:24,730 So the first time the local minimum occurs at 1.4 and the second time round the local minimum 213 214 00:20:24,820 --> 00:20:28,150 occurs at -1.4. 214 215 00:20:28,360 --> 00:20:33,850 But, uh, before we investigate this let's talk a little bit more about arguments. 215 216 00:20:33,940 --> 00:20:38,130 We already know that arguments are how a function gets its inputs. 216 217 00:20:38,200 --> 00:20:44,630 Arguments are how objects are sent to a function as an input. 217 218 00:20:44,860 --> 00:20:52,180 And I promise to show you how we can give our arguments a default value and also make some of these 218 219 00:20:52,180 --> 00:20:59,560 arguments optional. And to do that we're gonna have to modify that header in our gradient descent function, 219 220 00:21:00,490 --> 00:21:07,990 because this is where we can specify our default values. So let's specify a default value for this multiplier 220 221 00:21:07,990 --> 00:21:08,640 here. 221 222 00:21:09,010 --> 00:21:17,630 We can do that by setting it equal to 0.02, and to specify a default value for the precision, 222 223 00:21:17,710 --> 00:21:24,020 we also can just simply set it equal to 0.001. 223 224 00:21:24,240 --> 00:21:30,990 And now our gradient descent function has two required arguments - 224 225 00:21:30,990 --> 00:21:39,270 the derivative function and the initial guess, and two optional arguments - they're optional because they 225 226 00:21:39,270 --> 00:21:41,350 have default values. 226 227 00:21:41,670 --> 00:21:46,930 So let's call this function again, this time we're only gonna specify the required arguments. 227 228 00:21:46,990 --> 00:21:52,290 Now copy this code here, paste it here and I'm going to delete 228 229 00:21:55,020 --> 00:22:04,410 this part of my function call, so I'm only going to specify the derivative function and the initial guess. 229 230 00:22:04,480 --> 00:22:12,880 Now I'm going to change this guess to -0.1 and let's see where we end up. 230 231 00:22:13,230 --> 00:22:14,340 If you're getting this error. 231 232 00:22:14,640 --> 00:22:15,180 Yeah. 232 233 00:22:15,450 --> 00:22:20,110 If you're getting "gradient_descent() is missing 2 required positional arguments" 233 234 00:22:20,430 --> 00:22:25,710 despite having added this code, it's because you haven't rerun the cell. 234 235 00:22:25,710 --> 00:22:33,060 So remember to press Shift+Enter on the cell to rerun this code and make sure that our Jupyter notebook 235 236 00:22:33,120 --> 00:22:43,200 is updated and then come down here and run this one and you should see that we end up at the same minimum 236 237 00:22:43,590 --> 00:22:50,490 as before but this time it takes us 34 steps instead of 23. 237 238 00:22:50,510 --> 00:22:55,750 Now one thing that you might try is you might want to rerun this cell here. 238 239 00:22:55,770 --> 00:22:57,930 Question is: will this still work? 239 240 00:22:59,410 --> 00:23:01,560 And the answer is: yes, it will. 240 241 00:23:01,640 --> 00:23:09,050 Even though we don't have to specify a value for the multiplier and the precision, we can do. 241 242 00:23:09,050 --> 00:23:15,500 So I can add an extra zero here to the precision and make our step size even smaller by changing it 242 243 00:23:15,650 --> 00:23:25,250 from .02 to .01 and I can overwrite the default values that this function usually 243 244 00:23:25,250 --> 00:23:34,780 has and we can see that having made the step size smaller and our cut off point even more precise, now 244 245 00:23:34,780 --> 00:23:38,760 the number of steps increases to 56. 245 246 00:23:38,800 --> 00:23:40,700 So this is really, really cool right. 246 247 00:23:40,720 --> 00:23:44,170 We've got really powerful capabilities with functions. 247 248 00:23:44,170 --> 00:23:47,850 We've got a very descriptive way we can call them. 248 249 00:23:48,100 --> 00:23:56,440 We've got multiple outputs that we can store in variables just separated by commas and we can have optional 249 250 00:23:56,500 --> 00:24:04,120 arguments, so we can give default values to some of our parameters in the function when we create it. 250 251 00:24:04,510 --> 00:24:09,940 But all this stuff with arguments and optional arguments it's kind of hidden from view. 251 252 00:24:09,970 --> 00:24:10,950 Right. 252 253 00:24:10,990 --> 00:24:17,870 I mean how would you know what the arguments are for a function that you've never seen before? 253 254 00:24:17,890 --> 00:24:23,950 Wouldn't it be nice to be able to pull up this information really quickly and really easily inside Jupyter 254 255 00:24:23,950 --> 00:24:25,750 notebook without having to guess? 255 256 00:24:26,770 --> 00:24:29,680 Well, I got you covered. 256 257 00:24:29,710 --> 00:24:34,270 Let me show you a neat little trick in Jupyter notebook. 257 258 00:24:34,270 --> 00:24:44,170 So if I put my cursor over my function here gradient_descent and then I hit Shift and Tab on my keyboard, 258 259 00:24:44,200 --> 00:24:50,930 so I'm holding down the Shift key and I'm pressing Tab, then Jupyter notebook will pull up a little bit 259 260 00:24:50,930 --> 00:24:55,900 of documentation, a little bit of information on this function, so I can see here, 260 261 00:24:55,970 --> 00:25:01,880 I need four arguments, I can see what the arguments are called - derivative_func, initial_guess, multiplier, 261 262 00:25:02,060 --> 00:25:08,340 precision; and I can even see what the default values are for my other two arguments. 262 263 00:25:08,510 --> 00:25:14,090 Isn't this really, really cool? Jupyter notebook is actually smart enough to give us some information 263 264 00:25:14,420 --> 00:25:16,700 on our function right there and then. 264 265 00:25:16,880 --> 00:25:19,910 And it doesn't just work with our own function. 265 266 00:25:19,910 --> 00:25:26,360 So, scroll up and pick out matplotlib scatter function in this notebook and give this a try, Shift 266 267 00:25:26,780 --> 00:25:29,730 and then Tab on your keyboard. 267 268 00:25:29,950 --> 00:25:30,900 Go on. 268 269 00:25:31,010 --> 00:25:32,080 I'm going to wait for you right here. 269 270 00:25:34,910 --> 00:25:39,590 Did you try it in a couple of places? You might have noticed something. 270 271 00:25:39,710 --> 00:25:42,880 Sometimes it's really, really informative. 271 272 00:25:43,010 --> 00:25:46,830 And other times it's not. 272 273 00:25:46,970 --> 00:25:48,800 So, let me show you what I mean. 273 274 00:25:49,230 --> 00:25:52,300 So I've got my Python code here from our previous example. 274 275 00:25:52,340 --> 00:26:01,390 If I go to scatter and press Shift+Tab, then I get a wonderful description here with very, very detailed 275 276 00:26:01,390 --> 00:26:03,220 information on the Signature. 276 277 00:26:08,490 --> 00:26:16,620 I can click this little plus sign to even take a look at all this information here and all this documentation 277 278 00:26:17,040 --> 00:26:20,610 for our scatter function. 278 279 00:26:20,610 --> 00:26:29,970 And similarly, when I go up here to plt.figure, and I press Shift+Tab again I get a wonderful description 279 280 00:26:29,970 --> 00:26:32,040 here, a wonderful signature, 280 281 00:26:32,040 --> 00:26:39,840 these are all the things in the header and click the little plus sign and I get really, really descriptive 281 282 00:26:39,930 --> 00:26:43,650 information, right? "Facecolor: the background color. If not provided, 282 283 00:26:43,650 --> 00:26:46,500 defaults to our rc figure.facecolor." 283 284 00:26:46,500 --> 00:26:47,970 Fair enough, right? 284 285 00:26:48,510 --> 00:26:49,790 Cool. 285 286 00:26:50,010 --> 00:26:52,730 Let's go to this plot functionality here. 286 287 00:26:52,740 --> 00:26:53,500 Right. 287 288 00:26:53,520 --> 00:26:58,830 If I press Shift+Tab on this all I get is "plt. 288 289 00:26:58,860 --> 00:27:07,720 plot", and then args and kwargs, so this isn't really informative and I do have to kind of go to the 289 290 00:27:07,720 --> 00:27:15,000 documentation here and really, really figure out what it is. But it's not all that readable, right? 290 291 00:27:15,040 --> 00:27:22,360 So, I mean you can be scrolling around here for a long time trying to make sense of all this. 291 292 00:27:22,390 --> 00:27:27,520 So at this point you're probably much better off going to the website of the official documentation 292 293 00:27:27,940 --> 00:27:37,330 where you can actually read up on this in a much nicer format and you can search on this page, etc.. 293 294 00:27:37,660 --> 00:27:42,700 In other words - if you want to know how something like plant or a subplot works, you're probably still 294 295 00:27:42,700 --> 00:27:48,960 better off in pulling up the documentation for these things in your browser. 295 296 00:27:49,180 --> 00:27:54,470 But speaking of plots, it's time to chart our gradient descent. 296 297 00:27:54,730 --> 00:28:01,490 So let me copy the cell that generates these charts here. 297 298 00:28:01,550 --> 00:28:09,830 "Edit" > "Copy Cells", scroll all the way down and then paste the cells above. 298 299 00:28:09,920 --> 00:28:14,050 This is our chance to play around with the starting values and our algorithm. 299 300 00:28:14,240 --> 00:28:16,850 So I'm going to add a comment here: 300 301 00:28:20,120 --> 00:28:28,780 "Calling gradient descent function"; and I'm going to edit this comment here - 301 302 00:28:28,880 --> 00:28:34,260 "Plotting function and derivative and scatter plot 302 303 00:28:34,370 --> 00:28:42,110 side by side"; then I'm going to take this one here and copy it and I'm going to paste it here. 303 304 00:28:42,710 --> 00:28:49,850 I am going to change our initial guess from -0.1 to 0.1. 304 305 00:28:50,070 --> 00:28:50,320 Now, 305 306 00:28:50,320 --> 00:28:53,980 let's add a couple of lines of code to have that scatter plot on here as well. 306 307 00:28:53,990 --> 00:28:55,460 This is gonna be "plt. 307 308 00:28:55,460 --> 00:28:57,590 scatter" 308 309 00:28:58,130 --> 00:29:06,840 and here we had our list of x values and then our cost function 309 310 00:29:06,920 --> 00:29:15,440 "g(list_x)" but we can't leave it like that because we're doing some calculations, 310 311 00:29:15,440 --> 00:29:21,350 remember? And the power function doesn't play nice with a list type. 311 312 00:29:21,350 --> 00:29:31,490 So I'm going to convert it to np.array and then I'm going to have it inside our g function. 312 313 00:29:31,860 --> 00:29:36,760 So this is going to be the input - I'm nesting two functions here, 313 314 00:29:36,870 --> 00:29:46,370 instead of splitting it up. And then we're going to add a color, I'm going to say color is red, the size of the dots will 314 315 00:29:46,370 --> 00:29:50,140 be 100 as before, and we gonna give it some transparency as well, 315 316 00:29:50,150 --> 00:29:57,260 alpha = 0.6, and then a closing parentheses for the scatter plot. 316 317 00:29:57,260 --> 00:29:59,560 Now, transparency looks pretty good on the other one as well. 317 318 00:29:59,570 --> 00:30:10,060 So, I'm going to add alpha=0.8 on our cost function chart as well. 318 319 00:30:10,150 --> 00:30:14,480 And let's do something similar for our derivative chart below. 319 320 00:30:14,480 --> 00:30:20,450 So it's also gonna be a scatter plot, "plt.scatter" and it's gonna be a list of x values again 320 321 00:30:20,480 --> 00:30:22,170 for the x axis, 321 322 00:30:22,430 --> 00:30:30,520 and then what was previously our slope list, we've stored in a variable called the deriv_list. 322 323 00:30:30,860 --> 00:30:37,360 We're gonna go again with the red color for the chart. 323 324 00:30:37,560 --> 00:30:45,270 We're gonna go with size 100 and alpha 0.5 324 325 00:30:45,270 --> 00:30:54,780 and this sky blue line is also gonna get some transparency with alpha equals 0.6. 325 326 00:30:54,780 --> 00:30:56,090 Now let's run the cell. 326 327 00:30:56,130 --> 00:30:58,720 See what happens. And voila! 327 328 00:30:58,890 --> 00:31:00,120 This is what we get. 328 329 00:31:00,420 --> 00:31:09,360 We get our initial starting value up here at 0.1 and we're descending to the minimum here. 329 330 00:31:10,200 --> 00:31:13,530 And on our derivative it looks like this. 330 331 00:31:13,800 --> 00:31:23,540 We're gonna go down down down down down down down until our slope is equal to zero. 331 332 00:31:23,660 --> 00:31:32,220 Okay so let's try out a couple of different starting values for our gradient descent function in this 332 333 00:31:32,220 --> 00:31:33,040 example. 333 334 00:31:34,010 --> 00:31:40,100 So here we've started at 0.1 and we've converged to this minimum here, 334 335 00:31:40,100 --> 00:31:42,330 this right hand minimum here. 335 336 00:31:42,800 --> 00:31:47,240 Let's see what happens when we start out with the value 2. 336 337 00:31:47,360 --> 00:31:56,250 So I'm going to say my initial guess up here is gonna be the value 2 and I'm going to hit Shift+Enter. 337 338 00:31:57,720 --> 00:32:06,150 In this case, our gradient descent algorithm goes down here and we end up at the very, very same minimum. 338 339 00:32:06,350 --> 00:32:11,000 And if we start out somewhere else, say like -1.8, 339 340 00:32:19,930 --> 00:32:22,900 then we end up at this minimum 340 341 00:32:22,900 --> 00:32:31,000 instead. We end up in the left hand minimum and we'll end up in this left hand minimum as well 341 342 00:32:31,000 --> 00:32:43,110 if we start out at -0.1. So if we start at -0.1, we end up here as well, 342 343 00:32:43,120 --> 00:32:44,800 we end up in the left hand minimum. 343 344 00:32:45,460 --> 00:32:52,210 So what we can learn from this, what we can see from this example is that our algorithm isn't perfect, 344 345 00:32:52,600 --> 00:32:57,940 it's got some weaknesses because conceptually it's a little bit disturbing, 345 346 00:32:57,940 --> 00:32:58,200 right? 346 347 00:32:58,200 --> 00:33:06,430 That we end up at completely different minima when we start out at 0.1 and 347 348 00:33:06,550 --> 00:33:08,020 -0.1. 348 349 00:33:08,020 --> 00:33:16,390 So if we're unlucky in our choice of initial starting position, we can actually end up in very, very different 349 350 00:33:16,390 --> 00:33:17,700 places. 350 351 00:33:17,860 --> 00:33:25,840 So we can see in this example that the path of the descent can be very, very much influenced by that 351 352 00:33:25,840 --> 00:33:29,280 initial guess in certain situations. 352 353 00:33:29,290 --> 00:33:37,950 Now would you like to venture a guess what happens when we have an initial starting value of 0? So have a think about, 353 354 00:33:37,980 --> 00:33:44,760 what would happen with our gradient descent if we feed in the value zero into our initial 354 355 00:33:44,790 --> 00:33:49,560 guess before you run the algorithm. Let's try it out. 355 356 00:33:50,430 --> 00:33:58,650 So I'm gonna say instead of 0.1, I'm going to start at 0 and press Shift+Enter. What ends up 356 357 00:33:58,650 --> 00:34:03,930 happening is that we don't descend to any of the two minima here. 357 358 00:34:03,930 --> 00:34:08,710 Instead we end up sitting right here on the maximum. 358 359 00:34:08,910 --> 00:34:15,710 And that's because the slope at this very, very point is also equal to 0. 359 360 00:34:15,720 --> 00:34:21,420 And we can see this here on the right hand function here, on the slope of the cost function. 360 361 00:34:21,600 --> 00:34:28,720 Remember, a gradient descent algorithm stops running once the slope is equal to 0. 361 362 00:34:28,740 --> 00:34:34,480 So this problem is also related to this sensitivity to the starting position. 362 363 00:34:34,500 --> 00:34:35,000 Right? 363 364 00:34:35,070 --> 00:34:41,490 The sensitivity of the path of the gradient descent algorithm to that initial guess. 364 365 00:34:41,670 --> 00:34:46,990 Now, in this case, both of the two minima have the same cost. 365 366 00:34:46,990 --> 00:34:55,530 Right? The cost is equal at both of these two minima but we can also imagine a very different situation. 366 367 00:34:56,070 --> 00:35:05,430 If our cost function looked something like this, then we would have two minima, one of which has a much lower 367 368 00:35:05,430 --> 00:35:06,770 cost than the other one. 368 369 00:35:06,810 --> 00:35:09,010 One of these is a global minimum. 369 370 00:35:09,090 --> 00:35:11,600 And the other one is a local minimum. 370 371 00:35:12,090 --> 00:35:17,460 So the local minimum actually has a higher cost than the global minimum, 371 372 00:35:17,460 --> 00:35:25,140 but our gradient descent would not discover the global minimum if the initial starting point was on 372 373 00:35:25,140 --> 00:35:32,340 the right hand side of that local maximum, of that small hump right there in the middle of the chart. 373 374 00:35:32,520 --> 00:35:32,840 Okay. 374 375 00:35:32,860 --> 00:35:34,600 So we've outlined the problem. 375 376 00:35:34,860 --> 00:35:36,450 What's the solution? 376 377 00:35:36,450 --> 00:35:39,840 Right? How do we get around this weakness? 377 378 00:35:39,840 --> 00:35:49,320 Well, the easiest thing to do would be to simply try out multiple different starting values and see if 378 379 00:35:49,320 --> 00:35:52,170 they end up in the same place. 379 380 00:35:52,200 --> 00:35:58,590 Think of this as like injecting a little bit of randomness into our gradient descent, so we could do 380 381 00:35:58,590 --> 00:36:05,850 this by choosing a whole host of like random starting values and then running our gradient descent over 381 382 00:36:05,850 --> 00:36:09,160 and over again to see where it ends up. 382 383 00:36:09,330 --> 00:36:14,220 And this might be an approach you could take if you actually don't know what the cost function looks 383 384 00:36:14,220 --> 00:36:14,810 like, right? 384 385 00:36:14,820 --> 00:36:17,600 If you don't know where the minimum is, 385 386 00:36:17,670 --> 00:36:20,890 this might be something that you want to try. 386 387 00:36:20,970 --> 00:36:26,450 Another thing we could do is to try a completely different algorithm to find the minimum. 387 388 00:36:26,450 --> 00:36:26,920 Right? 388 389 00:36:27,150 --> 00:36:32,460 Because, let's face it, this particular version of gradient descent isn't our only option. 389 390 00:36:32,550 --> 00:36:40,440 Similar to how we can try a bunch of different random starting points, other algorithms have randomness 390 391 00:36:40,530 --> 00:36:41,960 baked into them. 391 392 00:36:42,060 --> 00:36:48,150 One of the versions of gradient descent that has more randomness baked in is called Stochastic Gradient 392 393 00:36:48,150 --> 00:36:49,290 Descent. 393 394 00:36:49,290 --> 00:36:56,840 And this is in contrast what we're currently doing which is called Batch Gradient Descent. But the thing 394 395 00:36:56,840 --> 00:37:02,300 to note about any of these approaches is that none of them are perfect, right. 395 396 00:37:02,300 --> 00:37:08,930 You'll find that no matter which approach you choose, it has certain strengths and it has certain weaknesses 396 397 00:37:09,590 --> 00:37:16,400 and it's important to understand what the pros and cons are of of each approach. 397 398 00:37:16,400 --> 00:37:21,830 So on that note, we're not done yet examining this particular algorithm. 398 399 00:37:21,830 --> 00:37:28,260 Our Batch Gradient Descent algorithm might actually face another problem and this is what we're gonna 399 400 00:37:28,280 --> 00:37:33,010 be looking at next. I'll see you in the next lesson. 400 401 00:37:33,030 --> 00:37:33,690 Take care.