0 1 00:00:01,220 --> 00:00:06,940 Okay so we've generated the data and we've plotted a 3D chart. 1 2 00:00:06,980 --> 00:00:14,030 Now before we run our gradient descent algorithm, we're going to have to calculate the slope or the gradient 2 3 00:00:14,510 --> 00:00:17,150 on all the different points on our chart. 3 4 00:00:17,150 --> 00:00:22,880 We're going to have that down, because remember the gradient was what told us how far away we were from 4 5 00:00:22,880 --> 00:00:29,600 the minimum, the steeper the gradient the further away we were and the higher our cost in our wonderful 5 6 00:00:29,780 --> 00:00:38,410 three step machine learning process. Now previously our cost functions were a bit more simple, right? 6 7 00:00:38,410 --> 00:00:44,260 All we had to do was apply our power rule and work out our derivative function. 7 8 00:00:44,260 --> 00:00:49,820 But for our cost function in example 4m that's not going to be enough. 8 9 00:00:49,930 --> 00:00:56,590 And the reason for this is is that this function as a little bit more complex. Before we only had a slope 9 10 00:00:56,590 --> 00:00:59,530 in one dimension - the x dimension. 10 11 00:00:59,530 --> 00:01:03,730 But this time we've got a slope in two dimensions because we've got two parameters - 11 12 00:01:03,730 --> 00:01:11,180 we've got the X and the Y. And not only that, our functional form is a little bit more complex as well. 12 13 00:01:11,260 --> 00:01:17,920 We've got a fraction, we've got exponents, it's getting a little bit more complex to do that derivative 13 14 00:01:17,920 --> 00:01:18,330 here, it's 14 15 00:01:18,340 --> 00:01:21,030 not such a simple process. 15 16 00:01:21,400 --> 00:01:29,590 And this is a little bit of a preparation to the kind of pretty hairy cost functions that you might 16 17 00:01:29,590 --> 00:01:30,460 encounter 17 18 00:01:30,490 --> 00:01:38,210 going forward. A lot of those come from statistical theory and there are definitely a lot more involved. 18 19 00:01:38,230 --> 00:01:44,980 So what I want to show you in this lesson is how you can tackle quite complex cost functions without 19 20 00:01:44,980 --> 00:01:47,960 having to work everything out by hand. 20 21 00:01:48,970 --> 00:01:54,340 So as I said before, this time around we've got a Y and we've got an X, and we're optimizing our cost 21 22 00:01:54,730 --> 00:02:02,530 across two parameters. Just because the Y is low doesn't mean that the cost is low. 22 23 00:02:02,530 --> 00:02:03,560 Right? 23 24 00:02:03,710 --> 00:02:09,340 And this makes sense if you think about it. Looking at this graph just because we've got a low Y doesn't 24 25 00:02:09,340 --> 00:02:11,880 mean that the X is low as well. 25 26 00:02:12,070 --> 00:02:15,250 We have to pick both an optimal X and an optimal Y. 26 27 00:02:15,970 --> 00:02:24,460 And this means working out two slopes - a slope for each parameter. And to calculate the slope we are going to calculate 27 28 00:02:24,550 --> 00:02:27,180 the partial derivative. 28 29 00:02:27,220 --> 00:02:29,010 Now, what's a partial derivative? 29 30 00:02:29,290 --> 00:02:36,160 It's taking a derivative of a function with respect to one of the variables and holding the other 30 31 00:02:36,160 --> 00:02:37,530 one constant. 31 32 00:02:37,540 --> 00:02:43,690 So if we're taking the partial derivative with respect to X then we're holding Y constant. 32 33 00:02:43,690 --> 00:02:50,560 That way we get the gradient in the X direction and the way we're going to go about this is I'm going 33 34 00:02:50,560 --> 00:02:57,190 to show you how to calculate a partial derivative using symbolic computation in Python. 34 35 00:02:57,940 --> 00:03:05,320 So back in our Python notebook at the bottom let's add a couple of cells and then the very first one 35 36 00:03:05,560 --> 00:03:10,430 we're going to change to Markdown and we're gonna add a section heading here. 36 37 00:03:10,600 --> 00:03:24,370 This section heading is going to read "Partial Derivatives & Symbolic Computation". 37 38 00:03:24,570 --> 00:03:31,770 Now the question is what does a partial derivative actually look like and how do you work it out? Well, 38 39 00:03:32,030 --> 00:03:36,790 if you remember your high school maths then this isn't gonna be a problem for you. 39 40 00:03:37,070 --> 00:03:42,350 You're gonna be looking at this function right here and you're already going to be writing out the partial 40 41 00:03:42,350 --> 00:03:44,030 derivatives for it. 41 42 00:03:44,230 --> 00:03:49,450 But on the other hand if you're a bit rusty with your calculus or you simply want to check your work, 42 43 00:03:49,610 --> 00:03:55,500 let me show you some alternative ways that you can solve this mathematical problem. 43 44 00:03:55,550 --> 00:04:02,870 The first approach is being a little sneaky and typing the words "derivative calculator" into Google to 44 45 00:04:02,870 --> 00:04:08,750 find an online derivative calculator that will solve this for you. I have to say this is actually a pretty 45 46 00:04:08,750 --> 00:04:09,780 good idea. 46 47 00:04:09,920 --> 00:04:13,590 So there's a couple of really good ones out there. 47 48 00:04:13,740 --> 00:04:16,630 You know, you'll get a lot of results. 48 49 00:04:16,670 --> 00:04:21,540 I checked a couple of them out, the one I quite liked was symbol lab. 49 50 00:04:21,560 --> 00:04:24,570 So this is a pretty decent one. 50 51 00:04:24,590 --> 00:04:30,470 So on every single one of these Web sites, you're gonna get some text box here where you have to enter your 51 52 00:04:30,470 --> 00:04:31,610 equation. 52 53 00:04:31,640 --> 00:04:44,630 So ours was 1 divided by 3 and then it was to the power of "-x^2 - 53 54 00:04:44,630 --> 00:04:47,400 y^2" 54 55 00:04:47,420 --> 00:04:53,390 So you're gonna be typing this in and it's going to be "1/(3^(-x^2 55 56 00:04:53,390 --> 00:05:01,460 -y^2))" and then you got to make sure on this particular website that you go back down 56 57 00:05:02,440 --> 00:05:07,280 outside of the exponent and then put "+1" here. 57 58 00:05:07,580 --> 00:05:15,890 Otherwise you'll get a very, very different answer than if you have it up top. When you hit "Go", it calculates 58 59 00:05:15,890 --> 00:05:17,560 the derivative for you. 59 60 00:05:17,570 --> 00:05:19,830 So we've got our solution. 60 61 00:05:19,910 --> 00:05:26,240 So right here you're looking at the partial derivative with respect to x. And the really cool thing about 61 62 00:05:26,240 --> 00:05:34,460 a lot of these Web sites is that you can actually see the steps on how they got the solution. 62 63 00:05:34,460 --> 00:05:40,510 So you can see them applying the exponent rule, applying the chain rule and then down here 63 64 00:05:40,520 --> 00:05:47,610 they're applying our old friend the power rule and they're also showing you how to simplify the solution 64 65 00:05:47,910 --> 00:05:51,820 in the end. Now checking out a Web site like this, 65 66 00:05:51,820 --> 00:05:59,140 and you know kind of, checking the solution or revising your calculus is all very well and good but you 66 67 00:05:59,130 --> 00:06:01,180 know, you and I we're doing a Python course together. 67 68 00:06:01,180 --> 00:06:06,400 So how do you do this in Python and in Jupyter notebook? 68 69 00:06:06,640 --> 00:06:12,040 And this is what I'm going to show you right now, because we're going to be doing symbolic mathematics 69 70 00:06:12,400 --> 00:06:18,880 directly in Python using a module called "SymPy". 70 71 00:06:19,240 --> 00:06:25,000 Sympy actually has their own Web site as well so you can check out the documentation here if you like. 71 72 00:06:25,000 --> 00:06:27,280 You can see who's developing it. 72 73 00:06:27,280 --> 00:06:29,220 You can even donate money to them. 73 74 00:06:29,260 --> 00:06:30,270 So, 74 75 00:06:31,480 --> 00:06:35,380 so you know these guys are doing good work, it's a good project. 75 76 00:06:35,740 --> 00:06:43,120 I think the design is a little bit dated, needs a bit of a facelift but hey they're doing symbolic mathematics, 76 77 00:06:43,150 --> 00:06:47,450 not designing for Apple or what have you. 77 78 00:06:47,500 --> 00:06:53,770 Now as always when we want to use an external module in our Jupyter notebook, then we're going to have 78 79 00:06:53,770 --> 00:07:00,210 to import it and we're going to import it at the top where we're importing all the rest of our modules. 79 80 00:07:00,220 --> 00:07:05,830 So keeping all our imports together nice and tidy. 80 81 00:07:05,830 --> 00:07:09,070 So from SymPy we're interested in two things, 81 82 00:07:09,280 --> 00:07:19,510 we're gonna say "from sympy", all lowercase, "import symbols, diff". 82 83 00:07:20,140 --> 00:07:23,440 So we're interested in importing two things. 83 84 00:07:23,440 --> 00:07:29,930 One of them is the symbols functionality which will help us write mathematical notation. 84 85 00:07:30,520 --> 00:07:37,690 And the other is diff, which allows us to differentiate a mathematical function. 85 86 00:07:37,690 --> 00:07:37,980 All right. 86 87 00:07:37,990 --> 00:07:43,480 So let me scroll back down and show you how SymPy works. 87 88 00:07:43,570 --> 00:07:48,920 Also let me fix this typo here. 88 89 00:07:48,940 --> 00:07:54,060 First off, we need to tell Python which mathematical symbols we're gonna use. 89 90 00:07:54,070 --> 00:08:02,380 So instead of X and Y I'm going to use a and b to show the difference between our Python code a bit more 90 91 00:08:02,380 --> 00:08:03,270 clearly. 91 92 00:08:03,300 --> 00:08:11,000 So I'm going to say "a, b = symbols( 92 93 00:08:13,580 --> 00:08:21,210 'x', 'y')" 93 94 00:08:21,400 --> 00:08:29,950 By writing this line of code or telling Python that a stands for x and b should stand for y in the code 94 95 00:08:30,220 --> 00:08:32,350 that we're gonna be writing next. 95 96 00:08:32,350 --> 00:08:33,190 Mm hmm. 96 97 00:08:33,310 --> 00:08:36,130 Now I realize this sounds a little bit confusing. 97 98 00:08:36,250 --> 00:08:43,840 So I'm actually going to print out our function as it is below our cell using this notation. So I'm gonna 98 99 00:08:43,840 --> 00:08:53,180 say "f(a,b)" and hit Shift+Enter. Now if you see the error "symbols is not defined", 99 100 00:08:53,180 --> 00:09:00,580 then you've got to hit Shift+Enter on our import cell. What you should see instead is this, 100 101 00:09:00,920 --> 00:09:09,530 you should see that Python now recognizes our function as a mathematical function and it's substituted 101 102 00:09:10,040 --> 00:09:12,620 x and y for our a and b 102 103 00:09:12,650 --> 00:09:20,950 arguments. So we can see here that thanks to SymPy, a and b are now treated as the x and y in our function 103 104 00:09:21,610 --> 00:09:29,800 and Jupyter notebook now understands how to work with this function as we would do with a mathematical 104 105 00:09:29,800 --> 00:09:31,320 equation. 105 106 00:09:31,390 --> 00:09:38,910 So let's work out the partial derivatives using the diff function. So let me wrap this in a print statement 106 107 00:09:39,480 --> 00:09:43,270 I'm going to say print and then single quotes 107 108 00:09:43,680 --> 00:09:48,310 and in those single quotes I'm gonna say "Our cost function 108 109 00:09:51,920 --> 00:10:04,850 f(x, y) is: ", and then that's gonna be the end of the string comma, "f(a,b)". 109 110 00:10:04,860 --> 00:10:11,670 So I'm going to this in a print statement and I'm going to add some more print statements below because 110 111 00:10:12,330 --> 00:10:21,190 we can now work out the partial derivatives with this diff functionality from SymPy. Because we've imported 111 112 00:10:21,190 --> 00:10:23,530 it above, it's very, very simple to call. 112 113 00:10:23,570 --> 00:10:33,650 It's "diff(f(a,b))" and then you have to provide a second argument to this. 113 114 00:10:33,940 --> 00:10:37,170 You have to say what you're differentiating with respect to. 114 115 00:10:37,480 --> 00:10:42,430 So if we wanted to differentiate with respect to our first parameter, then we would differentiate with 115 116 00:10:42,430 --> 00:10:46,670 respect to a. Hitting Shift+Enter, 116 117 00:10:46,870 --> 00:10:57,580 we see that this here is the same partial derivative that we got from using one of the online derivative 117 118 00:10:57,910 --> 00:11:00,080 calculators. 118 119 00:11:00,200 --> 00:11:02,330 Let me put that in a print statement as well. 119 120 00:11:02,360 --> 00:11:14,750 So I'm going to say print, single quotes, comma and in those single quotes I'm going to say "Partial derivative with 120 121 00:11:14,750 --> 00:11:24,740 respect to x is: " and then have the output that we just had here and then close the parentheses at the 121 122 00:11:24,740 --> 00:11:26,540 end and hit Shift+Enter. 122 123 00:11:27,410 --> 00:11:34,510 So now we know how to work with our cost function and how to work with our partial derivatives and this 123 124 00:11:34,510 --> 00:11:35,880 brings us to our next point. 124 125 00:11:36,070 --> 00:11:45,880 How do we calculate the cost at a particular point in our function? For that we need to evaluate our 125 126 00:11:45,880 --> 00:11:47,230 function. 126 127 00:11:47,350 --> 00:11:54,970 So we need to substitute the particular x and y values, right, like 1 and 1.5 or what have 127 128 00:11:54,970 --> 00:11:58,000 you into our function. 128 129 00:11:58,090 --> 00:12:06,060 The SymPy module gives us this ability through another function, namely evalf - evaluate function. 129 130 00:12:06,460 --> 00:12:09,820 So here's how you would use it on our cost function. 130 131 00:12:09,820 --> 00:12:21,980 I would have "f(a,b).evalf", and then parentheses and in those parentheses I can substitute 131 132 00:12:22,460 --> 00:12:26,280 the values at which point I want to evaluate the cost. 132 133 00:12:26,360 --> 00:12:34,650 So I would say "subs = {a: 133 134 00:12:34,700 --> 00:12:43,450 1.8, b:1.0}". 134 135 00:12:43,760 --> 00:12:52,730 Then close the curly braces, close the parentheses for the evaluation function and then hit Shift+Enter. 135 136 00:12:54,260 --> 00:13:03,490 And what you should see is that the cost at this particular point is approximately 0.99. 136 137 00:13:03,510 --> 00:13:12,350 If I take a look at our graph, I take a look at where x=1.8 and y=1 is, it's around here 137 138 00:13:12,840 --> 00:13:19,180 and going up I'm at around 0.99 on the z axis. 138 139 00:13:19,200 --> 00:13:22,010 Okay so that's pretty useful. 139 140 00:13:22,050 --> 00:13:27,800 The only thing that's a little bit new is this notation right here. 140 141 00:13:27,840 --> 00:13:34,820 This is Python code that we haven't quite seen before, because we've got these curly braces 141 142 00:13:34,890 --> 00:13:43,630 and then inside we have two values - a and b, separated by a comma and then we set the values with colons. 142 143 00:13:43,740 --> 00:13:50,540 So a gets the value 1.8 and b gets the value 1.0. 143 144 00:13:51,420 --> 00:13:53,520 So this notation with the curly braces 144 145 00:13:53,520 --> 00:14:00,980 together with the colons is the Python notation for giving both a and b a value at the same time. 145 146 00:14:03,960 --> 00:14:08,350 The specific term for this is a Python dictionary. 146 147 00:14:08,520 --> 00:14:11,660 We've got a key value pair. 147 148 00:14:11,880 --> 00:14:16,040 A is the key and 1.8 is the value. 148 149 00:14:16,350 --> 00:14:23,250 And then we've got another key value pair, b is the key and 1.0 is the value. 149 150 00:14:23,330 --> 00:14:29,890 Now if you're coming from another programming language, a dictionary might have been referred to as an 150 151 00:14:30,010 --> 00:14:39,250 associative array and it's very similar to an array really but instead of kind of grabbing a particular 151 152 00:14:39,250 --> 00:14:46,600 value by an index, by a number, as we have done with our array, say getting the first one or getting the 152 153 00:14:46,600 --> 00:14:47,600 second one, 153 154 00:14:47,740 --> 00:14:52,790 in this case we are working with a key, 154 155 00:14:52,860 --> 00:14:55,190 so it's like our index has a name. 155 156 00:14:55,390 --> 00:15:00,970 And the reason they're called dictionaries is because they pretty much work like real dictionaries. 156 157 00:15:00,970 --> 00:15:05,460 If you have a word say "bug", then that's the key. 157 158 00:15:05,770 --> 00:15:12,630 And if you have the definition of this word - "unintended behavior or defect in a program that causes 158 159 00:15:12,630 --> 00:15:18,060 it to crash or malfunction", then that's the value and that's it. 159 160 00:15:18,060 --> 00:15:22,470 That's the dictionary data structure in a nutshell. 160 161 00:15:22,470 --> 00:15:32,190 You can always spot it with the curly braces and the keys separated from the values by a colon. 161 162 00:15:32,210 --> 00:15:32,540 All right. 162 163 00:15:32,560 --> 00:15:38,520 So now that I've printed out the cost for our cost function at a particular point, I'm going to wrap this in 163 164 00:15:38,520 --> 00:15:45,810 a print statement again, I'm going to say "print('',)", and in the single quotes I'm going 164 165 00:15:45,810 --> 00:16:00,850 to write "Value of f(x,y) at x=1.8, y=1.0 is: " and then I'll have 165 166 00:16:01,540 --> 00:16:08,950 the rest of our print statement like this and I'll finish it all off with a parentheses at the end. I'm going to 166 167 00:16:09,010 --> 00:16:11,080 split it out over two lines like this, 167 168 00:16:11,080 --> 00:16:13,090 so it's a bit more easy to read. 168 169 00:16:13,090 --> 00:16:20,080 I'm going to hit Shift+Enter and evaluate this cell again. Now as a really, really quick practice, 169 170 00:16:20,140 --> 00:16:29,170 I want to give you a challenge, namely can you evaluate the value of the slope with respect to x at this 170 171 00:16:29,170 --> 00:16:33,430 very, very same point, at 1.8 and 1? 171 172 00:16:33,900 --> 00:16:41,560 And can you print out this value, the value of the partial derivative with respect to a at 1.8 172 173 00:16:41,590 --> 00:16:44,800 and 1 in the cell? 173 174 00:16:45,340 --> 00:16:52,190 I'll give you a few seconds to pause the video and figure this out. 174 175 00:16:52,310 --> 00:16:52,640 All right. 175 176 00:16:52,660 --> 00:16:55,060 So here's the solution. 176 177 00:16:55,610 --> 00:17:01,040 Once again you're going to be working with the evalf functionality, but in this case we're not gonna 177 178 00:17:01,040 --> 00:17:08,540 be evaluating the cost function, we're gonna be evaluating the partial derivative, so we can say 178 179 00:17:08,540 --> 00:17:18,740 "diff(f(a,b),)" and then we have to say if we want to evaluate this function 179 180 00:17:18,740 --> 00:17:25,900 with respect to a or to b. I'm going to say a and then I'm going to close the parentheses for our diff. 180 181 00:17:26,140 --> 00:17:28,340 So this is our partial derivative. 181 182 00:17:28,510 --> 00:17:37,480 We'll put a dot after it and then again we'll say "evalf(subs=)" and now we'll provide 182 183 00:17:37,570 --> 00:17:49,700 our dictionary, open curly braces "a:1.8, b:1.0" and let's print 183 184 00:17:49,700 --> 00:17:57,640 this out with Shift+Enter and here we can see that the slope at this particular point in the x direction 184 185 00:17:58,090 --> 00:18:09,340 is 0.037. Once again, I'm going to wrap this in a print statement and I'm going to say "Value of partial 185 186 00:18:10,300 --> 00:18:19,050 derivative with respect to x: ", and I'm going to close the parentheses at the end and hit Shift+Enter.