0 1 00:00:00,440 --> 00:00:00,850 All right. 1 2 00:00:00,870 --> 00:00:03,210 So welcome back. 2 3 00:00:03,210 --> 00:00:09,870 In this lesson, we're going to start taking a look at another quirk of the gradient descent algorithm. 3 4 00:00:09,870 --> 00:00:16,110 We're going to be taking a look at an example where our gradient descent algorithm actually diverges 4 5 00:00:16,410 --> 00:00:19,380 and spirals more and more out of control. 5 6 00:00:19,380 --> 00:00:26,880 And also we're gonna be working through some more hardcore Python programming concepts. So I'm going to click 6 7 00:00:26,880 --> 00:00:34,660 in my last cell and then I'm going to go change the cell from Code to Markdown here. I'm going to add a hashtag 7 8 00:00:35,480 --> 00:00:36,160 and say 8 9 00:00:36,200 --> 00:00:37,130 "Example 3 - 9 10 00:00:40,340 --> 00:00:52,400 Divergence, Overflow and Python Tuples". The function that we're gonna be looking at in this example is 10 11 00:00:52,400 --> 00:01:04,190 gonna be this one - it's gonna be h(x) = x^5 - 2x^4 +2 11 12 00:01:04,670 --> 00:01:12,680 Forgot my closing tags in the end. 12 13 00:01:12,710 --> 00:01:13,970 There we go. 13 14 00:01:14,000 --> 00:01:19,470 Maybe add another cell below actually, one, two, three, so that we're not at the bottom of the screen. 14 15 00:01:19,910 --> 00:01:24,650 And then I'm going to generate some data, as always that's gonna be my first thing. 15 16 00:01:24,730 --> 00:01:31,880 So I'm going to make data and since this is example 3, I am when I use x_3 as my variable 16 17 00:01:31,880 --> 00:01:48,800 and I'm going to use np.linspace starting at -2.5 going to 2.5 and with the num argument set to 17 18 00:01:49,220 --> 00:01:50,340 1000. 18 19 00:01:50,870 --> 00:01:51,420 Okay. 19 20 00:01:51,440 --> 00:01:55,600 So now it's time to write that equation above in Python code. 20 21 00:01:55,690 --> 00:02:14,470 It's gonna be "def f(x):", "return x**5 - 2*x**4+2" 21 22 00:02:14,600 --> 00:02:23,950 And then our derivative of this function is gonna be "def dh(x):", "return" 22 23 00:02:24,340 --> 00:02:28,340 and then you've probably already worked this out by applying the power rule. 23 24 00:02:28,400 --> 00:02:40,790 It's gonna be 5*x**4 - 8*x**3 24 25 00:02:40,790 --> 00:02:41,290 Okay. 25 26 00:02:41,300 --> 00:02:43,860 So let's plot this function. 26 27 00:02:44,000 --> 00:02:47,370 I'm gonna scroll up and I'm gonna take this cell here. 27 28 00:02:47,370 --> 00:02:53,620 I'm going to say "Copy Cell" and then I'm going to paste it above. 28 29 00:02:53,670 --> 00:02:56,770 Now I'm going to have to make some changes to this code. Over here 29 30 00:02:56,780 --> 00:02:57,640 for the gradient descent, 30 31 00:02:57,650 --> 00:03:00,330 we're gonna be calling dh. 31 32 00:03:00,590 --> 00:03:06,340 And I'm going to say my initial guess should be equal to 0.2. 32 33 00:03:06,440 --> 00:03:12,680 I'm going to leave a lot of the other stuff as it is but I'm going to change my x axis and y axis on my graph. 33 34 00:03:12,680 --> 00:03:23,120 So this graph is gonna go from -1.2 to 2.5 and the y axis is gonna go from 34 35 00:03:24,170 --> 00:03:27,570 -1 to 4. 35 36 00:03:27,800 --> 00:03:34,140 So that's our cost function, I'm gonna change the y label to read h(x). 36 37 00:03:34,270 --> 00:03:43,590 I'm going to plot x_3 on h(x_3) and on the scatter plot 37 38 00:03:43,630 --> 00:03:52,950 I'm also going to be plotting h(x) not g(x). Similarly, for my derivative I'm going to change the label 38 39 00:03:53,120 --> 00:04:00,980 and on the axes, we're gonna go from -1 to 2 and then from -4 to 5. 39 40 00:04:01,370 --> 00:04:04,720 And then on the plotting, I'm going to change this to x_3, 40 41 00:04:07,700 --> 00:04:09,490 dh(x_3) 41 42 00:04:09,950 --> 00:04:19,680 and then I'm gonna hit Shift+Enter. Voila! So we can see here on our graph, we start at positive 42 43 00:04:19,680 --> 00:04:27,630 0.2 and then our gradient descent slowly, slowly, slowly makes its way down into this local minimum 43 44 00:04:27,720 --> 00:04:29,370 right here. 44 45 00:04:29,370 --> 00:04:34,110 Let's modify the cell here to print out what the values actually are. 45 46 00:04:34,320 --> 00:04:47,290 So I'm going to say "print('Local min occurs at ', local_min)" and then I'm going to print 46 47 00:04:48,340 --> 00:05:00,310 "The cost at the minimum is", and then I'm going to say h(local_min), right. 47 48 00:05:01,230 --> 00:05:05,970 So local_min remember is the last value calculated by our gradient descent. 48 49 00:05:06,240 --> 00:05:13,350 We're going to work out what the y value is on our chart here at this particular point. 49 50 00:05:13,690 --> 00:05:23,940 And finally, let's print out the number of steps that our algorithm has taken including the initial guess. 50 51 00:05:24,040 --> 00:05:33,940 So that's gonna be the length - len of our variable called list_x. When I run this 51 52 00:05:33,940 --> 00:05:37,650 now, I can see the output here at the bottom 52 53 00:05:37,650 --> 00:05:45,130 below our charts. The local minimum occurs at around 1.6, the cost of this minimum is about 53 54 00:05:45,310 --> 00:05:51,030 -0.62 and the number of steps is 117. 54 55 00:05:51,070 --> 00:06:00,070 So our gradient descent algorithm has run over 100 times to get to this point. Okay, 55 56 00:06:00,080 --> 00:06:01,890 so we're converging to this minimum here 56 57 00:06:01,910 --> 00:06:04,010 and so far nothing new. 57 58 00:06:04,010 --> 00:06:11,990 We've seen all of this before. But let's see what happens if instead of at 0.2 we start at 58 59 00:06:12,080 --> 00:06:13,460 -0.2. 59 60 00:06:13,460 --> 00:06:18,260 Let's see what happens if we start a little bit on the other side of this chart. 60 61 00:06:18,290 --> 00:06:27,020 So I'm going to scroll back up and here where my initial guess is 0.2 I want to change this to 61 62 00:06:27,230 --> 00:06:29,930 -0.2. 62 63 00:06:29,930 --> 00:06:37,110 And I'm going to rerun the cell and see what happens. Scrolling down we see an error. 63 64 00:06:37,250 --> 00:06:43,470 In fact we see an overflow error, where it says the result is too large. 64 65 00:06:43,490 --> 00:06:45,160 What does that mean? 65 66 00:06:45,320 --> 00:06:52,220 I'm going to change the initial guess back to 0.2, hit Shift+Enter, reload the graph so we can 66 67 00:06:52,220 --> 00:06:57,640 think about this. If our initial guess was -0.2 67 68 00:06:57,640 --> 00:07:05,200 then we would be on the left hand side of this hump and this means that the algorithm would be starting 68 69 00:07:05,200 --> 00:07:11,380 to move down and down and down and down this line here. Now we continue going all the way down to 69 70 00:07:11,380 --> 00:07:14,190 like negative infinity. 70 71 00:07:14,200 --> 00:07:15,490 Now I know what you're thinking. 71 72 00:07:15,520 --> 00:07:17,110 Negative infinity. 72 73 00:07:17,560 --> 00:07:25,170 That makes h(x) a very a very unrealistic cost function but it does illustrate several things. 73 74 00:07:25,210 --> 00:07:31,870 First, we can see how our gradient descent algorithm behaves when the algorithm diverges. 74 75 00:07:31,870 --> 00:07:37,600 And second, it shows us how our Python program behaves when this happens. 75 76 00:07:37,600 --> 00:07:43,510 So conceptually we've explained the problem, but I do think that seeing this overflow error gives us 76 77 00:07:43,630 --> 00:07:48,920 an opportunity for understanding something a little deeper about our Python code. 77 78 00:07:49,060 --> 00:07:55,660 So let's run our code in slow motion if you will and examine what's actually happening. And to do that, 78 79 00:07:55,720 --> 00:08:02,710 I'm going to modify our gradient descent function. So I'm going to scroll back up where we've defined our gradient 79 80 00:08:02,740 --> 00:08:12,560 descent and in our function header, I'm going to tack on another parameter. This parameter is gonna be 80 81 00:08:12,560 --> 00:08:23,070 called max_iter for max iterations and give it a default value of say 300 and instead 81 82 00:08:23,070 --> 00:08:31,110 of this hardcoded 500 value here in our range, I'm going to substitute our argument I'm going to substitute 82 83 00:08:31,110 --> 00:08:40,710 in max_iter from max iterations. So this way we can specify the maximum number of times our loop 83 84 00:08:40,830 --> 00:08:45,000 will run when we are calling our function. 84 85 00:08:45,000 --> 00:08:48,460 So I'm going to press Shift+Enter to update the Python code now. 85 86 00:08:48,570 --> 00:08:54,750 So I'm going to update the cell and then down here where we're generating this graph with our third example, 86 87 00:08:55,440 --> 00:08:59,970 I'm going to change our function call as follows - 87 88 00:09:01,130 --> 00:09:11,270 for our initial guess I am going to use -0.2, but then I'm going to give a max iteration 88 89 00:09:11,270 --> 00:09:19,220 value of 10 and see where this leaves us. So I'm going to hit Shift+Enter, 89 90 00:09:19,710 --> 00:09:22,390 take a look at the graphs. OK, 90 91 00:09:22,400 --> 00:09:31,030 so in 10 iterations we're still pretty much on this hump. If instead the max iterations of our loop are 91 92 00:09:31,030 --> 00:09:35,560 set to 40, then we can see we start moving a little further down. 92 93 00:09:38,590 --> 00:09:46,800 And if I change it to say 60 I can see we're moving down even more. 93 94 00:09:46,920 --> 00:09:52,560 Now, let me update this to 70 and rerun this function. When we examine the chart, 94 95 00:09:52,560 --> 00:09:55,290 now, what do we notice? 95 96 00:09:55,290 --> 00:10:03,540 Well, yes, it's moving down to the left but very interesting here is that the step size gets bigger and 96 97 00:10:03,540 --> 00:10:09,300 bigger with each step as this slope starts getting steeper and steeper and steeper. 97 98 00:10:09,680 --> 00:10:13,170 Our steps start getting larger and larger and larger. 98 99 00:10:14,220 --> 00:10:18,980 So what's the last x value that the algorithm calculates? 99 100 00:10:19,230 --> 00:10:26,250 We're printing out the last x value with this print statement. So we can see below our graphs are the 100 101 00:10:26,310 --> 00:10:29,360 printouts from our three print statements. 101 102 00:10:29,580 --> 00:10:36,330 We can see that the last x value that's printed out is negative 2 million. 102 103 00:10:36,470 --> 00:10:38,650 That's the first print statement here. 103 104 00:10:39,050 --> 00:10:45,290 And this is definitely not a local minimum in this case, but when we feed our negative 2 million back 104 105 00:10:45,410 --> 00:10:49,620 into our function then we can see that the cost, 105 106 00:10:49,620 --> 00:10:49,890 yeah, 106 107 00:10:49,940 --> 00:10:53,720 what's on the y axis at this point 107 108 00:10:53,870 --> 00:10:59,940 is equal to -3.8*10^31. 108 109 00:10:59,960 --> 00:11:06,290 So in our print statement here, Python is giving us this number in scientific notation but it's actually 109 110 00:11:06,290 --> 00:11:07,400 an enormous number. 110 111 00:11:07,550 --> 00:11:13,880 It's around 3 8 0 0 0 0 0 0 0 0 0, 111 112 00:11:13,900 --> 00:11:15,790 it actually continues going, right. 112 113 00:11:15,830 --> 00:11:19,460 I'd have to copy this, paste it three times. 113 114 00:11:19,460 --> 00:11:21,960 This is how large this number is 114 115 00:11:22,010 --> 00:11:29,260 that's being printed here in scientific notation - It's 3.8 with a lot of zeros after it. 115 116 00:11:29,370 --> 00:11:33,290 Yeah I mean you're going to be looking at this number and you're like "Well it's a computer, right? 116 117 00:11:33,290 --> 00:11:40,490 So what, we sent a man to the moon over like 40 years ago - surely my computer can handle calculating large 117 118 00:11:40,490 --> 00:11:41,240 numbers, right? 118 119 00:11:41,240 --> 00:11:42,810 What's the big deal?" 119 120 00:11:43,010 --> 00:11:44,320 And you're not wrong. 120 121 00:11:44,330 --> 00:11:49,240 We can and we should be able to do math with very large numbers. 121 122 00:11:49,340 --> 00:11:54,800 But the thing is your computer and Python doesn't do this straight out of the box. If you want to work 122 123 00:11:54,800 --> 00:11:56,850 with numbers of this sort of magnitude, 123 124 00:11:56,870 --> 00:12:01,310 if you're are, I don't know, calculating the number of atoms in the universe or what have you, 124 125 00:12:01,310 --> 00:12:04,260 then you have to employ a couple of tricks. 125 126 00:12:04,280 --> 00:12:12,650 The thing is, you can actually see what the maximum is that you can reach on your particular machine 126 127 00:12:12,680 --> 00:12:15,180 at home right now in Python 127 128 00:12:15,200 --> 00:12:21,920 straight out of the box and that's without importing any libraries or any modules and using Python as 128 129 00:12:21,920 --> 00:12:24,450 it is that you've got installed right now. 129 130 00:12:24,590 --> 00:12:30,050 So if you're curious and you wanted to pull up this sort of system specific information you can actually 130 131 00:12:30,050 --> 00:12:33,450 do so with a module called "sys". 131 132 00:12:33,470 --> 00:12:35,220 So "import sys". 132 133 00:12:35,260 --> 00:12:41,820 This is the module where the system's specific information resides and there you can pull up a number 133 134 00:12:41,820 --> 00:12:45,050 of different things. To see the kind of things that I'm talking about, 134 135 00:12:45,060 --> 00:12:52,110 you can write something like "help", and then put sys in there and then you'll get some documentation on 135 136 00:12:52,440 --> 00:12:58,290 the system module. So you can read this. 136 137 00:12:58,430 --> 00:13:05,660 This is by the way very, very similar to what you can pull up by pressing Shift and then Tab and then 137 138 00:13:05,660 --> 00:13:07,700 hitting that little plus sign. 138 139 00:13:07,700 --> 00:13:12,820 You'll see that this also pulls up the same documentation. 139 140 00:13:12,920 --> 00:13:15,680 But let me show you two things that might be quite useful. 140 141 00:13:15,680 --> 00:13:18,110 I'm going to comment out the the help here. 141 142 00:13:18,170 --> 00:13:19,820 Don't need this. 142 143 00:13:19,820 --> 00:13:25,340 So for example one thing that you might be interested in looking up is what version of Anaconda you're 143 144 00:13:25,340 --> 00:13:27,530 using or what version of Python. 144 145 00:13:27,710 --> 00:13:31,750 And you can pull this up by writing "sys.version". 145 146 00:13:31,790 --> 00:13:35,390 So version is an attribute of this system module. 146 147 00:13:35,390 --> 00:13:43,430 So right now you can see that I'm using Python 3 and I've got a 46 bit system and you can also see that 147 148 00:13:43,430 --> 00:13:47,180 I'm running this on a Mac. 148 149 00:13:47,240 --> 00:13:52,220 Let me comment this out again and let's look at something else. 149 150 00:13:52,280 --> 00:14:00,870 Let's pull up what the largest floating point number is that I can calculate in my Python program now. 150 151 00:14:00,910 --> 00:14:05,870 Now you might ask: "Why am I interested in floating point numbers? Why do I say floating point numbers?" 151 152 00:14:05,870 --> 00:14:12,230 Well if you have that type of the thing that we're looking up, right, the type of thing that we're calculating 152 153 00:14:12,800 --> 00:14:13,940 h of, 153 154 00:14:14,000 --> 00:14:15,040 take a look, 154 155 00:14:15,050 --> 00:14:22,590 in this case it's h(local_min). 155 156 00:14:22,630 --> 00:14:24,250 So this is the thing that gave us the problem. 156 157 00:14:25,060 --> 00:14:33,070 So this is a float and this is what we're looking up. The largest float that we can use is "sys.float_ 157 158 00:14:33,550 --> 00:14:39,600 info.max" and here's our answer. 158 159 00:14:39,600 --> 00:14:40,330 Right. 159 160 00:14:40,500 --> 00:14:43,940 And this number that you see printed out here is specific to my machine. 160 161 00:14:44,220 --> 00:14:50,550 If you're using a different type of machine with a different architecture, say 32 bit, then you may see 161 162 00:14:50,700 --> 00:14:57,630 something else printed below the cell right now, but this is my maximum floating point number that I 162 163 00:14:57,630 --> 00:14:59,550 can use on on my architecture. 163 164 00:14:59,560 --> 00:15:04,990 Yeah, it's 1.79 times 10^308. 164 165 00:15:05,030 --> 00:15:05,340 Yeah. 165 166 00:15:05,370 --> 00:15:15,570 This is huge, but it's still well shy of the 10^31 that we had just a moment ago. 166 167 00:15:15,620 --> 00:15:16,380 Right. 167 168 00:15:16,400 --> 00:15:19,860 It's many, many orders of magnitude larger. 168 169 00:15:20,240 --> 00:15:22,400 So you might ask why are we running into this problem? 169 170 00:15:22,910 --> 00:15:29,120 Well, looking at our chart we can see that our step size increases dramatically with each step. 170 171 00:15:29,120 --> 00:15:36,740 And if I go up here and I change this from max iterations, the number of times I run my loop, from 70 171 172 00:15:36,830 --> 00:15:45,470 to say 71 and I look at my cost, then I get -2.1* 172 173 00:15:45,470 --> 00:15:47,910 10^121. 173 174 00:15:47,910 --> 00:15:54,710 So this is the crux of the problem - I'm going to blow through my limit at the very next iteration, on 174 175 00:15:54,710 --> 00:15:56,990 iteration 73. 175 176 00:15:57,020 --> 00:15:59,740 This is when I get the overflow error. 176 177 00:15:59,840 --> 00:16:05,080 Now on your machine at home, if the number that you're seeing printed here, the number that spat out by 177 178 00:16:05,230 --> 00:16:11,660 sys.float_info.max is smaller than this then you might actually get that 178 179 00:16:11,660 --> 00:16:14,370 overflow error much sooner than I do, right. 179 180 00:16:14,390 --> 00:16:16,660 You might not get it at iteration 73, 180 181 00:16:16,670 --> 00:16:20,520 you might actually be getting that error far earlier, that overflow error. 181 182 00:16:20,570 --> 00:16:26,540 Now one thing that you might like to know about Python lingo is that errors like this are also referred 182 183 00:16:26,540 --> 00:16:34,830 to as exceptions, but no matter if you call it an exception or an error, we still crash and burn. 183 184 00:16:35,230 --> 00:16:43,820 So yeah, I hope you enjoyed that little detour into the the low level of representation of numbers 184 185 00:16:43,940 --> 00:16:50,810 inside your Python computer program. But I think that while we're on the topic of Python programming, 185 186 00:16:51,260 --> 00:17:00,570 we should revisit a piece of code that we've written in a previous lesson, namely the code up here, the 186 187 00:17:00,570 --> 00:17:06,870 code for our gradient descent algorithm, because I have to confess something - I've been a little cheeky 187 188 00:17:06,960 --> 00:17:14,130 in having our Python gradient descent function return multiple values without actually explaining how 188 189 00:17:14,130 --> 00:17:15,600 this works. 189 190 00:17:15,810 --> 00:17:19,680 And this is a good point to cover the Python code 190 191 00:17:19,860 --> 00:17:27,900 before we go back to actually analyzing our algorithm. So let's add a new section heading at the bottom. 191 192 00:17:30,490 --> 00:17:33,690 I'm going to click this little plus sign here to insert some cells below. 192 193 00:17:34,620 --> 00:17:40,730 And this cell here I'm going to convert from Code to Markdown and the section heading I'm going to give 193 194 00:17:40,740 --> 00:17:44,760 this is "Python tuples". 194 195 00:17:47,880 --> 00:17:57,950 So, what's a tuple? A tuple is a data structure that's very, very similar to a list - a tuple is just a sequence 195 196 00:17:57,950 --> 00:18:01,160 of values that are separated by a comma. 196 197 00:18:01,160 --> 00:18:04,800 And this is what we've used in our gradient descent function. 197 198 00:18:04,820 --> 00:18:07,280 Let me show you how you can create a tuple. I'm going to click 198 199 00:18:07,280 --> 00:18:12,580 Plus again here and let's do this. Let's, 199 200 00:18:12,860 --> 00:18:15,330 let's insert a quick comment here 200 201 00:18:15,680 --> 00:18:24,700 "Creating a tuple". My first tuple is gonna be called "breakfast" and it's going to contain three values 201 202 00:18:25,360 --> 00:18:33,720 bacon, eggs and avocado. 202 203 00:18:34,000 --> 00:18:37,190 This, by the way, is a fantastic way to start your day. 203 204 00:18:37,240 --> 00:18:41,070 It also illustrates the general format for tuples. 204 205 00:18:41,200 --> 00:18:49,090 You have a sequence of values that are separated by a comma. I'm going to create another tuple here call it 205 206 00:18:49,520 --> 00:18:50,300 unlucky_ 206 207 00:18:50,390 --> 00:19:03,720 numbers. I'm going to give it 13, 4 for China, 9 for Japan, 26 for India and 17 for Italy. 207 208 00:19:03,810 --> 00:19:06,710 So it's the same pattern as above. 208 209 00:19:06,710 --> 00:19:09,310 And this way of creating tuples actually has a name. 209 210 00:19:09,340 --> 00:19:17,810 This is called tuple packing, because we're packing multiple values into a single tuple. 210 211 00:19:18,040 --> 00:19:21,280 So now that we've got our tuples, how do we access them? 211 212 00:19:21,280 --> 00:19:31,660 Well, I'm going to add some print statements here like "I love", comma breakfast 212 213 00:19:32,040 --> 00:19:34,970 [0]. 213 214 00:19:35,100 --> 00:19:42,270 I'm going to hit Shift+Enter. My lack of spelling ability has foiled me once again, I'm going to take out the superfluous 214 215 00:19:42,390 --> 00:19:49,590 e here and then hit Shift +Enter again and then we can see that the syntax here with the square brackets 215 216 00:19:50,040 --> 00:19:55,200 for working with tuples is actually very, very similar to working with a list. 216 217 00:19:55,560 --> 00:19:58,460 So you've got a tuple that has a name, 217 218 00:19:58,650 --> 00:20:05,270 in this case breakfast and you're accessing the values inside the tuple through the index. 218 219 00:20:05,310 --> 00:20:12,840 So zero is the first item in the tuple. And to show you a second example, 219 220 00:20:12,840 --> 00:20:15,870 I'm going to print out the string 220 221 00:20:15,870 --> 00:20:24,830 "My hotel has no ", and then I'm gonna have two plus signs, another string at the end "th floor". 221 222 00:20:25,080 --> 00:20:35,910 Now in between here, I could put unlucky_numbers, and then square brackets and say provide 222 223 00:20:36,330 --> 00:20:38,400 the index 1, 223 224 00:20:43,710 --> 00:20:50,450 and if I try to run this right now, I'll get an error because the string concatenation with the pluses 224 225 00:20:50,810 --> 00:21:00,020 does not convert the integers here to strings, so I have to actually wrap this in a function called str, 225 226 00:21:01,970 --> 00:21:10,420 and only now can I press Shift+Enter and run this. If I try to do this without wrapping it then we'll 226 227 00:21:10,420 --> 00:21:15,260 get an error like this - must be string not int. 227 228 00:21:15,670 --> 00:21:23,350 And that's because my tuple here contains ints and those are not converted to strings by the plus operator. 228 229 00:21:23,500 --> 00:21:27,910 So I'm going to wrap this in a string function and press Enter. 229 230 00:21:28,270 --> 00:21:36,630 So we've covered how to how to access a value in a tuple. Brilliant! And how we can try something else, 230 231 00:21:36,640 --> 00:21:42,040 because I'm sure you're looking at this and you're saying "Well how are tuples different from lists? 231 232 00:21:42,040 --> 00:21:43,750 How how are tuples used? 232 233 00:21:43,750 --> 00:21:49,030 Why do we have something that's so similar and yet different?" 233 234 00:21:49,030 --> 00:21:59,170 Well, in contrast to lists, tuples are often used when the data they contain is heterogeneous. 234 235 00:21:59,170 --> 00:22:05,030 Now, what do I mean by that? Tuples often contain a mix of data in contrast to lists. 235 236 00:22:05,200 --> 00:22:13,630 So lists often contain the same kind of data, like all strings, all integers, but tuple like say, "not_ 236 237 00:22:13,630 --> 00:22:25,350 my_address" equals 1, comma and then the string "Infinite Loop", and then a comma and then 237 238 00:22:25,350 --> 00:22:28,310 another string "Cupertino", 238 239 00:22:28,310 --> 00:22:33,440 and then comma, "95014" for our postcode. 239 240 00:22:34,440 --> 00:22:41,900 And we've just created a tuple with a mix of data, a mix of different data types if you will and this 240 241 00:22:41,900 --> 00:22:46,010 is something that you don't usually see in practice with lists. 241 242 00:22:46,010 --> 00:22:53,700 Lists are usually homogeneous, meaning people don't tend to mix and match the different types of data. 242 243 00:22:53,890 --> 00:23:01,590 Now, another difference with lists is that tuples are immutable. 243 244 00:23:01,630 --> 00:23:02,960 What does that mean? 244 245 00:23:02,980 --> 00:23:09,110 It means that we can't change the tuple after we've made it. 245 246 00:23:09,130 --> 00:23:20,590 So for example, if I had, say breakfast, and I wanted to change bacon which is at index 0, and set that equal 246 247 00:23:20,590 --> 00:23:31,300 to a, say, sausage and just you know innocently swap out the value then Python will actually yell at us, 247 248 00:23:31,470 --> 00:23:38,640 it's gonna give us a type error it's gonna say the "tuple object does not support item assignment" and 248 249 00:23:38,640 --> 00:23:46,800 this basically means that once we've created a tuple like this, we cannot change the values here and 249 250 00:23:46,800 --> 00:23:52,300 we also can't append a new value say we can't stick this at say index 3 right. 250 251 00:23:52,470 --> 00:23:59,600 We get the same error in other words the immutability of tuples means that once you've created a tuple 251 252 00:23:59,930 --> 00:24:01,280 you can't mess around with it. 252 253 00:24:01,310 --> 00:24:03,570 You can't change it up. 253 254 00:24:03,660 --> 00:24:05,410 This is quite different from a list right. 254 255 00:24:05,430 --> 00:24:11,790 Because if you remember in our gradient descent function we were running our loop and we were appending 255 256 00:24:12,090 --> 00:24:19,200 items to our lists. Every time the loop ran, our list grew in length because we're appending new items 256 257 00:24:20,160 --> 00:24:27,680 and this is something we couldn't do with tuples. Now one more thing I want to show you on the topic 257 258 00:24:27,680 --> 00:24:29,910 of tuples is a little gotcha. 258 259 00:24:29,960 --> 00:24:33,930 Say we want to create a tuple with a single value. 259 260 00:24:34,090 --> 00:24:37,000 So just one value inside our tuple. 260 261 00:24:37,100 --> 00:24:44,240 I know it's strange but, for the sake of argument, have a think about how you would create a tuple with just 261 262 00:24:44,300 --> 00:24:45,260 one value. 262 263 00:24:45,260 --> 00:24:49,520 What would the Python syntax look like to store a single value inside this tuple? 263 264 00:24:53,380 --> 00:24:54,510 Here's the solution. 264 265 00:24:54,530 --> 00:25:02,070 So if I put a single value in here, say 42, then I would have to put a trailing comma after it. 265 266 00:25:02,080 --> 00:25:09,100 Now I've got a tuple with a single value, so if I print it out, print(tuple_with_single_value), 266 267 00:25:12,080 --> 00:25:13,360 then I can see it looks like this. 267 268 00:25:13,370 --> 00:25:16,370 It's got a single value and a comma. 268 269 00:25:16,550 --> 00:25:23,720 And if I substitute the print for type and check, then I can see that indeed tuple with single value 269 270 00:25:23,930 --> 00:25:26,540 is indeed a tuple. 270 271 00:25:26,540 --> 00:25:34,890 Now the very first time I saw this I found this syntax like super weird and confusing - trailing comma 271 272 00:25:34,900 --> 00:25:35,790 right? 272 273 00:25:35,840 --> 00:25:37,350 My goodness. 273 274 00:25:37,520 --> 00:25:38,970 So here it is. Now, 274 275 00:25:39,590 --> 00:25:44,820 you too have shared this experience and the weird syntax. 275 276 00:25:44,910 --> 00:25:45,970 You're welcome. 276 277 00:25:45,990 --> 00:25:48,190 Now it's time to come full circle. 277 278 00:25:48,210 --> 00:25:55,000 We've packed a bunch of values into a tuple, but we can also do the very opposite. 278 279 00:25:55,080 --> 00:25:59,280 So we can unpack these values as well. 279 280 00:25:59,280 --> 00:26:06,750 So if I take my tuple - breakfast, and I want to grab the values that are stored inside this tuple and 280 281 00:26:06,750 --> 00:26:17,190 put them into some separate variables, I can do that by writing, say, "main, side, greens" is equal 281 282 00:26:17,190 --> 00:26:18,770 to breakfast. 282 283 00:26:19,050 --> 00:26:21,760 And this is called sequence unpacking. 283 284 00:26:21,780 --> 00:26:22,020 Yeah. 284 285 00:26:22,050 --> 00:26:33,100 So if I print out "Main course is ", and then comma, "main", then I get "Main course is bacon". 285 286 00:26:33,100 --> 00:26:36,620 So here's the reason I mentioned this and why I say we've come full circle. 286 287 00:26:36,700 --> 00:26:43,960 If I scroll back up to where we had our gradient descent function, we can see here in our return statement 287 288 00:26:44,260 --> 00:26:51,370 what we're doing is we're returning three separate values, but in fact we're packing all these values 288 289 00:26:51,640 --> 00:26:54,040 into a single tuple. 289 290 00:26:54,530 --> 00:26:55,470 Yeah. 290 291 00:26:55,780 --> 00:27:04,800 And when we're calling our gradient descent function, say here, then we are unpacking this sequence and 291 292 00:27:04,800 --> 00:27:10,830 storing the results in three separate variables - local_min, 292 293 00:27:10,830 --> 00:27:13,590 list_x, deriv_list. 293 294 00:27:14,250 --> 00:27:20,820 So we've actually been using tuples, but we've never had to access any of the values from the tuple by 294 295 00:27:20,880 --> 00:27:22,050 index. 295 296 00:27:22,050 --> 00:27:23,530 But we can do that. 296 297 00:27:23,580 --> 00:27:25,840 Let me show you how it would work. 297 298 00:27:25,900 --> 00:27:36,240 So if I created a variable called data_tuple and set that equal to gradient_descent 298 299 00:27:36,750 --> 00:27:49,530 and for my derivative function I supply dh and for my initial guess I supply 299 300 00:27:49,530 --> 00:28:04,220 0.2, then I can print out the local_min at data_tuple[0] because that's 300 301 00:28:04,220 --> 00:28:15,050 the very, very first thing that is stored inside our tuple. I can print out the cost at the last x value 301 302 00:28:15,800 --> 00:28:17,230 which would be equal to, 302 303 00:28:17,720 --> 00:28:27,520 well in this case it would be h(data_tuple[0]) and then I could also 303 304 00:28:27,520 --> 00:28:39,070 print out the number of steps, which in this case would be the length of, so it would be "data_ 304 305 00:28:39,580 --> 00:28:47,620 tuple[1]" and we run this. Then you can see that it works exactly the same way 305 306 00:28:47,950 --> 00:28:53,910 as before, but instead of unpacking the sequence we're using the tuple now explicitly. 306 307 00:28:54,820 --> 00:29:01,480 Okay, so we've paused a little bit on analysing our algorithm and we've talked a little more about Python 307 308 00:29:01,480 --> 00:29:08,020 and Python programming, so now it's time to change tracks and go back to our gradient descent algorithm. 308 309 00:29:09,430 --> 00:29:17,790 Now a reasonable question to ask is why did I show you this h(x) example function? I mean this function, 309 310 00:29:17,820 --> 00:29:25,110 by my own admission, seems a little bit contrived and I already confessed that a non convex cost function 310 311 00:29:25,470 --> 00:29:34,830 is not very realistic, but truth be told, is that we can get this divergence and the very, very same overflow 311 312 00:29:34,890 --> 00:29:43,260 error in another way too and we can see this divergence and see the same error even if we are working 312 313 00:29:43,260 --> 00:29:50,040 with a very nice clean cost function where we know for a fact that we should be able to reach a minimum. 313 314 00:29:51,030 --> 00:29:56,490 And this is what we're gonna be examining in the next lesson. In the next lesson we're gonna be looking 314 315 00:29:56,490 --> 00:30:02,600 at the elephant in the room - the gradient descent learning rate. I'll see you there. 315 316 00:30:02,850 --> 00:30:03,480 Take care.