0
1
00:00:00,440 --> 00:00:00,850
All right.
1

2
00:00:00,870 --> 00:00:03,210
So welcome back.
2

3
00:00:03,210 --> 00:00:09,870
In this lesson, we're going to start taking a look at another quirk of the gradient descent algorithm.
3

4
00:00:09,870 --> 00:00:16,110
We're going to be taking a look at an example where our gradient descent algorithm actually diverges
4

5
00:00:16,410 --> 00:00:19,380
and spirals more and more out of control.
5

6
00:00:19,380 --> 00:00:26,880
And also we're gonna be working through some more hardcore Python programming concepts. So I'm going to click
6

7
00:00:26,880 --> 00:00:34,660
in my last cell and then I'm going to go change the cell from Code to Markdown here. I'm going to add a hashtag
7

8
00:00:35,480 --> 00:00:36,160
and say
8

9
00:00:36,200 --> 00:00:37,130
"Example 3 -
9

10
00:00:40,340 --> 00:00:52,400
Divergence, Overflow and Python Tuples". The function that we're gonna be looking at in this example is
10

11
00:00:52,400 --> 00:01:04,190
gonna be this one - it's gonna be h(x) = x^5 - 2x^4 +2
11

12
00:01:04,670 --> 00:01:12,680
Forgot my closing tags in the end.
12

13
00:01:12,710 --> 00:01:13,970
There we go.
13

14
00:01:14,000 --> 00:01:19,470
Maybe add another cell below actually, one, two, three, so that we're not at the bottom of the screen.
14

15
00:01:19,910 --> 00:01:24,650
And then I'm going to generate some data, as always that's gonna be my first thing.
15

16
00:01:24,730 --> 00:01:31,880
So I'm going to make data and since this is example 3, I am when I use x_3 as my variable
16

17
00:01:31,880 --> 00:01:48,800
and I'm going to use np.linspace starting at -2.5 going to 2.5 and with the num argument set to
17

18
00:01:49,220 --> 00:01:50,340
1000.
18

19
00:01:50,870 --> 00:01:51,420
Okay.
19

20
00:01:51,440 --> 00:01:55,600
So now it's time to write that equation above in Python code.
20

21
00:01:55,690 --> 00:02:14,470
It's gonna be "def f(x):", "return x**5 - 2*x**4+2"
21

22
00:02:14,600 --> 00:02:23,950
And then our derivative of this function is gonna be "def dh(x):", "return"
22

23
00:02:24,340 --> 00:02:28,340
and then you've probably already worked this out by applying the power rule.
23

24
00:02:28,400 --> 00:02:40,790
It's gonna be 5*x**4 - 8*x**3 
24

25
00:02:40,790 --> 00:02:41,290
Okay.
25

26
00:02:41,300 --> 00:02:43,860
So let's plot this function.
26

27
00:02:44,000 --> 00:02:47,370
I'm gonna scroll up and I'm gonna take this cell here.
27

28
00:02:47,370 --> 00:02:53,620
I'm going to say "Copy Cell" and then I'm going to paste it above.
28

29
00:02:53,670 --> 00:02:56,770
Now I'm going to have to make some changes to this code. Over here
29

30
00:02:56,780 --> 00:02:57,640
for the gradient descent,
30

31
00:02:57,650 --> 00:03:00,330
we're gonna be calling dh.
31

32
00:03:00,590 --> 00:03:06,340
And I'm going to say my initial guess should be equal to 0.2.
32

33
00:03:06,440 --> 00:03:12,680
I'm going to leave a lot of the other stuff as it is but I'm going to change my x axis and y axis on my graph.
33

34
00:03:12,680 --> 00:03:23,120
So this graph is gonna go from -1.2 to 2.5 and the y axis is gonna go from
34

35
00:03:24,170 --> 00:03:27,570
-1 to 4.
35

36
00:03:27,800 --> 00:03:34,140
So that's our cost function, I'm gonna change the y label to read h(x).
36

37
00:03:34,270 --> 00:03:43,590
I'm going to plot x_3 on h(x_3) and on the scatter plot
37

38
00:03:43,630 --> 00:03:52,950
I'm also going to be plotting h(x) not g(x). Similarly, for my derivative I'm going to change the label
38

39
00:03:53,120 --> 00:04:00,980
and on the axes, we're gonna go from -1 to 2 and then from -4 to 5.
39

40
00:04:01,370 --> 00:04:04,720
And then on the plotting, I'm going to change this to x_3,
40

41
00:04:07,700 --> 00:04:09,490
dh(x_3)
41

42
00:04:09,950 --> 00:04:19,680
and then I'm gonna hit Shift+Enter. Voila! So we can see here on our graph, we start at positive 
42

43
00:04:19,680 --> 00:04:27,630
0.2 and then our gradient descent slowly, slowly, slowly makes its way down into this local minimum
43

44
00:04:27,720 --> 00:04:29,370
right here.
44

45
00:04:29,370 --> 00:04:34,110
Let's modify the cell here to print out what the values actually are.
45

46
00:04:34,320 --> 00:04:47,290
So I'm going to say "print('Local min occurs at ', local_min)" and then I'm going to print
46

47
00:04:48,340 --> 00:05:00,310
"The cost at the minimum is", and then I'm going to say h(local_min), right.
47

48
00:05:01,230 --> 00:05:05,970
So local_min remember is the last value calculated by our gradient descent.
48

49
00:05:06,240 --> 00:05:13,350
We're going to work out what the y value is on our chart here at this particular point.
49

50
00:05:13,690 --> 00:05:23,940
And finally, let's print out the number of steps that our algorithm has taken including the initial guess.
50

51
00:05:24,040 --> 00:05:33,940
So that's gonna be the length - len of our variable called list_x. When I run this
51

52
00:05:33,940 --> 00:05:37,650
now, I can see the output here at the bottom
52

53
00:05:37,650 --> 00:05:45,130
below our charts. The local minimum occurs at around 1.6, the cost of this minimum is about
53

54
00:05:45,310 --> 00:05:51,030
-0.62 and the number of steps is 117.
54

55
00:05:51,070 --> 00:06:00,070
So our gradient descent algorithm has run over 100 times to get to this point. Okay,
55

56
00:06:00,080 --> 00:06:01,890
so we're converging to this minimum here
56

57
00:06:01,910 --> 00:06:04,010
and so far nothing new.
57

58
00:06:04,010 --> 00:06:11,990
We've seen all of this before. But let's see what happens if instead of at 0.2 we start at
58

59
00:06:12,080 --> 00:06:13,460
-0.2.
59

60
00:06:13,460 --> 00:06:18,260
Let's see what happens if we start a little bit on the other side of this chart.
60

61
00:06:18,290 --> 00:06:27,020
So I'm going to scroll back up and here where my initial guess is 0.2 I want to change this to
61

62
00:06:27,230 --> 00:06:29,930
-0.2.
62

63
00:06:29,930 --> 00:06:37,110
And I'm going to rerun the cell and see what happens. Scrolling down we see an error.
63

64
00:06:37,250 --> 00:06:43,470
In fact we see an overflow error, where it says the result is too large.
64

65
00:06:43,490 --> 00:06:45,160
What does that mean?
65

66
00:06:45,320 --> 00:06:52,220
I'm going to change the initial guess back to 0.2, hit Shift+Enter, reload the graph so we can
66

67
00:06:52,220 --> 00:06:57,640
think about this. If our initial guess was -0.2
67

68
00:06:57,640 --> 00:07:05,200
then we would be on the left hand side of this hump and this means that the algorithm would be starting
68

69
00:07:05,200 --> 00:07:11,380
to move down and down and down and down this line here. Now we continue going all the way down to
69

70
00:07:11,380 --> 00:07:14,190
like negative infinity.
70

71
00:07:14,200 --> 00:07:15,490
Now I know what you're thinking.
71

72
00:07:15,520 --> 00:07:17,110
Negative infinity.
72

73
00:07:17,560 --> 00:07:25,170
That makes h(x) a very a very unrealistic cost function but it does illustrate several things.
73

74
00:07:25,210 --> 00:07:31,870
First, we can see how our gradient descent algorithm behaves when the algorithm diverges.
74

75
00:07:31,870 --> 00:07:37,600
And second, it shows us how our Python program behaves when this happens.
75

76
00:07:37,600 --> 00:07:43,510
So conceptually we've explained the problem, but I do think that seeing this overflow error gives us
76

77
00:07:43,630 --> 00:07:48,920
an opportunity for understanding something a little deeper about our Python code.
77

78
00:07:49,060 --> 00:07:55,660
So let's run our code in slow motion if you will and examine what's actually happening. And to do that,
78

79
00:07:55,720 --> 00:08:02,710
I'm going to modify our gradient descent function. So I'm going to scroll back up where we've defined our gradient
79

80
00:08:02,740 --> 00:08:12,560
descent and in our function header, I'm going to tack on another parameter. This parameter is gonna be
80

81
00:08:12,560 --> 00:08:23,070
called max_iter for max iterations and give it a default value of say 300 and instead
81

82
00:08:23,070 --> 00:08:31,110
of this hardcoded 500 value here in our range, I'm going to substitute our argument I'm going to substitute
82

83
00:08:31,110 --> 00:08:40,710
in max_iter from max iterations. So this way we can specify the maximum number of times our loop
83

84
00:08:40,830 --> 00:08:45,000
will run when we are calling our function.
84

85
00:08:45,000 --> 00:08:48,460
So I'm going to press Shift+Enter to update the Python code now.
85

86
00:08:48,570 --> 00:08:54,750
So I'm going to update the cell and then down here where we're generating this graph with our third example,
86

87
00:08:55,440 --> 00:08:59,970
I'm going to change our function call as follows - 
87

88
00:09:01,130 --> 00:09:11,270
for our initial guess I am going to use -0.2, but then I'm going to give a max iteration
88

89
00:09:11,270 --> 00:09:19,220
value of 10 and see where this leaves us. So I'm going to hit Shift+Enter,
89

90
00:09:19,710 --> 00:09:22,390
take a look at the graphs. OK,
90

91
00:09:22,400 --> 00:09:31,030
so in 10 iterations we're still pretty much on this hump. If instead the max iterations of our loop are
91

92
00:09:31,030 --> 00:09:35,560
set to 40, then we can see we start moving a little further down.
92

93
00:09:38,590 --> 00:09:46,800
And if I change it to say 60 I can see we're moving down even more.
93

94
00:09:46,920 --> 00:09:52,560
Now, let me update this to 70 and rerun this function. When we examine the chart,
94

95
00:09:52,560 --> 00:09:55,290
now, what do we notice?
95

96
00:09:55,290 --> 00:10:03,540
Well, yes, it's moving down to the left but very interesting here is that the step size gets bigger and
96

97
00:10:03,540 --> 00:10:09,300
bigger with each step as this slope starts getting steeper and steeper and steeper.
97

98
00:10:09,680 --> 00:10:13,170
Our steps start getting larger and larger and larger.
98

99
00:10:14,220 --> 00:10:18,980
So what's the last x value that the algorithm calculates?
99

100
00:10:19,230 --> 00:10:26,250
We're printing out the last x value with this print statement. So we can see below our graphs are the
100

101
00:10:26,310 --> 00:10:29,360
printouts from our three print statements.
101

102
00:10:29,580 --> 00:10:36,330
We can see that the last x value that's printed out is negative 2 million.
102

103
00:10:36,470 --> 00:10:38,650
That's the first print statement here.
103

104
00:10:39,050 --> 00:10:45,290
And this is definitely not a local minimum in this case, but when we feed our negative 2 million back
104

105
00:10:45,410 --> 00:10:49,620
into our function then we can see that the cost,
105

106
00:10:49,620 --> 00:10:49,890
yeah,
106

107
00:10:49,940 --> 00:10:53,720
what's on the y axis at this point
107

108
00:10:53,870 --> 00:10:59,940
is equal to -3.8*10^31.
108

109
00:10:59,960 --> 00:11:06,290
So in our print statement here, Python is giving us this number in scientific notation but it's actually
109

110
00:11:06,290 --> 00:11:07,400
an enormous number.
110

111
00:11:07,550 --> 00:11:13,880
It's around 3 8 0 0 0 0 0 0 0 0 0,
111

112
00:11:13,900 --> 00:11:15,790
it actually continues going, right.
112

113
00:11:15,830 --> 00:11:19,460
I'd have to copy this, paste it three times.
113

114
00:11:19,460 --> 00:11:21,960
This is how large this number is
114

115
00:11:22,010 --> 00:11:29,260
that's being printed here in scientific notation - It's 3.8 with a lot of zeros after it.
115

116
00:11:29,370 --> 00:11:33,290
Yeah I mean you're going to be looking at this number and you're like "Well it's a computer, right?
116

117
00:11:33,290 --> 00:11:40,490
So what, we sent a man to the moon over like 40 years ago - surely my computer can handle calculating large
117

118
00:11:40,490 --> 00:11:41,240
numbers, right?
118

119
00:11:41,240 --> 00:11:42,810
What's the big deal?"
119

120
00:11:43,010 --> 00:11:44,320
And you're not wrong.
120

121
00:11:44,330 --> 00:11:49,240
We can and we should be able to do math with very large numbers.
121

122
00:11:49,340 --> 00:11:54,800
But the thing is your computer and Python doesn't do this straight out of the box. If you want to work
122

123
00:11:54,800 --> 00:11:56,850
with numbers of this sort of magnitude,
123

124
00:11:56,870 --> 00:12:01,310
if you're are, I don't know, calculating the number of atoms in the universe or what have you,
124

125
00:12:01,310 --> 00:12:04,260
then you have to employ a couple of tricks.
125

126
00:12:04,280 --> 00:12:12,650
The thing is, you can actually see what the maximum is that you can reach on your particular machine
126

127
00:12:12,680 --> 00:12:15,180
at home right now in Python
127

128
00:12:15,200 --> 00:12:21,920
straight out of the box and that's without importing any libraries or any modules and using Python as
128

129
00:12:21,920 --> 00:12:24,450
it is that you've got installed right now.
129

130
00:12:24,590 --> 00:12:30,050
So if you're curious and you wanted to pull up this sort of system specific information you can actually
130

131
00:12:30,050 --> 00:12:33,450
do so with a module called "sys".
131

132
00:12:33,470 --> 00:12:35,220
So "import sys".
132

133
00:12:35,260 --> 00:12:41,820
This is the module where the system's specific information resides and there you can pull up a number
133

134
00:12:41,820 --> 00:12:45,050
of different things. To see the kind of things that I'm talking about,
134

135
00:12:45,060 --> 00:12:52,110
you can write something like "help", and then put sys in there and then you'll get some documentation on
135

136
00:12:52,440 --> 00:12:58,290
the system module. So you can read this.
136

137
00:12:58,430 --> 00:13:05,660
This is by the way very, very similar to what you can pull up by pressing Shift and then Tab and then
137

138
00:13:05,660 --> 00:13:07,700
hitting that little plus sign.
138

139
00:13:07,700 --> 00:13:12,820
You'll see that this also pulls up the same documentation.
139

140
00:13:12,920 --> 00:13:15,680
But let me show you two things that might be quite useful.
140

141
00:13:15,680 --> 00:13:18,110
I'm going to comment out the the help here.
141

142
00:13:18,170 --> 00:13:19,820
Don't need this.
142

143
00:13:19,820 --> 00:13:25,340
So for example one thing that you might be interested in looking up is what version of Anaconda you're
143

144
00:13:25,340 --> 00:13:27,530
using or what version of Python.
144

145
00:13:27,710 --> 00:13:31,750
And you can pull this up by writing "sys.version".
145

146
00:13:31,790 --> 00:13:35,390
So version is an attribute of this system module.
146

147
00:13:35,390 --> 00:13:43,430
So right now you can see that I'm using Python 3 and I've got a 46 bit system and you can also see that
147

148
00:13:43,430 --> 00:13:47,180
I'm running this on a Mac.
148

149
00:13:47,240 --> 00:13:52,220
Let me comment this out again and let's look at something else.
149

150
00:13:52,280 --> 00:14:00,870
Let's pull up what the largest floating point number is that I can calculate in my Python program now.
150

151
00:14:00,910 --> 00:14:05,870
Now you might ask: "Why am I interested in floating point numbers? Why do I say floating point numbers?"
151

152
00:14:05,870 --> 00:14:12,230
Well if you have that type of the thing that we're looking up, right, the type of thing that we're calculating
152

153
00:14:12,800 --> 00:14:13,940
h of,
153

154
00:14:14,000 --> 00:14:15,040
take a look,
154

155
00:14:15,050 --> 00:14:22,590
in this case it's h(local_min).
155

156
00:14:22,630 --> 00:14:24,250
So this is the thing that gave us the problem.
156

157
00:14:25,060 --> 00:14:33,070
So this is a float and this is what we're looking up. The largest float that we can use is "sys.float_
157

158
00:14:33,550 --> 00:14:39,600
info.max" and here's our answer.
158

159
00:14:39,600 --> 00:14:40,330
Right.
159

160
00:14:40,500 --> 00:14:43,940
And this number that you see printed out here is specific to my machine.
160

161
00:14:44,220 --> 00:14:50,550
If you're using a different type of machine with a different architecture, say 32 bit, then you may see
161

162
00:14:50,700 --> 00:14:57,630
something else printed below the cell right now, but this is my maximum floating point number that I
162

163
00:14:57,630 --> 00:14:59,550
can use on on my architecture.
163

164
00:14:59,560 --> 00:15:04,990
Yeah, it's 1.79 times 10^308.
164

165
00:15:05,030 --> 00:15:05,340
Yeah.
165

166
00:15:05,370 --> 00:15:15,570
This is huge, but it's still well shy of the 10^31 that we had just a moment ago.
166

167
00:15:15,620 --> 00:15:16,380
Right.
167

168
00:15:16,400 --> 00:15:19,860
It's many, many orders of magnitude larger.
168

169
00:15:20,240 --> 00:15:22,400
So you might ask why are we running into this problem?
169

170
00:15:22,910 --> 00:15:29,120
Well, looking at our chart we can see that our step size increases dramatically with each step.
170

171
00:15:29,120 --> 00:15:36,740
And if I go up here and I change this from max iterations, the number of times I run my loop, from 70
171

172
00:15:36,830 --> 00:15:45,470
to say 71 and I look at my cost, then I get -2.1*
172

173
00:15:45,470 --> 00:15:47,910
10^121.
173

174
00:15:47,910 --> 00:15:54,710
So this is the crux of the problem - I'm going to blow through my limit at the very next iteration, on
174

175
00:15:54,710 --> 00:15:56,990
iteration 73.
175

176
00:15:57,020 --> 00:15:59,740
This is when I get the overflow error.
176

177
00:15:59,840 --> 00:16:05,080
Now on your machine at home, if the number that you're seeing printed here, the number that spat out by
177

178
00:16:05,230 --> 00:16:11,660
sys.float_info.max is smaller than this then you might actually get that
178

179
00:16:11,660 --> 00:16:14,370
overflow error much sooner than I do, right.
179

180
00:16:14,390 --> 00:16:16,660
You might not get it at iteration 73,
180

181
00:16:16,670 --> 00:16:20,520
you might actually be getting that error far earlier, that overflow error.
181

182
00:16:20,570 --> 00:16:26,540
Now one thing that you might like to know about Python lingo is that errors like this are also referred
182

183
00:16:26,540 --> 00:16:34,830
to as exceptions, but no matter if you call it an exception or an error, we still crash and burn.
183

184
00:16:35,230 --> 00:16:43,820
So yeah, I hope you enjoyed that little detour into the the low level of representation of numbers
184

185
00:16:43,940 --> 00:16:50,810
inside your Python computer program. But I think that while we're on the topic of Python programming,
185

186
00:16:51,260 --> 00:17:00,570
we should revisit a piece of code that we've written in a previous lesson, namely the code up here, the
186

187
00:17:00,570 --> 00:17:06,870
code for our gradient descent algorithm, because I have to confess something - I've been a little cheeky
187

188
00:17:06,960 --> 00:17:14,130
in having our Python gradient descent function return multiple values without actually explaining how
188

189
00:17:14,130 --> 00:17:15,600
this works.
189

190
00:17:15,810 --> 00:17:19,680
And this is a good point to cover the Python code
190

191
00:17:19,860 --> 00:17:27,900
before we go back to actually analyzing our algorithm. So let's add a new section heading at the bottom.
191

192
00:17:30,490 --> 00:17:33,690
I'm going to click this little plus sign here to insert some cells below.
192

193
00:17:34,620 --> 00:17:40,730
And this cell here I'm going to convert from Code to Markdown and the section heading I'm going to give
193

194
00:17:40,740 --> 00:17:44,760
this is "Python tuples".
194

195
00:17:47,880 --> 00:17:57,950
So, what's a tuple? A tuple is a data structure that's very, very similar to a list - a tuple is just a sequence
195

196
00:17:57,950 --> 00:18:01,160
of values that are separated by a comma.
196

197
00:18:01,160 --> 00:18:04,800
And this is what we've used in our gradient descent function.
197

198
00:18:04,820 --> 00:18:07,280
Let me show you how you can create a tuple. I'm going to click
198

199
00:18:07,280 --> 00:18:12,580
Plus again here and let's do this. Let's,
199

200
00:18:12,860 --> 00:18:15,330
let's insert a quick comment here
200

201
00:18:15,680 --> 00:18:24,700
"Creating a tuple". My first tuple is gonna be called "breakfast" and it's going to contain three values
201

202
00:18:25,360 --> 00:18:33,720
bacon, eggs and avocado.
202

203
00:18:34,000 --> 00:18:37,190
This, by the way, is a fantastic way to start your day.
203

204
00:18:37,240 --> 00:18:41,070
It also illustrates the general format for tuples.
204

205
00:18:41,200 --> 00:18:49,090
You have a sequence of values that are separated by a comma. I'm going to create another tuple here call it 
205

206
00:18:49,520 --> 00:18:50,300
unlucky_
206

207
00:18:50,390 --> 00:19:03,720
numbers. I'm going to give it 13, 4 for China, 9 for Japan, 26 for India and 17 for Italy.
207

208
00:19:03,810 --> 00:19:06,710
So it's the same pattern as above.
208

209
00:19:06,710 --> 00:19:09,310
And this way of creating tuples actually has a name.
209

210
00:19:09,340 --> 00:19:17,810
This is called tuple packing, because we're packing multiple values into a single tuple.
210

211
00:19:18,040 --> 00:19:21,280
So now that we've got our tuples, how do we access them?
211

212
00:19:21,280 --> 00:19:31,660
Well, I'm going to add some print statements here like "I love", comma breakfast
212

213
00:19:32,040 --> 00:19:34,970
[0].
213

214
00:19:35,100 --> 00:19:42,270
I'm going to hit Shift+Enter. My lack of spelling ability has foiled me once again, I'm going to take out the superfluous
214

215
00:19:42,390 --> 00:19:49,590
e here and then hit Shift +Enter again and then we can see that the syntax here with the square brackets
215

216
00:19:50,040 --> 00:19:55,200
for working with tuples is actually very, very similar to working with a list.
216

217
00:19:55,560 --> 00:19:58,460
So you've got a tuple that has a name,
217

218
00:19:58,650 --> 00:20:05,270
in this case breakfast and you're accessing the values inside the tuple through the index.
218

219
00:20:05,310 --> 00:20:12,840
So zero is the first item in the tuple. And to show you a second example,
219

220
00:20:12,840 --> 00:20:15,870
I'm going to print out the string
220

221
00:20:15,870 --> 00:20:24,830
"My hotel has no ", and then I'm gonna have two plus signs, another string at the end "th floor".
221

222
00:20:25,080 --> 00:20:35,910
Now in between here, I could put unlucky_numbers, and then square brackets and say provide
222

223
00:20:36,330 --> 00:20:38,400
the index 1,
223

224
00:20:43,710 --> 00:20:50,450
and if I try to run this right now, I'll get an error because the string concatenation with the pluses
224

225
00:20:50,810 --> 00:21:00,020
does not convert the integers here to strings, so I have to actually wrap this in a function called str,
225

226
00:21:01,970 --> 00:21:10,420
and only now can I press Shift+Enter and run this. If I try to do this without wrapping it then we'll
226

227
00:21:10,420 --> 00:21:15,260
get an error like this - must be string not int.
227

228
00:21:15,670 --> 00:21:23,350
And that's because my tuple here contains ints and those are not converted to strings by the plus operator.
228

229
00:21:23,500 --> 00:21:27,910
So I'm going to wrap this in a string function and press Enter.
229

230
00:21:28,270 --> 00:21:36,630
So we've covered how to how to access a value in a tuple. Brilliant! And how we can try something else,
230

231
00:21:36,640 --> 00:21:42,040
because I'm sure you're looking at this and you're saying "Well how are tuples different from lists?
231

232
00:21:42,040 --> 00:21:43,750
How how are tuples used?
232

233
00:21:43,750 --> 00:21:49,030
Why do we have something that's so similar and yet different?"
233

234
00:21:49,030 --> 00:21:59,170
Well, in contrast to lists, tuples are often used when the data they contain is heterogeneous.
234

235
00:21:59,170 --> 00:22:05,030
Now, what do I mean by that? Tuples often contain a mix of data in contrast to lists.
235

236
00:22:05,200 --> 00:22:13,630
So lists often contain the same kind of data, like all strings, all integers, but tuple like say, "not_
236

237
00:22:13,630 --> 00:22:25,350
my_address" equals 1, comma and then the string "Infinite Loop", and then a comma and then
237

238
00:22:25,350 --> 00:22:28,310
another string "Cupertino",
238

239
00:22:28,310 --> 00:22:33,440
and then comma, "95014" for our postcode.
239

240
00:22:34,440 --> 00:22:41,900
And we've just created a tuple with a mix of data, a mix of different data types if you will and this
240

241
00:22:41,900 --> 00:22:46,010
is something that you don't usually see in practice with lists.
241

242
00:22:46,010 --> 00:22:53,700
Lists are usually homogeneous, meaning people don't tend to mix and match the different types of data.
242

243
00:22:53,890 --> 00:23:01,590
Now, another difference with lists is that tuples are immutable.
243

244
00:23:01,630 --> 00:23:02,960
What does that mean?
244

245
00:23:02,980 --> 00:23:09,110
It means that we can't change the tuple after we've made it.
245

246
00:23:09,130 --> 00:23:20,590
So for example, if I had, say breakfast, and I wanted to change bacon which is at index 0, and set that equal
246

247
00:23:20,590 --> 00:23:31,300
to a, say, sausage and just you know innocently swap out the value then Python will actually yell at us,
247

248
00:23:31,470 --> 00:23:38,640
it's gonna give us a type error it's gonna say the "tuple object does not support item assignment" and
248

249
00:23:38,640 --> 00:23:46,800
this basically means that once we've created a tuple like this, we cannot change the values here and
249

250
00:23:46,800 --> 00:23:52,300
we also can't append a new value say we can't stick this at say index 3 right.
250

251
00:23:52,470 --> 00:23:59,600
We get the same error in other words the immutability of tuples means that once you've created a tuple
251

252
00:23:59,930 --> 00:24:01,280
you can't mess around with it.
252

253
00:24:01,310 --> 00:24:03,570
You can't change it up.
253

254
00:24:03,660 --> 00:24:05,410
This is quite different from a list right.
254

255
00:24:05,430 --> 00:24:11,790
Because if you remember in our gradient descent function we were running our loop and we were appending
255

256
00:24:12,090 --> 00:24:19,200
items to our lists. Every time the loop ran, our list grew in length because we're appending new items
256

257
00:24:20,160 --> 00:24:27,680
and this is something we couldn't do with tuples. Now one more thing I want to show you on the topic
257

258
00:24:27,680 --> 00:24:29,910
of tuples is a little gotcha.
258

259
00:24:29,960 --> 00:24:33,930
Say we want to create a tuple with a single value.
259

260
00:24:34,090 --> 00:24:37,000
So just one value inside our tuple.
260

261
00:24:37,100 --> 00:24:44,240
I know it's strange but, for the sake of argument, have a think about how you would create a tuple with just
261

262
00:24:44,300 --> 00:24:45,260
one value.
262

263
00:24:45,260 --> 00:24:49,520
What would the Python syntax look like to store a single value inside this tuple?
263

264
00:24:53,380 --> 00:24:54,510
Here's the solution.
264

265
00:24:54,530 --> 00:25:02,070
So if I put a single value in here, say 42, then I would have to put a trailing comma after it.
265

266
00:25:02,080 --> 00:25:09,100
Now I've got a tuple with a single value, so if I print it out, print(tuple_with_single_value),
266

267
00:25:12,080 --> 00:25:13,360
then I can see it looks like this.
267

268
00:25:13,370 --> 00:25:16,370
It's got a single value and a comma.
268

269
00:25:16,550 --> 00:25:23,720
And if I substitute the print for type and check, then I can see that indeed tuple with single value
269

270
00:25:23,930 --> 00:25:26,540
is indeed a tuple.
270

271
00:25:26,540 --> 00:25:34,890
Now the very first time I saw this I found this syntax like super weird and confusing - trailing comma
271

272
00:25:34,900 --> 00:25:35,790
right?
272

273
00:25:35,840 --> 00:25:37,350
My goodness.
273

274
00:25:37,520 --> 00:25:38,970
So here it is. Now,
274

275
00:25:39,590 --> 00:25:44,820
you too have shared this experience and the weird syntax.
275

276
00:25:44,910 --> 00:25:45,970
You're welcome.
276

277
00:25:45,990 --> 00:25:48,190
Now it's time to come full circle.
277

278
00:25:48,210 --> 00:25:55,000
We've packed a bunch of values into a tuple, but we can also do the very opposite.
278

279
00:25:55,080 --> 00:25:59,280
So we can unpack these values as well.
279

280
00:25:59,280 --> 00:26:06,750
So if I take my tuple - breakfast, and I want to grab the values that are stored inside this tuple and
280

281
00:26:06,750 --> 00:26:17,190
put them into some separate variables, I can do that by writing, say, "main, side, greens" is equal
281

282
00:26:17,190 --> 00:26:18,770
to breakfast.
282

283
00:26:19,050 --> 00:26:21,760
And this is called sequence unpacking.
283

284
00:26:21,780 --> 00:26:22,020
Yeah.
284

285
00:26:22,050 --> 00:26:33,100
So if I print out "Main course is ", and then comma, "main", then I get "Main course is bacon".
285

286
00:26:33,100 --> 00:26:36,620
So here's the reason I mentioned this and why I say we've come full circle.
286

287
00:26:36,700 --> 00:26:43,960
If I scroll back up to where we had our gradient descent function, we can see here in our return statement
287

288
00:26:44,260 --> 00:26:51,370
what we're doing is we're returning three separate values, but in fact we're packing all these values
288

289
00:26:51,640 --> 00:26:54,040
into a single tuple.
289

290
00:26:54,530 --> 00:26:55,470
Yeah.
290

291
00:26:55,780 --> 00:27:04,800
And when we're calling our gradient descent function, say here, then we are unpacking this sequence and
291

292
00:27:04,800 --> 00:27:10,830
storing the results in three separate variables - local_min,
292

293
00:27:10,830 --> 00:27:13,590
list_x, deriv_list.
293

294
00:27:14,250 --> 00:27:20,820
So we've actually been using tuples, but we've never had to access any of the values from the tuple by
294

295
00:27:20,880 --> 00:27:22,050
index.
295

296
00:27:22,050 --> 00:27:23,530
But we can do that.
296

297
00:27:23,580 --> 00:27:25,840
Let me show you how it would work.
297

298
00:27:25,900 --> 00:27:36,240
So if I created a variable called data_tuple and set that equal to gradient_descent
298

299
00:27:36,750 --> 00:27:49,530
and for my derivative function I supply dh and for my initial guess I supply 
299

300
00:27:49,530 --> 00:28:04,220
0.2, then I can print out the local_min at data_tuple[0] because that's
300

301
00:28:04,220 --> 00:28:15,050
the very, very first thing that is stored inside our tuple. I can print out the cost at the last x value
301

302
00:28:15,800 --> 00:28:17,230
which would be equal to,
302

303
00:28:17,720 --> 00:28:27,520
well in this case it would be h(data_tuple[0]) and then I could also
303

304
00:28:27,520 --> 00:28:39,070
print out the number of steps, which in this case would be the length of, so it would be "data_
304

305
00:28:39,580 --> 00:28:47,620
tuple[1]" and we run this. Then you can see that it works exactly the same way
305

306
00:28:47,950 --> 00:28:53,910
as before, but instead of unpacking the sequence we're using the tuple now explicitly.
306

307
00:28:54,820 --> 00:29:01,480
Okay, so we've paused a little bit on analysing our algorithm and we've talked a little more about Python
307

308
00:29:01,480 --> 00:29:08,020
and Python programming, so now it's time to change tracks and go back to our gradient descent algorithm.
308

309
00:29:09,430 --> 00:29:17,790
Now a reasonable question to ask is why did I show you this h(x) example function? I mean this function,
309

310
00:29:17,820 --> 00:29:25,110
by my own admission, seems a little bit contrived and I already confessed that a non convex cost function
310

311
00:29:25,470 --> 00:29:34,830
is not very realistic, but truth be told, is that we can get this divergence and the very, very same overflow
311

312
00:29:34,890 --> 00:29:43,260
error in another way too and we can see this divergence and see the same error even if we are working
312

313
00:29:43,260 --> 00:29:50,040
with a very nice clean cost function where we know for a fact that we should be able to reach a minimum.
313

314
00:29:51,030 --> 00:29:56,490
And this is what we're gonna be examining in the next lesson. In the next lesson we're gonna be looking
314

315
00:29:56,490 --> 00:30:02,600
at the elephant in the room - the gradient descent learning rate. I'll see you there.
315

316
00:30:02,850 --> 00:30:03,480
Take care.