0
1
00:00:00,650 --> 00:00:04,520
All right, now it's finally time to do the hard part.
1

2
00:00:04,530 --> 00:00:09,780
We're going to write our own algorithm that will find the lowest cost.
2

3
00:00:09,930 --> 00:00:17,670
And this is the famous gradient descent algorithm, and gradient is well you guessed it just another word
3

4
00:00:17,790 --> 00:00:19,520
for slope.
4

5
00:00:19,520 --> 00:00:26,740
So I'm going to give you a very, very Austrian perspective to think about the gradient descent algorithm.
5

6
00:00:26,740 --> 00:00:32,520
You know, we've got a lot of mountains back in Austria and they're very, very beautiful and you can go
6

7
00:00:32,520 --> 00:00:37,340
ski down them, but a mountain is a force of nature.
7

8
00:00:37,350 --> 00:00:47,670
You have to respect the mountains. You see, the weather can change very, very quickly and this is especially
8

9
00:00:47,730 --> 00:00:49,370
unpredictable in winter
9

10
00:00:49,560 --> 00:00:51,400
and at high altitude.
10

11
00:00:51,480 --> 00:00:57,430
So you know imagine yourself that you've wandered off the beaten track and the fog comes rolling in
11

12
00:00:57,540 --> 00:01:03,870
and at this point you find yourself in a survival situation.
12

13
00:01:03,870 --> 00:01:09,850
This is when you can't see very far and you can only feel the ground beneath your feet.
13

14
00:01:09,870 --> 00:01:15,420
The cold is going to be creeping in through your jacket and you find yourself thinking: How do I get
14

15
00:01:15,420 --> 00:01:15,800
down?
15

16
00:01:15,810 --> 00:01:23,650
How do I get back down? Well, to figure out which way is down and towards that hot cup of tea waiting
16

17
00:01:23,650 --> 00:01:25,460
for you at the end of your journey,
17

18
00:01:25,570 --> 00:01:28,240
you got a feel like,
18

19
00:01:28,240 --> 00:01:29,020
which way is down,
19

20
00:01:29,020 --> 00:01:30,920
what is the slope, right?
20

21
00:01:30,970 --> 00:01:38,520
You're going to look at your feet and you're going to figure out that the fastest way down is in the
21

22
00:01:38,520 --> 00:01:42,010
direction where the slope is steepest.
22

23
00:01:42,180 --> 00:01:45,990
Right? Where the descent is the most steep.
23

24
00:01:46,530 --> 00:01:53,700
And if you take a step downwards in that direction and then kind of get a feel for it again, like which
24

25
00:01:53,700 --> 00:01:59,670
way, which way is the slope and then take another step where the slope is steepest, you'll be down in
25

26
00:01:59,670 --> 00:02:01,010
that valley in no time.
26

27
00:02:01,020 --> 00:02:04,930
And you can sit on that hot cup of tea, right?
27

28
00:02:05,100 --> 00:02:11,910
And this is how you can think about gradient descent, except, instead of a mountain, yeah, gradient descent
28

29
00:02:12,350 --> 00:02:18,060
is going to take place on a cost function and the cost function actually doesn't tend to look like this -
29

30
00:02:18,060 --> 00:02:21,060
it doesn't kind of have a peak, if you will.
30

31
00:02:21,120 --> 00:02:22,100
Right?
31

32
00:02:22,170 --> 00:02:27,450
Because if a function had a peak then it would be called concave,
32

33
00:02:27,450 --> 00:02:29,510
it has a maximum.
33

34
00:02:29,760 --> 00:02:33,240
But with our cost functions, we're going to be looking for minimums.
34

35
00:02:33,240 --> 00:02:33,420
Right?
35

36
00:02:33,420 --> 00:02:37,640
So if you imagine that mountain flipped upside down and all you've got is a valley,
36

37
00:02:37,760 --> 00:02:37,980
right,
37

38
00:02:37,980 --> 00:02:45,070
that you have to kind of find then your cost function is going to look more like this.
38

39
00:02:45,090 --> 00:02:49,390
This is a kind of function that's called convex.
39

40
00:02:49,400 --> 00:02:55,800
It will have a minimum and a hard job is to get to the bottom of it because that's where the cost is
40

41
00:02:55,980 --> 00:02:58,040
lowest.
41

42
00:02:58,470 --> 00:03:01,140
Now, gradient descent isn't always called gradient descent,
42

43
00:03:01,140 --> 00:03:04,840
there's another word for it that you might see as well in the literature.
43

44
00:03:04,920 --> 00:03:13,260
Sometimes it's referred to as Steepest Descent and yes it's an optimization algorithm for finding the
44

45
00:03:13,380 --> 00:03:15,390
minimum of a function.
45

46
00:03:15,390 --> 00:03:21,140
So, you know, think about our mountain example - to find the minimum the function takes these little steps,
46

47
00:03:21,150 --> 00:03:21,420
right?
47

48
00:03:21,420 --> 00:03:28,140
It takes a step in that direction where the slope is steepest, in the direction of the negative of the
48

49
00:03:28,140 --> 00:03:35,240
gradient and bit by bit ends up in the bottom of the valley.
49

50
00:03:35,260 --> 00:03:35,620
All right.
50

51
00:03:35,650 --> 00:03:38,560
So let's implement this in Jupyter notebook.
51

52
00:03:38,560 --> 00:03:41,650
Let's add another markdown cell here.
52

53
00:03:41,650 --> 00:03:51,760
And I'm going to go to "Cell" > "Cell Type" > "Markdown" and put two hash tags there for a section heading and that section
53

54
00:03:51,760 --> 00:04:00,340
heading is gonna be "Python Loops and Gradient Descent".
54

55
00:04:00,340 --> 00:04:06,610
Now, if you're a seasoned programmer you're gonna be familiar with writing loops but if you're new to
55

56
00:04:06,610 --> 00:04:13,840
Python or you're new to programming, then the next couple of minutes are going to be the introduction to this
56

57
00:04:14,080 --> 00:04:20,240
topic of loops. Loops are little bits of code that are executed over and over again.
57

58
00:04:20,290 --> 00:04:22,520
We're going to walk down that mountain.
58

59
00:04:22,540 --> 00:04:26,140
We're going to walk down into the valley with our gradient descent algorithm.
59

60
00:04:26,140 --> 00:04:33,970
So this is going to be a very, very useful tool for accomplishing that because our algorithm has to complete
60

61
00:04:33,970 --> 00:04:35,590
that famous three step process.
61

62
00:04:35,590 --> 00:04:35,990
Right?
62

63
00:04:36,040 --> 00:04:41,080
Predict, calculate error and learn, and repeat.
63

64
00:04:41,080 --> 00:04:47,560
So instead of writing the same Python instructions over and over again we're going to be using loops
64

65
00:04:47,830 --> 00:04:51,360
to simplify that for us.
65

66
00:04:51,820 --> 00:04:55,730
Speaking of "for", this is the first loop I'm going to introduce to you, guys.
66

67
00:04:55,750 --> 00:05:04,850
So this is gonna be the for loop in Python. So I'll just write a little comment here "Python for loop" and
67

68
00:05:04,870 --> 00:05:09,170
this is what the syntax looks like. We're going to have the keyword "for".
68

69
00:05:09,220 --> 00:05:15,460
And then there's gonna be a variable, in this case I'm gonna call it "n", and then then other keyword
69

70
00:05:16,300 --> 00:05:28,070
"in" and then I'm going to say "range(5):", new line and now we're inside the loop here
70

71
00:05:28,070 --> 00:05:33,900
we're gonna print famous first words "Hello World".
71

72
00:05:34,160 --> 00:05:36,290
OK, let's hit Shift + Enter.
72

73
00:05:36,290 --> 00:05:38,010
See what happens.
73

74
00:05:38,050 --> 00:05:38,330
All right.
74

75
00:05:38,350 --> 00:05:41,600
So we've printed "Hello World" five times.
75

76
00:05:41,600 --> 00:05:41,800
Right?
76

77
00:05:41,810 --> 00:05:43,210
One, two, three, four, five.
77

78
00:05:43,640 --> 00:05:50,780
If I change this range to 3, l print it three times. If I change it to a thousand, it'll print it a thousand
78

79
00:05:50,780 --> 00:05:58,250
times, but let's stick to the stick to five for the time being, and let's take a closer look at this value
79

80
00:05:58,340 --> 00:05:59,130
n here.
80

81
00:05:59,150 --> 00:05:59,420
Right.
81

82
00:05:59,450 --> 00:06:06,580
n is just a variable and it's going to keep track of how often our for loop has run.
82

83
00:06:06,590 --> 00:06:15,440
So if I say "'Hello World', n", then I can see what the value is of the variable n each time the
83

84
00:06:15,440 --> 00:06:16,050
loop is run.
84

85
00:06:16,070 --> 00:06:17,200
So it starts at zero.
85

86
00:06:17,270 --> 00:06:20,830
Programmers like to start counting from zero.
86

87
00:06:21,050 --> 00:06:23,900
And that's the very first time the loop runs.
87

88
00:06:23,900 --> 00:06:28,400
Then this print statement is executed another time.
88

89
00:06:28,490 --> 00:06:33,410
So second time, third time, fourth time, fifth time.
89

90
00:06:33,590 --> 00:06:37,010
And at this point the loop stops.
90

91
00:06:37,010 --> 00:06:38,890
Right.
91

92
00:06:39,080 --> 00:06:43,620
Show you that - "print(
92

93
00:06:43,840 --> 00:06:46,240
'End of Loop')".
93

94
00:06:46,770 --> 00:06:49,170
So the Python program will come in here
94

95
00:06:49,240 --> 00:06:53,460
and execute whatever's inside the loop, and you can tell what's inside
95

96
00:06:53,460 --> 00:06:59,850
by the spacing, a predefined number of times, in this case five times.
96

97
00:06:59,850 --> 00:07:00,320
Right.
97

98
00:07:00,330 --> 00:07:03,630
0, 1, 2, 3 ,4 .
98

99
00:07:03,630 --> 00:07:07,380
Now we can call a little counter variable here and we can call it "i",
99

100
00:07:07,380 --> 00:07:09,960
this is another one that is often used.
100

101
00:07:09,960 --> 00:07:13,240
So if I call it "i", I get exactly the same result.
101

102
00:07:13,290 --> 00:07:15,180
It really doesn't matter what you call it.
102

103
00:07:15,180 --> 00:07:19,380
You can call it counter. As long as you're consistent,
103

104
00:07:19,380 --> 00:07:28,840
you can access the variable, the looping counter, inside of the loop by its name. All right,
104

105
00:07:28,840 --> 00:07:30,710
so that's the that's the for loop.
105

106
00:07:30,880 --> 00:07:38,590
It executes a predefined number of times and it's got this very, very simple syntax "for n in range",
106

107
00:07:38,650 --> 00:07:40,460
And then some number here.
107

108
00:07:40,480 --> 00:07:45,490
So this is how often times you want to execute the loop. With that out of the way,
108

109
00:07:45,520 --> 00:07:53,620
let me show you another type of loop. This type of loop is also very, very common.
109

110
00:07:53,800 --> 00:07:57,510
And this is the so-called "while" loop.
110

111
00:08:00,360 --> 00:08:03,350
A while loop works a little differently.
111

112
00:08:03,480 --> 00:08:04,130
Right.
112

113
00:08:04,140 --> 00:08:08,210
It has a condition that it checks every time it runs, right.
113

114
00:08:08,220 --> 00:08:14,610
So it will check the condition and then if that condition holds, it's going to run the code inside the
114

115
00:08:14,610 --> 00:08:19,890
loop and it's going to continue doing that until the condition fails.
115

116
00:08:19,890 --> 00:08:27,870
So if I have a counter and I say it's equal to zero and then I can write my while loop like this I can
116

117
00:08:27,870 --> 00:08:39,170
say "while", which is a key word and say, counter is smaller than, I don't know what, 7, colon print
117

118
00:08:41,690 --> 00:08:42,350
Counting
118

119
00:08:47,580 --> 00:08:48,320
counter.
119

120
00:08:48,830 --> 00:08:57,910
So print the value of my counter inside my loop and then I'll say "counter = counter + 1".
120

121
00:08:57,910 --> 00:09:04,190
So I'm going to increment my counter variable by 1 every time the loop runs.
121

122
00:09:06,280 --> 00:09:12,850
And then when I finish with the loop I'll print something else.
122

123
00:09:12,850 --> 00:09:14,190
Yeah.
123

124
00:09:14,200 --> 00:09:17,260
"Ready or not, here I come"!.
124

125
00:09:18,590 --> 00:09:23,260
Yeah let's make that loop a little bit more menacing than the last one.
125

126
00:09:23,260 --> 00:09:33,640
So if I hit Shift+Enter now, I can see my print statement inside my loop executed seven times,
126

127
00:09:33,640 --> 00:09:39,410
right? Starts at zero and executes it until our condition fails.
127

128
00:09:39,420 --> 00:09:46,630
This is the condition, so whatever follows the while keyword is the condition that's checked.
128

129
00:09:47,220 --> 00:09:50,080
And this fails when counter is equal to seven.
129

130
00:09:50,200 --> 00:09:53,280
Seven is equal to seven, it's not smaller than seven.
130

131
00:09:53,340 --> 00:09:56,720
So this will be false at this point.
131

132
00:09:57,000 --> 00:10:02,770
The loop terminates and the code inside is not executed anymore.
132

133
00:10:03,060 --> 00:10:06,810
And we jump to our print statement below.
133

134
00:10:06,810 --> 00:10:09,430
Right, this one "Ready or not, here I come!".
134

135
00:10:09,780 --> 00:10:11,780
And this is what we're seeing here.
135

136
00:10:13,140 --> 00:10:21,020
Again, I can accomplish the very, very same thing as with the for loop so I can execute it five times.
136

137
00:10:21,090 --> 00:10:27,530
So if you want you can also execute a while loop a predefined number of times.
137

138
00:10:27,610 --> 00:10:31,510
Yeah there's a small catch, there's a small gotcha
138

139
00:10:31,650 --> 00:10:37,250
that can happen with while loops that you won't get with for loops.
139

140
00:10:37,250 --> 00:10:44,100
Any guess what this gotcha is? Any guess what it is that can trip you up and where you can shoot yourself
140

141
00:10:44,100 --> 00:10:44,580
in the foot?
141

142
00:10:47,440 --> 00:10:52,920
So with while loops you can get into a situation where they don't stop, where they don't terminate.
142

143
00:10:52,920 --> 00:11:01,980
So, for example, if I had made a typo here and instead of that plus I had hit minus, then my loop would
143

144
00:11:01,980 --> 00:11:05,770
actually run forever, right, because it would start at a zero,
144

145
00:11:05,770 --> 00:11:14,270
then when it reaches this line my counter would go to negative 1, then would come here and go to negative
145

146
00:11:14,270 --> 00:11:17,420
2 and then here negative 3.
146

147
00:11:17,420 --> 00:11:20,830
And this thing would just continue going, right?
147

148
00:11:20,960 --> 00:11:28,370
Which is clearly not my intention, right? It would continue going and cause a lot of problems.
148

149
00:11:28,400 --> 00:11:37,430
So with while loops you have to be careful that you don't accidentally write an infinite loop.
149

150
00:11:37,940 --> 00:11:45,080
So, for loops by their very nature run a predefined number of times, while loops run while a certain condition
150

151
00:11:45,080 --> 00:11:46,700
holds true.
151

152
00:11:46,910 --> 00:11:53,480
And this is where you gotta be, gotta be careful. So with your while loops, you've got to make sure they terminate
152

153
00:11:54,430 --> 00:11:59,940
and the easiest way to remember this is with an old programming joke.
153

154
00:11:59,980 --> 00:12:01,810
Yeah, that goes something like this.
154

155
00:12:02,060 --> 00:12:09,440
A programmer once said to his wife "Honey I'm heading to the supermarket to buy some groceries", to which
155

156
00:12:09,440 --> 00:12:13,710
his wife responded "While you're there, buy some milk".
156

157
00:12:13,710 --> 00:12:16,460
And alas he never returned home again.
157

158
00:12:17,510 --> 00:12:17,900
Oh.
158

159
00:12:17,900 --> 00:12:19,130
Crickets.
159

160
00:12:19,460 --> 00:12:21,220
Back to gradient descent.
160

161
00:12:21,440 --> 00:12:26,330
Let's tackle that in the new cell here at the bottom.
161

162
00:12:26,330 --> 00:12:30,850
The thing with gradient descent is that we need a couple of ingredients right.
162

163
00:12:30,860 --> 00:12:34,170
We need a starting point.
163

164
00:12:34,190 --> 00:12:41,990
Then we need a learning rate, and we're gonna need some maybe temporary value to hold onto something
164

165
00:12:42,200 --> 00:12:45,120
while our program is executing.
165

166
00:12:45,230 --> 00:12:47,720
So I'm gonna create these three things here.
166

167
00:12:47,750 --> 00:12:54,600
I'm going to say "new_x" which is gonna be our starting point, I'm going to set it equal to 3, I'm going to start with
167

168
00:12:54,680 --> 00:12:57,050
3 as the starting point.
168

169
00:12:57,090 --> 00:13:03,440
I'm going to say "previous_x" and this is gonna be my temp value if you will.
169

170
00:13:03,530 --> 00:13:06,300
That only matters inside of the loop.
170

171
00:13:07,180 --> 00:13:11,700
And then I'm going to also specify a learning rate.
171

172
00:13:11,720 --> 00:13:11,990
Yeah.
172

173
00:13:12,010 --> 00:13:14,630
Or gamma or whatever you call it.
173

174
00:13:14,680 --> 00:13:18,280
So I'll call it a step multiplier
174

175
00:13:22,310 --> 00:13:27,900
and I'll set it equal to 0.1. Now it's time to write that loop.
175

176
00:13:27,910 --> 00:13:36,930
It's gonna be a for loop for us. So I'm going to say for and in range and maybe start at 30,
176

177
00:13:37,480 --> 00:13:39,640
colon,
177

178
00:13:39,640 --> 00:13:40,880
and now for the first step -
178

179
00:13:40,930 --> 00:13:43,040
What's the first thing that we have to do?
179

180
00:13:43,510 --> 00:13:46,020
Well, we have to make a guess, right?
180

181
00:13:46,030 --> 00:13:47,770
We have to make some prediction.
181

182
00:13:47,770 --> 00:13:50,930
This is step one of the machine learning process.
182

183
00:13:50,980 --> 00:13:59,280
So I'm going to take our temp value, previous_x and I'm going to set it equal to our random guess.
183

184
00:13:59,310 --> 00:13:59,760
Yeah.
184

185
00:14:00,000 --> 00:14:01,990
new_x = 3
185

186
00:14:02,220 --> 00:14:06,970
Three was a random guess, just our starting point for our gradient descent.
186

187
00:14:06,970 --> 00:14:08,570
I'm going to set them equal to each other.
187

188
00:14:09,860 --> 00:14:16,130
Now we get to step two. Step two is calculating the error because we need to know how far off we were.
188

189
00:14:16,650 --> 00:14:18,010
From the previous lesson,
189

190
00:14:18,050 --> 00:14:28,220
you will know that the steepness of the slope tells us how far off we are, right, from the minimum, because
190

191
00:14:28,220 --> 00:14:30,310
at the minimum the slope is equal to zero.
191

192
00:14:31,100 --> 00:14:35,210
And everywhere else it's equal to some number that isn't zero.
192

193
00:14:35,250 --> 00:14:47,470
So our gradient is gonna be equal to df of the previous_x.
193

194
00:14:47,510 --> 00:14:47,850
Yeah.
194

195
00:14:48,280 --> 00:14:56,140
So we're gonna call our derivative function. I'm going to pass in the temp value.
195

196
00:14:56,170 --> 00:15:05,570
So at the point where we are, in our function, I'm going to store the the slope, yeah, at this point in a variable
196

197
00:15:05,570 --> 00:15:12,380
called gradient. So one thing you might ask at this point is -  
Why is calculating the gradient
197

198
00:15:12,420 --> 00:15:14,330
step two or calculating the error?
198

199
00:15:14,340 --> 00:15:16,770
What's the link between those two things?
199

200
00:15:17,100 --> 00:15:25,430
And the way to think about it is that the further away we are from our minimum, the steeper our slope.
200

201
00:15:25,440 --> 00:15:33,300
So if the slope is very, very steep then it's indicative of being very, very far away from where we want
201

202
00:15:33,300 --> 00:15:37,380
to be. A steep slope means that we've got a high error.
202

203
00:15:37,440 --> 00:15:42,590
And if the slope is zero or close to it then our error is small.
203

204
00:15:44,390 --> 00:15:47,420
And now it's time for that adjustment step, for that learning step.
204

205
00:15:48,380 --> 00:16:00,770
So the new value of x is gonna be equal to the previous value of x minus, because we get to go down the
205

206
00:16:00,770 --> 00:16:04,310
hill, minus our step multiplier
206

207
00:16:07,750 --> 00:16:17,960
times the slope, times the gradient and remember this is the value of the slope at the previous value
207

208
00:16:17,960 --> 00:16:19,400
of x.
208

209
00:16:19,450 --> 00:16:24,820
So what we're doing here is we're taking a step that's proportional to the negative of the gradient
209

210
00:16:24,820 --> 00:16:27,660
of the function at the point that we're at.
210

211
00:16:28,600 --> 00:16:35,680
And then we're subtracting from the previous x value because we want to move against the gradient towards
211

212
00:16:35,680 --> 00:16:41,130
the minimum and this is where the learning in machine learning takes place.
212

213
00:16:42,260 --> 00:16:49,470
So this loop is going to run 30 times and after it's finished let's print out our results.
213

214
00:16:49,470 --> 00:16:55,350
So I'm going to say "Local minimum occurs at",
214

215
00:16:58,550 --> 00:17:01,350
at what? Well the new value of x, right?
215

216
00:17:01,350 --> 00:17:08,290
Because that's what we're updating in our for loop, and we're gonna print out the slope.
216

217
00:17:08,420 --> 00:17:08,660
Yeah.
217

218
00:17:08,820 --> 00:17:11,370
So we just have to make sure our slope is close to zero.
218

219
00:17:11,370 --> 00:17:12,140
Right?
219

220
00:17:12,270 --> 00:17:16,410
Or the value of df(x)
220

221
00:17:16,590 --> 00:17:19,290
Yeah.
221

222
00:17:19,490 --> 00:17:24,820
yeah, "at this point".
222

223
00:17:24,990 --> 00:17:36,450
So this is gonna be our derivative function and as an input it's gonna get the latest value of x. Finally
223

224
00:17:37,190 --> 00:17:40,650
we're going to print out what the what the cost is at this point.
224

225
00:17:40,680 --> 00:17:49,290
So this is the f(x) value or cost at this point is,
225

226
00:17:52,030 --> 00:18:02,710
and this is gonna be our cost function at the point where the cost is lowest. Now, before I run this,
226

227
00:18:02,730 --> 00:18:12,120
make sure you've got a plus sign here because if you ever have to go to "Restart and Run All" or "Run All
227

228
00:18:12,120 --> 00:18:12,890
Above",
228

229
00:18:12,930 --> 00:18:13,480
yeah,
229

230
00:18:13,710 --> 00:18:18,150
then you may want to make sure that this loop doesn't continue going.
230

231
00:18:18,210 --> 00:18:19,820
I just caught myself out there.
231

232
00:18:19,980 --> 00:18:30,800
So I'm going to hit Shift+Enter now here and I get my print statements shooting off the results of our gradient descent.
232

233
00:18:30,960 --> 00:18:32,060
So, what can we learn from this?
233

234
00:18:32,070 --> 00:18:36,890
What can we deduce from the values that we're seeing here?
234

235
00:18:38,330 --> 00:18:43,340
Well, the first thing is that we can see that they're approximations, right?
235

236
00:18:43,340 --> 00:18:45,700
This isn't an exact value.
236

237
00:18:45,710 --> 00:18:47,920
We're not getting a very clean answer here.
237

238
00:18:49,360 --> 00:18:55,930
But that might be the case because maybe we haven't run our loop often enough.
238

239
00:18:55,930 --> 00:19:06,490
So if I increase the value here from say 30 to 50 let's see what happens with our values that we get
239

240
00:19:06,490 --> 00:19:14,570
printed out. So one thing that we're seeing is that our slope is getting a lot closer to zero here than
240

241
00:19:14,570 --> 00:19:15,610
before.
241

242
00:19:15,620 --> 00:19:23,960
The second thing is is that this value here on f(x) is also getting a lot more precise and so is our
242

243
00:19:23,960 --> 00:19:24,730
new value of x.
243

244
00:19:24,740 --> 00:19:31,890
So it's getting much, much closer to a -0.5.
244

245
00:19:31,920 --> 00:19:36,960
Yeah if I run this 500 times, let's see what happens.
245

246
00:19:38,590 --> 00:19:45,060
So, as you can see, we're converging on this local minimum by brute force, right?
246

247
00:19:45,070 --> 00:19:48,790
We didn't solve our cost function here analytically.
247

248
00:19:48,790 --> 00:19:56,010
What we're doing is we're iterating and going down that valley, that cost function
248

249
00:19:56,380 --> 00:20:01,970
until we reach the minimum point and at the minimum our slope is equal to zero,
249

250
00:20:02,050 --> 00:20:05,250
our cost is equal to 0.75.
250

251
00:20:05,470 --> 00:20:11,960
And this is when the x is equal to -0.5.
251

252
00:20:12,010 --> 00:20:15,880
So obviously you can run this thing a thousand times or what have you.
252

253
00:20:15,880 --> 00:20:16,460
Right?
253

254
00:20:16,600 --> 00:20:23,200
But often times you actually know ahead of time how precise a calculation you need, right. from the resource
254

255
00:20:23,200 --> 00:20:25,120
management point of view.
255

256
00:20:25,130 --> 00:20:32,590
What you can actually do is you can tell the loop to stop running once a certain level of precision
256

257
00:20:32,590 --> 00:20:33,820
is met.
257

258
00:20:33,820 --> 00:20:35,920
And I'm sure you're looking up.
258

259
00:20:35,950 --> 00:20:36,180
Yeah.
259

260
00:20:36,190 --> 00:20:41,500
You scrolling up and you looking at this while loop here and you're thinking ah yeah the while loop seems
260

261
00:20:41,500 --> 00:20:42,680
ideal for this, right.
261

262
00:20:42,700 --> 00:20:49,930
We can run the while loop as long as our calculation is within a certain level of precision - and you'd
262

263
00:20:49,930 --> 00:20:50,500
be right.
263

264
00:20:50,500 --> 00:20:57,430
That's exactly something you could implement if you wanted to, with the structure of the while loop.
264

265
00:20:57,430 --> 00:21:02,500
Let me show you how to do this with the for loop as well.
265

266
00:21:02,620 --> 00:21:09,190
We're going to modify our code here a little bit to include a cutoff point for a certain level of precision
266

267
00:21:10,300 --> 00:21:17,480
and the way I'm going to do this is by adding another variable up top and say "precision"
267

268
00:21:17,840 --> 00:21:24,820
is gonna be equal to 0.0001.
268

269
00:21:24,880 --> 00:21:25,160
Yeah.
269

270
00:21:25,190 --> 00:21:30,700
So this is how precise I want my answer to be.
270

271
00:21:30,770 --> 00:21:33,670
Now, where does this come into play?
271

272
00:21:33,680 --> 00:21:41,030
Well, what we're interested in, in with our precision estimate is what's the difference between the new
272

273
00:21:41,030 --> 00:21:42,450
and the old x, right.x
273

274
00:21:42,800 --> 00:21:49,280
So if those two are getting closer and closer and closer together then our calculation is getting much
274

275
00:21:49,280 --> 00:21:51,350
more precise.
275

276
00:21:51,380 --> 00:22:01,130
So what we can do is we can say well the step size is gonna be the difference between our new x minus
276

277
00:22:01,220 --> 00:22:03,110
our previous x.
277

278
00:22:03,140 --> 00:22:05,430
Yeah that's gonna be the step size.
278

279
00:22:05,540 --> 00:22:12,110
And just to make sure that step size is always a positive number, we're going to say well what we care
279

280
00:22:12,110 --> 00:22:22,250
about is actually the absolute value of our step size and, uh, now, change the number of times this loop
280

281
00:22:22,250 --> 00:22:25,250
runs to maybe 10.
281

282
00:22:25,260 --> 00:22:25,650
Yeah.
282

283
00:22:25,730 --> 00:22:32,690
And I'm going to print out the step size, just we can see how it it evolves over time as the as the loop
283

284
00:22:32,690 --> 00:22:34,180
runs.
284

285
00:22:34,200 --> 00:22:34,960
So let me run this.
285

286
00:22:34,970 --> 00:22:36,530
Let me press Shift+Enter here.
286

287
00:22:37,920 --> 00:22:42,870
And we can see here our step size initially starts out with 0.7.
287

288
00:22:42,870 --> 00:22:43,730
And then it decreases.
288

289
00:22:43,740 --> 00:22:44,040
Right?
289

290
00:22:44,040 --> 00:22:48,490
The new x and the old x are getting closer and closer together.
290

291
00:22:48,690 --> 00:22:56,160
So we can see here our step size is decreasing.
291

292
00:22:56,330 --> 00:22:59,570
Commenting out this print statement so it doesn't execute anymore.
292

293
00:23:00,070 --> 00:23:04,280
And I'm going to add the condition for terminating this for loop.
293

294
00:23:04,360 --> 00:23:13,360
Yeah I'm going to say well if the step size is smaller than the precision,
294

295
00:23:13,360 --> 00:23:20,500
so in other words - if the difference between the new x and the previous x is smaller than 
295

296
00:23:20,500 --> 00:23:27,280
0.0001, then we can terminate our loop, then we can stop with our calculations.
296

297
00:23:27,630 --> 00:23:33,670
So, I'm going to put a colon there and then the Python keyword for stopping this loop
297

298
00:23:33,810 --> 00:23:35,780
it's called break.
298

299
00:23:35,970 --> 00:23:37,530
We'll leave it at that.
299

300
00:23:38,220 --> 00:23:43,540
And uh I'm going to say, well, run 500 times.
300

301
00:23:43,540 --> 00:23:55,790
Yeah, for loop run from 0 to 500, but if the step size is smaller than our predetermined precision,
301

302
00:23:56,930 --> 00:24:04,050
then stop running. Let's see how often our loop runs according to this logic.
302

303
00:24:04,670 --> 00:24:06,940
Well, we don't know, right?
303

304
00:24:06,970 --> 00:24:20,310
Could have run any number of times. We probably have to print the value of n - so, print("Loop ran this
304

305
00:24:20,790 --> 00:24:21,900
many times:", n)
305

306
00:24:30,230 --> 00:24:32,830
Forgot the s.
306

307
00:24:33,150 --> 00:24:37,890
So, given these constraints, our loop ran 40 times.
307

308
00:24:37,980 --> 00:24:40,140
It's actually not that much.
308

309
00:24:40,170 --> 00:24:41,450
Not that many times.
309

310
00:24:41,640 --> 00:24:48,230
If we add an extra zero here on the precision that we're looking for and press Shift+Enter again, we
310

311
00:24:48,240 --> 00:24:52,600
can see that it ran 50 times, so it actually never gets up to 500.
311

312
00:24:52,600 --> 00:24:53,190
Yeah.
312

313
00:24:53,310 --> 00:25:00,340
Doesn't, doesn't go up all that way and that's because it reaches that terminating condition,
313

314
00:25:00,450 --> 00:25:06,840
this break statement, a lot sooner, but still the way we wrote this code we have two conditions where
314

315
00:25:06,840 --> 00:25:07,880
it can stop.
315

316
00:25:08,100 --> 00:25:16,320
It can either reach 500 and it will stop there or when it reaches the minimum and that step size becomes
316

317
00:25:16,320 --> 00:25:17,800
very, very, very small
317

318
00:25:17,910 --> 00:25:25,200
then it can also terminate. Now running the Python loop and calculating the minimum is very well and
318

319
00:25:25,200 --> 00:25:30,920
good, but I'm a very, very visual person and I'm sure you might be too.
319

320
00:25:30,920 --> 00:25:37,140
So I find graphing things very, very helpful. The way we're gonna go about graphing it is
320

321
00:25:37,140 --> 00:25:46,520
first off we have to kind of keep track of all the values that we've calculated inside of our loop.
321

322
00:25:46,680 --> 00:25:49,080
So we're going to create two lists.
322

323
00:25:49,140 --> 00:25:49,380
Yeah.
323

324
00:25:49,380 --> 00:25:53,890
Two Python lists - one of them is going to hold onto our x values,
324

325
00:25:54,030 --> 00:26:00,720
so it's gonna be a list and it's going to contain the new x values. And the other thing is I'm going
325

326
00:26:00,720 --> 00:26:04,560
to also create a list for all the slopes.
326

327
00:26:04,620 --> 00:26:12,300
So I'm going to call this "slope_list" and it's going to contain whatever value our derivative has at
327

328
00:26:12,630 --> 00:26:14,210
this x position.
328

329
00:26:14,220 --> 00:26:14,460
Yeah.
329

330
00:26:17,260 --> 00:26:21,370
Now, within our loop we're actually doing these calculations anyhow.
330

331
00:26:21,370 --> 00:26:30,440
So all we need to do is we need to append the x values and the slope values to our list.
331

332
00:26:30,490 --> 00:26:38,650
So I'm going to say "x_list.append()" to add a new value to it of the new x value.
332

333
00:26:38,650 --> 00:26:46,700
And this is the x value that we've updated after we've taken our step down the cost function.
333

334
00:26:46,810 --> 00:26:56,590
So I'm going to append this value to our list and also for our slope list we're going to append
334

335
00:27:00,620 --> 00:27:06,110
the output from our derivative function at the new x value.
335

336
00:27:06,110 --> 00:27:06,940
Yeah.
336

337
00:27:07,220 --> 00:27:08,510
And that's it.
337

338
00:27:08,510 --> 00:27:11,040
That gives us the basis for plotting out charts.
338

339
00:27:11,120 --> 00:27:12,040
So let's do that now.
339

340
00:27:12,590 --> 00:27:23,890
I'm gonna go up here and I'm actually going to copy this cell here, I'm going to go "Edit" > "Copy Cell" and I'm going to reuse
340

341
00:27:23,890 --> 00:27:25,730
it down here a lot of this code.
341

342
00:27:25,750 --> 00:27:36,730
So I'm going to place the cell above. I'm going to edit my comment to, uh, say that we're gonna superimpose the
342

343
00:27:38,890 --> 00:27:40,540
gradient descent calculations.
343

344
00:27:40,570 --> 00:27:40,760
Yeah
344

345
00:27:48,910 --> 00:27:57,520
So this is the goal. Now we've got two charts and what we're gonna do is we're gonna add a scatter plot on
345

346
00:27:57,520 --> 00:28:01,090
top of these with the data that we've captured from our loop.
346

347
00:28:01,630 --> 00:28:02,520
Here's how we're gonna do it.
347

348
00:28:08,040 --> 00:28:09,370
For our first chart,
348

349
00:28:09,390 --> 00:28:15,930
we're gonna say "plt.scatter()" and then we have to supply some arguments.
349

350
00:28:16,350 --> 00:28:27,520
So on the x axis, it's gonna be our list of x values and for our y axis we want to feed our list of values
350

351
00:28:28,060 --> 00:28:31,100
into our cost function.
351

352
00:28:31,160 --> 00:28:31,350
Right?
352

353
00:28:31,360 --> 00:28:33,620
So this is our f(x).
353

354
00:28:33,760 --> 00:28:40,870
Now you might think I can actually just put the x_list in here and press Shift+Enter but this isn't
354

355
00:28:40,870 --> 00:28:42,810
going to work. I'm going to get an error.
355

356
00:28:43,240 --> 00:28:51,340
Yeah, and this is because our function, the way that we've written it cannot process a list.
356

357
00:28:51,340 --> 00:28:55,630
It's unable to process this list as it is.
357

358
00:28:55,630 --> 00:29:01,960
So I'm gonna have to do a little type conversion first. So I'm going to create a variable called values
358

359
00:29:02,050 --> 00:29:11,170
and set it equal to a numpy array which is gonna take as an argument our list of x values.
359

360
00:29:11,320 --> 00:29:22,360
So, our function can work with an array but it can't work with a list and when I press Shift+Enter we should
360

361
00:29:22,360 --> 00:29:29,500
see that now we have a little scatter plot on top of our graph.
361

362
00:29:29,500 --> 00:29:33,640
But in terms of data visualization that was very, very poor.
362

363
00:29:33,760 --> 00:29:34,030
Right?
363

364
00:29:34,150 --> 00:29:43,550
So I'm going to say the color of these dots should be red.
364

365
00:29:43,660 --> 00:29:52,000
They should be a lot larger so that the size equal to maybe 100 and give them a little bit of transparency.
365

366
00:29:52,000 --> 00:29:55,870
So I'm going to say the alpha should be equal to maybe 0.6.
366

367
00:29:55,870 --> 00:29:58,690
See how that looks.
367

368
00:29:58,820 --> 00:30:00,190
It's looking a lot better.
368

369
00:30:00,340 --> 00:30:00,510
Yeah.
369

370
00:30:00,520 --> 00:30:10,740
So we can see here - as our algorithm runs going closer and closer to this minimum. But we can also show
370

371
00:30:10,740 --> 00:30:13,940
this on our second chart as well.
371

372
00:30:13,980 --> 00:30:14,250
Right.
372

373
00:30:14,280 --> 00:30:24,270
So we can see how we're inching closer to where the slope is zero on this right hand chart and we can
373

374
00:30:24,270 --> 00:30:29,880
do that by making use of the other list that we've captured.
374

375
00:30:29,880 --> 00:30:39,930
So, in this case it is a little bit simpler because we just have to write "plt.scatter(
375

376
00:30:40,740 --> 00:30:47,670
x_list)", x values still the same, but for the y values we've done a bit of a calculation already, so
376

377
00:30:47,670 --> 00:30:52,800
we can say "slope_list"
377

378
00:30:57,000 --> 00:30:59,540
and let's also make it a red color,
378

379
00:31:02,920 --> 00:31:13,290
make the dots big, size 100, and alpha 0.5 or something.
379

380
00:31:13,290 --> 00:31:17,490
Let's see how it goes.
380

381
00:31:17,930 --> 00:31:26,030
It's looking not bad, but I do wonder if there maybe should be some transparency on the line itself.
381

382
00:31:26,030 --> 00:31:36,450
So if this thing had an alpha of say 0.6, would it look a bit better? Yeah.
382

383
00:31:36,780 --> 00:31:45,960
Yeah this looks this looks better. Let's do the same thing with our plot at the top as well.
383

384
00:31:46,030 --> 00:31:52,110
Let's give this an alpha of maybe 0.6 as well.
384

385
00:31:52,110 --> 00:31:55,520
See or 0.7 perhaps.
385

386
00:31:58,320 --> 00:31:59,420
0.8
386

387
00:31:59,580 --> 00:32:00,050
Let's try.
387

388
00:32:00,870 --> 00:32:01,270
Yeah.
388

389
00:32:01,350 --> 00:32:03,720
This is looking pretty good.
389

390
00:32:03,720 --> 00:32:11,700
So you can see here that now we have our scatter plot superimposed on our derivative function and it
390

391
00:32:11,700 --> 00:32:15,930
stops when the slope is equal to zero.
391

392
00:32:15,930 --> 00:32:21,900
And on the regular cost function we're moving down and down and down and down into the minimum at the
392

393
00:32:21,900 --> 00:32:24,900
bottom of this parabola.
393

394
00:32:24,940 --> 00:32:30,420
You know the cool thing is that we can even zoom in a little bit and we can even do a little close up
394

395
00:32:30,960 --> 00:32:33,300
of our slope.
395

396
00:32:33,300 --> 00:32:34,290
Let me show you what I mean.
396

397
00:32:34,710 --> 00:32:46,920
So if I take this bit of code here, copy it and paste it and say chart number three, and call this
397

398
00:32:46,920 --> 00:32:47,280
"Derivative
398

399
00:32:50,020 --> 00:33:03,190
(Close up)" and then I change the title and say "Gradient Descent (Close up)" might get rid of the y label.
399

400
00:33:03,580 --> 00:33:11,380
don't need that, I'm going to keep the grid but I'm going to change what's on the axes and go from say 
400

401
00:33:12,400 --> 00:33:18,550
0.55 to -0.2.
401

402
00:33:18,550 --> 00:33:24,200
So, zooming in here on the x axis and on the y axis I'm going to do the same,
402

403
00:33:24,200 --> 00:33:33,660
I'm going to zoom in from 0.3 to 0.8. I'm still gonna leave it sky blue.
403

404
00:33:33,660 --> 00:33:41,710
Change the linewidth to 6, alpha is 0.8.
404

405
00:33:41,920 --> 00:33:48,550
Change these values around a little bit to make it a bit more distinct, make the dots a little bigger.
405

406
00:33:48,700 --> 00:34:00,670
And if a press Shift+Enter now, then nothing will happen because I need to adjust my subplot.
406

407
00:34:00,680 --> 00:34:00,850
Right.
407

408
00:34:00,860 --> 00:34:02,300
I'm adding a third plot here.
408

409
00:34:02,330 --> 00:34:09,090
So I have to make sure that I have in this case, what, three columns, right, I've got three charts.
409

410
00:34:09,140 --> 00:34:12,390
This is chart number three of the lot.
410

411
00:34:12,440 --> 00:34:18,710
And this is gonna be also edited to chart number two, right?
411

412
00:34:19,750 --> 00:34:26,840
On the three column subplot, and same with this. This chart number one on the three column subplot.
412

413
00:34:27,030 --> 00:34:29,890
And it's now that I can run this.
413

414
00:34:29,890 --> 00:34:32,270
See what happens.
414

415
00:34:32,570 --> 00:34:33,100
Huh.
415

416
00:34:33,160 --> 00:34:34,750
So I'd say this is pretty good right.
416

417
00:34:34,780 --> 00:34:42,190
We've got a close up here where we can actually watch the gradient descent converge upon that zero value.
417

418
00:34:42,190 --> 00:34:47,980
And you can see those steps getting smaller and smaller and smaller and smaller as we're getting closer
418

419
00:34:47,980 --> 00:34:49,050
to our goal.
419

420
00:34:49,060 --> 00:34:51,530
I think this is incredibly cool.
420

421
00:34:51,700 --> 00:34:54,300
The charts look a little bit squished.
421

422
00:34:54,640 --> 00:35:01,150
Maybe what I'll do is I'll change this from 15 to I don't know 20 on the width.
422

423
00:35:01,150 --> 00:35:09,040
See if that helps. Yeah that definitely looks a little better.
423

424
00:35:09,090 --> 00:35:10,350
Okay brilliant.
424

425
00:35:10,380 --> 00:35:12,990
We've done quite a lot of work in this lesson.
425

426
00:35:12,990 --> 00:35:21,360
This has been a long and difficult lesson but writing the code definitely helps us play around with
426

427
00:35:21,360 --> 00:35:22,800
the gradient descent.
427

428
00:35:22,800 --> 00:35:31,050
Yeah, because what we can do now is we can change a couple of these values and see how it behaves differently.
428

429
00:35:31,380 --> 00:35:41,010
So for example if instead of at 3, we start at -3 with our gradient descent.
429

430
00:35:41,010 --> 00:35:41,940
Let's take a look here.
430

431
00:35:41,950 --> 00:35:50,040
If I starting value is -3 and I rerun the loop and rerun all the calculations and rerun the
431

432
00:35:50,040 --> 00:35:56,040
graphs then we can see how the gradient descent comes in from the other side.
432

433
00:35:56,040 --> 00:36:01,250
So in this case it's from the bottom here instead of from the top.
433

434
00:36:01,530 --> 00:36:08,330
This is really, really cool in being able to actually play with the algorithm.
434

435
00:36:08,330 --> 00:36:16,140
And this is the advantage of writing all the code out and actually running it and rerunning it to see
435

436
00:36:16,140 --> 00:36:23,400
how differently it behaves. Because not only can we change the starting point but we can also change
436

437
00:36:23,660 --> 00:36:25,280
say how many steps we're taking, right?
437

438
00:36:25,290 --> 00:36:33,030
So if we rerun our algorithm to only run about 10 times instead of the usual amount then we can see
438

439
00:36:33,030 --> 00:36:36,380
how we're not getting that close to the minimum.
439

440
00:36:36,380 --> 00:36:36,660
Right?
440

441
00:36:36,690 --> 00:36:41,780
So we should be getting about here, but we're actually not reaching it.
441

442
00:36:41,960 --> 00:36:44,360
So yeah, I think this is really really cool.
442

443
00:36:44,450 --> 00:36:50,400
And in the next couple of lessons we're going to be exploring a couple more of the idiosyncrasies and
443

444
00:36:50,400 --> 00:36:53,810
the strengths and weaknesses of this algorithm
444

445
00:36:53,810 --> 00:36:58,720
now that we've written it and graphed it. I'll see you there.