1
00:00:00,300 --> 00:00:03,033
Hello and welcome back to the course
on Machine Learning.

2
00:00:03,033 --> 00:00:05,900
Today we're talking about the upper
confidence bound.

3
00:00:05,900 --> 00:00:09,433
And the intuition
behind this algorithm of the.

4
00:00:09,433 --> 00:00:12,266
Reinforcement branch of machine learning.

5
00:00:12,266 --> 00:00:13,700
So let's get started.

6
00:00:13,700 --> 00:00:16,400
As we discussed previously,
the problem we are solving.

7
00:00:16,400 --> 00:00:19,766
Is a multi-armed bandit problem
where you've got

8
00:00:20,166 --> 00:00:23,633
five or more
or any number of slot machines, and

9
00:00:23,833 --> 00:00:27,833
you can bet your money in any one of them,
and you need to find out.

10
00:00:28,200 --> 00:00:31,266
How to bet to maximize.

11
00:00:31,266 --> 00:00:33,133
Your returns.

12
00:00:33,133 --> 00:00:36,466
And basically,
we agreed that behind every machine

13
00:00:36,466 --> 00:00:39,166
there is a certain distribution.

14
00:00:39,166 --> 00:00:42,733
And that's because you don't know
which of these distributions is optimal.

15
00:00:42,966 --> 00:00:43,366
You need.

16
00:00:43,366 --> 00:00:47,400
To combine exploration of these machines
with their.

17
00:00:47,400 --> 00:00:51,000
Exploitation in order to find out which.

18
00:00:51,000 --> 00:00:54,900
One of these machines is the best, and
then you can start exploiting that one.

19
00:00:55,566 --> 00:00:59,766
adds the more implication of this problem
is, of course, advertising.

20
00:00:59,766 --> 00:01:03,233
So if you have 5 or 10 or 50 or 500
different ads,

21
00:01:03,600 --> 00:01:07,166
how do you find out
which one is the best one?

22
00:01:07,400 --> 00:01:12,100
Of course, you can run just an AB test
and then use the results of that.

23
00:01:12,100 --> 00:01:14,400
But that means
you're doing the exploration

24
00:01:14,400 --> 00:01:16,133
and then you're doing the exploitation
separately.

25
00:01:16,133 --> 00:01:17,466
You're going to incur lots of costs.

26
00:01:17,466 --> 00:01:20,333
You're going to incur,
you're going to waste a lot of time.

27
00:01:20,333 --> 00:01:23,466
we want to combine
exploration, exploitation and get. To.

28
00:01:23,466 --> 00:01:29,733
The optimal result as soon as we can
and maximize the output of our efforts.

29
00:01:30,166 --> 00:01:30,533
All right.

30
00:01:30,533 --> 00:01:35,666
So, this is a quick
summary of the multi-armed bandit problem.

31
00:01:35,900 --> 00:01:38,100
So let's go through this very quickly
so we can get to.

32
00:01:38,100 --> 00:01:39,066
The fun stuff.

33
00:01:39,066 --> 00:01:40,466
So we have the arms.

34
00:01:40,466 --> 00:01:45,333
For example, arms are ads that we display,
each time a user comes to a web page,

35
00:01:45,600 --> 00:01:48,366
each time an ad is displayed
or a user visits this page,

36
00:01:48,366 --> 00:01:51,500
that's around, for each round. And.

37
00:01:52,200 --> 00:01:53,800
we choose. Which ads to display.

38
00:01:53,800 --> 00:01:57,966
So you can only display one ads,
like with, one, armed bandits.

39
00:01:57,966 --> 00:01:59,700
You can only pull one of those arms.

40
00:01:59,700 --> 00:02:02,666
You can only choose one machine
to bet on, at each round.

41
00:02:02,666 --> 00:02:06,466
And, ads AI gives a reward
whether it's a 0 or 1.

42
00:02:06,800 --> 00:02:12,400
and basically 
I of n t of n is equal to one

43
00:02:12,400 --> 00:02:15,666
if the user clicks on the ad and zero
if you didn't.

44
00:02:15,866 --> 00:02:18,266
These are didn't.
And our goal is to Maxwell total reward.

45
00:02:18,266 --> 00:02:19,633
We get over the many rounds.

46
00:02:19,633 --> 00:02:22,366
So that's basically what we're doing.

47
00:02:22,366 --> 00:02:26,400
And this is how the upper, confidence
bound algorithm works.

48
00:02:26,700 --> 00:02:27,333
And I.

49
00:02:27,333 --> 00:02:29,933
Won't go into too much detail on this
because.

50
00:02:29,933 --> 00:02:32,966
Actually, how the land is going
to, run you through this and.

51
00:02:33,200 --> 00:02:34,433
You're going to code.

52
00:02:34,433 --> 00:02:39,200
This, from scratch, in R,
and you can code this also in Python

53
00:02:39,600 --> 00:02:41,100
in the fully lectures, of course.

54
00:02:41,100 --> 00:02:42,533
So we're not going to waste.

55
00:02:42,533 --> 00:02:44,100
spend time on this.
We're actually going to get.

56
00:02:44,100 --> 00:02:46,033
To the essence of the algorithm. Right.

57
00:02:46,033 --> 00:02:50,833
So, let's get to the intuition part,
which is, how does it work?

58
00:02:50,833 --> 00:02:54,100
What what's actually happening in the
background when this algorithm is running?

59
00:02:54,600 --> 00:02:56,000
All right, so let's have a look.

60
00:02:56,000 --> 00:02:59,633
these are our slot machines,
or one armed bandits.

61
00:02:59,900 --> 00:03:01,500
And, they each

62
00:03:01,500 --> 00:03:05,400
one of them has a distribution behind, it
we want to find the best one.

63
00:03:05,400 --> 00:03:06,200
Right. Look at them.

64
00:03:06,200 --> 00:03:08,433
We can't tell which one it. Is,
but let's see.

65
00:03:08,433 --> 00:03:09,866
We do know. Let's see.

66
00:03:09,866 --> 00:03:11,533
We know the end result.

67
00:03:11,533 --> 00:03:14,333
just for argument's sake,
what would it look like?

68
00:03:14,333 --> 00:03:15,833
Well, this is.

69
00:03:15,833 --> 00:03:17,800
For instance, in this case, the.
Distribution.

70
00:03:17,800 --> 00:03:20,100
These are the distributions. Behind
those machines.

71
00:03:20,100 --> 00:03:22,933
You've got, you know, the machine.

72
00:03:22,933 --> 00:03:25,933
This is how they're spitting out
the results with these distributions.

73
00:03:25,933 --> 00:03:27,700
And just by looking at this you can

74
00:03:27,700 --> 00:03:30,700
you can tell right away
which one is the best machine.

75
00:03:30,733 --> 00:03:33,100
Which one would you bet your money on?

76
00:03:33,100 --> 00:03:36,300
constantly
if you were playing around with this one.

77
00:03:36,300 --> 00:03:36,533
Right.

78
00:03:36,533 --> 00:03:41,166
So right away here you can see that
this one has the best, return.

79
00:03:41,166 --> 00:03:43,033
And you would want to just,

80
00:03:43,033 --> 00:03:46,700
well, all the time, just bet on this one,
and your outcome would be the best.

81
00:03:47,433 --> 00:03:48,633
But we don't know that, right?

82
00:03:48,633 --> 00:03:49,933
We don't know that.

83
00:03:49,933 --> 00:03:53,566
And we want to find that out
in the process of playing

84
00:03:53,566 --> 00:03:57,266
these machines of
or using those ads that we're running,

85
00:03:57,533 --> 00:04:00,533
and find out, you know,
which one is getting the most clicks?

86
00:04:00,833 --> 00:04:04,166
we don't want to
we don't have the time, and money

87
00:04:04,333 --> 00:04:09,033
to do that exploration before
or the actual campaign is running.

88
00:04:09,033 --> 00:04:10,266
We want to do that in the process.

89
00:04:10,266 --> 00:04:13,700
We want to maximize our return
already from the very start.

90
00:04:13,833 --> 00:04:14,900
So how do we do that?

91
00:04:14,900 --> 00:04:19,200
Well, let's transfer these distributions
or the actual expected return

92
00:04:19,433 --> 00:04:21,533
from these distributions onto.
A vertical axis.

93
00:04:21,533 --> 00:04:23,800
So we're going to take these values

94
00:04:23,800 --> 00:04:26,800
and we're going to put them
onto a vertical axis over here.

95
00:04:26,800 --> 00:04:28,233
So there's our vertical axis.

96
00:04:28,233 --> 00:04:31,100
So for distribution one let's say that
what value is there for distribution.

97
00:04:31,100 --> 00:04:33,133
Two there was a value we could. Remember.

98
00:04:33,133 --> 00:04:35,700
It was lower distribution
three even lower to sure.

99
00:04:35,700 --> 00:04:39,066
For higher and have five the best right.

100
00:04:39,066 --> 00:04:42,300
So those are the expected

101
00:04:42,300 --> 00:04:45,800
values or returns
for each of those distribution.

102
00:04:45,800 --> 00:04:48,066
For each of those machine.
That's why our y axis.

103
00:04:48,066 --> 00:04:49,600
But again we don't know that.

104
00:04:49,600 --> 00:04:51,900
So, what how does this algorithm work.

105
00:04:51,900 --> 00:04:53,266
Well, it assumes.

106
00:04:53,266 --> 00:04:55,366
Some starting point.
For every distribution.

107
00:04:55,366 --> 00:04:56,300
It just assumes that.

108
00:04:56,300 --> 00:04:59,266
There is a certain starting, value that.

109
00:04:59,266 --> 00:05:02,633
Okay, let's just assume that 
because we can't distinguish, we can't.

110
00:05:02,733 --> 00:05:04,966
Discriminate. Against these machines
in any way.

111
00:05:04,966 --> 00:05:06,000
They all look the same.

112
00:05:06,000 --> 00:05:08,633
Let's assume
that they all have the same return.

113
00:05:08,633 --> 00:05:10,366
And let's put it on that level.

114
00:05:10,366 --> 00:05:14,100
Now then what the algorithm does is,

115
00:05:14,100 --> 00:05:17,466
those formulas
that, are behind the algorithm, they.

116
00:05:17,966 --> 00:05:19,833
create a.

117
00:05:19,833 --> 00:05:22,166
confidence band. And it's,

118
00:05:23,233 --> 00:05:24,066
it is

119
00:05:24,066 --> 00:05:27,466
designed
in such a way that with a very high level.

120
00:05:27,466 --> 00:05:28,466
Of certainty, that.

121
00:05:28,466 --> 00:05:31,866
Confidence band will include the actual,

122
00:05:32,333 --> 00:05:37,633
will include the actual return
or the actual expected return.

123
00:05:37,633 --> 00:05:40,633
So basically, the first couple of.

124
00:05:40,866 --> 00:05:43,300
rounds are going to be trial runs.

125
00:05:43,300 --> 00:05:45,600
So we're going to intentionally just.
Try out.

126
00:05:45,600 --> 00:05:51,166
The machines at least one time each
in order for us to be able to place this.

127
00:05:51,166 --> 00:05:54,066
Value here and come up with a confidence
band.

128
00:05:54,066 --> 00:05:55,500
Who's going to be very. Large.

129
00:05:55,500 --> 00:05:58,800
So at the very start, it's very large,
but it is designed specifically in a way

130
00:05:58,800 --> 00:06:04,100
that, the expected value,
which is this one over here

131
00:06:04,866 --> 00:06:09,100
will have very high level of confidence,
falls inside this confidence

132
00:06:09,466 --> 00:06:10,200
with a little high,

133
00:06:10,200 --> 00:06:13,133
with a very high degree of certainty,
falls inside this confidence bound,

134
00:06:13,133 --> 00:06:16,133
which is built around this,

135
00:06:16,133 --> 00:06:19,133
red empirical value which we have derived.

136
00:06:19,300 --> 00:06:21,000
And the very solid, it's
all the same. Right.

137
00:06:21,000 --> 00:06:22,800
So and then how does this algorithm work?

138
00:06:22,800 --> 00:06:25,066
Well, out of all of them, we pick the,

139
00:06:25,066 --> 00:06:27,766
the machine with the highest confidence
bound.

140
00:06:27,766 --> 00:06:29,900
Right now,
it can be any of these machines, right?

141
00:06:29,900 --> 00:06:32,400
They all have the same confidence bound
that we're talking about.

142
00:06:32,400 --> 00:06:34,133
The upper confidence bound.

143
00:06:34,133 --> 00:06:37,133
That's why the algorithm is called
upper confidence bound.

144
00:06:37,500 --> 00:06:39,333
and so we just going to pick
any one of them

145
00:06:39,333 --> 00:06:40,800
because it doesn't matter
which one we pick.

146
00:06:40,800 --> 00:06:43,400
Again we don't. Know these blue
these color. Lines.

147
00:06:43,400 --> 00:06:44,933
We don't know about them all. We see.

148
00:06:46,333 --> 00:06:47,900
as as the

149
00:06:47,900 --> 00:06:51,300
agent or as the person analyzing this.

150
00:06:51,300 --> 00:06:53,700
We only see these boxes and all.
They're all identical to us.

151
00:06:53,700 --> 00:06:55,900
So we just pick any one of them.
Let's say we pick this one.

152
00:06:55,900 --> 00:06:56,600
So what happens.

153
00:06:56,600 --> 00:06:57,466
Next is we actually.

154
00:06:57,466 --> 00:07:00,866
Pull that lever of that machine
and something happens,

155
00:07:00,866 --> 00:07:02,400
or we place that ad, right.

156
00:07:02,400 --> 00:07:05,700
So we display that ad next
and we want to see

157
00:07:05,733 --> 00:07:08,200
did the person click on it
or did the person not click on it.

158
00:07:08,200 --> 00:07:12,800
And in this case, 
the person didn't click on it.

159
00:07:12,800 --> 00:07:13,000
Right.

160
00:07:13,000 --> 00:07:18,466
So it went this red value goes down
because it's

161
00:07:18,600 --> 00:07:21,700
when now we have another observation
just for this machine that is added

162
00:07:21,700 --> 00:07:25,566
to the whole, 
sample of observations for this machine.

163
00:07:25,566 --> 00:07:28,533
And the value goes down because, well.

164
00:07:28,533 --> 00:07:29,833
All always this red.

165
00:07:29,833 --> 00:07:33,733
Value is like the observed average.

166
00:07:33,733 --> 00:07:35,900
The observed average.
Is going to according.

167
00:07:35,900 --> 00:07:37,266
To the law of large numbers, is always.

168
00:07:37,266 --> 00:07:39,333
Going to, In the long run is.

169
00:07:39,333 --> 00:07:41,100
Going to converge to the.

170
00:07:41,100 --> 00:07:46,800
Expected,
expected return or expected average

171
00:07:46,800 --> 00:07:50,366
or expected value for this distribution.

172
00:07:51,133 --> 00:07:53,600
So, therefore

173
00:07:53,600 --> 00:07:56,266
it is very likely
that this value is going to go down.

174
00:07:56,266 --> 00:07:57,833
And now because we.

175
00:07:57,833 --> 00:08:01,900
Have an extra observation,
the second thing happens is the confidence

176
00:08:01,900 --> 00:08:03,866
bounds confidence interval.

177
00:08:03,866 --> 00:08:06,766
You see that confidence interval
becomes smaller.

178
00:08:06,766 --> 00:08:08,100
Simply because we have.

179
00:08:08,100 --> 00:08:10,566
An additional duration of course
doesn't become that much smaller.

180
00:08:10,566 --> 00:08:14,100
But this is to just to illustrate a point,
because

181
00:08:14,100 --> 00:08:17,533
we have an additional observation,
we are more confident in our predictions.

182
00:08:17,533 --> 00:08:19,633
We are more confident in everything
that's going on.

183
00:08:19,633 --> 00:08:23,533
So the confidence interval interval,
slowly starts to shrink.

184
00:08:23,933 --> 00:08:24,200
All right.

185
00:08:24,200 --> 00:08:27,833
So the next step is now we find the next
one with the highest confidence bound.

186
00:08:28,033 --> 00:08:29,066
So this is not this one.

187
00:08:29,066 --> 00:08:31,333
It's one of these
for just picking a random one.

188
00:08:31,333 --> 00:08:34,433
There we go. This one do the same thing.

189
00:08:34,433 --> 00:08:40,433
So again the, ad is displayed,
a person either clicks or doesn't click.

190
00:08:40,566 --> 00:08:44,466
And that affects the average that we've
measured so far, the empirical average or,

191
00:08:44,733 --> 00:08:47,466
if you've pulled the lever,
you've got a certain,

192
00:08:47,466 --> 00:08:49,000
you know, you either one or you lost.

193
00:08:49,000 --> 00:08:52,100
And that affects your,
empirical average, this red line.

194
00:08:52,366 --> 00:08:57,700
And as expected, it's, slowly starts
to converge over like, lots of iterations.

195
00:08:57,700 --> 00:09:01,400
It will start to converge to the
to the, expected value.

196
00:09:02,100 --> 00:09:03,566
so it comes closer and right.

197
00:09:03,566 --> 00:09:06,066
Away,
you can see now this machine is all of.

198
00:09:06,066 --> 00:09:08,366
A sudden above
all of the other machines. Right.

199
00:09:08,366 --> 00:09:10,633
So if this was the end of this.

200
00:09:10,633 --> 00:09:11,900
Iteration, that's it.

201
00:09:11,900 --> 00:09:14,533
We, we would assume from here
that this is the best machine.

202
00:09:14,533 --> 00:09:15,666
And we'd start exploiting it.

203
00:09:15,666 --> 00:09:18,666
And, therefore, this algorithm
would be completely useless.

204
00:09:18,800 --> 00:09:21,700
But the we we shouldn't forget about
the second thing that happens.

205
00:09:21,700 --> 00:09:24,366
The second thing that happens is that.

206
00:09:24,366 --> 00:09:25,033
Because we.

207
00:09:25,033 --> 00:09:29,066
Got an additional, observation
in our sample now,

208
00:09:29,833 --> 00:09:32,100
we are more confident in this interval.

209
00:09:32,100 --> 00:09:35,433
And these confidence bounds,
they're designed, they're they're.

210
00:09:35,433 --> 00:09:38,300
Only purpose is to include.

211
00:09:38,300 --> 00:09:42,100
The, actual expected value,
wherever it is.

212
00:09:42,100 --> 00:09:47,133
We don't know where it is, but they are,
they are telling us that this value,

213
00:09:47,133 --> 00:09:48,733
this green value is somewhere
inside this box.

214
00:09:48,733 --> 00:09:50,633
But because we've got
an additional observation, we're.

215
00:09:50,633 --> 00:09:52,966
More confident our sample size is larger.

216
00:09:52,966 --> 00:09:56,833
So we're more confident
in the overall picture, for this machine.

217
00:09:56,833 --> 00:09:59,433
So the confidence bounds decrease.

218
00:09:59,433 --> 00:10:02,166
And now as you can see it's
no longer the top machine

219
00:10:02,166 --> 00:10:05,066
because even though it went up
the confidence bounds went down.

220
00:10:05,066 --> 00:10:08,166
So now we're going to look
for the next highest confidence bound.

221
00:10:08,400 --> 00:10:10,566
It can be any one of these three machines.

222
00:10:10,566 --> 00:10:14,466
And just look at any
any one randomly for now this one.

223
00:10:14,766 --> 00:10:19,666
And here even though
the red line is above the blue line. So,

224
00:10:19,933 --> 00:10:23,633
according to the law of large numbers,
you'd expect this to converge to that.

225
00:10:23,733 --> 00:10:27,800
But sometimes it can randomly occur that,
it can go the other way.

226
00:10:27,800 --> 00:10:29,533
Right? Things can happen like this.

227
00:10:29,533 --> 00:10:32,533
It's it's all probabilities.
so. Basically.

228
00:10:32,733 --> 00:10:34,166
It might even go up.

229
00:10:34,166 --> 00:10:34,800
So there we go.

230
00:10:34,800 --> 00:10:37,800
It went up even though the blue line
was below the red line.

231
00:10:38,200 --> 00:10:41,233
it can happen as a, you know, like as a,

232
00:10:41,700 --> 00:10:44,500
just as per chance.

233
00:10:44,500 --> 00:10:44,700
Right.

234
00:10:44,700 --> 00:10:48,133
In the long run, it will converge,
but on a random occasion it can go up.

235
00:10:48,300 --> 00:10:51,666
It can go in anyway, and again,
we got another,

236
00:10:51,833 --> 00:10:53,800
another element in the sample.

237
00:10:53,800 --> 00:10:57,300
So the, confidence
bounds converges. Okay.

238
00:10:57,300 --> 00:11:00,033
So can we kind of get the picture
of what's going on here.

239
00:11:00,033 --> 00:11:03,266
So now we're going to pick the next one
with the highest upper bound.

240
00:11:03,633 --> 00:11:05,233
let's say this one.

241
00:11:05,233 --> 00:11:07,233
then we do the trial.

242
00:11:07,233 --> 00:11:08,700
We do the, the rounds.

243
00:11:08,700 --> 00:11:12,166
What happens is the person click on add,
do we win money from the slot machine?

244
00:11:13,166 --> 00:11:14,600
and it goes down?

245
00:11:14,600 --> 00:11:16,033
Probably not.

246
00:11:16,033 --> 00:11:18,100
we didn't didn't click on the add.

247
00:11:18,100 --> 00:11:20,100
Didn't
win from the money from the slot machine.

248
00:11:20,100 --> 00:11:24,066
So the average of our observation
goes down, comes closer to the,

249
00:11:24,666 --> 00:11:26,166
expected value.

250
00:11:26,166 --> 00:11:28,866
And the confidence bounds also decrease.
Okay.

251
00:11:28,866 --> 00:11:31,800
Now we kind of when our in business,
we can

252
00:11:31,800 --> 00:11:35,233
all of them are kind of starting to play, 
next one is this one okay.

253
00:11:35,233 --> 00:11:36,666
This is the now.

254
00:11:36,666 --> 00:11:38,333
Because we know. The end result.

255
00:11:38,333 --> 00:11:39,766
We know that this is the best one. Right.

256
00:11:39,766 --> 00:11:43,166
We know that this is the best
ad or this is the best slot machine.

257
00:11:43,166 --> 00:11:44,700
We should be using. But like.

258
00:11:44,700 --> 00:11:45,600
Because just we.

259
00:11:45,600 --> 00:11:50,033
We were kind of like, given this insight,
just for argument's sake

260
00:11:50,033 --> 00:11:52,966
or for the purpose of this exercise,
but the person that's,

261
00:11:52,966 --> 00:11:55,966
reforming this algorithm
or the algorithm itself doesn't know that.

262
00:11:55,966 --> 00:12:00,766
So unknowingly, it's actually starting
to exploit the best, option right now.

263
00:12:01,100 --> 00:12:04,166
so again, okay, it goes up good.

264
00:12:04,366 --> 00:12:07,833
kind of this band goes down and
as you can see, it's still the best one.

265
00:12:07,833 --> 00:12:09,933
All right.
So now we're going to do it again.

266
00:12:09,933 --> 00:12:11,433
we're going to use this one again.

267
00:12:11,433 --> 00:12:15,300
And it comes closer and
but the confidence bound goes down again.

268
00:12:15,366 --> 00:12:18,233
This is all just, for illustration
purposes.

269
00:12:18,233 --> 00:12:21,833
Of course, it's not going to go down
by that much just because one observation.

270
00:12:21,833 --> 00:12:24,566
But we don't want to be sitting here
through a thousand iterations.

271
00:12:24,566 --> 00:12:26,700
This is just to demonstrate
the overall picture.

272
00:12:26,700 --> 00:12:29,933
So even though we exploited
the best option

273
00:12:29,933 --> 00:12:33,266
by exploiting the best option,
we're decreasing the confidence bound.

274
00:12:33,266 --> 00:12:35,300
Which gives an opportunity
or breaks learning.

275
00:12:35,300 --> 00:12:36,300
Any option.

276
00:12:36,300 --> 00:12:40,100
If it goes, if it keeps going up,
kind of keeps being good.

277
00:12:40,433 --> 00:12:41,700
because we're building.

278
00:12:41,700 --> 00:12:48,133
Up the sample size, this gives an option
to the other, it gives opportunity

279
00:12:48,133 --> 00:12:52,800
to the other options too, or machines
or adds to have a chance in the play.

280
00:12:52,833 --> 00:12:55,900
So that we are not just,
you know, we're not biased towards

281
00:12:55,900 --> 00:12:59,700
which one we think is the best
or optimal outcome, optimal machine.

282
00:12:59,866 --> 00:13:02,800
So now that we move on to this one
same thing

283
00:13:02,800 --> 00:13:06,900
and comes closer to decrease,
we want to this one,

284
00:13:07,133 --> 00:13:12,366
but decrease and we move on to this one
and the decrease.

285
00:13:12,366 --> 00:13:15,600
And then again this one bounds decrease.

286
00:13:15,600 --> 00:13:18,933
And again this one might jump
closer bounds decrease.

287
00:13:19,200 --> 00:13:21,533
And even though we were very close
to, you.

288
00:13:21,533 --> 00:13:23,900
Know, finding the solution
that as that one bounds the.

289
00:13:23,900 --> 00:13:26,100
Bounds decrease so much.
And you'll actually see this.

290
00:13:26,100 --> 00:13:27,533
In the practical application.

291
00:13:27,533 --> 00:13:31,366
The practical areas that are following
that sometimes

292
00:13:31,766 --> 00:13:36,300
we will after using the optimal option
for some time, we'll switch.

293
00:13:36,566 --> 00:13:37,600
The algorithm will still switch

294
00:13:37,600 --> 00:13:40,866
to a suboptimal option just because
the bounds are decreasing all the time.

295
00:13:41,366 --> 00:13:43,533
And then we'll use
this one. Bounds will decrease.

296
00:13:43,533 --> 00:13:46,166
And now we're back to the best
one on the crease.

297
00:13:46,166 --> 00:13:49,900
And then we're just going to be exploiting
this one and exploiting this one

298
00:13:50,133 --> 00:13:53,133
and exploiting this one because
we found out that it's the best one.

299
00:13:53,400 --> 00:13:56,400
So that is in essence

300
00:13:56,400 --> 00:14:01,500
the whole concept behind
this upper confidence bound algorithm.

301
00:14:01,500 --> 00:14:06,033
And that's how it solves the,
multi-armed bandit problem.

302
00:14:06,600 --> 00:14:10,066
it's a it's a very interesting solution,
much more sophisticated

303
00:14:10,100 --> 00:14:13,833
than just selecting randomly
or running an AB test and then,

304
00:14:14,133 --> 00:14:17,400
selecting the option,
you know, that that one.

305
00:14:18,133 --> 00:14:21,666
So, you know, if you're in advertising
or if you've got, campaigns

306
00:14:21,666 --> 00:14:25,700
or if you come across problems
that are similar to this, always

307
00:14:25,700 --> 00:14:27,600
just remember about the upper confidence
bound algorithm.

308
00:14:27,600 --> 00:14:30,900
And you can apply this in your work
as well.

309
00:14:30,900 --> 00:14:32,466
Very powerful algorithm.

310
00:14:32,466 --> 00:14:33,766
And on that note.

311
00:14:33,766 --> 00:14:35,600
I hope you enjoyed today's tutorial.

312
00:14:35,600 --> 00:14:39,566
In the next couple of videos,
Helena will take you through the.

313
00:14:39,566 --> 00:14:43,200
Programing of this algorithm
both in R and in Python, and you'll.

314
00:14:43,200 --> 00:14:45,200
Get your takeaway templates.

315
00:14:45,200 --> 00:14:46,600
And I can't wait to see you.

316
00:14:46,600 --> 00:14:50,800
Next time when we'll be talking
about the Thompson sampling algorithm.

317
00:14:50,800 --> 00:14:52,866
And until then, enjoy machine learning.