1
00:00:01,266 --> 00:00:04,233
Hello and welcome back to the course
on Machine Learning.

2
00:00:04,233 --> 00:00:08,000
I hope you enjoyed the previous tutorials,
and now you're quite confident

3
00:00:08,000 --> 00:00:09,600
with the upper confidence bound

4
00:00:09,600 --> 00:00:12,933
and the Thompson sampling algorithm,
or at least the intuition behind them.

5
00:00:13,366 --> 00:00:13,800
And today

6
00:00:13,800 --> 00:00:17,633
we're going to quickly compare the two
because they do solve the same problem.

7
00:00:17,633 --> 00:00:20,900
They solve
the problem of the multi armed bandit.

8
00:00:20,933 --> 00:00:25,433
And let's have a look at some of the pros
and cons of each of the algorithms.

9
00:00:25,700 --> 00:00:28,700
So of course
there's lots and lots of different

10
00:00:28,800 --> 00:00:31,633
advantages and disadvantages
and differences between the two.

11
00:00:31,633 --> 00:00:34,500
But we'll just highlight the main ones.

12
00:00:34,500 --> 00:00:36,400
So here
we've got the two algorithms on the left.

13
00:00:36,400 --> 00:00:40,266
We've got the UCB
and the image from the intuition tutorial

14
00:00:40,266 --> 00:00:43,733
that will help us to remember what it's
what it's about.

15
00:00:43,733 --> 00:00:46,866
And the Thompson sampling algorithm
and same the image behind it.

16
00:00:47,666 --> 00:00:51,066
So, the first characteristic is probably

17
00:00:51,600 --> 00:00:55,533
that is different is that, that
the UCB is a deterministic algorithm.

18
00:00:55,533 --> 00:00:59,800
And actually there's lots of different,

19
00:00:59,800 --> 00:01:03,833
there's lots of different, modifications
to the UCB algorithm.

20
00:01:03,866 --> 00:01:05,100
And you can find them online.

21
00:01:05,100 --> 00:01:06,733
There's lots of different white papers

22
00:01:06,733 --> 00:01:10,200
on how the YouTube algorithm
can be modified to improve it and to,

23
00:01:10,533 --> 00:01:14,733
make it better at,
one thing or like ads, one advantage made.

24
00:01:14,733 --> 00:01:19,300
It makes this, a bit better a result, but

25
00:01:19,300 --> 00:01:22,466
it makes it more computationally intensive
or the other way around and so on.

26
00:01:22,500 --> 00:01:27,633
So, but all of that, all of those
algorithms belong to a family, of UCB

27
00:01:27,633 --> 00:01:30,700
algorithms or upper confidence bound
algorithms, which are all deterministic.

28
00:01:30,700 --> 00:01:34,500
And basically it what that means is,
is that it is very straightforward.

29
00:01:34,500 --> 00:01:36,733
Right? So, once,

30
00:01:38,066 --> 00:01:38,800
you have

31
00:01:38,800 --> 00:01:43,800
a certain round, it is very,
it's very straightforward.

32
00:01:43,800 --> 00:01:44,633
What's going to happen.

33
00:01:44,633 --> 00:01:47,866
You just look at the upper confidence
boundary, whichever one has the highest.

34
00:01:48,133 --> 00:01:49,500
That's the one you pick.

35
00:01:49,500 --> 00:01:51,966
you pull the algorithm. Yes.

36
00:01:51,966 --> 00:01:53,500
you pull, you pull the lever. Yes.

37
00:01:53,500 --> 00:01:56,933
Then you do get, like a random value
from the machine, but that's,

38
00:01:57,266 --> 00:02:01,200
that's that's the it's,
that's on the side of the machine.

39
00:02:01,200 --> 00:02:03,633
So that's randomness
is on the side of machine.

40
00:02:03,633 --> 00:02:04,433
On the machine.

41
00:02:04,433 --> 00:02:08,766
And then, but then whatever value you get,
it is very determined.

42
00:02:08,766 --> 00:02:12,333
It's very deterministic what the UCB
is going to do with that value.

43
00:02:12,933 --> 00:02:16,500
and so all of the steps
that the UCB actually takes,

44
00:02:16,500 --> 00:02:17,700
they're very deterministic.

45
00:02:17,700 --> 00:02:20,766
They there's no randomness
in the algorithm itself.

46
00:02:21,466 --> 00:02:25,333
Whereas on the other hand, the Thompson
sampling algorithm is a probabilistic

47
00:02:25,333 --> 00:02:29,366
algorithm because in the algorithm itself
it has these distributions

48
00:02:30,033 --> 00:02:33,300
which represent
our perception of the world and where

49
00:02:33,300 --> 00:02:39,333
we think the actual expected returns
of each of those, machines might lie.

50
00:02:39,533 --> 00:02:43,533
And therefore, every time we are,

51
00:02:43,800 --> 00:02:48,200
implementing or, iterating
in the Thompson sampling algorithm,

52
00:02:48,200 --> 00:02:52,566
we actually generate random values
from those distributions. So

53
00:02:53,866 --> 00:02:56,866
if you rerun around in the UCB algorithm,

54
00:02:56,900 --> 00:03:00,200
just one given round,
after you've received the value from

55
00:03:00,466 --> 00:03:04,000
the previous value from, the machine,
and then you rerun

56
00:03:04,000 --> 00:03:06,033
the round, it's
always going to be the same result.

57
00:03:06,033 --> 00:03:08,933
Whereas in the Thompson
sampling algorithm, after you've received

58
00:03:08,933 --> 00:03:11,933
the previous value from the machine
and you rerun the current round,

59
00:03:12,400 --> 00:03:15,400
it's all it's always going
to be different because you're always,

60
00:03:15,600 --> 00:03:19,333
sampling from your distributions,

61
00:03:19,500 --> 00:03:22,533
which, characterize
your perception of the world.

62
00:03:22,933 --> 00:03:27,200
And, that is a whole different type
of algorithm.

63
00:03:27,200 --> 00:03:28,833
It's a probabilistic algorithm.

64
00:03:28,833 --> 00:03:32,566
And those two things,
they actually have, a few different,

65
00:03:32,933 --> 00:03:35,866
implications. Right.

66
00:03:35,866 --> 00:03:39,200
So, for instance,
one important one is that

67
00:03:39,200 --> 00:03:42,533
the UCB requires an update at every round.

68
00:03:42,533 --> 00:03:42,800
Right?

69
00:03:42,800 --> 00:03:47,933
So, basically the value that you get back
from the you, from the machine.

70
00:03:47,933 --> 00:03:50,133
So once you've pulled the lever
and you get a value back

71
00:03:50,133 --> 00:03:53,333
from that machine, that value,
you have to incorporate it right away

72
00:03:53,533 --> 00:03:55,166
in order to proceed to the next round.

73
00:03:55,166 --> 00:04:00,633
You cannot proceed to the next round
until you have, incorporated that value

74
00:04:00,633 --> 00:04:05,300
until you have made an adjustment,
to the algorithm based on that value.

75
00:04:06,400 --> 00:04:07,300
because if you don't make the

76
00:04:07,300 --> 00:04:10,866
adjustment, then nothing changes
and you're going to be stuck.

77
00:04:11,233 --> 00:04:15,133
Whereas in the Thompson sampling,
it can accommodate delayed feedback.

78
00:04:15,133 --> 00:04:16,166
And this is very important.

79
00:04:16,166 --> 00:04:19,566
This basically means that if you
pull the lever and you only will get,

80
00:04:19,866 --> 00:04:22,966
the result,
or you will only know the result

81
00:04:22,966 --> 00:04:26,700
of pulling the lever,
500 rounds down the track, right?

82
00:04:26,733 --> 00:04:27,366
Not right away.

83
00:04:27,366 --> 00:04:30,466
You'll only get to know the result
500 rounds later.

84
00:04:30,866 --> 00:04:32,733
The Thompson
sampling algorithm will still work.

85
00:04:32,733 --> 00:04:33,533
Why will it still work?

86
00:04:33,533 --> 00:04:36,533
Because, if you now run,

87
00:04:36,666 --> 00:04:39,533
the algorithm without even updating
your perception of the world,

88
00:04:39,533 --> 00:04:42,666
you still going to get a new,

89
00:04:42,666 --> 00:04:45,666
set of hypothetical,

90
00:04:45,733 --> 00:04:46,633
bandits, right?

91
00:04:46,633 --> 00:04:50,466
You're going to generate a new
a new expected return for every band

92
00:04:50,500 --> 00:04:53,566
because you are generating them
in a probabilistic manner.

93
00:04:53,800 --> 00:04:57,900
And this is very important to understand
because, this gives the Thompson sampling

94
00:04:57,900 --> 00:04:58,700
that advantage,

95
00:04:58,700 --> 00:05:02,433
that you don't have to update
the algorithm with the result every time.

96
00:05:02,633 --> 00:05:06,300
And in terms of, I mean, in
terms of vendors, of course, it doesn't,

97
00:05:07,000 --> 00:05:10,400
really matter that much because
if you're playing in the casino and,

98
00:05:10,966 --> 00:05:14,033
or if some hypothetical person
is playing in the casino

99
00:05:14,033 --> 00:05:15,733
and they're pulling these levers,

100
00:05:15,733 --> 00:05:18,566
they get see the results right away
so they could update the algorithm.

101
00:05:18,566 --> 00:05:23,100
But in terms of websites and ads,
that is a big deal, right?

102
00:05:23,100 --> 00:05:26,300
So not even just, not even just,

103
00:05:27,133 --> 00:05:29,833
displaying ads, on, on a website.

104
00:05:29,833 --> 00:05:34,300
Or you could use this for like AB testing
different instead of AB testing.

105
00:05:34,600 --> 00:05:38,100
the different, layouts of your website.

106
00:05:38,100 --> 00:05:38,500
Right.

107
00:05:38,500 --> 00:05:41,633
You could, you could use, a Thompson
sampling algorithm to,

108
00:05:41,900 --> 00:05:46,500
have that power, balance
between exploitation, exploration.

109
00:05:46,500 --> 00:05:47,100
Right away.

110
00:05:47,100 --> 00:05:50,766
So basically anything
you're doing on the web with Thompson

111
00:05:50,766 --> 00:05:55,233
sampling or like,
solving a, multi-armed bandit problem

112
00:05:55,633 --> 00:05:58,633
for your business
or for a business on on the web.

113
00:05:59,300 --> 00:06:04,366
you you're getting all these thousands
and tens of thousands of clicks, right?

114
00:06:04,366 --> 00:06:08,166
And to update the algorithm right away,
that would be very

115
00:06:08,200 --> 00:06:12,033
highly computationally,
or computationally costly.

116
00:06:12,666 --> 00:06:16,166
or it might require additional resources
and, complex process.

117
00:06:16,166 --> 00:06:17,833
Whereas if you,

118
00:06:17,833 --> 00:06:20,400
what is what the Thompson
sampling algorithm allows you to do is to

119
00:06:20,400 --> 00:06:24,600
update your data set or your information
algorithm in a batch manner.

120
00:06:24,600 --> 00:06:28,566
So you you wait until you get 500 clicks,
or you wait until you get 5000 clicks,

121
00:06:28,733 --> 00:06:30,800
you update the algorithm,
then you let it run

122
00:06:30,800 --> 00:06:34,033
and then it runs, runs, runs, and
then you get another 5000 clicks and then,

123
00:06:34,633 --> 00:06:37,266
you update the algorithm again
and it will still work.

124
00:06:37,266 --> 00:06:39,833
And that's a very, important thing.

125
00:06:39,833 --> 00:06:44,033
That's that flexibility that the Thompson
sampling algorithm creates.

126
00:06:44,866 --> 00:06:48,166
And, finally, again,
we're not going to go into too much detail

127
00:06:48,166 --> 00:06:51,300
on the pros and cons, but, the,

128
00:06:51,900 --> 00:06:54,300
Thompson sampling algorithm is actually

129
00:06:54,300 --> 00:06:56,966
it has better empirical evidence
than they used to be.

130
00:06:56,966 --> 00:07:00,766
And, they you will find this phrase
better empirical evidence.

131
00:07:00,766 --> 00:07:05,366
And that's because up until recently,
the,

132
00:07:05,533 --> 00:07:08,933
the theory behind the Thompson
sampling algorithm or,

133
00:07:09,166 --> 00:07:12,000
you know, the whole research wasn't
it wasn't complete.

134
00:07:12,000 --> 00:07:17,100
It's only been researched very detail
in a lot of detail just a few years ago.

135
00:07:17,400 --> 00:07:20,966
And, you know, now you can find, like,
a lot of information on the Thompson

136
00:07:20,966 --> 00:07:23,966
sampling algorithm,
but previously people just see that,

137
00:07:24,066 --> 00:07:26,800
from experimental evidence,

138
00:07:26,800 --> 00:07:29,900
the Thompson sampling algorithm does work
better than the UCB.

139
00:07:29,900 --> 00:07:32,000
And that's exactly
what we're going to see.

140
00:07:32,000 --> 00:07:34,033
spoiler alert.

141
00:07:34,033 --> 00:07:35,666
That's exactly what you're going to see

142
00:07:35,666 --> 00:07:39,700
in the, 
practical tutorials for this section.

143
00:07:39,700 --> 00:07:40,700
So we're going to be now

144
00:07:40,700 --> 00:07:44,200
coding the Or Hudlin will walk you
through coding the same,

145
00:07:44,666 --> 00:07:48,300
exercise, the same problem
that we had previously with the UCB.

146
00:07:48,300 --> 00:07:49,200
Now we're going to be

147
00:07:49,200 --> 00:07:52,500
or you're going to be solving it
with, the Thompson sampling algorithm.

148
00:07:52,700 --> 00:07:56,100
And you will see actually
some very interesting results.

149
00:07:56,333 --> 00:07:57,733
So we'll leave it at that.

150
00:07:57,733 --> 00:08:00,266
And, I hope you enjoyed these intuition
tutorials.

151
00:08:00,266 --> 00:08:03,433
And off
we go to the practical side of things.

152
00:08:03,600 --> 00:08:04,800
Can't wait for you to get started.

153
00:08:04,800 --> 00:08:08,900
Had lunch will show you all around
and I'll see you back here next time.

154
00:08:08,933 --> 00:08:10,733
And until then, enjoy machine learning.