1
00:00:00,233 --> 00:00:02,433
Hello and welcome to this art tutorial.

2
00:00:02,433 --> 00:00:05,633
So in the previous tutorial we implemented
Thomson something from scratch.

3
00:00:05,866 --> 00:00:08,833
And now time for the moment
we've all been waiting for.

4
00:00:08,833 --> 00:00:12,000
Let's
see if Thomson sampling can beat UCB.

5
00:00:12,500 --> 00:00:15,266
So in fact we are ready to execute

6
00:00:15,266 --> 00:00:18,933
this code section here
and find out about the final result.

7
00:00:19,200 --> 00:00:24,833
So let's remember the random selection
gave us a total reward of 1200 on average.

8
00:00:25,133 --> 00:00:30,033
The UCB algorithm gave us
a total reward of 2178.

9
00:00:30,433 --> 00:00:34,733
And now let's see
if Thompson sampling can beat that.

10
00:00:35,166 --> 00:00:39,700
Now let's select everything
from here to the top.

11
00:00:39,866 --> 00:00:41,900
Because,
you know we haven't imported the data set.

12
00:00:41,900 --> 00:00:45,233
So we will execute everything
all at once to immediately

13
00:00:45,233 --> 00:00:48,300
get this final result
that we were so excited to find out about.

14
00:00:48,566 --> 00:00:51,833
So ready I'm going to press Command
plus enter to execute.

15
00:00:52,200 --> 00:00:55,533
And let's see who is the big winner.

16
00:00:56,233 --> 00:00:56,766
Here we go.

17
00:00:56,766 --> 00:01:00,700
And it turns out to be Thompson Sampling

18
00:01:00,700 --> 00:01:05,100
because we got a total reward of 2602.

19
00:01:05,933 --> 00:01:07,733
So we have some random factor.

20
00:01:07,733 --> 00:01:09,633
So let's not scream victory yet.

21
00:01:09,633 --> 00:01:13,700
We are going to execute that again
to see the new total reward.

22
00:01:13,700 --> 00:01:16,533
Will we get 2600. Almost.

23
00:01:16,533 --> 00:01:17,433
We can do that again.

24
00:01:17,433 --> 00:01:22,800
And basically it's averaging around
2006 hundred.

25
00:01:23,133 --> 00:01:27,633
So yes definitely it's beating the upper
confidence bound algorithm.

26
00:01:27,966 --> 00:01:30,600
And by the way
remember that with the UCB algorithm

27
00:01:30,600 --> 00:01:34,233
we almost doubled the total reward
of the random selection algorithm.

28
00:01:34,300 --> 00:01:37,833
But now with Thompson sampling
we're not only beating the UCB algorithm,

29
00:01:38,033 --> 00:01:41,966
but also we are doing better than doubling
the random selection total reward

30
00:01:42,266 --> 00:01:46,466
because we get this 2600
total reward on average,

31
00:01:46,633 --> 00:01:49,466
which is more than the double of 1200.

32
00:01:49,466 --> 00:01:52,866
That was the total reward of the random
selection algorithm on average.

33
00:01:53,366 --> 00:01:54,100
So great.

34
00:01:54,100 --> 00:01:56,600
Definitely. Thompson
sampling is the big winner.

35
00:01:56,600 --> 00:01:59,266
And now we have last thing to check.

36
00:01:59,266 --> 00:02:01,866
You know remember
we need to check that Thompson sampling

37
00:02:01,866 --> 00:02:05,333
also gives us the best
ad that has the highest conversion rate.

38
00:02:05,633 --> 00:02:08,633
You know, on which the users of
the social network would click the most.

39
00:02:09,033 --> 00:02:12,766
And so we need to make sure
that it's also the ad version

40
00:02:12,766 --> 00:02:16,233
number five, which was the adverse
and found by the UCB algorithm.

41
00:02:16,600 --> 00:02:21,366
And to check that out very efficiently,
we can select this code section here

42
00:02:21,766 --> 00:02:25,000
and execute to look at the histogram.

43
00:02:25,033 --> 00:02:26,200
And here we go.

44
00:02:26,200 --> 00:02:30,833
We also get that the ad version that was
most selected is ad version number five.

45
00:02:31,233 --> 00:02:35,400
And by the way in UCB we had some higher
bias here if I remember correctly.

46
00:02:35,700 --> 00:02:38,700
But here with Thompson sampling
we can clearly see that

47
00:02:38,733 --> 00:02:42,800
this was this ad version number five here
that was most selected.

48
00:02:43,100 --> 00:02:45,966
You know, this bar here
corresponding to the ad version number

49
00:02:45,966 --> 00:02:48,966
five is clearly dominating the other boys.

50
00:02:49,233 --> 00:02:50,000
And that's because

51
00:02:50,000 --> 00:02:53,866
Thompson sampling quickly figured out
which ad is the best to select.

52
00:02:53,900 --> 00:02:54,466
That is it.

53
00:02:54,466 --> 00:02:57,533
Quickly figured out
which ad has the best click through rate.

54
00:02:57,666 --> 00:03:01,033
And so now we can congratulate ourselves
because we clearly solved

55
00:03:01,033 --> 00:03:04,500
very efficiently this click
through rate optimization problem.

56
00:03:04,900 --> 00:03:08,733
And the best algorithm that we found for
this is Thompson sampling.

57
00:03:09,466 --> 00:03:09,866
All right.

58
00:03:09,866 --> 00:03:13,200
So congratulations for having implemented
these two

59
00:03:13,200 --> 00:03:16,200
algorithms UCB and Thompson sampling.

60
00:03:16,300 --> 00:03:17,733
That's the end of this section.

61
00:03:17,733 --> 00:03:20,733
And that's also the end of this part
reinforcement learning.

62
00:03:20,766 --> 00:03:23,066
So I look forward to seeing you
in the next part.

63
00:03:23,066 --> 00:03:25,366
Natural language processing.

64
00:03:25,366 --> 00:03:26,900
Until then, enjoy machine learning.