1
00:00:00,233 --> 00:00:01,200
Hello, my friends.

2
00:00:01,200 --> 00:00:06,300
All right, let's start the implementation
of the upper confidence bounds algorithm.

3
00:00:06,733 --> 00:00:09,533
So we're going to. Make it step by step.

4
00:00:09,533 --> 00:00:10,833
And you're going to implement each.

5
00:00:10,833 --> 00:00:13,733
Of the steps
first before we do it together.

6
00:00:13,733 --> 00:00:15,300
And that first step you know I've.

7
00:00:15,300 --> 00:00:16,633
Prepared the slide. Here.

8
00:00:16,633 --> 00:00:18,833
We're going to 
have a look at it many times.

9
00:00:18,833 --> 00:00:21,933
The first step is that at each round
you know for each.

10
00:00:21,933 --> 00:00:22,766
User because.

11
00:00:22,766 --> 00:00:27,866
A round corresponds to a user,
we consider two numbers for each add I,

12
00:00:27,900 --> 00:00:29,333
you know, from 1 to 10

13
00:00:29,333 --> 00:00:32,866
this first number and I
n, which is the number of times to add.

14
00:00:32,866 --> 00:00:35,233
I was selected up. To round.

15
00:00:35,233 --> 00:00:38,700
And so make sure to understand
the indexes here and the variables.

16
00:00:39,066 --> 00:00:42,566
And then or I n
which is the sum of rewards.

17
00:00:42,566 --> 00:00:45,866
I have to add number I up to round n okay.

18
00:00:46,200 --> 00:00:48,366
So the first step
that I would like you to do.

19
00:00:48,366 --> 00:00:51,066
You know, and I'm going to ask you
to press pause on this video.

20
00:00:51,066 --> 00:00:53,433
The first step I'm going to ask
you. To do is to.

21
00:00:53,433 --> 00:00:54,700
Make these. Two.

22
00:00:54,700 --> 00:00:55,800
Variables, you know, create.

23
00:00:55,800 --> 00:00:58,700
Two variables for these. Numbers.

24
00:00:58,700 --> 00:01:00,766
The number of times
that I was selected up to run

25
00:01:00,766 --> 00:01:04,200
n, and the sum of rewards of the
at I up to round that.

26
00:01:04,566 --> 00:01:05,766
So create these two variables.

27
00:01:05,766 --> 00:01:08,166
And then also in the same step one.

28
00:01:08,166 --> 00:01:10,666
I would like you to create
other variables.

29
00:01:10,666 --> 00:01:13,066
The first one is the total number of.
Users.

30
00:01:13,066 --> 00:01:16,666
To whom we will show one of the ads,
and that's 10,000.

31
00:01:16,666 --> 00:01:17,966
So I would like you to put this

32
00:01:17,966 --> 00:01:21,433
10,000 value
in a variable that you can call capital N.

33
00:01:21,900 --> 00:01:23,533
Then I would like you to create

34
00:01:23,533 --> 00:01:27,066
another variable for the number of ads
we have, meaning ten.

35
00:01:27,300 --> 00:01:30,300
And you can call this.
Variable lowercase d.

36
00:01:30,500 --> 00:01:32,600
After this please create a variable.

37
00:01:32,600 --> 00:01:35,633
That will contain the.
List of the selected.

38
00:01:35,633 --> 00:01:36,766
Ads over the round.

39
00:01:36,766 --> 00:01:41,100
So you know it will start as an empty list
and will become a list

40
00:01:41,100 --> 00:01:44,866
of 10,000 elements
corresponding to the 10,000 ads

41
00:01:44,900 --> 00:01:48,200
that were selected to the 10,000 users
successively.

42
00:01:48,700 --> 00:01:50,900
Then please create these two variables.

43
00:01:50,900 --> 00:01:51,600
So this first.

44
00:01:51,600 --> 00:01:55,100
One and I n
you can call it numbers of selections.

45
00:01:55,333 --> 00:01:57,300
And you have to of course initialize it.

46
00:01:57,300 --> 00:01:58,500
As a list of.

47
00:01:58,500 --> 00:02:01,400
Ten elements only containing zeros.

48
00:02:01,400 --> 00:02:03,966
And then I will show. You a trick
to do that easily.

49
00:02:03,966 --> 00:02:08,533
And for the second variable or I n
you can call it some of rewards and same.

50
00:02:08,666 --> 00:02:13,933
You have to initialize it as a list of ten
elements, but initialize with ten zeros.

51
00:02:14,133 --> 00:02:16,666
And we will populate these two lists
over the rounds.

52
00:02:16,666 --> 00:02:18,333
Okay. And finally.

53
00:02:18,333 --> 00:02:22,133
I'd like you to create one last variable,
which is the total reward.

54
00:02:22,133 --> 00:02:23,500
And which will simply. Be.

55
00:02:23,500 --> 00:02:24,733
The sum of.

56
00:02:24,733 --> 00:02:26,500
All the rewards received.

57
00:02:26,500 --> 00:02:27,300
At each round.

58
00:02:27,300 --> 00:02:29,933
Because it's important
to remember that. The.

59
00:02:29,933 --> 00:02:31,566
Zeros and ones in the.

60
00:02:31,566 --> 00:02:33,933
Data set. Are in fact the rewards.

61
00:02:33,933 --> 00:02:35,100
You know, the single.

62
00:02:35,100 --> 00:02:37,600
Rewards received at each round.

63
00:02:37,600 --> 00:02:39,300
If the user clicks the add.

64
00:02:39,300 --> 00:02:41,933
Then we get a reward of one
at a particular round.

65
00:02:41,933 --> 00:02:43,700
And if the user doesn't collect.

66
00:02:43,700 --> 00:02:46,000
Yet, we get a reward of zero.

67
00:02:46,000 --> 00:02:47,500
We get no. Reward, basically.

68
00:02:47,500 --> 00:02:48,133
Okay.

69
00:02:48,133 --> 00:02:50,700
And the total. Reward here
that I would like you to create as.

70
00:02:50,700 --> 00:02:51,900
A final variable.

71
00:02:51,900 --> 00:02:54,233
Will be the accumulated reward.

72
00:02:54,233 --> 00:02:57,066
Meaning the sum of.
All the rewards collected.

73
00:02:57,066 --> 00:02:58,100
Over the round.

74
00:02:58,100 --> 00:02:58,800
All right.

75
00:02:58,800 --> 00:03:01,300
So let's do this. Please
press pause on the video.

76
00:03:01,300 --> 00:03:05,066
And in a second
we will implement the solution together.

77
00:03:06,566 --> 00:03:07,033
All right.

78
00:03:07,033 --> 00:03:09,266
Welcome back. Let's do this.

79
00:03:09,266 --> 00:03:10,500
So first let's create.

80
00:03:10,500 --> 00:03:12,333
A new code cell. And let's.

81
00:03:12,333 --> 00:03:15,066
Create
each of these variables. One by one.

82
00:03:15,066 --> 00:03:16,433
So at first we said that we.

83
00:03:16,433 --> 00:03:18,900
Wanted to have a variable
for the total number of.

84
00:03:18,900 --> 00:03:19,800
Users or the.

85
00:03:19,800 --> 00:03:22,500
Total number of rounds over
which we're going to show ads.

86
00:03:22,500 --> 00:03:23,466
To the users.

87
00:03:23,466 --> 00:03:24,266
So there we go.

88
00:03:24,266 --> 00:03:27,200
We want to call it n capital n equals.

89
00:03:27,200 --> 00:03:29,466
And that's 10,000.

90
00:03:29,466 --> 00:03:31,500
All right. 10,000. Yes.

91
00:03:31,500 --> 00:03:33,066
Then we said we wanted to have a.

92
00:03:33,066 --> 00:03:36,233
Variable for the number of ads
meaning ten.

93
00:03:36,433 --> 00:03:37,133
And we want to call.

94
00:03:37,133 --> 00:03:39,900
This variable lowercase d equals.

95
00:03:39,900 --> 00:03:41,700
Ten. Perfect.

96
00:03:41,700 --> 00:03:43,333
Then as we said we want to have.

97
00:03:43,333 --> 00:03:46,200
The full list of the ads
that are selected.

98
00:03:46,200 --> 00:03:46,900
Over the round.

99
00:03:46,900 --> 00:03:49,233
So you know at first
this will be an empty list.

100
00:03:49,233 --> 00:03:51,833
And over the rounds
it will get bigger and bigger.

101
00:03:51,833 --> 00:03:52,833
Up to at the.

102
00:03:52,833 --> 00:03:54,333
End it. Will be a list of.

103
00:03:54,333 --> 00:03:59,933
10,000 elements, and the nth element
will be the ad selected at run n.

104
00:03:59,933 --> 00:04:00,566
All right.

105
00:04:00,566 --> 00:04:04,000
So we're going to call this variable
ad underscore selected.

106
00:04:04,566 --> 00:04:07,566
And this will be initialized
as an empty list.

107
00:04:07,566 --> 00:04:09,766
Just like that ad selected.

108
00:04:09,766 --> 00:04:11,733
All right then next one.

109
00:04:11,733 --> 00:04:14,000
Well the next two ones are. These two.

110
00:04:14,000 --> 00:04:18,300
You know and I n number of times
the ad I was selected up to run n

111
00:04:18,500 --> 00:04:22,200
and r I and the sum of rewards of the
ad I up to round n.

112
00:04:22,500 --> 00:04:23,700
So for the first.

113
00:04:23,700 --> 00:04:28,200
One we will call it numbers of selections.

114
00:04:28,600 --> 00:04:32,366
And since we want to have these numbers
of selections, you know, these numbers

115
00:04:32,366 --> 00:04:33,100
of times.

116
00:04:33,100 --> 00:04:35,800
Each ad was selected for all the ads.

117
00:04:35,800 --> 00:04:39,066
Well this will be. Initialized. As a list.

118
00:04:39,066 --> 00:04:42,733
But not an empty list,
but a list of ten zeros.

119
00:04:42,966 --> 00:04:45,800
And the trick to initialize
this list of ten zeros

120
00:04:45,800 --> 00:04:49,400
efficiently is to just add here times t.

121
00:04:50,033 --> 00:04:52,500
All right, just like that,
this will initialize.

122
00:04:52,500 --> 00:04:54,933
This list as. A list of ten zeros.

123
00:04:54,933 --> 00:04:58,233
And then each time we select an ad,
for example, ad number three.

124
00:04:58,233 --> 00:05:00,000
Well the third element of this.

125
00:05:00,000 --> 00:05:02,066
List will. Be incremented by one.

126
00:05:02,066 --> 00:05:03,800
All right. So at first it will be zero.

127
00:05:03,800 --> 00:05:06,600
Then let's say ad number
three is selected. It will become one.

128
00:05:06,600 --> 00:05:09,000
Then let's say ad number.
Five is selected.

129
00:05:09,000 --> 00:05:10,533
We'll replace zero by one.

130
00:05:10,533 --> 00:05:11,600
And then you know each round

131
00:05:11,600 --> 00:05:14,066
it is incremented
each time a new ad is selected.

132
00:05:14,066 --> 00:05:14,733
Okay.

133
00:05:14,733 --> 00:05:17,100
And at the end
we hopefully want to see one ad.

134
00:05:17,100 --> 00:05:18,700
That is selected way.

135
00:05:18,700 --> 00:05:20,166
More than the others, and.

136
00:05:20,166 --> 00:05:22,000
UCB will figure it out.

137
00:05:22,000 --> 00:05:24,400
Okay then next variable you know this one.

138
00:05:24,400 --> 00:05:27,866
The sum of rewards of the up to round
n will same here.

139
00:05:27,866 --> 00:05:31,900
We want to have these sums of rewards
for each of the add up to round n.

140
00:05:31,900 --> 00:05:34,166
And therefore
we're going to create another list

141
00:05:34,166 --> 00:05:38,000
which we're going to call sums of rewards.

142
00:05:38,433 --> 00:05:39,833
Right. And same.

143
00:05:39,833 --> 00:05:43,600
This will be initialized
as a list of ten zero.

144
00:05:43,600 --> 00:05:46,166
So I'm just copying and pasting this.

145
00:05:46,166 --> 00:05:47,100
All right. That's the same.

146
00:05:47,100 --> 00:05:51,266
And of course at first round
well each ad has a sum of.

147
00:05:51,266 --> 00:05:52,766
Rewards equal to zero because.

148
00:05:52,766 --> 00:05:55,033
At the beginning
no ad is selected and therefore no.

149
00:05:55,033 --> 00:05:57,000
Reward is. Collected.

150
00:05:57,000 --> 00:05:59,366
Then as we said, we want to have a final.

151
00:05:59,366 --> 00:06:01,200
Variable, which is the total.

152
00:06:01,200 --> 00:06:03,533
Reward accumulated. Over. The rounds.

153
00:06:03,533 --> 00:06:06,300
You know, with the different ads
we select at each round.

154
00:06:06,300 --> 00:06:08,533
And let's call this. Variable total.

155
00:06:08,533 --> 00:06:10,333
Underscore. Reward.

156
00:06:10,333 --> 00:06:12,766
And of course we have to initialize it as.

157
00:06:12,766 --> 00:06:13,633
Zero because.

158
00:06:13,633 --> 00:06:18,800
In the first round well no AD is selected
yet and therefore no reward is collected.

159
00:06:19,466 --> 00:06:20,466
Okay. Good.

160
00:06:20,466 --> 00:06:21,400
So we have all.

161
00:06:21,400 --> 00:06:24,233
The parameters all initialized correctly.

162
00:06:24,233 --> 00:06:26,933
And now what do you think
will be the next step?

163
00:06:26,933 --> 00:06:27,900
Well of course the next.

164
00:06:27,900 --> 00:06:33,033
Step will be to start a for loop
which will iterate through all.

165
00:06:33,033 --> 00:06:35,866
The different rounds,
you know, starting from round. Zero.

166
00:06:35,866 --> 00:06:40,133
Because, you know, in Python indexes
start from zero up to round 10,000

167
00:06:40,500 --> 00:06:42,133
and that each round, well.

168
00:06:42,133 --> 00:06:45,000
We will follow these two steps.

169
00:06:45,000 --> 00:06:48,400
You know, we will compute
the average reward of up to run n.

170
00:06:48,400 --> 00:06:50,400
Then we will get the confidence interval.

171
00:06:50,400 --> 00:06:51,900
And in step. Three will select the.

172
00:06:51,900 --> 00:06:55,000
Ad that has the maximum upper
confidence bound.

173
00:06:55,033 --> 00:06:58,033
You know the higher upper confidence
bounds.

174
00:06:58,100 --> 00:06:59,800
All right.
So you will see this will be very easy.

175
00:06:59,800 --> 00:07:01,433
We will just follow these steps.

176
00:07:01,433 --> 00:07:03,200
And it will ask you
to implement them first.

177
00:07:03,200 --> 00:07:05,266
But no worries I will guide you.

178
00:07:05,266 --> 00:07:06,733
And so now let's take a.

179
00:07:06,733 --> 00:07:08,166
Little break because this for.

180
00:07:08,166 --> 00:07:10,866
Loop will, you know,
actually take a few lines of code.

181
00:07:10,866 --> 00:07:12,933
So make sure to have good energy for this.

182
00:07:12,933 --> 00:07:14,700
And then let's smash this together.

183
00:07:14,700 --> 00:07:16,466
Until then enjoy machine learning.