1 00:00:00,233 --> 00:00:01,200 Hello, my friends. 2 00:00:01,200 --> 00:00:06,300 All right, let's start the implementation of the upper confidence bounds algorithm. 3 00:00:06,733 --> 00:00:09,533 So we're going to. Make it step by step. 4 00:00:09,533 --> 00:00:10,833 And you're going to implement each. 5 00:00:10,833 --> 00:00:13,733 Of the steps first before we do it together. 6 00:00:13,733 --> 00:00:15,300 And that first step you know I've. 7 00:00:15,300 --> 00:00:16,633 Prepared the slide. Here. 8 00:00:16,633 --> 00:00:18,833 We're going to have a look at it many times. 9 00:00:18,833 --> 00:00:21,933 The first step is that at each round you know for each. 10 00:00:21,933 --> 00:00:22,766 User because. 11 00:00:22,766 --> 00:00:27,866 A round corresponds to a user, we consider two numbers for each add I, 12 00:00:27,900 --> 00:00:29,333 you know, from 1 to 10 13 00:00:29,333 --> 00:00:32,866 this first number and I n, which is the number of times to add. 14 00:00:32,866 --> 00:00:35,233 I was selected up. To round. 15 00:00:35,233 --> 00:00:38,700 And so make sure to understand the indexes here and the variables. 16 00:00:39,066 --> 00:00:42,566 And then or I n which is the sum of rewards. 17 00:00:42,566 --> 00:00:45,866 I have to add number I up to round n okay. 18 00:00:46,200 --> 00:00:48,366 So the first step that I would like you to do. 19 00:00:48,366 --> 00:00:51,066 You know, and I'm going to ask you to press pause on this video. 20 00:00:51,066 --> 00:00:53,433 The first step I'm going to ask you. To do is to. 21 00:00:53,433 --> 00:00:54,700 Make these. Two. 22 00:00:54,700 --> 00:00:55,800 Variables, you know, create. 23 00:00:55,800 --> 00:00:58,700 Two variables for these. Numbers. 24 00:00:58,700 --> 00:01:00,766 The number of times that I was selected up to run 25 00:01:00,766 --> 00:01:04,200 n, and the sum of rewards of the at I up to round that. 26 00:01:04,566 --> 00:01:05,766 So create these two variables. 27 00:01:05,766 --> 00:01:08,166 And then also in the same step one. 28 00:01:08,166 --> 00:01:10,666 I would like you to create other variables. 29 00:01:10,666 --> 00:01:13,066 The first one is the total number of. Users. 30 00:01:13,066 --> 00:01:16,666 To whom we will show one of the ads, and that's 10,000. 31 00:01:16,666 --> 00:01:17,966 So I would like you to put this 32 00:01:17,966 --> 00:01:21,433 10,000 value in a variable that you can call capital N. 33 00:01:21,900 --> 00:01:23,533 Then I would like you to create 34 00:01:23,533 --> 00:01:27,066 another variable for the number of ads we have, meaning ten. 35 00:01:27,300 --> 00:01:30,300 And you can call this. Variable lowercase d. 36 00:01:30,500 --> 00:01:32,600 After this please create a variable. 37 00:01:32,600 --> 00:01:35,633 That will contain the. List of the selected. 38 00:01:35,633 --> 00:01:36,766 Ads over the round. 39 00:01:36,766 --> 00:01:41,100 So you know it will start as an empty list and will become a list 40 00:01:41,100 --> 00:01:44,866 of 10,000 elements corresponding to the 10,000 ads 41 00:01:44,900 --> 00:01:48,200 that were selected to the 10,000 users successively. 42 00:01:48,700 --> 00:01:50,900 Then please create these two variables. 43 00:01:50,900 --> 00:01:51,600 So this first. 44 00:01:51,600 --> 00:01:55,100 One and I n you can call it numbers of selections. 45 00:01:55,333 --> 00:01:57,300 And you have to of course initialize it. 46 00:01:57,300 --> 00:01:58,500 As a list of. 47 00:01:58,500 --> 00:02:01,400 Ten elements only containing zeros. 48 00:02:01,400 --> 00:02:03,966 And then I will show. You a trick to do that easily. 49 00:02:03,966 --> 00:02:08,533 And for the second variable or I n you can call it some of rewards and same. 50 00:02:08,666 --> 00:02:13,933 You have to initialize it as a list of ten elements, but initialize with ten zeros. 51 00:02:14,133 --> 00:02:16,666 And we will populate these two lists over the rounds. 52 00:02:16,666 --> 00:02:18,333 Okay. And finally. 53 00:02:18,333 --> 00:02:22,133 I'd like you to create one last variable, which is the total reward. 54 00:02:22,133 --> 00:02:23,500 And which will simply. Be. 55 00:02:23,500 --> 00:02:24,733 The sum of. 56 00:02:24,733 --> 00:02:26,500 All the rewards received. 57 00:02:26,500 --> 00:02:27,300 At each round. 58 00:02:27,300 --> 00:02:29,933 Because it's important to remember that. The. 59 00:02:29,933 --> 00:02:31,566 Zeros and ones in the. 60 00:02:31,566 --> 00:02:33,933 Data set. Are in fact the rewards. 61 00:02:33,933 --> 00:02:35,100 You know, the single. 62 00:02:35,100 --> 00:02:37,600 Rewards received at each round. 63 00:02:37,600 --> 00:02:39,300 If the user clicks the add. 64 00:02:39,300 --> 00:02:41,933 Then we get a reward of one at a particular round. 65 00:02:41,933 --> 00:02:43,700 And if the user doesn't collect. 66 00:02:43,700 --> 00:02:46,000 Yet, we get a reward of zero. 67 00:02:46,000 --> 00:02:47,500 We get no. Reward, basically. 68 00:02:47,500 --> 00:02:48,133 Okay. 69 00:02:48,133 --> 00:02:50,700 And the total. Reward here that I would like you to create as. 70 00:02:50,700 --> 00:02:51,900 A final variable. 71 00:02:51,900 --> 00:02:54,233 Will be the accumulated reward. 72 00:02:54,233 --> 00:02:57,066 Meaning the sum of. All the rewards collected. 73 00:02:57,066 --> 00:02:58,100 Over the round. 74 00:02:58,100 --> 00:02:58,800 All right. 75 00:02:58,800 --> 00:03:01,300 So let's do this. Please press pause on the video. 76 00:03:01,300 --> 00:03:05,066 And in a second we will implement the solution together. 77 00:03:06,566 --> 00:03:07,033 All right. 78 00:03:07,033 --> 00:03:09,266 Welcome back. Let's do this. 79 00:03:09,266 --> 00:03:10,500 So first let's create. 80 00:03:10,500 --> 00:03:12,333 A new code cell. And let's. 81 00:03:12,333 --> 00:03:15,066 Create each of these variables. One by one. 82 00:03:15,066 --> 00:03:16,433 So at first we said that we. 83 00:03:16,433 --> 00:03:18,900 Wanted to have a variable for the total number of. 84 00:03:18,900 --> 00:03:19,800 Users or the. 85 00:03:19,800 --> 00:03:22,500 Total number of rounds over which we're going to show ads. 86 00:03:22,500 --> 00:03:23,466 To the users. 87 00:03:23,466 --> 00:03:24,266 So there we go. 88 00:03:24,266 --> 00:03:27,200 We want to call it n capital n equals. 89 00:03:27,200 --> 00:03:29,466 And that's 10,000. 90 00:03:29,466 --> 00:03:31,500 All right. 10,000. Yes. 91 00:03:31,500 --> 00:03:33,066 Then we said we wanted to have a. 92 00:03:33,066 --> 00:03:36,233 Variable for the number of ads meaning ten. 93 00:03:36,433 --> 00:03:37,133 And we want to call. 94 00:03:37,133 --> 00:03:39,900 This variable lowercase d equals. 95 00:03:39,900 --> 00:03:41,700 Ten. Perfect. 96 00:03:41,700 --> 00:03:43,333 Then as we said we want to have. 97 00:03:43,333 --> 00:03:46,200 The full list of the ads that are selected. 98 00:03:46,200 --> 00:03:46,900 Over the round. 99 00:03:46,900 --> 00:03:49,233 So you know at first this will be an empty list. 100 00:03:49,233 --> 00:03:51,833 And over the rounds it will get bigger and bigger. 101 00:03:51,833 --> 00:03:52,833 Up to at the. 102 00:03:52,833 --> 00:03:54,333 End it. Will be a list of. 103 00:03:54,333 --> 00:03:59,933 10,000 elements, and the nth element will be the ad selected at run n. 104 00:03:59,933 --> 00:04:00,566 All right. 105 00:04:00,566 --> 00:04:04,000 So we're going to call this variable ad underscore selected. 106 00:04:04,566 --> 00:04:07,566 And this will be initialized as an empty list. 107 00:04:07,566 --> 00:04:09,766 Just like that ad selected. 108 00:04:09,766 --> 00:04:11,733 All right then next one. 109 00:04:11,733 --> 00:04:14,000 Well the next two ones are. These two. 110 00:04:14,000 --> 00:04:18,300 You know and I n number of times the ad I was selected up to run n 111 00:04:18,500 --> 00:04:22,200 and r I and the sum of rewards of the ad I up to round n. 112 00:04:22,500 --> 00:04:23,700 So for the first. 113 00:04:23,700 --> 00:04:28,200 One we will call it numbers of selections. 114 00:04:28,600 --> 00:04:32,366 And since we want to have these numbers of selections, you know, these numbers 115 00:04:32,366 --> 00:04:33,100 of times. 116 00:04:33,100 --> 00:04:35,800 Each ad was selected for all the ads. 117 00:04:35,800 --> 00:04:39,066 Well this will be. Initialized. As a list. 118 00:04:39,066 --> 00:04:42,733 But not an empty list, but a list of ten zeros. 119 00:04:42,966 --> 00:04:45,800 And the trick to initialize this list of ten zeros 120 00:04:45,800 --> 00:04:49,400 efficiently is to just add here times t. 121 00:04:50,033 --> 00:04:52,500 All right, just like that, this will initialize. 122 00:04:52,500 --> 00:04:54,933 This list as. A list of ten zeros. 123 00:04:54,933 --> 00:04:58,233 And then each time we select an ad, for example, ad number three. 124 00:04:58,233 --> 00:05:00,000 Well the third element of this. 125 00:05:00,000 --> 00:05:02,066 List will. Be incremented by one. 126 00:05:02,066 --> 00:05:03,800 All right. So at first it will be zero. 127 00:05:03,800 --> 00:05:06,600 Then let's say ad number three is selected. It will become one. 128 00:05:06,600 --> 00:05:09,000 Then let's say ad number. Five is selected. 129 00:05:09,000 --> 00:05:10,533 We'll replace zero by one. 130 00:05:10,533 --> 00:05:11,600 And then you know each round 131 00:05:11,600 --> 00:05:14,066 it is incremented each time a new ad is selected. 132 00:05:14,066 --> 00:05:14,733 Okay. 133 00:05:14,733 --> 00:05:17,100 And at the end we hopefully want to see one ad. 134 00:05:17,100 --> 00:05:18,700 That is selected way. 135 00:05:18,700 --> 00:05:20,166 More than the others, and. 136 00:05:20,166 --> 00:05:22,000 UCB will figure it out. 137 00:05:22,000 --> 00:05:24,400 Okay then next variable you know this one. 138 00:05:24,400 --> 00:05:27,866 The sum of rewards of the up to round n will same here. 139 00:05:27,866 --> 00:05:31,900 We want to have these sums of rewards for each of the add up to round n. 140 00:05:31,900 --> 00:05:34,166 And therefore we're going to create another list 141 00:05:34,166 --> 00:05:38,000 which we're going to call sums of rewards. 142 00:05:38,433 --> 00:05:39,833 Right. And same. 143 00:05:39,833 --> 00:05:43,600 This will be initialized as a list of ten zero. 144 00:05:43,600 --> 00:05:46,166 So I'm just copying and pasting this. 145 00:05:46,166 --> 00:05:47,100 All right. That's the same. 146 00:05:47,100 --> 00:05:51,266 And of course at first round well each ad has a sum of. 147 00:05:51,266 --> 00:05:52,766 Rewards equal to zero because. 148 00:05:52,766 --> 00:05:55,033 At the beginning no ad is selected and therefore no. 149 00:05:55,033 --> 00:05:57,000 Reward is. Collected. 150 00:05:57,000 --> 00:05:59,366 Then as we said, we want to have a final. 151 00:05:59,366 --> 00:06:01,200 Variable, which is the total. 152 00:06:01,200 --> 00:06:03,533 Reward accumulated. Over. The rounds. 153 00:06:03,533 --> 00:06:06,300 You know, with the different ads we select at each round. 154 00:06:06,300 --> 00:06:08,533 And let's call this. Variable total. 155 00:06:08,533 --> 00:06:10,333 Underscore. Reward. 156 00:06:10,333 --> 00:06:12,766 And of course we have to initialize it as. 157 00:06:12,766 --> 00:06:13,633 Zero because. 158 00:06:13,633 --> 00:06:18,800 In the first round well no AD is selected yet and therefore no reward is collected. 159 00:06:19,466 --> 00:06:20,466 Okay. Good. 160 00:06:20,466 --> 00:06:21,400 So we have all. 161 00:06:21,400 --> 00:06:24,233 The parameters all initialized correctly. 162 00:06:24,233 --> 00:06:26,933 And now what do you think will be the next step? 163 00:06:26,933 --> 00:06:27,900 Well of course the next. 164 00:06:27,900 --> 00:06:33,033 Step will be to start a for loop which will iterate through all. 165 00:06:33,033 --> 00:06:35,866 The different rounds, you know, starting from round. Zero. 166 00:06:35,866 --> 00:06:40,133 Because, you know, in Python indexes start from zero up to round 10,000 167 00:06:40,500 --> 00:06:42,133 and that each round, well. 168 00:06:42,133 --> 00:06:45,000 We will follow these two steps. 169 00:06:45,000 --> 00:06:48,400 You know, we will compute the average reward of up to run n. 170 00:06:48,400 --> 00:06:50,400 Then we will get the confidence interval. 171 00:06:50,400 --> 00:06:51,900 And in step. Three will select the. 172 00:06:51,900 --> 00:06:55,000 Ad that has the maximum upper confidence bound. 173 00:06:55,033 --> 00:06:58,033 You know the higher upper confidence bounds. 174 00:06:58,100 --> 00:06:59,800 All right. So you will see this will be very easy. 175 00:06:59,800 --> 00:07:01,433 We will just follow these steps. 176 00:07:01,433 --> 00:07:03,200 And it will ask you to implement them first. 177 00:07:03,200 --> 00:07:05,266 But no worries I will guide you. 178 00:07:05,266 --> 00:07:06,733 And so now let's take a. 179 00:07:06,733 --> 00:07:08,166 Little break because this for. 180 00:07:08,166 --> 00:07:10,866 Loop will, you know, actually take a few lines of code. 181 00:07:10,866 --> 00:07:12,933 So make sure to have good energy for this. 182 00:07:12,933 --> 00:07:14,700 And then let's smash this together. 183 00:07:14,700 --> 00:07:16,466 Until then enjoy machine learning.