1 00:00:00,233 --> 00:00:01,300 All right, let's do this. 2 00:00:01,300 --> 00:00:03,700 Let's implement Thompson sampling. 3 00:00:03,700 --> 00:00:06,600 So we already initialize, you know, the main parameters 4 00:00:06,600 --> 00:00:10,766 which are part of the common framework between Thompson sampling and UCB. 5 00:00:10,833 --> 00:00:14,666 So this is n the total number of rounds or the total number of users 6 00:00:14,666 --> 00:00:17,700 to whom we're going to show successively the ads. 7 00:00:17,966 --> 00:00:20,000 And then we have the number of ads which is ten. 8 00:00:20,000 --> 00:00:22,566 Here we're dealing with ten different designs of ads, 9 00:00:22,566 --> 00:00:26,733 which we are experimenting to figure out which one has the highest conversion rate. 10 00:00:27,100 --> 00:00:30,600 And then we have this list which is initialized as an empty list, but 11 00:00:30,600 --> 00:00:34,733 which will be populated over the rounds with the different ads that we select. 12 00:00:34,733 --> 00:00:35,300 So at the end 13 00:00:35,300 --> 00:00:39,100 it will contain all the different ads that were selected at each round. 14 00:00:39,466 --> 00:00:44,166 And now now what we're about to code is specific to Thompson sampling. 15 00:00:44,300 --> 00:00:45,300 So there we go. 16 00:00:45,300 --> 00:00:47,366 We're going to start with the first step. 17 00:00:47,366 --> 00:00:50,900 Step one which says that at each round n 18 00:00:51,066 --> 00:00:55,966 we actually consider two numbers for each ad I and I1N, 19 00:00:55,966 --> 00:00:58,966 which is the number of times two ad I get reward one 20 00:00:59,033 --> 00:01:02,033 up to round n and an I0N 21 00:01:02,066 --> 00:01:05,600 the number of times that I got reward zero up to round them. 22 00:01:05,833 --> 00:01:08,966 So basically what we have to do in this step one is create 23 00:01:09,100 --> 00:01:12,066 two variables here for these two numbers. 24 00:01:12,066 --> 00:01:16,400 And we have to initialize them of course as zero you know zero for both. 25 00:01:16,400 --> 00:01:19,400 Because of course before we begin, you know the first round. 26 00:01:19,400 --> 00:01:23,133 Well these two numbers are equal to zero because the ad didn't 27 00:01:23,133 --> 00:01:24,766 get any reward at all. 28 00:01:24,766 --> 00:01:26,966 So that's what I would like you to do. 29 00:01:26,966 --> 00:01:30,766 And since you know these numbers are for each ad, I. 30 00:01:30,900 --> 00:01:34,800 Well, what we're going to do is we will create a list of ten elements 31 00:01:35,000 --> 00:01:38,800 where each element is indeed this number for each of the ads. 32 00:01:38,833 --> 00:01:40,500 Right. And same for this one. 33 00:01:40,500 --> 00:01:42,333 So now your turn. 34 00:01:42,333 --> 00:01:45,900 Please press pause on the video and create two variables 35 00:01:45,900 --> 00:01:50,233 which will initialize these two lists of numbers. 36 00:01:50,400 --> 00:01:52,500 And you can call these two lists first. 37 00:01:52,500 --> 00:01:55,800 For this one, you can call it numbers of rewards, one 38 00:01:55,800 --> 00:02:00,300 you know with underscores and for this one you can call it numbers of rewards. 39 00:02:00,333 --> 00:02:01,566 Zero okay. 40 00:02:01,566 --> 00:02:04,566 So please press pause and I'll give you the solution in a second. 41 00:02:05,466 --> 00:02:07,766 Okay. Good. Let's do this together. 42 00:02:07,766 --> 00:02:11,933 So as we said we want to introduce two new variables which will be two lists. 43 00:02:12,200 --> 00:02:17,400 First, for the numbers of times each ad gets the reward one. 44 00:02:17,400 --> 00:02:21,900 And we're going to call that numbers of rewards 45 00:02:23,033 --> 00:02:24,366 one. All right. 46 00:02:24,366 --> 00:02:28,933 And as we said we want to initialize it as a list of ten elements. 47 00:02:29,300 --> 00:02:32,100 And all these elements should be initialized to zeros 48 00:02:32,100 --> 00:02:35,266 because at the beginning no AD got any reward. 49 00:02:35,266 --> 00:02:35,933 Right. 50 00:02:35,933 --> 00:02:39,366 And remember the trick to initialize a list of ten zeros. 51 00:02:39,566 --> 00:02:42,400 It is to put a zero here in square brackets, 52 00:02:42,400 --> 00:02:46,166 and then multiply this by the okay. 53 00:02:46,533 --> 00:02:53,066 And then very simply for the other list you know corresponding to these numbers. 54 00:02:53,266 --> 00:02:54,400 Well same. 55 00:02:54,400 --> 00:02:57,400 We're going to call that numbers of rewards 56 00:02:57,500 --> 00:03:00,433 not one but zero and same. 57 00:03:00,433 --> 00:03:03,500 We initialize it as a list of ten zeros. 58 00:03:03,500 --> 00:03:06,900 And over the rounds the elements that these lists will be populated 59 00:03:06,966 --> 00:03:07,733 for this one. 60 00:03:07,733 --> 00:03:12,600 As soon as an ad receives a reward, one in which case we increment 61 00:03:12,833 --> 00:03:16,466 the element corresponding to the add by one and same for this one. 62 00:03:16,466 --> 00:03:20,733 Each time an ad gets a reward zero, we increment the element 63 00:03:20,733 --> 00:03:22,733 corresponding to this ad in the list. 64 00:03:22,733 --> 00:03:24,300 Okay, so very simply. 65 00:03:24,300 --> 00:03:28,066 And that's the two parameters specific to Thompson sampling. 66 00:03:28,200 --> 00:03:29,366 So if you got this two 67 00:03:29,366 --> 00:03:32,566 well congratulations you just implemented step one. 68 00:03:33,100 --> 00:03:33,666 All right. 69 00:03:33,666 --> 00:03:36,233 So now according to you what do we have to do. 70 00:03:36,233 --> 00:03:38,900 Well not yet the big four loop. 71 00:03:38,900 --> 00:03:41,700 Remember that we also want to keep the accumulated 72 00:03:41,700 --> 00:03:44,700 reward that we, well, you know, accumulate over time. 73 00:03:44,766 --> 00:03:48,233 And we're going to introduce the same variable as in UCB. 74 00:03:48,266 --> 00:03:50,633 You know we could have even kept it in the framework. 75 00:03:50,633 --> 00:03:53,866 But remember that variable is total reward. 76 00:03:53,866 --> 00:03:56,100 So here we introduce the same new variable 77 00:03:56,100 --> 00:03:59,133 which of course we initialize still to zero. 78 00:03:59,400 --> 00:04:01,733 And now now we can start the for loop. 79 00:04:01,733 --> 00:04:04,733 You know that main for loop which will iterate 80 00:04:04,933 --> 00:04:07,800 through all the 10,000 rounds in each. 81 00:04:07,800 --> 00:04:12,733 We will show an ad to a user, which will decide to click yes or no on the add. 82 00:04:12,933 --> 00:04:14,933 And so there we go. Let's stop this loop. 83 00:04:14,933 --> 00:04:18,600 So it's exactly the same as in the UCB we still was for. 84 00:04:18,600 --> 00:04:22,900 Then we choose n for the iterated variable because same and represents the rounds. 85 00:04:23,133 --> 00:04:28,500 And then and will go in the range from same zero. 86 00:04:28,500 --> 00:04:32,833 Because indexes in Python start from zero up to n, right. 87 00:04:32,833 --> 00:04:36,366 The total number of rounds, or the total number of customers of users 88 00:04:36,566 --> 00:04:40,466 to whom we show successively the yhat and then colon. 89 00:04:40,633 --> 00:04:41,600 And there we go. 90 00:04:41,600 --> 00:04:44,600 Here we are inside this big first for loop. 91 00:04:44,700 --> 00:04:45,033 All right. 92 00:04:45,033 --> 00:04:48,033 So according to you, what will be the first step here? 93 00:04:48,100 --> 00:04:50,566 You know, there is a second full loop coming up. 94 00:04:50,566 --> 00:04:53,766 It is the full loop that will iterate through the different ads. 95 00:04:54,000 --> 00:04:57,333 But before this we need to introduce a new variable, 96 00:04:57,333 --> 00:05:00,433 which we already did in the UCB implementation. 97 00:05:00,633 --> 00:05:03,100 Remember we need to introduce that variable, 98 00:05:03,100 --> 00:05:06,700 which will be the ad that we select at each front end. 99 00:05:06,966 --> 00:05:12,266 And as in the UCB implementation we will introduce that variable as ad right 100 00:05:12,466 --> 00:05:17,600 ad which will be the index of the ad that will be selected at each round end. 101 00:05:17,966 --> 00:05:20,700 And we initialize this to zero 102 00:05:20,700 --> 00:05:23,733 because you know we're going to have that second for loop that will iterate through 103 00:05:23,733 --> 00:05:27,266 different ads starting from zero up to the last one of index nine. 104 00:05:27,366 --> 00:05:28,200 Okay. 105 00:05:28,200 --> 00:05:31,800 Then let's have a look at the algorithm and let's see 106 00:05:31,800 --> 00:05:33,866 if we need to introduce a new variable. 107 00:05:33,866 --> 00:05:36,366 You know, before we start this second for loop, 108 00:05:36,366 --> 00:05:41,233 I remind that, you know, for UCB we indeed had to introduce a new variable here, 109 00:05:41,233 --> 00:05:44,400 which was max up about, you know, we introduced Max upper bound 110 00:05:44,400 --> 00:05:48,300 and initialized it to zero before we started the second loop. 111 00:05:48,433 --> 00:05:50,366 Okay. So here that's kind of the same. 112 00:05:50,366 --> 00:05:53,700 Let's see if we have to introduce a new variable other than add one 113 00:05:53,900 --> 00:05:56,800 before starting this second for loop. 114 00:05:56,800 --> 00:06:00,266 And well you know, when we have a look at step three here, 115 00:06:00,466 --> 00:06:03,600 we see that indeed we select the ad that has the highest 116 00:06:03,833 --> 00:06:07,500 of these random draws, you know, taken from the beta distributions. 117 00:06:07,766 --> 00:06:11,366 And therefore there is still a maximum value to consider here. 118 00:06:11,700 --> 00:06:15,233 And that's exactly what we will introduce as a new variable 119 00:06:15,233 --> 00:06:17,666 just before starting this second for loop. 120 00:06:17,666 --> 00:06:21,933 That new variable will be exactly that maximum of the random draws. 121 00:06:21,933 --> 00:06:25,500 And we're going to call it max underscore random 122 00:06:25,766 --> 00:06:30,466 okay max random which we will initialize of course to zero. 123 00:06:30,700 --> 00:06:33,700 And now now we can start the second loop. 124 00:06:33,700 --> 00:06:36,000 All right. So let's do this. Try to do it faster than me. 125 00:06:36,000 --> 00:06:38,966 You know exactly how to do it. We start with four. 126 00:06:38,966 --> 00:06:42,000 Then you know the iterated variable will still be I, 127 00:06:42,000 --> 00:06:47,000 which will iterate through the different ads from 0 to 9, because are the indexes. 128 00:06:47,233 --> 00:06:51,833 So for I in the range from indeed zero to 129 00:06:51,900 --> 00:06:57,800 well d, because we had to divide both for the number of ads and then colon. 130 00:06:57,833 --> 00:06:59,066 And there we go. 131 00:06:59,066 --> 00:07:04,466 Now is the time we implement the next step in the Thompson sampling algorithm. 132 00:07:04,766 --> 00:07:08,833 I'm talking of course about step two where for each ad I. 133 00:07:08,866 --> 00:07:12,433 Well, that's exactly what this second loop is about. 134 00:07:12,433 --> 00:07:15,666 You know, it will implement step two for each of the ad. 135 00:07:15,666 --> 00:07:17,733 I from 0 to 9. 136 00:07:17,733 --> 00:07:20,733 And so for each of these ads, we take a random draw 137 00:07:20,800 --> 00:07:25,166 from the data distribution given below of parameters. 138 00:07:25,166 --> 00:07:29,300 And I n plus one where an n is of course number of times 139 00:07:29,500 --> 00:07:33,466 that I got reward one up to run n and the second parameter 140 00:07:33,466 --> 00:07:36,700 and i0 and plus one where NI0N is 141 00:07:36,700 --> 00:07:40,333 of course a number of times I got reward zero up to running. 142 00:07:40,700 --> 00:07:45,633 So that means you can totally implement this step to on your own, 143 00:07:45,633 --> 00:07:47,700 because you have all the parameters, right. 144 00:07:47,700 --> 00:07:50,300 We already introduce these two new variables. 145 00:07:50,300 --> 00:07:51,966 We initialize them the right way. 146 00:07:51,966 --> 00:07:55,400 And so now you can totally implement this step two on your own 147 00:07:55,400 --> 00:07:58,033 before we do it together, I will just give you a hint. 148 00:07:58,033 --> 00:07:59,566 Unless you don't want to listen to it 149 00:07:59,566 --> 00:08:03,366 and look into the documentation online, which is even better. 150 00:08:03,566 --> 00:08:08,033 But I'll give you that hint anyway, the way to get the random draw 151 00:08:08,033 --> 00:08:10,666 from a beta distribution of these parameters 152 00:08:10,666 --> 00:08:15,700 is given by a certain function of well, this random library, which we, by the way, 153 00:08:15,700 --> 00:08:19,166 already imported in our implementation, that random library. 154 00:08:19,333 --> 00:08:22,200 Well, this random library contains a certain function 155 00:08:22,200 --> 00:08:25,766 which can give you exactly what you want here, meaning a random 156 00:08:25,766 --> 00:08:28,966 draw from the data distribution of any parameters. 157 00:08:29,133 --> 00:08:32,700 And that function is called beta var it in one word 158 00:08:32,733 --> 00:08:35,733 betta var iat. 159 00:08:36,200 --> 00:08:37,500 Okay, so that's the function. 160 00:08:37,500 --> 00:08:38,900 That's all I'm going to tell you. 161 00:08:38,900 --> 00:08:41,833 And so now you can really implement this step to on your own. 162 00:08:41,833 --> 00:08:43,966 So please press pause on this video 163 00:08:43,966 --> 00:08:47,133 and will implement this solution together in a few seconds. 164 00:08:47,600 --> 00:08:48,233 All right. 165 00:08:48,233 --> 00:08:48,966 Are you ready? 166 00:08:48,966 --> 00:08:50,433 I hope you got it correct. 167 00:08:50,433 --> 00:08:53,366 Really there is no trap so it should be totally fine. 168 00:08:53,366 --> 00:08:54,733 But let's do this. 169 00:08:54,733 --> 00:08:55,533 So here we are 170 00:08:55,533 --> 00:08:58,800 at the beginning of the second full loop, iterating through the different ads. 171 00:08:59,000 --> 00:09:02,433 And so what we have to do is collect for each of these ads. 172 00:09:02,700 --> 00:09:05,733 This particular number taken as a random draw of this 173 00:09:05,733 --> 00:09:09,200 beta distribution called with these parameters. 174 00:09:09,200 --> 00:09:10,333 So let's do this. 175 00:09:10,333 --> 00:09:11,000 First we're going 176 00:09:11,000 --> 00:09:14,833 to introduce a new variable which will be exactly that random draw. 177 00:09:14,833 --> 00:09:18,600 And we're going to call it random underscore data. 178 00:09:18,866 --> 00:09:20,433 So that's exactly the random draw 179 00:09:20,433 --> 00:09:23,766 taken from the beta distribution of these two parameters. 180 00:09:24,000 --> 00:09:25,033 So random beta. 181 00:09:25,033 --> 00:09:29,033 And then as I told you, the way to get the random draw from a 182 00:09:29,066 --> 00:09:31,266 beta distribution of certain parameters. 183 00:09:31,266 --> 00:09:34,933 Well we first need to call the random library 184 00:09:34,933 --> 00:09:39,733 from which we're going to call the beta variate function. 185 00:09:39,933 --> 00:09:41,733 All right so that's the function. 186 00:09:41,733 --> 00:09:45,666 And inside this function well of course we input these two parameters. 187 00:09:45,666 --> 00:09:49,300 And I'm plus one and I and zero plus one. 188 00:09:49,666 --> 00:09:50,733 And since right now 189 00:09:50,733 --> 00:09:55,233 we're dealing with a specific ad you know the ad I in this second for loop. 190 00:09:55,233 --> 00:09:58,400 So it starts from zero and then goes to 1 to 3 etc. 191 00:09:58,400 --> 00:09:59,400 up to nine. 192 00:09:59,400 --> 00:10:02,266 Well these two parameters that we have to enter are 193 00:10:02,266 --> 00:10:06,400 for this specific ad I meaning for the specific index 194 00:10:06,400 --> 00:10:10,600 I in these two numbers, and I want n and i0 n okay. 195 00:10:10,900 --> 00:10:14,033 And therefore the way to get these two numbers 196 00:10:14,033 --> 00:10:19,200 is by calling our list of numbers of rewards one. 197 00:10:19,200 --> 00:10:22,366 You know, the number of times each ad, and especially our ad, 198 00:10:22,366 --> 00:10:25,633 I we're dealing with right now got the reward one. 199 00:10:25,833 --> 00:10:27,366 So we take this list. 200 00:10:27,366 --> 00:10:32,500 We paste it inside this beta void function as the first parameter. 201 00:10:32,766 --> 00:10:35,500 And then of course we take the index I. 202 00:10:35,500 --> 00:10:35,833 Right. 203 00:10:35,833 --> 00:10:40,200 Because we're dealing right now with this specific ad I in this second for loop. 204 00:10:40,533 --> 00:10:43,566 And then it's going to forget the plus one. 205 00:10:44,266 --> 00:10:47,133 And then same for the second parameter. 206 00:10:47,133 --> 00:10:53,033 That's going to be the number of times this particular ad I we're dealing with 207 00:10:53,033 --> 00:10:57,166 right now in the second for loop has received the reward zero. 208 00:10:57,500 --> 00:10:59,966 So here I'm basing that. 209 00:10:59,966 --> 00:11:02,866 But I don't forget to take the index 210 00:11:02,866 --> 00:11:07,833 I of this ad I and then of course I ad plus one okay. 211 00:11:07,833 --> 00:11:08,633 And that's it. 212 00:11:08,633 --> 00:11:11,533 That's all you had to do for this. Step two. 213 00:11:11,533 --> 00:11:15,100 So you see it's actually a bit simpler than the UCB algorithm. 214 00:11:15,300 --> 00:11:19,933 But now I trust that you're going to use the same right trick to implement 215 00:11:19,933 --> 00:11:20,500 step three. 216 00:11:20,500 --> 00:11:22,100 Because indeed in this step three 217 00:11:22,100 --> 00:11:26,400 we have to select the ad that has the highest of these random draws. 218 00:11:26,400 --> 00:11:30,366 You know, we took ten successive random draws for each of these ad, 219 00:11:30,400 --> 00:11:34,366 but inside the same for loop that is iterating through the ad. 220 00:11:34,466 --> 00:11:39,466 We need to use a trick to keep in memory the highest of these random draws. 221 00:11:39,466 --> 00:11:43,233 And at the end of the full loop, we'll get the final highest of these random draws. 222 00:11:43,633 --> 00:11:48,100 So the trick that you have to use is exactly the same as in UCB. 223 00:11:48,366 --> 00:11:53,200 You're going to use, of course, this variable max random, which we introduced 224 00:11:53,433 --> 00:11:56,433 and initialized to zero right before this second full loop. 225 00:11:56,600 --> 00:12:01,100 And you're going to use it to keep in memory over this second for loop. 226 00:12:01,300 --> 00:12:03,633 Well the maximum of these random draws. 227 00:12:03,633 --> 00:12:06,000 So that's the new exercise for you. 228 00:12:06,000 --> 00:12:10,200 Your new exercise is basically to implement step three with that same trick. 229 00:12:10,200 --> 00:12:13,200 We already learned in the UCB implementation. 230 00:12:13,233 --> 00:12:17,100 And now I'll see you in next tutorial to implement this solution with you. 231 00:12:17,466 --> 00:12:19,366 And until then enjoy machine learning.