1 00:00:00,200 --> 00:00:01,033 Hello my friends. 2 00:00:01,033 --> 00:00:04,133 Welcome back to this implementation of Thompson Sampling. 3 00:00:04,166 --> 00:00:07,333 So at the end of the previous tutorial I asked you to. 4 00:00:07,333 --> 00:00:08,733 Implement by. Yourself. 5 00:00:08,733 --> 00:00:11,766 Well step three now that step. One and step. 6 00:00:11,766 --> 00:00:13,800 Two are well. Implemented. 7 00:00:13,800 --> 00:00:14,300 And so your. 8 00:00:14,300 --> 00:00:17,566 Exercise in step three was to indeed figure out 9 00:00:17,566 --> 00:00:20,666 the trick to select the ad that has the highest. 10 00:00:21,000 --> 00:00:25,366 Random draw among all these random draws taken from the different beta. 11 00:00:25,366 --> 00:00:27,533 Distributions for each of the ads. 12 00:00:27,533 --> 00:00:29,133 So I hope that you tried. 13 00:00:29,133 --> 00:00:30,800 Congratulations that you succeeded. 14 00:00:30,800 --> 00:00:33,266 And even congratulations if you only tried. 15 00:00:33,266 --> 00:00:36,000 What matters is. That you know, you. Practice. 16 00:00:36,000 --> 00:00:36,866 And no worries. 17 00:00:36,866 --> 00:00:39,100 Now we're going to implement this solution together. 18 00:00:39,100 --> 00:00:40,233 So let's do this. 19 00:00:40,233 --> 00:00:43,166 So the first question I have for you is do we have. 20 00:00:43,166 --> 00:00:45,066 To stay in this second for. Loop. 21 00:00:45,066 --> 00:00:46,500 Well yes of course. 22 00:00:46,500 --> 00:00:48,633 Because that trick to. 23 00:00:48,633 --> 00:00:51,333 Keep in memory the maximum of the run draws. 24 00:00:51,333 --> 00:00:52,566 Has. To be done through. 25 00:00:52,566 --> 00:00:53,466 All this. 26 00:00:53,466 --> 00:00:56,100 Second full. Loop, you know, iterating through the. Ads. 27 00:00:56,100 --> 00:00:56,966 And it's only 28 00:00:56,966 --> 00:01:01,500 until we get the final random draw that we can know which one is the maximum. 29 00:01:01,766 --> 00:01:02,966 So of course we stay. 30 00:01:02,966 --> 00:01:04,133 Inside. This for loop. 31 00:01:04,133 --> 00:01:06,800 And well, very simply, since, you know. 32 00:01:06,800 --> 00:01:08,400 We have this. Random beta. 33 00:01:08,400 --> 00:01:09,000 Here, which. 34 00:01:09,000 --> 00:01:12,866 Is indeed the random draw from the beta distribution of the. 35 00:01:12,866 --> 00:01:15,600 Ad I we are. Dealing with right now. 36 00:01:15,600 --> 00:01:16,366 Well, what. 37 00:01:16,366 --> 00:01:19,466 We have to do now is naturally. To. Compare. 38 00:01:19,633 --> 00:01:21,533 This random draw to. 39 00:01:21,533 --> 00:01:24,000 Max random the maximum the random draw. 40 00:01:24,000 --> 00:01:26,500 Which okay so far is initialized to zero. 41 00:01:26,500 --> 00:01:28,200 You know, at the beginning. Of the for loop. 42 00:01:28,200 --> 00:01:31,466 But then of course you guessed that we will update that value. 43 00:01:31,466 --> 00:01:32,533 Of max random. 44 00:01:32,533 --> 00:01:34,833 To the. New random beta. 45 00:01:34,833 --> 00:01:36,366 If the new random beta. 46 00:01:36,366 --> 00:01:38,533 Is. Higher than the maximum, and. 47 00:01:38,533 --> 00:01:39,566 Therefore. You know. 48 00:01:39,566 --> 00:01:44,100 Over the iterations of this second loop, iterating through the at well. 49 00:01:44,100 --> 00:01:47,100 Max random will. Be updated each. Time we. 50 00:01:47,100 --> 00:01:48,033 Get a. Random. 51 00:01:48,033 --> 00:01:50,066 Beta that is higher. Than the. 52 00:01:50,066 --> 00:01:51,666 Previous. Random betas, meaning. 53 00:01:51,666 --> 00:01:54,533 That is higher. Than this max random parameter. 54 00:01:54,533 --> 00:01:57,433 So that's exactly the same trick as before. 55 00:01:57,433 --> 00:01:59,433 You know, with the. UCB algorithm. 56 00:01:59,433 --> 00:02:03,633 When we were updating that maximum of the upper confidence bounds at. 57 00:02:03,633 --> 00:02:05,700 Each. Iteration of that second for loop. 58 00:02:05,700 --> 00:02:07,433 So here it's exactly the same. 59 00:02:07,433 --> 00:02:09,600 And therefore, you know what is the next step. 60 00:02:09,600 --> 00:02:12,466 Now the next step I'm going to scroll down a bit. 61 00:02:12,466 --> 00:02:14,533 The next step now is. To start. 62 00:02:14,533 --> 00:02:16,566 With an if condition. 63 00:02:16,566 --> 00:02:18,266 And if condition which. 64 00:02:18,266 --> 00:02:21,533 Will check if well that. 65 00:02:21,533 --> 00:02:22,800 Random draw. 66 00:02:22,800 --> 00:02:23,500 You know this random. 67 00:02:23,500 --> 00:02:26,333 Beta variable here is higher. 68 00:02:26,333 --> 00:02:28,500 Than this maximum random draw. 69 00:02:28,500 --> 00:02:29,966 And if it is. The case. 70 00:02:29,966 --> 00:02:32,300 Then inside this if condition, we will. Say. 71 00:02:32,300 --> 00:02:33,666 That this maximum random. 72 00:02:33,666 --> 00:02:35,366 Draw has to be updated to. 73 00:02:35,366 --> 00:02:37,566 Become the new random beta. 74 00:02:37,566 --> 00:02:39,433 That was just higher. Okay. 75 00:02:39,433 --> 00:02:40,633 And at the same time. 76 00:02:40,633 --> 00:02:42,900 We will select the add. Of index. 77 00:02:42,900 --> 00:02:45,333 I. Inside the same if condition. 78 00:02:45,333 --> 00:02:48,900 Because anyway, if we get a new higher random beta. 79 00:02:49,166 --> 00:02:51,300 Will this add will be updated as well? 80 00:02:51,300 --> 00:02:52,633 Okay so I'll show you. 81 00:02:52,633 --> 00:02:55,800 Let's first write this if condition. So if. 82 00:02:55,800 --> 00:02:57,766 Random underscore. 83 00:02:57,766 --> 00:02:59,866 Beta is larger. 84 00:02:59,866 --> 00:03:02,400 Than max random. 85 00:03:02,400 --> 00:03:05,233 All right then colon and side. 86 00:03:05,233 --> 00:03:06,066 Well what do we do. 87 00:03:06,066 --> 00:03:08,200 We of course update the value. 88 00:03:08,200 --> 00:03:09,366 Of max random. 89 00:03:09,366 --> 00:03:11,633 You know the maximum of the random draws. 90 00:03:11,633 --> 00:03:12,600 To. Become. 91 00:03:12,600 --> 00:03:13,800 That new. 92 00:03:13,800 --> 00:03:14,866 Random. 93 00:03:14,866 --> 00:03:17,400 Beta, which is indeed higher. 94 00:03:17,400 --> 00:03:19,400 Than the maximum random draw. 95 00:03:19,400 --> 00:03:20,666 Collected so far. 96 00:03:20,666 --> 00:03:21,500 And then as we said. 97 00:03:21,500 --> 00:03:25,000 Well, if that's the case, you know, if the ads I we're dealing. 98 00:03:25,000 --> 00:03:27,000 With right now has a random beta. 99 00:03:27,000 --> 00:03:28,033 That is higher. 100 00:03:28,033 --> 00:03:30,033 Than the max random draw. Well. 101 00:03:30,033 --> 00:03:32,433 We can. Select this add. 102 00:03:32,433 --> 00:03:35,433 To become this index AI. All right. 103 00:03:35,466 --> 00:03:36,733 So let's simulate. 104 00:03:36,733 --> 00:03:37,200 You know the. 105 00:03:37,200 --> 00:03:38,833 First iterations of this loop. 106 00:03:38,833 --> 00:03:41,166 First max random is. Equal to zero. 107 00:03:41,166 --> 00:03:43,266 Then I begins at zero. 108 00:03:43,266 --> 00:03:43,766 So first. 109 00:03:43,766 --> 00:03:44,600 We're dealing indeed. 110 00:03:44,600 --> 00:03:46,800 With the first. Add of index. Zero. 111 00:03:46,800 --> 00:03:49,100 We compute the random draw from the. 112 00:03:49,100 --> 00:03:50,266 Beta distribution of this. 113 00:03:50,266 --> 00:03:52,133 First add of index zero. 114 00:03:52,133 --> 00:03:55,000 Then of course random beta will be larger than max random. 115 00:03:55,000 --> 00:03:57,133 So max random will be. Updated to. 116 00:03:57,133 --> 00:03:59,600 Become the value of that random draw the. 117 00:03:59,600 --> 00:04:02,300 First add. And we will select. That. For that. 118 00:04:02,300 --> 00:04:04,366 Then I will be. Equal to one. 119 00:04:04,366 --> 00:04:07,500 You know the index one mean the index of the second add. 120 00:04:07,833 --> 00:04:09,700 Then we will take the random draw 121 00:04:09,700 --> 00:04:13,033 from the beta distribution of this second add of index one. 122 00:04:13,400 --> 00:04:14,166 If this. 123 00:04:14,166 --> 00:04:15,100 Random draw is. 124 00:04:15,100 --> 00:04:15,733 Higher. 125 00:04:15,733 --> 00:04:18,500 Than max random, which is equal to the random draw. 126 00:04:18,500 --> 00:04:19,733 Of the previous add. 127 00:04:19,733 --> 00:04:21,200 Then we. Will update. 128 00:04:21,200 --> 00:04:24,133 Max random to become this new random draw. 129 00:04:24,133 --> 00:04:25,500 Of this new add. And. 130 00:04:25,500 --> 00:04:27,400 We will select this new add to. 131 00:04:27,400 --> 00:04:29,000 Become the add selected. 132 00:04:29,000 --> 00:04:30,200 And otherwise, if. 133 00:04:30,200 --> 00:04:33,200 You know this new random draw from this new add 134 00:04:33,266 --> 00:04:37,066 is not larger than max random, then we will do nothing and we. 135 00:04:37,066 --> 00:04:38,566 Will just keep the previous. 136 00:04:38,566 --> 00:04:40,466 Add and the previous max random. 137 00:04:40,466 --> 00:04:42,000 And we repeat this, you know, 138 00:04:42,000 --> 00:04:45,833 through the iterations of this second for loop up to the last add. 139 00:04:45,833 --> 00:04:47,766 You know the add of index nine and. 140 00:04:47,766 --> 00:04:50,300 Successively well. Max random will be. Updated. 141 00:04:50,300 --> 00:04:51,133 If needed. 142 00:04:51,133 --> 00:04:54,133 If the random draw is larger than the maximum random draw. 143 00:04:54,300 --> 00:04:57,000 And we will. Select the right add accordingly. 144 00:04:57,000 --> 00:04:58,900 Okay, so very easy trick. 145 00:04:58,900 --> 00:05:00,733 And you know that's. A classic. Trick in Python. 146 00:05:00,733 --> 00:05:02,400 To get. The maximum of. 147 00:05:02,400 --> 00:05:04,433 A list through a for loop. 148 00:05:04,433 --> 00:05:06,066 So it's really good. That you know this trick. 149 00:05:06,066 --> 00:05:08,000 And it's really good that you did it twice. 150 00:05:08,000 --> 00:05:09,600 You know, once with. USB. 151 00:05:09,600 --> 00:05:12,300 To compute the maximum of the upper confidence bounds. 152 00:05:12,300 --> 00:05:16,200 And with Thompson sampling to compute the maximum of the random draws. 153 00:05:16,433 --> 00:05:18,033 So now. You're all set with. 154 00:05:18,033 --> 00:05:19,666 Getting maximum techniques. 155 00:05:19,666 --> 00:05:21,133 And therefore. 156 00:05:21,133 --> 00:05:22,566 Time to move on to the next step. 157 00:05:22,566 --> 00:05:23,266 Now by the way. 158 00:05:23,266 --> 00:05:25,933 Congratulations if you got this right. 159 00:05:25,933 --> 00:05:28,700 I know it's not that easy the first time, but you know by. 160 00:05:28,700 --> 00:05:29,500 Just repeating this. 161 00:05:29,500 --> 00:05:31,700 Over time you will become. So good at this. 162 00:05:31,700 --> 00:05:33,933 And now. What, according to you. 163 00:05:33,933 --> 00:05:35,366 Will be the next step. Here. 164 00:05:35,366 --> 00:05:39,166 Well, I imagine some of you are wondering if we need to do an else here. 165 00:05:39,400 --> 00:05:40,866 And the answer to that. Is. 166 00:05:40,866 --> 00:05:42,633 No, we don't need to do an else. 167 00:05:42,633 --> 00:05:44,700 Because if. This condition is not. 168 00:05:44,700 --> 00:05:47,800 True, well, we'll just keep the last ad that. 169 00:05:47,800 --> 00:05:49,800 Had this maximum random. Draw. 170 00:05:49,800 --> 00:05:51,833 So no need to do an else here. 171 00:05:51,833 --> 00:05:53,700 And in. Fact, well. The second for. 172 00:05:53,700 --> 00:05:54,800 Loop is. Done. 173 00:05:54,800 --> 00:05:57,000 Because it gives us exactly what. We want. 174 00:05:57,000 --> 00:05:59,400 It gives us the ad that has. 175 00:05:59,400 --> 00:06:01,933 The maximum random draw among all. The ads. 176 00:06:01,933 --> 00:06:03,600 From 0 to 9. 177 00:06:03,600 --> 00:06:07,600 And therefore that's why I got out of this second for loop. 178 00:06:07,766 --> 00:06:10,766 And I go back into. This first for loop. 179 00:06:11,033 --> 00:06:12,300 You know, right here. 180 00:06:12,300 --> 00:06:13,833 I'm at this level now. 181 00:06:13,833 --> 00:06:16,300 Right? Okay. So there we. Go. 182 00:06:16,300 --> 00:06:19,233 Now we just need to finish this implementation because. 183 00:06:19,233 --> 00:06:20,066 Basically. 184 00:06:20,066 --> 00:06:22,200 You know, step one, step two and step. 185 00:06:22,200 --> 00:06:23,366 Three. Are done. 186 00:06:23,366 --> 00:06:24,200 I know it's a little. 187 00:06:24,200 --> 00:06:25,800 Bit easier than the UCB. 188 00:06:25,800 --> 00:06:28,200 Algorithm, but you're going to see that in the end. 189 00:06:28,200 --> 00:06:30,266 The. Result might be even better. 190 00:06:30,266 --> 00:06:32,433 I won't. Tell you exactly if it's going to be the case, 191 00:06:32,433 --> 00:06:34,133 but you're going to see we're going. 192 00:06:34,133 --> 00:06:36,166 To have a good. Surprise for the end. 193 00:06:36,166 --> 00:06:36,633 Okay. 194 00:06:36,633 --> 00:06:38,833 So now let's finish this implementation. 195 00:06:38,833 --> 00:06:39,700 Basically we just need. 196 00:06:39,700 --> 00:06:42,666 To, you know, update the different variables we have here. 197 00:06:42,666 --> 00:06:46,800 You know, we need to update exactly these four variables. 198 00:06:46,800 --> 00:06:47,700 Add selected. 199 00:06:47,700 --> 00:06:50,266 Numbers of rewards one numbers. Of reward zero. 200 00:06:50,266 --> 00:06:51,333 And total. Reward. 201 00:06:51,333 --> 00:06:54,000 So I suggest we start by updating. 202 00:06:54,000 --> 00:06:54,900 Add. Selected. 203 00:06:54,900 --> 00:06:56,166 So I'm going to copy this. 204 00:06:56,166 --> 00:06:58,500 Actually this will be simpler. 205 00:06:58,500 --> 00:07:00,633 And then right here you know make. 206 00:07:00,633 --> 00:07:02,400 Sure to be at the right level. 207 00:07:02,400 --> 00:07:05,133 You know at the level of this first for loop. 208 00:07:05,133 --> 00:07:07,266 You know for n in range zero to n. 209 00:07:07,266 --> 00:07:08,566 And now I'm going to paste. 210 00:07:08,566 --> 00:07:10,300 That add selected variable. 211 00:07:10,300 --> 00:07:11,133 And so now over. 212 00:07:11,133 --> 00:07:13,433 To you according. To what do we need to. Do here. 213 00:07:13,433 --> 00:07:16,433 How do we need to update that add selected variable. 214 00:07:16,633 --> 00:07:17,700 Well exactly. 215 00:07:17,700 --> 00:07:18,400 The same as in. 216 00:07:18,400 --> 00:07:20,566 UCB of course since this variable. 217 00:07:20,566 --> 00:07:21,666 Corresponds to. 218 00:07:21,666 --> 00:07:25,433 The full list of the ads that are selected. 219 00:07:25,433 --> 00:07:27,033 Over time, you know it contains. 220 00:07:27,033 --> 00:07:29,633 All the ads are selected over. Time up. To. 221 00:07:29,633 --> 00:07:31,366 The 10,000 rounds, and. 222 00:07:31,366 --> 00:07:32,400 Well, each. Time we. 223 00:07:32,400 --> 00:07:34,900 Select a new ad, we need, of course, to. 224 00:07:34,900 --> 00:07:35,966 Append. 225 00:07:35,966 --> 00:07:39,666 This new ad that was just selected, which has index ad, 226 00:07:39,666 --> 00:07:43,166 of course, into this ad selected list. 227 00:07:43,433 --> 00:07:44,900 And therefore in the append function. 228 00:07:44,900 --> 00:07:47,200 We of course need to input at all. 229 00:07:47,200 --> 00:07:48,366 Right. Perfect. 230 00:07:48,366 --> 00:07:53,333 So that appends the add to this full list of the ad selected which will be. 231 00:07:53,333 --> 00:07:56,266 Remember the input. Of the. Histogram. 232 00:07:56,266 --> 00:07:57,400 Then as we said. 233 00:07:57,400 --> 00:07:59,333 We need to update these two. 234 00:07:59,333 --> 00:08:00,800 Variables numbers of. Rewards. 235 00:08:00,800 --> 00:08:02,600 One and numbers of reward. Zero. 236 00:08:02,600 --> 00:08:04,500 But let's think this through. 237 00:08:04,500 --> 00:08:07,266 How do we need to. Update these two variables. 238 00:08:07,266 --> 00:08:08,300 Well that actually. 239 00:08:08,300 --> 00:08:11,133 Depends on something something. Specific. 240 00:08:11,133 --> 00:08:12,300 It is whether. 241 00:08:12,300 --> 00:08:13,166 When you know, we. 242 00:08:13,166 --> 00:08:16,000 Selected that ad at this particular. 243 00:08:16,000 --> 00:08:18,433 Round for. A particular customer. 244 00:08:18,433 --> 00:08:19,500 It depends on whether the. 245 00:08:19,500 --> 00:08:23,400 Ad was selected, got a reward one or a reward zero. 246 00:08:23,733 --> 00:08:24,833 Because indeed. 247 00:08:24,833 --> 00:08:27,600 If it got a reward one, while this. 248 00:08:27,600 --> 00:08:30,500 Needs to be incremented. By. One, and if. 249 00:08:30,500 --> 00:08:33,200 It got a reward zero, well, this. 250 00:08:33,200 --> 00:08:36,266 Needs to be incremented. By zero, right? 251 00:08:36,366 --> 00:08:38,400 We can actually see that clearly in. 252 00:08:38,400 --> 00:08:42,100 Step one and I1N is the number of times that I. 253 00:08:42,100 --> 00:08:42,633 Got reward. 254 00:08:42,633 --> 00:08:45,233 One up to round n. So we add. A new round. 255 00:08:45,233 --> 00:08:46,366 We get a reward one. 256 00:08:46,366 --> 00:08:48,666 Well this has to be incremented by one. 257 00:08:48,666 --> 00:08:51,133 And this one this is number of times that. 258 00:08:51,133 --> 00:08:53,233 I got reward zero up to. Round n. 259 00:08:53,233 --> 00:08:54,466 So if at our round end. 260 00:08:54,466 --> 00:08:56,500 We go to reward zero, well this needs to. 261 00:08:56,500 --> 00:08:58,533 Be incremented by one. 262 00:08:58,533 --> 00:08:59,400 And therefore. 263 00:08:59,400 --> 00:09:01,200 What we need. To do here since. 264 00:09:01,200 --> 00:09:03,100 We have, you know, two different conditions. 265 00:09:03,100 --> 00:09:04,200 Well we need to. 266 00:09:04,200 --> 00:09:05,233 Of course naturally. 267 00:09:05,233 --> 00:09:06,300 An. If. 268 00:09:06,300 --> 00:09:10,166 Condition once again but a simple one that is if condition that. 269 00:09:10,166 --> 00:09:13,800 We'll just check if well the reward is equal to one. 270 00:09:14,033 --> 00:09:15,566 So so. Far I'm just going to write. 271 00:09:15,566 --> 00:09:16,800 This reward. 272 00:09:17,833 --> 00:09:18,300 And then be. 273 00:09:18,300 --> 00:09:19,666 Careful double equal. 274 00:09:19,666 --> 00:09:21,300 One instead of just one. 275 00:09:21,300 --> 00:09:22,966 Equal because otherwise this would be. 276 00:09:22,966 --> 00:09:25,233 An effective one then colon. 277 00:09:25,233 --> 00:09:27,200 But now you're going to say. Wait, we don't. 278 00:09:27,200 --> 00:09:28,900 Have any. Reward variable. 279 00:09:28,900 --> 00:09:29,933 Well that's because. 280 00:09:29,933 --> 00:09:32,800 Same as in the UCB implementation I just want to. 281 00:09:32,800 --> 00:09:35,900 Highlight what is this reward. Remember. 282 00:09:36,200 --> 00:09:38,666 And therefore just above. 283 00:09:38,666 --> 00:09:39,833 I'm going to specifically. 284 00:09:39,833 --> 00:09:41,533 Say what is this reward. 285 00:09:41,533 --> 00:09:42,533 And that's you know. 286 00:09:42,533 --> 00:09:43,133 That's exactly. 287 00:09:43,133 --> 00:09:45,633 The same as in the UCB implementation. 288 00:09:45,633 --> 00:09:48,200 It is the value. In the. Data set. 289 00:09:48,200 --> 00:09:50,200 Corresponding to. The row. 290 00:09:50,200 --> 00:09:53,100 We are dealing with right now in this first. Full loop. You know, with. 291 00:09:53,100 --> 00:09:53,766 This particular. 292 00:09:53,766 --> 00:09:56,533 Customer and the column of. 293 00:09:56,533 --> 00:09:59,500 The ad that was just selected. Right. 294 00:09:59,500 --> 00:10:04,733 Because the reward is the value we get after selecting this 295 00:10:04,733 --> 00:10:08,266 ad here to show to this specific user we are. 296 00:10:08,266 --> 00:10:09,933 Dealing with right now in this first. 297 00:10:09,933 --> 00:10:15,400 For loop, and therefore the reward here is the value in our data set. 298 00:10:15,966 --> 00:10:16,300 Right. 299 00:10:16,300 --> 00:10:18,000 So I'm taking my data set and then. 300 00:10:18,000 --> 00:10:20,300 The values attribute. 301 00:10:20,300 --> 00:10:22,200 And then in square. Brackets. 302 00:10:22,200 --> 00:10:23,933 I enter the. Row of the user. 303 00:10:23,933 --> 00:10:26,666 We're dealing with right now which is n. So n. 304 00:10:26,666 --> 00:10:27,900 And then the column of. 305 00:10:27,900 --> 00:10:30,900 The ad was just selected which is at right. 306 00:10:30,966 --> 00:10:31,766 That's exactly. 307 00:10:31,766 --> 00:10:33,300 The same as in. The UCB. 308 00:10:33,300 --> 00:10:37,100 But I clearly wanted to highlight what is the reward here, so. 309 00:10:37,100 --> 00:10:40,200 That then we can. Check. In an if condition. 310 00:10:40,500 --> 00:10:42,266 If the reward is. Equal to one. 311 00:10:42,266 --> 00:10:45,000 And if that's the. Case, well what. Are we going to do. 312 00:10:45,000 --> 00:10:46,066 Well as we said. 313 00:10:46,066 --> 00:10:48,900 We are going to increment this. By one. 314 00:10:48,900 --> 00:10:50,066 Because if the reward is. 315 00:10:50,066 --> 00:10:52,466 One, that means that the number of times. 316 00:10:52,466 --> 00:10:54,066 This particular ad got. 317 00:10:54,066 --> 00:10:56,733 Reward, one just got incremented. By one. 318 00:10:56,733 --> 00:10:59,266 Then of course. I'm taking the right index of. 319 00:10:59,266 --> 00:11:00,166 This list. 320 00:11:00,166 --> 00:11:02,133 Of the different numbers of times each. 321 00:11:02,133 --> 00:11:03,633 Ad got reward one. 322 00:11:03,633 --> 00:11:04,266 But the ad. 323 00:11:04,266 --> 00:11:06,466 We're dealing with right now has of course indexed. 324 00:11:06,466 --> 00:11:09,100 Ad, because that's the ad that was just selected. 325 00:11:09,100 --> 00:11:10,133 And therefore. 326 00:11:10,133 --> 00:11:12,566 I just need. To copy this. 327 00:11:12,566 --> 00:11:15,433 And then ad. Equal. Paste that again. 328 00:11:15,433 --> 00:11:18,433 And then ad plus one. All right. 329 00:11:18,566 --> 00:11:19,233 So all good. 330 00:11:19,233 --> 00:11:22,233 This particular value is updated the right way. 331 00:11:22,300 --> 00:11:24,733 And now back in the. Else. 332 00:11:24,733 --> 00:11:27,333 Condition meaning the other condition where. 333 00:11:27,333 --> 00:11:29,700 The reward we got by. Selecting this. 334 00:11:29,700 --> 00:11:31,766 Ad at this. Particular round is. 335 00:11:31,766 --> 00:11:32,833 Equal to zero. 336 00:11:32,833 --> 00:11:34,566 Which can be specified through an else. 337 00:11:34,566 --> 00:11:36,533 Because reward can either be equal to. 338 00:11:36,533 --> 00:11:37,666 1 or 0. 339 00:11:37,666 --> 00:11:39,033 So else is fine here. 340 00:11:39,033 --> 00:11:41,133 And then well what we need to do. 341 00:11:41,133 --> 00:11:42,900 Is of course take that. 342 00:11:42,900 --> 00:11:45,100 You know, I can just copy all this. 343 00:11:45,100 --> 00:11:49,166 And paste that and replace of course numbers of rewards one by. 344 00:11:49,800 --> 00:11:51,166 Zero here. 345 00:11:51,166 --> 00:11:54,433 And same one by. Zero here as well. 346 00:11:54,433 --> 00:11:57,000 Because in this condition we are in the condition. 347 00:11:57,000 --> 00:12:00,000 That the reward we collected is zero. 348 00:12:00,133 --> 00:12:00,700 And therefore. 349 00:12:00,700 --> 00:12:02,133 That variable. 350 00:12:02,133 --> 00:12:03,233 Of the index ad. 351 00:12:03,233 --> 00:12:03,666 Here, which. 352 00:12:03,666 --> 00:12:05,666 Represents number of times this. 353 00:12:05,666 --> 00:12:08,166 Ad here got reward zero needs to be. 354 00:12:08,166 --> 00:12:09,833 Incremented by. One. 355 00:12:09,833 --> 00:12:10,333 All right. 356 00:12:10,333 --> 00:12:11,766 Perfect. And now. 357 00:12:11,766 --> 00:12:13,933 We just have. One final thing to do. 358 00:12:13,933 --> 00:12:16,766 You know exactly. What it. Is. Let's not forget about this. 359 00:12:16,766 --> 00:12:21,166 It is to update that particular variable which gives the total 360 00:12:21,166 --> 00:12:24,733 reward, you know, the total accumulated reward over the rounds. 361 00:12:25,033 --> 00:12:28,233 And since now we are, you know, in a new round, you know, with this. 362 00:12:28,500 --> 00:12:29,766 First for loop here. 363 00:12:29,766 --> 00:12:30,533 Well, of course, 364 00:12:30,533 --> 00:12:34,300 as soon as we get this new reward, we need to update this total reward. 365 00:12:34,300 --> 00:12:35,466 By incrementing it. 366 00:12:35,466 --> 00:12:37,333 By the reward we just got, whether it is. 367 00:12:37,333 --> 00:12:38,666 Zero, in which case 368 00:12:38,666 --> 00:12:43,000 there is no incremental action, or one, in which case we increment it by one. 369 00:12:43,000 --> 00:12:45,100 So let's do this efficiently. 370 00:12:45,100 --> 00:12:50,166 Total reward is equal to total reward plus reward. 371 00:12:50,566 --> 00:12:51,100 All right. 372 00:12:51,100 --> 00:12:54,733 And that way we update our total reward variable. 373 00:12:55,000 --> 00:12:56,300 And that's it my friends. 374 00:12:56,300 --> 00:12:59,400 Now the implementation of Thompson sampling is over. 375 00:12:59,400 --> 00:13:01,166 So congratulations you. 376 00:13:01,166 --> 00:13:04,233 Just implemented your second reinforcement learning model. 377 00:13:04,233 --> 00:13:05,266 And by. The way. 378 00:13:05,266 --> 00:13:08,833 Now the full implementation is done because indeed. 379 00:13:08,833 --> 00:13:10,800 Here. We don't have anything to change. 380 00:13:10,800 --> 00:13:13,100 We can just plot. The histogram because we have the. 381 00:13:13,100 --> 00:13:15,000 Same variable names. 382 00:13:15,000 --> 00:13:17,433 So now it's time for the show time. 383 00:13:17,433 --> 00:13:18,066 In the next. 384 00:13:18,066 --> 00:13:19,700 Tutorial I will show you. 385 00:13:19,700 --> 00:13:22,000 The demo of, you know, Thompson sampling. 386 00:13:22,000 --> 00:13:23,000 We will of course. 387 00:13:23,000 --> 00:13:23,966 Run all. Our. 388 00:13:23,966 --> 00:13:26,766 Cells and mostly we will compare. The two. 389 00:13:26,766 --> 00:13:29,166 Performances between. UCB. 390 00:13:29,166 --> 00:13:30,633 And Thompson sampling. 391 00:13:30,633 --> 00:13:33,700 I remind that. UCB was perfectly able to. 392 00:13:33,700 --> 00:13:35,300 Find that. Best. 393 00:13:35,300 --> 00:13:37,400 Ad you know, with the. Highest click through rate. 394 00:13:37,400 --> 00:13:41,866 In 1000 rounds, but was not able to do it in 500 rounds. 395 00:13:42,133 --> 00:13:44,600 And so now I'm. Super excited to check. 396 00:13:44,600 --> 00:13:47,100 If Thompson sampling will do better than this. 397 00:13:47,100 --> 00:13:48,666 Meaning if it will not only. 398 00:13:48,666 --> 00:13:51,500 Be able to figure out the best ad. In 1000 rounds. 399 00:13:51,500 --> 00:13:52,800 But also if it will be able. 400 00:13:52,800 --> 00:13:54,900 To find it in 500 rounds. 401 00:13:54,900 --> 00:13:56,533 Which UCB couldn't do. 402 00:13:56,533 --> 00:13:57,800 All right, so let's. 403 00:13:57,800 --> 00:13:59,900 Find out. About this. In the next. Tutorial. 404 00:13:59,900 --> 00:14:01,500 I can't wait to. Show you all. This. 405 00:14:01,500 --> 00:14:03,533 And until then, enjoy machine learning.