1 00:00:00,133 --> 00:00:00,833 Hello my friends. 2 00:00:00,833 --> 00:00:02,600 Welcome back to this implementation. 3 00:00:02,600 --> 00:00:07,266 Are you ready to start that big full loop which will iterate through all. 4 00:00:07,266 --> 00:00:11,133 The rounds in all the 10,000 rounds, meaning the 10,000 customers 5 00:00:11,333 --> 00:00:13,100 to which we're going to. Show the add. 6 00:00:13,100 --> 00:00:13,833 And therefore. 7 00:00:13,833 --> 00:00:15,700 In this loop at each step, well. 8 00:00:15,700 --> 00:00:17,166 We have to implement the. 9 00:00:17,166 --> 00:00:18,866 Three steps here. Well, we already. 10 00:00:18,866 --> 00:00:21,733 Implemented step one, but we have to implement step two and step. 11 00:00:21,733 --> 00:00:24,600 Three at each iteration of the loop. 12 00:00:24,600 --> 00:00:25,500 All right let's do this. 13 00:00:25,500 --> 00:00:28,500 Let's start this for loop starting with four. 14 00:00:28,666 --> 00:00:29,666 Then we need to choose. 15 00:00:29,666 --> 00:00:32,433 A variable name for you know that. Iterated variable. 16 00:00:32,433 --> 00:00:36,266 And this time instead of calling it, you know, as the classic name, we. 17 00:00:36,266 --> 00:00:37,866 Will call. It n, because. 18 00:00:37,866 --> 00:00:40,566 Clearly what we are iterating through are. 19 00:00:40,566 --> 00:00:43,266 The rounds, you know, the rounds from round one. 20 00:00:43,266 --> 00:00:46,266 The first user to. Round 10,000, the last. 21 00:00:46,266 --> 00:00:47,933 User. To whom we. Show yet. 22 00:00:47,933 --> 00:00:48,200 Okay. 23 00:00:48,200 --> 00:00:48,500 So for. 24 00:00:48,500 --> 00:00:51,900 N in and then of course range from. 25 00:00:52,100 --> 00:00:52,666 Zero. 26 00:00:52,666 --> 00:00:55,533 Because you know indexes in Python start from zero. 27 00:00:55,533 --> 00:00:56,333 Up to. 28 00:00:56,333 --> 00:01:00,233 And here you can either put 10,000 but we already put 10,000 in a variable. 29 00:01:00,333 --> 00:01:01,200 So let's put. 30 00:01:01,200 --> 00:01:02,433 N so that you know 31 00:01:02,433 --> 00:01:03,100 then if you want to 32 00:01:03,100 --> 00:01:06,100 try another number of rounds you can just change the value of n here. 33 00:01:06,200 --> 00:01:06,566 Okay. 34 00:01:06,566 --> 00:01:10,000 So for n in range from zero to n then colon. 35 00:01:10,166 --> 00:01:12,866 And there we. Go. Let's start that for loop. 36 00:01:12,866 --> 00:01:13,133 All right. 37 00:01:13,133 --> 00:01:16,300 So I'm going to scroll down a bit you know from here perfect. 38 00:01:16,766 --> 00:01:18,700 And there we. Go. Let's do this. All right. 39 00:01:18,700 --> 00:01:21,100 So what do we need to start with. 40 00:01:21,100 --> 00:01:23,833 Well you know. Right now what. We want. 41 00:01:23,833 --> 00:01:26,133 You know what we ultimately want. You know within. 42 00:01:26,133 --> 00:01:27,033 This particular. 43 00:01:27,033 --> 00:01:31,066 Iteration of number n what we want is to select an that right. 44 00:01:31,066 --> 00:01:32,033 We want to select an add. 45 00:01:32,033 --> 00:01:34,900 And we're going to select it according to these steps. 46 00:01:34,900 --> 00:01:36,000 You know we will. Select the add. 47 00:01:36,000 --> 00:01:38,000 That has the maximum. UCB. 48 00:01:38,000 --> 00:01:39,433 Upper confidence bound. 49 00:01:39,433 --> 00:01:41,833 So the. First thing that. We'll do is. 50 00:01:41,833 --> 00:01:44,800 Start from the first add you know add number one. 51 00:01:44,800 --> 00:01:47,966 But since we're going to work with indexes the index. 52 00:01:47,966 --> 00:01:49,266 Of this add will be zero. 53 00:01:49,266 --> 00:01:50,366 So I'm going to start from. 54 00:01:50,366 --> 00:01:53,233 This add equals zero. 55 00:01:53,233 --> 00:01:54,566 Here I'm introducing. 56 00:01:54,566 --> 00:01:56,300 Of course a new variable add. 57 00:01:56,300 --> 00:01:58,866 Which is initialize. As zero. But then you're going to. 58 00:01:58,866 --> 00:02:00,766 See that we will do a. Second. 59 00:02:00,766 --> 00:02:03,900 For loop which will loop over all. 60 00:02:03,900 --> 00:02:05,266 The different ads one by one. 61 00:02:05,266 --> 00:02:08,366 You know, starting from this one at number one of index zero, then 62 00:02:08,366 --> 00:02:11,500 going to this one at number two of index one and etcetera. 63 00:02:11,500 --> 00:02:12,500 Up to add. 64 00:02:12,500 --> 00:02:15,966 Ten, because indeed, in order to get the maximum upper 65 00:02:15,966 --> 00:02:19,300 confidence bound, we will need to compare the upper confidence bound. 66 00:02:19,333 --> 00:02:20,633 Of each of these ads. 67 00:02:20,633 --> 00:02:23,833 And therefore we'll have to iterate through each of these add to. 68 00:02:23,866 --> 00:02:26,033 Find the maximum upper confidence bound. 69 00:02:26,033 --> 00:02:28,766 And that's why right now I'm initializing this. 70 00:02:28,766 --> 00:02:30,333 New add variable at zero. 71 00:02:30,333 --> 00:02:31,800 To start with, add number one. 72 00:02:31,800 --> 00:02:34,100 And then we'll compute the. UCB of this. Add number one. 73 00:02:34,100 --> 00:02:37,000 And then through the for loop you know we. Will get the other upper. 74 00:02:37,000 --> 00:02:39,733 Confidence bounds and we will get the maximum. One. 75 00:02:39,733 --> 00:02:41,566 You see the idea very simple. 76 00:02:41,566 --> 00:02:43,933 We just have to follow. You know. A certain logic. 77 00:02:43,933 --> 00:02:45,400 Where we know that the ultimate. 78 00:02:45,400 --> 00:02:47,866 Goal is to select the add that has the highest. 79 00:02:47,866 --> 00:02:49,366 Upper confidence bounds. 80 00:02:49,366 --> 00:02:49,866 All right. 81 00:02:49,866 --> 00:02:51,400 So then do you think we have. 82 00:02:51,400 --> 00:02:53,933 To already start that. Second for. Loop. 83 00:02:53,933 --> 00:02:55,133 Well almost. 84 00:02:55,133 --> 00:02:57,433 But we need to do something else. 85 00:02:57,433 --> 00:02:57,933 We need to. 86 00:02:57,933 --> 00:03:00,066 Introduce a new variable which. 87 00:03:00,066 --> 00:03:03,366 Will mean you know that maximum of the upper confidence bound. 88 00:03:03,366 --> 00:03:06,533 Because for each of the add through that second for loop we. 89 00:03:06,533 --> 00:03:08,366 Will compute the upper confidence bound. 90 00:03:08,366 --> 00:03:12,400 But in order to compare them to the maximum upper confidence bound. 91 00:03:12,600 --> 00:03:13,533 Well, a smart thing. 92 00:03:13,533 --> 00:03:17,000 To do would be to introduce a new variable here that will. 93 00:03:17,000 --> 00:03:20,000 Be the maximum upper confidence bound, and which will be compared. 94 00:03:20,000 --> 00:03:22,266 To each of the UCB of the At. 95 00:03:22,266 --> 00:03:26,666 And therefore here I'm introducing a new variable which I'm going to call Max. 96 00:03:27,300 --> 00:03:30,000 Let's say upper. Bound. 97 00:03:30,000 --> 00:03:30,666 All right. 98 00:03:30,666 --> 00:03:33,600 And which I'm initializing to. Zero because. 99 00:03:33,600 --> 00:03:36,166 Indeed at the beginning, well. Let's say that the. 100 00:03:36,166 --> 00:03:38,966 Maximum, you know, the highest upper confidence bound is. 101 00:03:38,966 --> 00:03:41,300 Zero. And then of course. Once we get. 102 00:03:41,300 --> 00:03:43,033 An. Add that has a higher. 103 00:03:43,033 --> 00:03:44,100 Upper confidence bound than. 104 00:03:44,100 --> 00:03:46,200 Zero, we will update this value. 105 00:03:46,200 --> 00:03:49,166 Of max upper bound to this new. Higher. 106 00:03:49,166 --> 00:03:50,333 Upper confidence bound. 107 00:03:50,333 --> 00:03:50,633 All right. 108 00:03:50,633 --> 00:03:53,066 So our logic here then. 109 00:03:53,066 --> 00:03:55,200 Now we can start the for loop. You know the second for. 110 00:03:55,200 --> 00:03:57,200 Loop that will iterate through the. 111 00:03:57,200 --> 00:03:59,466 Different add from 1 to 10. 112 00:03:59,466 --> 00:04:02,100 So there we go. We started second for loop for. 113 00:04:02,100 --> 00:04:04,800 Then a new looping variable you know iterated. 114 00:04:04,800 --> 00:04:07,000 Variable. Which we're going to call this. Time of course. 115 00:04:07,000 --> 00:04:09,433 Well either add but add is already taken. 116 00:04:09,433 --> 00:04:11,233 You know that's already an existing variable. 117 00:04:11,233 --> 00:04:13,500 So let's just choose I for. 118 00:04:13,500 --> 00:04:16,666 I in range from. Zero. 119 00:04:16,666 --> 00:04:19,533 Because indexes in Python start at. Zero up to. 120 00:04:19,533 --> 00:04:21,200 Either ten here or you know. 121 00:04:21,200 --> 00:04:25,233 D because we already introduced that d variable which is equal. 122 00:04:25,233 --> 00:04:26,533 To ten. And if you want to. 123 00:04:26,533 --> 00:04:27,033 Do that. 124 00:04:27,033 --> 00:04:30,200 Same UCB implementation on a different number 125 00:04:30,200 --> 00:04:33,300 of ads, imagine you have, you know, five ads or even 50. 126 00:04:33,300 --> 00:04:34,000 As well. 127 00:04:34,000 --> 00:04:36,600 You simply have to change. That value here. 128 00:04:36,600 --> 00:04:37,500 And nowhere. Else. 129 00:04:37,500 --> 00:04:37,800 All right. 130 00:04:37,800 --> 00:04:40,766 So that's the purpose of introducing these variables here. 131 00:04:40,766 --> 00:04:42,833 So for I in range from zero to d. 132 00:04:42,833 --> 00:04:45,600 And there we go. We can start the second full loop. 133 00:04:45,600 --> 00:04:47,333 And it is now in the second for. 134 00:04:47,333 --> 00:04:48,866 Loop that we will really. 135 00:04:48,866 --> 00:04:51,133 Implement each. Of these steps. 136 00:04:51,133 --> 00:04:53,866 So now now is the time you're going to press pause 137 00:04:53,866 --> 00:04:57,033 on the video to indeed implement that. 138 00:04:57,033 --> 00:04:57,600 Step two. 139 00:04:57,600 --> 00:04:59,133 First I would like you to. 140 00:04:59,133 --> 00:05:00,633 Implement from scratch. 141 00:05:00,633 --> 00:05:03,233 No, you can totally. Do this because you have. Everything here. 142 00:05:03,233 --> 00:05:04,833 I would like you to implement that. 143 00:05:04,833 --> 00:05:08,900 Step two you have also all the variables, so it should be all fine. 144 00:05:09,133 --> 00:05:11,300 And then don't implement step. Three because that. 145 00:05:11,300 --> 00:05:13,266 Will be done for next. Exercise. 146 00:05:13,266 --> 00:05:15,300 And besides it is not that direct. You know, we will. 147 00:05:15,300 --> 00:05:17,700 Have to use. A certain trick in order to implement that. 148 00:05:17,700 --> 00:05:18,966 Step three. All right. 149 00:05:18,966 --> 00:05:20,700 So please implement step two. 150 00:05:20,700 --> 00:05:23,300 Now please press pause on the video and implement step two. 151 00:05:23,300 --> 00:05:25,466 But let me just give you two hints. 152 00:05:25,466 --> 00:05:29,166 You know, because maybe you will encounter issues with something in this part. 153 00:05:29,166 --> 00:05:30,900 As you can see, you will have to compute. 154 00:05:30,900 --> 00:05:33,900 The square root of that value here. 155 00:05:34,100 --> 00:05:36,566 And you have to know that in. Order to use the square. 156 00:05:36,566 --> 00:05:38,866 Root in Python, you have to import a certain. 157 00:05:38,866 --> 00:05:40,466 Library, which is simply called. 158 00:05:40,466 --> 00:05:41,866 The math. Library. 159 00:05:41,866 --> 00:05:44,533 And from which you can use a lot of mathematical tools. 160 00:05:44,533 --> 00:05:45,766 Like the square root. 161 00:05:45,766 --> 00:05:48,300 So actually let's. Do it right now so that it's done. 162 00:05:48,300 --> 00:05:50,100 We can either import. It here. 163 00:05:50,100 --> 00:05:51,400 Or you know, since I. 164 00:05:51,400 --> 00:05:54,133 Prefer to keep the essential libraries here, let's just. 165 00:05:54,133 --> 00:05:55,333 Import. It at the. 166 00:05:55,333 --> 00:05:57,300 Beginning of this implementation. 167 00:05:57,300 --> 00:05:58,833 All right. Right here. 168 00:05:58,833 --> 00:05:59,933 So that it's done, you know, so. 169 00:05:59,933 --> 00:06:02,400 That you can be ready to use that square. 170 00:06:02,400 --> 00:06:03,133 Root function. 171 00:06:03,133 --> 00:06:03,866 And then I. 172 00:06:03,866 --> 00:06:06,600 Will let you of. Course look in the documentation online. 173 00:06:06,600 --> 00:06:08,433 What is the exact function name. 174 00:06:08,433 --> 00:06:10,100 So math there we go. 175 00:06:10,100 --> 00:06:12,000 And then the second hint I wanted to give. 176 00:06:12,000 --> 00:06:15,200 You was the fact that you have to start the step two. 177 00:06:15,233 --> 00:06:17,033 With an if condition. 178 00:06:17,033 --> 00:06:17,900 Right? Because at the. 179 00:06:17,900 --> 00:06:20,100 Beginning no. Add was selected. 180 00:06:20,100 --> 00:06:21,433 And. Therefore this. 181 00:06:21,433 --> 00:06:22,366 You know, this quantity. 182 00:06:22,366 --> 00:06:24,500 Is equal to zero for all the ads. 183 00:06:24,500 --> 00:06:27,066 And if this. Is equal to zero, this. Doesn't make sense. 184 00:06:27,066 --> 00:06:29,433 You know it will. Be equal to plus. Infinity. 185 00:06:29,433 --> 00:06:30,600 So you have to start with this. 186 00:06:30,600 --> 00:06:32,933 If condition to make sure. That the ad. 187 00:06:32,933 --> 00:06:34,400 We're dealing with in the second. 188 00:06:34,400 --> 00:06:36,433 Full loop was already selected. 189 00:06:36,433 --> 00:06:37,500 So that this. 190 00:06:37,500 --> 00:06:39,633 Is different than zero and this. Exists. 191 00:06:39,633 --> 00:06:42,733 And so that then you can indeed compute the confidence interval. 192 00:06:43,000 --> 00:06:45,266 All right. So that was the last hint. And now. 193 00:06:45,266 --> 00:06:46,066 Your turn. 194 00:06:46,066 --> 00:06:47,600 Please press pause in the video. 195 00:06:47,600 --> 00:06:49,600 And please implement this step to. 196 00:06:50,633 --> 00:06:51,033 All right. 197 00:06:51,033 --> 00:06:51,566 Perfect. 198 00:06:51,566 --> 00:06:53,900 So now let's implement the solution together. 199 00:06:53,900 --> 00:06:55,200 Of the step two. 200 00:06:55,200 --> 00:06:55,533 All right. 201 00:06:55,533 --> 00:06:59,700 So as we said we have to actually start here with. 202 00:06:59,700 --> 00:07:01,566 An if condition. 203 00:07:01,566 --> 00:07:04,433 That will check if the add we are. 204 00:07:04,433 --> 00:07:06,433 Dealing with right now you know the add of index. 205 00:07:06,433 --> 00:07:08,600 A has. Already been selected. 206 00:07:08,600 --> 00:07:09,866 It's not the case at the beginning 207 00:07:09,866 --> 00:07:12,266 you know, but over the round it will be the case. You know the. 208 00:07:12,266 --> 00:07:12,900 Add will. 209 00:07:12,900 --> 00:07:14,866 Already have been selected, but at the beginning 210 00:07:14,866 --> 00:07:16,300 for the first rounds we need to check that. 211 00:07:16,300 --> 00:07:18,700 Indeed the Add has been selected. 212 00:07:18,700 --> 00:07:20,866 So in order to. Check this well that's very. Simple. 213 00:07:20,866 --> 00:07:21,300 You know we have. 214 00:07:21,300 --> 00:07:24,200 This variables that gives for each of the add. 215 00:07:24,200 --> 00:07:26,666 How many times they have been selected. So. Far. 216 00:07:26,666 --> 00:07:29,466 You know at round N up to round N. 217 00:07:29,466 --> 00:07:32,333 And therefore we're going to take this now, you know. 218 00:07:32,333 --> 00:07:33,000 And then. 219 00:07:33,000 --> 00:07:35,900 We will check that. The index. 220 00:07:35,900 --> 00:07:38,900 I. Which is the index of the add we we're dealing with right now 221 00:07:39,066 --> 00:07:41,400 in this number of selections list. 222 00:07:41,400 --> 00:07:43,133 Well we're going to check that the. 223 00:07:43,133 --> 00:07:45,400 Element of index I in this number of. 224 00:07:45,400 --> 00:07:48,400 Selections list is indeed larger. 225 00:07:48,633 --> 00:07:49,733 Than. Zero. 226 00:07:49,733 --> 00:07:50,433 Because if it is. 227 00:07:50,433 --> 00:07:53,100 Larger than zero then that means that the Add was. 228 00:07:53,100 --> 00:07:55,933 Indeed already selected at least once. Okay. 229 00:07:55,933 --> 00:07:58,200 And now if you want. But that's totally. Optional. 230 00:07:58,200 --> 00:08:01,266 You can put this condition in parenthesis, right? 231 00:08:01,600 --> 00:08:03,366 And then add. Just a. Colon. 232 00:08:03,366 --> 00:08:07,400 And now now you're going to say to Python what should happen and what must happen 233 00:08:07,666 --> 00:08:08,633 if indeed. 234 00:08:08,633 --> 00:08:09,700 That at I, which. 235 00:08:09,700 --> 00:08:11,866 We're dealing with right now in the second for loop. 236 00:08:11,866 --> 00:08:13,666 Was already selected. 237 00:08:13,666 --> 00:08:14,033 All right. 238 00:08:14,033 --> 00:08:16,800 So now now is the time we can implement all. 239 00:08:16,800 --> 00:08:18,366 This because indeed this. 240 00:08:18,366 --> 00:08:20,800 Is different than zero. This is larger than zero. 241 00:08:20,800 --> 00:08:23,800 So very simply let's just follow this step by step. 242 00:08:23,833 --> 00:08:24,966 First let's. 243 00:08:24,966 --> 00:08:25,966 You know, create a new. 244 00:08:25,966 --> 00:08:28,566 Variable for this which we're going to call average reward. 245 00:08:28,566 --> 00:08:30,300 You know, because this corresponds to an. 246 00:08:30,300 --> 00:08:34,433 Average of the reward because it is the accumulated reward here of this at. 247 00:08:34,433 --> 00:08:37,366 I up to run n divided by the number of times it was. 248 00:08:37,366 --> 00:08:39,866 Selected. So that's nothing else than an average. 249 00:08:39,866 --> 00:08:40,700 And therefore. 250 00:08:40,700 --> 00:08:43,800 Let's create a new variable here which we're going to call average. 251 00:08:44,300 --> 00:08:45,933 Underscore. Reward. 252 00:08:47,000 --> 00:08:48,133 And it is simply. 253 00:08:48,133 --> 00:08:53,766 Going to be equal to well the accumulated reward of that specific adds I up to. 254 00:08:53,766 --> 00:08:55,800 Run n which. Is given exactly. 255 00:08:55,800 --> 00:08:58,266 Thanks to this variable here, of. 256 00:08:58,266 --> 00:09:00,100 Which we're going to have to take the index. 257 00:09:00,100 --> 00:09:02,566 I write that at I, which. 258 00:09:02,566 --> 00:09:05,400 We're dealing with right now. So I'm basing that here. 259 00:09:05,400 --> 00:09:06,633 And I'm taking of course. 260 00:09:06,633 --> 00:09:09,000 In square bracket the. Index. 261 00:09:09,000 --> 00:09:10,200 I because this. 262 00:09:10,200 --> 00:09:10,500 You know. 263 00:09:10,500 --> 00:09:13,333 All of this corresponds exactly to the. 264 00:09:13,333 --> 00:09:14,100 Accumulated. 265 00:09:14,100 --> 00:09:16,700 Reward of that specific add I up to. 266 00:09:16,700 --> 00:09:18,066 Run then okay. 267 00:09:18,066 --> 00:09:19,866 So sum of rewards I. 268 00:09:19,866 --> 00:09:22,200 And then of course we need to divide that by. 269 00:09:22,200 --> 00:09:25,266 The number of times. That. Same ad. 270 00:09:25,266 --> 00:09:26,566 I was selected. 271 00:09:26,566 --> 00:09:28,133 And this is given of course. 272 00:09:28,133 --> 00:09:30,100 By this again right. 273 00:09:30,100 --> 00:09:30,933 The number. Of. 274 00:09:30,933 --> 00:09:36,000 Selections of index I, which corresponds to the number of times the out of index. 275 00:09:36,000 --> 00:09:37,500 I was selected. 276 00:09:37,500 --> 00:09:38,666 And that gives you exactly 277 00:09:38,666 --> 00:09:41,666 the average reward that gives you exactly that value here. 278 00:09:41,800 --> 00:09:43,966 Okay. So very easy so far. 279 00:09:43,966 --> 00:09:46,166 Then next up the next step. 280 00:09:46,166 --> 00:09:47,700 Is to get the. 281 00:09:47,700 --> 00:09:49,133 Confidence interval. 282 00:09:49,133 --> 00:09:51,633 And more specifically we're going to get that value. 283 00:09:51,633 --> 00:09:53,400 So right now we're going to compute. 284 00:09:53,400 --> 00:09:55,166 This value here. 285 00:09:55,166 --> 00:09:55,600 All right. 286 00:09:55,600 --> 00:10:00,100 So let's simply call it delta I you know it's a new variable delta. 287 00:10:00,600 --> 00:10:03,300 Underscore. I equals. 288 00:10:03,300 --> 00:10:04,333 And now there you go. 289 00:10:04,333 --> 00:10:06,666 That's where you needed to take that square root. 290 00:10:06,666 --> 00:10:09,366 Function from the math. Library. 291 00:10:09,366 --> 00:10:10,000 So first we. 292 00:10:10,000 --> 00:10:12,433 Have to call indeed that math. Library. 293 00:10:12,433 --> 00:10:13,400 And from which 294 00:10:13,400 --> 00:10:16,500 well we need to call this function that allows to compute the square root. 295 00:10:16,733 --> 00:10:20,366 And I'm sure you've found very easily online that this function is. 296 00:10:20,366 --> 00:10:22,433 Called execute art. 297 00:10:22,433 --> 00:10:24,233 And then parenthesis. 298 00:10:24,233 --> 00:10:24,633 Right. 299 00:10:24,633 --> 00:10:26,833 This computes the square root of a value. 300 00:10:26,833 --> 00:10:28,400 And the value we have to put inside. 301 00:10:28,400 --> 00:10:30,733 This function. Is of course exactly. 302 00:10:30,733 --> 00:10:31,633 This. You know. 303 00:10:31,633 --> 00:10:34,133 Three divided by two multiplied by the. 304 00:10:34,133 --> 00:10:35,833 Logarithm of the realm and. 305 00:10:35,833 --> 00:10:37,800 Divided. By the number of times that. 306 00:10:37,800 --> 00:10:39,700 I was selected up to. Run n. 307 00:10:39,700 --> 00:10:41,300 Okay, so let's do this. 308 00:10:41,300 --> 00:10:44,133 First we start with this three divided by two. 309 00:10:44,133 --> 00:10:46,700 Then multiplied. By. 310 00:10:46,700 --> 00:10:47,866 Then there you go. 311 00:10:47,866 --> 00:10:48,900 You need to call the. 312 00:10:48,900 --> 00:10:51,900 Log function which is another function of this math. Library. 313 00:10:51,900 --> 00:10:54,300 So actually this math library is used twice. 314 00:10:54,300 --> 00:10:55,833 I'm calling it now to indeed be. 315 00:10:55,833 --> 00:10:58,666 Able to call that log. Function. 316 00:10:58,666 --> 00:11:01,666 And now inside this log function well be careful. 317 00:11:01,666 --> 00:11:02,633 Maybe I should have. 318 00:11:02,633 --> 00:11:05,166 Given you another hint. Here. But be careful. 319 00:11:05,166 --> 00:11:08,300 We can not input n here. Why is that? 320 00:11:08,633 --> 00:11:09,766 That's because actually. 321 00:11:09,766 --> 00:11:12,900 N you know in this range here this first range. 322 00:11:13,133 --> 00:11:14,366 Starts from zero. 323 00:11:14,366 --> 00:11:15,766 So the first value. Of n. 324 00:11:15,766 --> 00:11:16,666 Is zero. 325 00:11:16,666 --> 00:11:19,500 And that's you know because of the indexes in Python. 326 00:11:19,500 --> 00:11:21,566 And you have to know that the logarithm of. 327 00:11:21,566 --> 00:11:24,066 Zero is. Actually minus infinity. 328 00:11:24,066 --> 00:11:25,166 Therefore it would be very. 329 00:11:25,166 --> 00:11:27,733 Dangerous here to only put n in order to. 330 00:11:27,733 --> 00:11:28,833 Protect us from this. 331 00:11:28,833 --> 00:11:29,566 We're just. Going to. 332 00:11:29,566 --> 00:11:32,300 Start. From. One okay. 333 00:11:32,300 --> 00:11:34,533 That's one way to do this. The other way is of. 334 00:11:34,533 --> 00:11:35,900 Course, to, you know, make that. 335 00:11:35,900 --> 00:11:36,400 First for. 336 00:11:36,400 --> 00:11:39,433 Loop going from one to n plus one 337 00:11:39,433 --> 00:11:42,466 so that we can go indeed from round one to N plus one. 338 00:11:42,466 --> 00:11:44,533 But you know, with Python we. Always work with. 339 00:11:44,533 --> 00:11:46,433 The same indexes starting from zero. 340 00:11:46,433 --> 00:11:49,300 So that's why we go with the first option here. 341 00:11:49,300 --> 00:11:49,633 All right. 342 00:11:49,633 --> 00:11:51,000 So all good so far. 343 00:11:51,000 --> 00:11:53,633 And then of course. We have to divide that by. 344 00:11:53,633 --> 00:11:54,700 Well let's look at it. 345 00:11:54,700 --> 00:11:57,800 Again the number of times that. 346 00:11:57,800 --> 00:12:00,233 I was selected. Up to round N. 347 00:12:00,233 --> 00:12:03,366 And that's of course exactly this value. 348 00:12:03,666 --> 00:12:06,600 So I'm copying this again and pasting that here. 349 00:12:06,600 --> 00:12:07,533 And there we go. 350 00:12:07,533 --> 00:12:12,600 Very quickly we got our Delta I, we got exactly this value. 351 00:12:13,100 --> 00:12:15,633 And now. We have. A final value to compute. 352 00:12:15,633 --> 00:12:18,000 Which is of. Course, well you know that long. 353 00:12:18,000 --> 00:12:20,300 Awaited upper. Confidence bound. 354 00:12:20,300 --> 00:12:22,500 That's exactly what we have to compute right now. 355 00:12:22,500 --> 00:12:23,200 And it is going. 356 00:12:23,200 --> 00:12:27,600 To be simply the sum of that average reward plus that delta here. 357 00:12:27,900 --> 00:12:31,666 So let's here add a new line of code. 358 00:12:31,666 --> 00:12:32,733 And then introduce. 359 00:12:32,733 --> 00:12:35,100 A new variable which we're going to call upper. 360 00:12:35,100 --> 00:12:39,900 Underscore bound not max upper bound but upper bound and which will simply. 361 00:12:39,900 --> 00:12:42,766 Be equal to the. Sum of. The average 362 00:12:43,966 --> 00:12:44,600 reward. 363 00:12:44,600 --> 00:12:45,633 There we go. 364 00:12:45,633 --> 00:12:48,433 Plus the delta. 365 00:12:48,433 --> 00:12:50,000 I and perfect. 366 00:12:50,000 --> 00:12:52,500 And now you know, thanks to. This for loop, the. 367 00:12:52,500 --> 00:12:54,300 Second for loop iterating through. All the. 368 00:12:54,300 --> 00:12:57,766 Ads from 0 to 9, but from and number one and number ten. 369 00:12:58,000 --> 00:13:00,400 Well, we have the upper bound, we have the upper. 370 00:13:00,400 --> 00:13:01,966 Bound of. Each of these ads. 371 00:13:01,966 --> 00:13:02,766 And the upper bound 372 00:13:02,766 --> 00:13:06,900 that you know, that specific round n we're dealing with right now inside this. 373 00:13:06,900 --> 00:13:07,866 First for loop. 374 00:13:07,866 --> 00:13:08,200 All right. 375 00:13:08,200 --> 00:13:09,333 So basically. 376 00:13:09,333 --> 00:13:12,166 We are done implementing this. Step two. 377 00:13:12,166 --> 00:13:13,600 We implemented this step two. 378 00:13:13,600 --> 00:13:16,366 And we even got that. Value. Here. You know the upper bound. 379 00:13:16,366 --> 00:13:17,733 But now step. Three. 380 00:13:17,733 --> 00:13:19,600 Is not implemented yet because indeed. Step. 381 00:13:19,600 --> 00:13:20,733 Three consists. 382 00:13:20,733 --> 00:13:22,233 Of selecting the add that. 383 00:13:22,233 --> 00:13:24,266 Has the maximum upper confidence bound. 384 00:13:24,266 --> 00:13:26,133 So now we need to add. 385 00:13:26,133 --> 00:13:29,133 One trick, you know, which is kind of classic in Python. 386 00:13:29,366 --> 00:13:31,800 But in do we need to add. One trick to select the. 387 00:13:31,800 --> 00:13:34,466 Maximum upper confidence bound of these. 388 00:13:34,466 --> 00:13:35,133 And at. 389 00:13:35,133 --> 00:13:40,266 So going back to here, well, we will surely take a little break here. 390 00:13:40,500 --> 00:13:41,433 And actually. 391 00:13:41,433 --> 00:13:43,566 I will directly ask you to. Try to implement that. 392 00:13:43,566 --> 00:13:46,500 Step three you know, between this tutorial and the next one. 393 00:13:46,500 --> 00:13:47,600 Before you. Start the next. 394 00:13:47,600 --> 00:13:48,666 Tutorial, please. 395 00:13:48,666 --> 00:13:50,766 Try to implement. Step three. 396 00:13:50,766 --> 00:13:52,700 It's not that direct. It's not that easy. 397 00:13:52,700 --> 00:13:55,800 But you know you have to use some kind of algorithmic logic. 398 00:13:55,833 --> 00:13:57,100 Okay, but I'll give you of. 399 00:13:57,100 --> 00:14:00,133 Course, some hints unless you don't want the hints and therefore you can. 400 00:14:00,133 --> 00:14:03,033 Directly press pause or quit this. Video now. 401 00:14:03,033 --> 00:14:05,033 But I'm going to give you some hints if you want. 402 00:14:05,033 --> 00:14:09,166 The first hint is that right now you're done with this if condition. 403 00:14:09,166 --> 00:14:10,800 So you can go. Back here. 404 00:14:10,800 --> 00:14:13,333 You know inside this second for. Loop. 405 00:14:13,333 --> 00:14:14,533 And then you're going to have to start. 406 00:14:14,533 --> 00:14:16,633 With of. Course an else. Right. 407 00:14:16,633 --> 00:14:18,466 Which is the condition where you know. 408 00:14:18,466 --> 00:14:19,566 That, that we're. 409 00:14:19,566 --> 00:14:22,466 Dealing with right now has not been. Selected yet. 410 00:14:22,466 --> 00:14:25,033 So what you'll have to do. And that's my last hint. 411 00:14:25,033 --> 00:14:28,966 Will be to use a trick to select the ads. 412 00:14:29,000 --> 00:14:30,900 That have not been selected yet. 413 00:14:30,900 --> 00:14:32,700 Why do you need to select these ads? 414 00:14:32,700 --> 00:14:34,133 Well, the answer is here. 415 00:14:34,133 --> 00:14:35,900 That's exactly because of this. 416 00:14:35,900 --> 00:14:37,900 You know, this denominator which shouldn't. 417 00:14:37,900 --> 00:14:39,466 Be equal to zero. 418 00:14:39,466 --> 00:14:43,233 And since this is exactly the number of times that. 419 00:14:43,233 --> 00:14:46,466 I was selected, well, we need this to be. 420 00:14:46,466 --> 00:14:48,966 Different than zero in order to compute. 421 00:14:48,966 --> 00:14:49,966 This average. Reward. 422 00:14:49,966 --> 00:14:52,333 And therefore in order to compute then. The. 423 00:14:52,333 --> 00:14:54,900 Confidence interval. With the upper. 424 00:14:54,900 --> 00:14:56,100 Confidence bound. 425 00:14:56,100 --> 00:14:57,266 And that's why in the. 426 00:14:57,266 --> 00:14:59,100 UCB. Algorithm it's. 427 00:14:59,100 --> 00:15:00,800 Compulsory to. 428 00:15:00,800 --> 00:15:03,266 Make sure that during the first round all. 429 00:15:03,266 --> 00:15:05,400 The ads are selected. So actually. 430 00:15:05,400 --> 00:15:06,800 During the first ten rounds. 431 00:15:06,800 --> 00:15:09,900 We have to use a trick so that we make sure 432 00:15:09,900 --> 00:15:13,100 to select all the ads so that this way, well. 433 00:15:13,100 --> 00:15:15,666 All the end I end here, you know, for all the different. 434 00:15:15,666 --> 00:15:18,666 Ads will be. Different than zero. Okay. 435 00:15:18,733 --> 00:15:20,766 So basically the exercise. 436 00:15:20,766 --> 00:15:23,333 For next time, you know, the next. Tutorial is to. 437 00:15:23,333 --> 00:15:24,766 Indeed implement that. 438 00:15:24,766 --> 00:15:25,966 Step three to. 439 00:15:25,966 --> 00:15:28,633 Compute the maximum of the upper confidence bound. 440 00:15:28,633 --> 00:15:31,300 Well, at the same time, implementing a. 441 00:15:31,300 --> 00:15:32,233 Trick. To. 442 00:15:32,233 --> 00:15:35,233 Make sure that all the ads are selected. 443 00:15:35,333 --> 00:15:36,666 In the ten first. Rounds. 444 00:15:36,666 --> 00:15:38,700 So very. Challenging, but. 445 00:15:38,700 --> 00:15:40,766 At least try, at least try as hard. 446 00:15:40,766 --> 00:15:41,466 As you can. 447 00:15:41,466 --> 00:15:44,300 And I promise. You that you will. Progress and improve. 448 00:15:44,300 --> 00:15:45,800 Your skills. In machine learning.