1 00:00:00,233 --> 00:00:02,400 Hello and welcome to this art tutorial. 2 00:00:02,400 --> 00:00:04,266 So in the previous tutorial we imported the. 3 00:00:04,266 --> 00:00:06,400 Data set. And today in this. Tutorial. 4 00:00:06,400 --> 00:00:07,800 We're going to implement the. 5 00:00:07,800 --> 00:00:10,800 UCB Upper. Confidence Bound algorithm. 6 00:00:10,966 --> 00:00:14,533 So I'm not sure if it's bad news or good. News, but there is. 7 00:00:14,533 --> 00:00:17,100 Actually no. Package that we can use easily to. 8 00:00:17,100 --> 00:00:19,833 Implement this. UCB algorithm. 9 00:00:19,833 --> 00:00:21,300 So that's the bad news. 10 00:00:21,300 --> 00:00:22,200 But the good news is. 11 00:00:22,200 --> 00:00:24,300 That we. Are going to implement the. 12 00:00:24,300 --> 00:00:26,100 UCB from. Scratch. 13 00:00:26,100 --> 00:00:27,433 And that's good news because that. 14 00:00:27,433 --> 00:00:31,066 Will give you the opportunity to really improve your skills in AR. 15 00:00:31,733 --> 00:00:33,166 So be ready. 16 00:00:33,166 --> 00:00:36,666 It's not going to take 3 or 4 lines like we used to have before. 17 00:00:36,933 --> 00:00:37,533 We are. 18 00:00:37,533 --> 00:00:40,000 Going to implement the whole algorithm from. 19 00:00:40,000 --> 00:00:42,266 Scratch without. Using any package. 20 00:00:42,266 --> 00:00:44,433 And we are going to do. It step by step. 21 00:00:44,433 --> 00:00:46,166 And speaking of steps, let's. 22 00:00:46,166 --> 00:00:47,566 Now actually jump to. 23 00:00:47,566 --> 00:00:48,466 The slide of the. 24 00:00:48,466 --> 00:00:50,900 Upper confidence bound algorithm. 25 00:00:50,900 --> 00:00:53,833 So this algorithm takes three steps. 26 00:00:53,833 --> 00:00:56,366 The first step is at each round end. 27 00:00:56,366 --> 00:00:58,200 We consider two numbers. 28 00:00:58,200 --> 00:00:59,733 For each ad ai. 29 00:00:59,733 --> 00:01:02,200 That is, for each version of the ad. 30 00:01:02,200 --> 00:01:04,300 These two numbers are an AI. 31 00:01:04,300 --> 00:01:05,300 The number of times. 32 00:01:05,300 --> 00:01:06,900 The ad I was selected up. 33 00:01:06,900 --> 00:01:09,700 To round N, and the sum of. Rewards of the ad. 34 00:01:09,700 --> 00:01:10,633 Version AI. 35 00:01:10,633 --> 00:01:13,200 Up to. Round n. Okay, so the. 36 00:01:13,200 --> 00:01:14,233 First thing that we need to do. 37 00:01:14,233 --> 00:01:18,633 Here is to declare these two variables, because we will need them afterwards. 38 00:01:18,900 --> 00:01:19,200 Okay. 39 00:01:19,200 --> 00:01:19,700 So the first. 40 00:01:19,700 --> 00:01:21,233 One, the number of times the ad. 41 00:01:21,233 --> 00:01:24,233 I was selected up. To round n. 42 00:01:24,500 --> 00:01:27,300 Let's actually call this variable. Numbers. 43 00:01:28,233 --> 00:01:31,233 Of selections. 44 00:01:31,533 --> 00:01:32,366 All right. 45 00:01:32,366 --> 00:01:34,566 So we need. To consider this number for. 46 00:01:34,566 --> 00:01:35,633 Each ad AI. 47 00:01:35,633 --> 00:01:36,500 So what we're. Going to do. 48 00:01:36,500 --> 00:01:39,733 Is create a vector that. Will contain each. 49 00:01:39,733 --> 00:01:40,166 Of those. 50 00:01:40,166 --> 00:01:42,900 Numbers of selections. Of each ad I. 51 00:01:42,900 --> 00:01:46,366 So we will set this variable equal to a vector of size d. 52 00:01:46,666 --> 00:01:47,100 And we will. 53 00:01:47,100 --> 00:01:49,000 Initialize all the. D components of. 54 00:01:49,000 --> 00:01:51,000 This vector to zero. 55 00:01:51,000 --> 00:01:53,100 And so how can we do that in R. 56 00:01:53,100 --> 00:01:56,100 Well we simply need to type integer. 57 00:01:56,200 --> 00:01:58,700 And in. Parentheses d. 58 00:01:58,700 --> 00:02:01,600 And this will create a vector of sized. 59 00:02:01,600 --> 00:02:03,566 Containing only. Zeros. 60 00:02:03,566 --> 00:02:05,700 And we are doing. This because of course at the. 61 00:02:05,700 --> 00:02:09,200 First round each version of the ad hasn't been selected. Yet. 62 00:02:09,233 --> 00:02:11,733 So the number of times each version of the. 63 00:02:11,733 --> 00:02:14,033 Ad was selected is of course, zero. 64 00:02:14,033 --> 00:02:15,900 Okay, so done. For the first number. 65 00:02:15,900 --> 00:02:16,300 All right. 66 00:02:16,300 --> 00:02:17,133 And the second number. 67 00:02:17,133 --> 00:02:20,700 Is the sum of rewards of the ad ai up to. 68 00:02:20,700 --> 00:02:23,866 Round n. Okay. So let's call this variable. 69 00:02:24,200 --> 00:02:24,900 Sums. 70 00:02:26,033 --> 00:02:29,033 Of rewards this way. 71 00:02:29,100 --> 00:02:30,166 And same. 72 00:02:30,166 --> 00:02:33,000 You know we need to take the sums of rewards of each. 73 00:02:33,000 --> 00:02:33,666 Version of the. 74 00:02:33,666 --> 00:02:35,466 Ad at each. Round. And so. 75 00:02:35,466 --> 00:02:36,766 We're going to set it as a vector. 76 00:02:36,766 --> 00:02:38,733 Of D components as. Well, exactly. 77 00:02:38,733 --> 00:02:40,566 Like we did for numbers of selections. 78 00:02:40,566 --> 00:02:41,800 And same. We. 79 00:02:41,800 --> 00:02:44,200 Will initialize. It to zero because of course at the. 80 00:02:44,200 --> 00:02:45,733 First round the sum. 81 00:02:45,733 --> 00:02:49,333 Of rewards of each version of the ad is of course zero. 82 00:02:49,566 --> 00:02:52,566 So we will just copy this copy. 83 00:02:52,766 --> 00:02:54,566 And. Paste it here. 84 00:02:54,566 --> 00:02:55,033 All right. 85 00:02:55,033 --> 00:02:57,900 So basically step one done. 86 00:02:57,900 --> 00:02:59,933 Now let's move on to step two. 87 00:02:59,933 --> 00:03:01,200 Step two is. 88 00:03:01,200 --> 00:03:04,200 From these two numbers we compute. 89 00:03:04,333 --> 00:03:07,533 First the average reward of ad ai up. 90 00:03:07,533 --> 00:03:08,666 To round n. 91 00:03:08,666 --> 00:03:12,900 And second the confidence interval at round n. 92 00:03:13,200 --> 00:03:15,066 So basically we need to compute these. 93 00:03:15,066 --> 00:03:16,066 Two numbers one. 94 00:03:16,066 --> 00:03:17,766 At each. Round n. 95 00:03:17,766 --> 00:03:20,033 So let's implement this. Step two. 96 00:03:20,033 --> 00:03:23,433 Since we need to compute these two numbers at each round n. 97 00:03:23,766 --> 00:03:26,833 Of course what we need to do is to make a for loop. 98 00:03:27,266 --> 00:03:29,366 All right. So that's what we're going to do. 99 00:03:29,366 --> 00:03:31,933 We're going to go through all the rounds from. 100 00:03:31,933 --> 00:03:35,033 Zero to the. 10,000 round. 101 00:03:35,433 --> 00:03:36,733 So for each. 102 00:03:36,733 --> 00:03:40,200 Round n so we're calling the rounds n n. 103 00:03:40,700 --> 00:03:41,766 So now we need to input. 104 00:03:41,766 --> 00:03:44,266 The lower bound which. Is one. 105 00:03:44,266 --> 00:03:45,766 So that's the first round. 106 00:03:45,766 --> 00:03:48,666 And then the upper bound which is n. 107 00:03:48,666 --> 00:03:52,366 So n is the total number of rounds that is 10,000 rounds. 108 00:03:52,366 --> 00:03:56,266 So in case you have a problem with more rounds 109 00:03:56,266 --> 00:04:00,700 or more users we will, you know, declare this variable here. 110 00:04:00,900 --> 00:04:03,333 N. Equals well here for our. 111 00:04:03,333 --> 00:04:05,800 Problem it's 10,000. 112 00:04:05,800 --> 00:04:08,800 All right. 10,000. So right now in the loop. 113 00:04:08,900 --> 00:04:10,700 And so what. Do we need to do. 114 00:04:10,700 --> 00:04:13,666 We need to compute for each version of the add. 115 00:04:13,666 --> 00:04:16,400 The average. Reward and the confidence interval. 116 00:04:16,400 --> 00:04:18,233 So that's exactly. What we're going to do. 117 00:04:18,233 --> 00:04:19,300 And since we're doing. 118 00:04:19,300 --> 00:04:21,866 It for each version of the add AI, well. 119 00:04:21,866 --> 00:04:22,966 What we need to do now. 120 00:04:22,966 --> 00:04:25,033 Is. Again another. For loop. 121 00:04:25,033 --> 00:04:27,000 And this time we're going to loop. 122 00:04:27,000 --> 00:04:29,400 Over all. The ten different versions of. 123 00:04:29,400 --> 00:04:30,066 The ad. 124 00:04:30,066 --> 00:04:32,300 So the. Ads are indexed. By AI. 125 00:04:32,300 --> 00:04:34,200 So for I here. 126 00:04:34,200 --> 00:04:37,833 In one and then again we're going to input. 127 00:04:37,833 --> 00:04:41,700 Here D in case you have more versions of your Add 128 00:04:41,700 --> 00:04:44,733 or more arms for your specific problem. 129 00:04:45,100 --> 00:04:49,500 So we are going to declare a new variable here which is D. 130 00:04:49,833 --> 00:04:51,400 And since it's the number of. 131 00:04:51,400 --> 00:04:54,333 Ads we will set it equal to ten. 132 00:04:54,333 --> 00:04:55,266 All right. 133 00:04:55,266 --> 00:04:56,800 And now we're entering into. 134 00:04:56,800 --> 00:04:57,833 The second loop. 135 00:04:57,833 --> 00:04:59,333 So right now this. 136 00:04:59,333 --> 00:05:01,400 Level we are at a specific. 137 00:05:01,400 --> 00:05:03,066 Round and dealing. 138 00:05:03,066 --> 00:05:06,066 With a specific version of the ad. 139 00:05:06,100 --> 00:05:07,100 And now we can. 140 00:05:07,100 --> 00:05:09,000 Compute our. Two numbers. 141 00:05:09,000 --> 00:05:11,100 The average reward of the ad AI. 142 00:05:11,100 --> 00:05:13,066 And the confidence interval. 143 00:05:13,066 --> 00:05:14,700 So let's start. With the first number. 144 00:05:14,700 --> 00:05:16,633 The first number is the average reward. 145 00:05:16,633 --> 00:05:18,300 So let's call it average 146 00:05:19,400 --> 00:05:21,800 reward and equals. 147 00:05:21,800 --> 00:05:23,400 All right. So what does it say. 148 00:05:23,400 --> 00:05:26,666 It says that it's the sum of rewards of the. 149 00:05:26,666 --> 00:05:32,333 Ad version AI up to round n divided by the number of times this ad version I. 150 00:05:32,333 --> 00:05:34,100 Was selected up to round n. 151 00:05:34,100 --> 00:05:36,333 So simply let's write this formula. 152 00:05:36,333 --> 00:05:38,866 We already have these two variables. 153 00:05:38,866 --> 00:05:42,300 Well, we have the vectors, but of course we will take the i'th. 154 00:05:42,600 --> 00:05:43,200 Element. 155 00:05:43,200 --> 00:05:48,133 Of these two vectors because they correspond to our AD version ie of the ad. 156 00:05:48,333 --> 00:05:49,400 So let's do it. 157 00:05:49,400 --> 00:05:52,400 Sums of rewards. 158 00:05:52,933 --> 00:05:54,600 And we take the i'th. 159 00:05:54,600 --> 00:05:56,066 Element of this vector. 160 00:05:56,066 --> 00:05:56,566 Divided. 161 00:05:56,566 --> 00:05:59,566 By numbers. Of. 162 00:05:59,833 --> 00:06:02,133 Selections and same. 163 00:06:02,133 --> 00:06:04,333 We're taking the i'th. Element of. 164 00:06:04,333 --> 00:06:05,633 This vector. 165 00:06:05,633 --> 00:06:06,166 Great. 166 00:06:06,166 --> 00:06:08,700 So we have our first. Number computed. 167 00:06:08,700 --> 00:06:10,200 The average reward. 168 00:06:10,200 --> 00:06:11,233 Now let's take care of the. 169 00:06:11,233 --> 00:06:13,566 Second number the confidence interval. 170 00:06:13,566 --> 00:06:15,200 Well we. Will not. Build. 171 00:06:15,200 --> 00:06:16,800 The whole confidence interval. 172 00:06:16,800 --> 00:06:19,733 What we will compute right now is the upper. 173 00:06:19,733 --> 00:06:21,500 Bound of the confidence interval. 174 00:06:21,500 --> 00:06:23,700 Because that's what we need for step three. 175 00:06:23,700 --> 00:06:24,600 As you can see. 176 00:06:24,600 --> 00:06:26,766 Step three is we select the ad that. 177 00:06:26,766 --> 00:06:28,966 Has the maximum upper confidence. Bounds. 178 00:06:28,966 --> 00:06:30,433 So we just need the upper. 179 00:06:30,433 --> 00:06:33,200 Confidence bounds. Of this. Confidence interval. 180 00:06:33,200 --> 00:06:35,266 And so what is this upper confidence bound. 181 00:06:35,266 --> 00:06:38,800 Well it's the average reward plus delta 182 00:06:38,800 --> 00:06:42,300 I of n and delta I of n is given by this formula. 183 00:06:42,300 --> 00:06:44,666 It's the square root of. 184 00:06:44,666 --> 00:06:46,433 1.5. Log. 185 00:06:46,433 --> 00:06:50,533 N divided by nine n, which is the number of times. 186 00:06:50,533 --> 00:06:53,333 The AD. Version I. Was selected up to. Round n. 187 00:06:53,333 --> 00:06:55,700 So let's compute. This delta first. 188 00:06:55,700 --> 00:06:58,566 And then we'll compute the upper confidence. Bound. 189 00:06:58,566 --> 00:07:03,000 So delta will call it's actually delta underscore AI. 190 00:07:03,300 --> 00:07:06,300 And so it's equal to square root 191 00:07:06,300 --> 00:07:10,033 which I call sqrt parenthesis here okay. 192 00:07:10,033 --> 00:07:10,666 So what do we have. 193 00:07:10,666 --> 00:07:13,666 First we have this three divided by two. 194 00:07:13,666 --> 00:07:15,200 Then times. 195 00:07:15,200 --> 00:07:17,200 And then we take the log. 196 00:07:17,200 --> 00:07:19,800 So log here of. N. 197 00:07:19,800 --> 00:07:22,100 That we divide by. 198 00:07:22,100 --> 00:07:24,433 The numbers. Of. 199 00:07:24,433 --> 00:07:27,466 Selections of the ad version ie. 200 00:07:27,466 --> 00:07:29,733 That is the number of times the ad version I. 201 00:07:29,733 --> 00:07:31,466 Was selected up to. 202 00:07:31,466 --> 00:07:33,066 Round n. 203 00:07:33,066 --> 00:07:35,500 Okay. So great for Delta. Delta is. Ready. 204 00:07:35,500 --> 00:07:36,633 And therefore now we are. 205 00:07:36,633 --> 00:07:39,133 Ready to compute the upper. 206 00:07:39,133 --> 00:07:40,433 Confidence bound. 207 00:07:40,433 --> 00:07:44,233 Which is all the essence of this UCB algorithm. 208 00:07:44,333 --> 00:07:45,466 So let's do it. 209 00:07:45,466 --> 00:07:49,366 Let's compute the UCB and let's call it upper bound. 210 00:07:50,033 --> 00:07:51,033 This way. 211 00:07:51,033 --> 00:07:53,700 Upper bound equals. Average reward. 212 00:07:56,700 --> 00:07:58,233 Plus delta 213 00:07:58,233 --> 00:08:02,066 I exactly like in this slide okay. 214 00:08:02,066 --> 00:08:02,700 So great. 215 00:08:02,700 --> 00:08:04,866 We just computed the average reward. 216 00:08:04,866 --> 00:08:06,533 And the. Upper bound. 217 00:08:06,533 --> 00:08:09,066 And therefore we are done with step two. 218 00:08:09,066 --> 00:08:11,300 So now let's move on to step three. 219 00:08:11,300 --> 00:08:14,633 Step three is to select the ad the. 220 00:08:14,633 --> 00:08:17,500 Ad version I that has the maximum upper. 221 00:08:17,500 --> 00:08:18,200 Bound. 222 00:08:18,200 --> 00:08:18,800 So now. 223 00:08:18,800 --> 00:08:22,833 Things get a little more complicated because we need to actually 224 00:08:22,833 --> 00:08:25,966 create a vector, a huge vector, like a huge list. 225 00:08:26,366 --> 00:08:27,933 That will contain the different. 226 00:08:27,933 --> 00:08:31,400 Versions of the ad that were selected at each round. 227 00:08:31,533 --> 00:08:33,033 So let's do this. 228 00:08:33,033 --> 00:08:34,366 We're going to declare here a new. 229 00:08:34,366 --> 00:08:37,033 Variable that we're going to call ad. 230 00:08:37,033 --> 00:08:38,966 Underscore. Select id. 231 00:08:38,966 --> 00:08:44,133 And this variable is going to be the huge vector that will give us the list of all. 232 00:08:44,133 --> 00:08:45,033 The different versions. 233 00:08:45,033 --> 00:08:47,366 Of the ad that are selected at each round. 234 00:08:47,366 --> 00:08:49,033 That is, you know, at the end of the. 235 00:08:49,033 --> 00:08:50,533 Algorithm when we run it. 236 00:08:50,533 --> 00:08:52,200 The ad selected will be a vector. 237 00:08:52,200 --> 00:08:53,866 Of 10,000 elements. 238 00:08:53,866 --> 00:08:57,866 And each of these elements will be the ad that was selected at each round. 239 00:08:58,100 --> 00:09:01,600 So we will clearly see the results of the strategy used by the algorithm. 240 00:09:02,100 --> 00:09:02,700 Okay. 241 00:09:02,700 --> 00:09:05,700 So as usual we. Need to. Initialize this. 242 00:09:05,866 --> 00:09:07,900 And we are simply going to initialize. 243 00:09:07,900 --> 00:09:08,466 It as. 244 00:09:08,466 --> 00:09:10,200 An. Empty vector. 245 00:09:10,200 --> 00:09:12,466 Because what we'll do next is to. 246 00:09:12,466 --> 00:09:14,866 Append the different ads one by one. 247 00:09:14,866 --> 00:09:18,066 Up to the. Last round. Round 10,000. Okay. 248 00:09:18,366 --> 00:09:21,700 So now the question is how are we going to append the different. 249 00:09:21,700 --> 00:09:22,900 Versions of the ad. 250 00:09:22,900 --> 00:09:25,566 In. This ad selected vector. 251 00:09:25,566 --> 00:09:27,866 Well let's go back to the slide. 252 00:09:27,866 --> 00:09:30,600 Step three is we select the ad I. 253 00:09:30,600 --> 00:09:33,500 That has the maximum upper confidence bounds. 254 00:09:33,500 --> 00:09:34,866 So we already computed. 255 00:09:34,866 --> 00:09:37,200 The. Upper confidence bound and now we need. 256 00:09:37,200 --> 00:09:37,733 To create. 257 00:09:37,733 --> 00:09:40,333 Another. Variable that will be the highest. 258 00:09:40,333 --> 00:09:41,566 Upper confidence bounds. 259 00:09:41,566 --> 00:09:42,333 Because right now. 260 00:09:42,333 --> 00:09:44,933 This upper bound variable here is just the upper. 261 00:09:44,933 --> 00:09:45,733 Bound for. 262 00:09:45,733 --> 00:09:48,000 Each of the diverse versions of the ad at. 263 00:09:48,000 --> 00:09:49,200 Round N. 264 00:09:49,200 --> 00:09:51,166 And so that's why we need to create another. 265 00:09:51,166 --> 00:09:53,900 Variable that will take. The maximum. 266 00:09:53,900 --> 00:09:55,133 Of these upper bounds. 267 00:09:55,133 --> 00:09:57,466 Here of the ten ads at round N. 268 00:09:57,466 --> 00:09:59,666 So let's create this new variable. 269 00:09:59,666 --> 00:10:02,500 And we're going to call it max upper bound. 270 00:10:02,500 --> 00:10:06,900 So since this max upper bound variable is going to be different at each round. 271 00:10:07,233 --> 00:10:10,466 Well we need to initialize it at each new round. 272 00:10:10,500 --> 00:10:12,500 So therefore this variable. 273 00:10:12,500 --> 00:10:14,133 Max upper bound will be. 274 00:10:14,133 --> 00:10:15,866 Initialized right here. 275 00:10:15,866 --> 00:10:19,200 So we will initialize this. Max upper bound. 276 00:10:19,466 --> 00:10:22,366 Variable here to zero. 277 00:10:22,366 --> 00:10:23,466 And then what happens is. 278 00:10:23,466 --> 00:10:24,866 That we will compute the. 279 00:10:24,866 --> 00:10:27,433 Different upper bounds of each of the ten ads. 280 00:10:27,433 --> 00:10:28,866 And then we will compare. These. 281 00:10:28,866 --> 00:10:31,133 Upper bounds to this max upper bound. 282 00:10:31,133 --> 00:10:33,100 And each time the upper bound of the. 283 00:10:33,100 --> 00:10:34,333 I here will be. 284 00:10:34,333 --> 00:10:37,300 Higher than max upper bound, then we will set max upper. 285 00:10:37,300 --> 00:10:39,400 Bound equals to upper bound. 286 00:10:39,400 --> 00:10:40,566 So that's the idea. 287 00:10:40,566 --> 00:10:42,733 So let's do that right now. 288 00:10:42,733 --> 00:10:44,866 So basically what we have to do here is. 289 00:10:44,866 --> 00:10:45,433 In this. 290 00:10:45,433 --> 00:10:48,833 For loop we need to add. A new if condition. 291 00:10:49,133 --> 00:10:51,433 And this condition. Will be if. 292 00:10:51,433 --> 00:10:55,200 Upper bound larger than max upper bound. 293 00:10:55,600 --> 00:10:59,633 And then what happens if upper bound is larger than the max upper bound. 294 00:10:59,666 --> 00:11:02,666 Then we need to set max upper bound. 295 00:11:02,733 --> 00:11:04,833 Equal to. Upper bound. 296 00:11:04,833 --> 00:11:06,933 This way you know, what will happen is. That. 297 00:11:06,933 --> 00:11:11,366 We will compute the different upper bounds of each of the ten ads at round n. 298 00:11:12,000 --> 00:11:13,533 At first, this max upper bound is. 299 00:11:13,533 --> 00:11:14,433 Equal to zero. 300 00:11:14,433 --> 00:11:16,200 Then we compute the first upper bound. 301 00:11:16,200 --> 00:11:18,266 And of. Course it will be larger. 302 00:11:18,266 --> 00:11:21,033 Than max upper bound because it's. Equal to zero. 303 00:11:21,033 --> 00:11:22,200 So then max upper bound. 304 00:11:22,200 --> 00:11:25,133 Will be equal to upper. Bound. So that's for the first add. 305 00:11:25,133 --> 00:11:26,233 And then we will compute. 306 00:11:26,233 --> 00:11:28,733 The other upper bounds of the other ads. 307 00:11:28,733 --> 00:11:31,366 And each time we find an upper bound that. Is larger. 308 00:11:31,366 --> 00:11:34,066 The max upper bound then max upper bound will be. 309 00:11:34,066 --> 00:11:37,133 Equal to this new upper bound. All right. 310 00:11:37,600 --> 00:11:39,133 So this way we will. 311 00:11:39,133 --> 00:11:42,966 Get the maximum of the different upper bounds of the ten ads at the 312 00:11:43,000 --> 00:11:44,800 specific round n. 313 00:11:44,800 --> 00:11:45,400 All right. 314 00:11:45,400 --> 00:11:47,566 And now we need to do one more thing. 315 00:11:47,566 --> 00:11:47,933 You know, we. 316 00:11:47,933 --> 00:11:50,433 Need to select the ad that has the highest upper. 317 00:11:50,433 --> 00:11:51,000 Bound. 318 00:11:51,000 --> 00:11:54,000 And therefore each time we find this upper 319 00:11:54,000 --> 00:11:57,000 bound to be larger than the max upper bound, not only. 320 00:11:57,000 --> 00:11:59,266 We need to do this to. Keep the max upper bound. 321 00:11:59,266 --> 00:12:01,500 But also we need to keep track. 322 00:12:01,500 --> 00:12:03,700 Of the index that has the max upper bound. 323 00:12:03,700 --> 00:12:05,966 And to. Keep track of. This index, what we need. 324 00:12:05,966 --> 00:12:10,300 To do here is to create a new variable here, which we'll call ad. 325 00:12:10,900 --> 00:12:12,133 And we will make. 326 00:12:12,133 --> 00:12:14,700 It equal to I, because right. 327 00:12:14,700 --> 00:12:16,266 Now we are dealing with a specific. 328 00:12:16,266 --> 00:12:20,400 Ad because we are at a specific I of this for I loop here. 329 00:12:20,666 --> 00:12:24,533 And so I here has a specific value that corresponds to a specific ad. 330 00:12:24,933 --> 00:12:25,666 And so we need. 331 00:12:25,666 --> 00:12:27,533 To keep track of this specific ad. 332 00:12:27,533 --> 00:12:29,766 Each time we find an upper bound that. 333 00:12:29,766 --> 00:12:30,300 Is larger. 334 00:12:30,300 --> 00:12:34,066 Than the max upper bound to become this new max upper bound. 335 00:12:34,566 --> 00:12:36,033 All right. So that's great. 336 00:12:36,033 --> 00:12:37,200 But you know, when. 337 00:12:37,200 --> 00:12:39,900 We use a new variable here, it's always. 338 00:12:39,900 --> 00:12:41,800 Important to. Initialize it. 339 00:12:41,800 --> 00:12:43,200 So that's what we'll do. 340 00:12:43,200 --> 00:12:45,033 Right now we will initialize. 341 00:12:45,033 --> 00:12:46,466 This ad variable. 342 00:12:46,466 --> 00:12:47,700 We will initialize it. 343 00:12:47,700 --> 00:12:50,100 To zero okay. 344 00:12:50,100 --> 00:12:51,466 So we are getting close. 345 00:12:51,466 --> 00:12:53,933 Now what we need to do is deal with the. 346 00:12:53,933 --> 00:12:55,166 Initial conditions. 347 00:12:55,166 --> 00:12:58,166 Because this is what happens. At round N. 348 00:12:58,466 --> 00:13:00,033 You know this is the strategy. 349 00:13:00,033 --> 00:13:01,500 That happens at round N. 350 00:13:01,500 --> 00:13:03,900 But this is actually not what happens at the beginning. 351 00:13:03,900 --> 00:13:05,266 Because at the beginning, 352 00:13:05,266 --> 00:13:09,200 you know, during the ten first rounds, we don't have much information of the ads. 353 00:13:09,233 --> 00:13:12,033 We don't have much informations. About their reward. 354 00:13:12,033 --> 00:13:15,033 Whether they earned reward equals one or reward equals. 355 00:13:15,033 --> 00:13:17,766 Zero when we selected them. Because simply we. 356 00:13:17,766 --> 00:13:19,433 Haven't selected them yet. 357 00:13:19,433 --> 00:13:21,866 So that's why we need to deal with. 358 00:13:21,866 --> 00:13:23,133 The initial conditions, that. 359 00:13:23,133 --> 00:13:25,500 Is, choose which ads we will select. 360 00:13:25,500 --> 00:13:27,100 During the ten first rounds. 361 00:13:27,100 --> 00:13:30,833 And so according to you, what will be the strategy to select the ads during the. 362 00:13:30,833 --> 00:13:31,833 Ten first rounds? 363 00:13:31,833 --> 00:13:34,400 Given the fact that we don't have any information? 364 00:13:34,400 --> 00:13:36,966 Well, there's. Actually no. Strategy. 365 00:13:36,966 --> 00:13:38,666 We will simply select the ten first. 366 00:13:38,666 --> 00:13:40,966 Ads without using the strategy here. 367 00:13:40,966 --> 00:13:43,133 We will use this strategy as soon. 368 00:13:43,133 --> 00:13:44,533 As we have some informations of the. 369 00:13:44,533 --> 00:13:46,966 Reward of each of the. Ten ads. 370 00:13:46,966 --> 00:13:48,466 So basically what we'll do. 371 00:13:48,466 --> 00:13:49,866 During the first ten rounds. 372 00:13:49,866 --> 00:13:52,800 Is to simply select the ten first ads. 373 00:13:52,800 --> 00:13:55,733 That is round. One. We'll select add. One. 374 00:13:55,733 --> 00:13:58,500 Round two we'll select ad to round three we'll select 375 00:13:58,500 --> 00:14:01,600 at three, up to round ten we'll select at ten. 376 00:14:01,900 --> 00:14:02,566 And then 377 00:14:02,566 --> 00:14:06,800 that will give us some information about, you know, the number of selections. 378 00:14:06,800 --> 00:14:08,233 Of each. Of the ten. Ads. 379 00:14:08,233 --> 00:14:10,900 Well at around 11 the number of selections will. 380 00:14:10,900 --> 00:14:13,200 Be one. For each of the ten ads. 381 00:14:13,200 --> 00:14:15,700 And we'll also get some information about the sums of. 382 00:14:15,700 --> 00:14:16,433 Rewards. 383 00:14:16,433 --> 00:14:18,500 The sums of rewards will contain either. 384 00:14:18,500 --> 00:14:21,600 Zeros corresponding to the ads that go to zero reward. 385 00:14:21,600 --> 00:14:25,800 When they were selected during the first ten rounds, or once corresponding. 386 00:14:25,833 --> 00:14:26,833 To the ads that go to. 387 00:14:26,833 --> 00:14:29,833 One reward when they were selected during the first ten rounds. 388 00:14:30,166 --> 00:14:32,933 So let's do it now. Let's simply select add. 389 00:14:32,933 --> 00:14:34,300 One at. Two at three, up to. 390 00:14:34,300 --> 00:14:37,266 At ten during the first ten rounds, and then let's. 391 00:14:37,266 --> 00:14:38,633 Use this strategy. 392 00:14:38,633 --> 00:14:41,233 So to do this we will add. 393 00:14:41,233 --> 00:14:43,900 A if condition. Here which will be. 394 00:14:43,900 --> 00:14:45,366 If numbers. 395 00:14:45,366 --> 00:14:47,400 Of. Selections. 396 00:14:47,400 --> 00:14:48,833 I is. 397 00:14:48,833 --> 00:14:50,866 Larger than. Zero. 398 00:14:50,866 --> 00:14:51,733 So that means. 399 00:14:51,733 --> 00:14:52,933 If. The ad version. 400 00:14:52,933 --> 00:14:56,000 I was selected at least. Once, then we. 401 00:14:56,000 --> 00:14:57,033 Will use this. Strategy. 402 00:14:57,033 --> 00:14:59,866 And actually we need to align this. 403 00:14:59,866 --> 00:15:01,500 And so now thanks to this condition. 404 00:15:01,500 --> 00:15:03,866 Here, this strategy will be applied. 405 00:15:03,866 --> 00:15:06,366 After the ten first rounds. Okay. 406 00:15:06,366 --> 00:15:08,000 And now we. Just need to add a little. 407 00:15:08,000 --> 00:15:10,633 Something to make. Sure that the. Algorithm. 408 00:15:10,633 --> 00:15:12,833 Selects at one at two, at three of. 409 00:15:12,833 --> 00:15:14,933 Two, at ten during the ten first rounds. 410 00:15:14,933 --> 00:15:15,366 And to. 411 00:15:15,366 --> 00:15:19,100 Do this the trick is to add. 412 00:15:19,300 --> 00:15:22,033 An else. Here else. 413 00:15:22,033 --> 00:15:24,800 And then. We will set the upper. Bounds. 414 00:15:24,800 --> 00:15:29,600 To a very large value, like ten at the power of 400. 415 00:15:30,266 --> 00:15:34,500 So to get this we can use one 400. 416 00:15:34,733 --> 00:15:37,666 And that's ten to the power. Of 400. 417 00:15:37,666 --> 00:15:38,700 So now I'd like to give you. 418 00:15:38,700 --> 00:15:40,033 A little enigma. 419 00:15:40,033 --> 00:15:42,800 I would like you to figure out why we're using this 420 00:15:42,800 --> 00:15:46,433 very large value ten to the power of 400 for the upper. 421 00:15:46,433 --> 00:15:48,466 Bound here in this else condition. 422 00:15:48,466 --> 00:15:49,666 Try to figure out why. 423 00:15:49,666 --> 00:15:52,866 Try to figure out how it will be useful for what we want. 424 00:15:53,366 --> 00:15:56,700 And now I'll give you the answer in the explanation in the next tutorial. 425 00:15:57,133 --> 00:15:58,800 Until then, enjoy machine learning.