1 00:00:00,233 --> 00:00:02,566 Hello and welcome to this art tutorial. 2 00:00:02,566 --> 00:00:04,633 So let's pick up where we left off. 3 00:00:04,633 --> 00:00:08,533 That is, let's try to figure out why we're using this upper bound equal to this 4 00:00:08,700 --> 00:00:11,766 very large number ten of the power of 400 given 5 00:00:11,766 --> 00:00:15,066 to this upper bound variable here in this else condition. 6 00:00:15,400 --> 00:00:16,833 Well let's see what happens. 7 00:00:16,833 --> 00:00:19,833 You know, let's see what happens at first round. 8 00:00:19,866 --> 00:00:23,333 Well, at first round we will go through the ten versions of the Add 9 00:00:23,400 --> 00:00:25,500 thanks to this four eye loop here. 10 00:00:25,500 --> 00:00:28,566 And since that the first round no add was selected. 11 00:00:28,966 --> 00:00:32,700 Well this condition here, if numbers of selections 12 00:00:32,700 --> 00:00:36,000 eye is higher than zero will never be true. 13 00:00:36,566 --> 00:00:40,066 And therefore we will directly go to this else here. 14 00:00:40,566 --> 00:00:43,366 And accordingly, the upper bound will be set 15 00:00:43,366 --> 00:00:46,366 equal to ten to the power of 400. 16 00:00:46,466 --> 00:00:50,233 Then we move on to this condition because this is the next step of this for 17 00:00:50,266 --> 00:00:51,000 loop here. 18 00:00:51,000 --> 00:00:54,533 And it says that if upper bound is larger than max upper bound, 19 00:00:54,666 --> 00:00:55,733 which is true, of course, 20 00:00:55,733 --> 00:00:59,500 because upper bound is ten of the power of 400 and max upper bound is zero. 21 00:00:59,533 --> 00:01:01,066 So this condition is true. 22 00:01:01,066 --> 00:01:05,066 So what happens next is that max upper bound is equal to upper bound. 23 00:01:05,066 --> 00:01:08,233 So max upper bound will be equal to ten at the power of 400 24 00:01:08,700 --> 00:01:11,233 and add equals I. 25 00:01:11,233 --> 00:01:14,666 And so since we're at the beginning of the for loop here I equals one. 26 00:01:14,700 --> 00:01:16,766 So add will be equal to one. 27 00:01:16,766 --> 00:01:18,000 And then what happens. 28 00:01:18,000 --> 00:01:22,266 Then we go to the next step of this for I loop here which is I equals 29 00:01:22,266 --> 00:01:25,633 two and I equals two corresponds to the second add. 30 00:01:26,033 --> 00:01:28,200 And the second add hasn't been selected yet. 31 00:01:28,200 --> 00:01:32,700 So numbers of selections I here will not be larger than zero. 32 00:01:32,966 --> 00:01:34,900 So this condition will not be true here. 33 00:01:34,900 --> 00:01:37,566 So again we will get to this else here. 34 00:01:37,566 --> 00:01:40,600 And therefore upper bound will be equal to ten to the power of 400. 35 00:01:41,200 --> 00:01:43,200 And then we go to this if here. 36 00:01:43,200 --> 00:01:45,466 So now let's see if this condition is true. 37 00:01:45,466 --> 00:01:48,466 Upper bound is equal to ten to the power of 400. 38 00:01:48,533 --> 00:01:52,200 And remember, from the previous round, max upper bound was set equal 39 00:01:52,200 --> 00:01:56,400 to ten at the power of 400, because it was set equal to upper bound. 40 00:01:56,866 --> 00:01:59,866 So this condition, translated in real terms, is 41 00:02:00,133 --> 00:02:04,766 if ten of the power of 400 is larger than ten at the power of 400. 42 00:02:05,033 --> 00:02:06,266 And that is not true, 43 00:02:06,266 --> 00:02:10,133 then at the power of 400 is not larger than ten to the power 400. 44 00:02:10,400 --> 00:02:15,900 So that condition is not true, and therefore add is not set equal to two, 45 00:02:16,033 --> 00:02:19,666 because right now I equals two, but it remains equal to one. 46 00:02:20,066 --> 00:02:22,800 And that's why at the first round the add 47 00:02:22,800 --> 00:02:25,866 that will be selected is add equals one. 48 00:02:26,066 --> 00:02:28,200 That is the first add. 49 00:02:28,200 --> 00:02:30,533 You can try with the other values of I here. 50 00:02:30,533 --> 00:02:33,533 Upper bound will always be equal to ten to the power 400. 51 00:02:33,700 --> 00:02:36,966 And max upper bound will stay equal to ten of the power 400. 52 00:02:37,200 --> 00:02:41,333 So therefore this condition will never be verified for the nine remaining at here. 53 00:02:41,366 --> 00:02:44,300 And therefore we keep this at equals one. 54 00:02:44,300 --> 00:02:46,633 And that's the same principle for the next round. 55 00:02:46,633 --> 00:02:49,266 Up to ten at round n equals two. 56 00:02:49,266 --> 00:02:53,333 This number of selections I here will be larger than zero only 57 00:02:53,333 --> 00:02:56,800 for the first add, because the first add was selected at round one. 58 00:02:57,100 --> 00:03:00,200 And therefore this condition will be true only for the first add. 59 00:03:00,533 --> 00:03:04,533 So upper bound will be equal to this average reward plus delta I. 60 00:03:04,533 --> 00:03:05,233 Here. 61 00:03:05,233 --> 00:03:09,600 But then we'll move on to I plus one which will correspond to at version two. 62 00:03:09,933 --> 00:03:12,800 And since add version two wasn't selected yet, 63 00:03:12,800 --> 00:03:17,033 well this number of selections I here will be equal to zero. 64 00:03:17,166 --> 00:03:19,733 And therefore this condition won't be verified. 65 00:03:19,733 --> 00:03:22,066 And therefore we will go to this else here. 66 00:03:22,066 --> 00:03:25,500 And upper bound will be equal to ten at the power of 400. 67 00:03:25,566 --> 00:03:30,400 We will forget this upper bound value here computed at the previous I here. 68 00:03:30,633 --> 00:03:31,866 And therefore what happens next 69 00:03:31,866 --> 00:03:34,866 is the same upper bound will be larger than max upper bound, 70 00:03:34,866 --> 00:03:38,633 because max upper bound was equal to the upper bound at the previous I, 71 00:03:38,700 --> 00:03:42,566 and the upper bound of the previous I was average reward plus delta I, 72 00:03:42,766 --> 00:03:46,200 which is of course lower than this ten at the power of 400. 73 00:03:46,300 --> 00:03:49,000 And that's why, by the way, we set it to a very large value. 74 00:03:49,000 --> 00:03:50,700 It's in order to have this 75 00:03:50,700 --> 00:03:53,966 upper bound here lower than this ten at the power of 400. 76 00:03:54,233 --> 00:03:57,233 And therefore max upper bound will be ten at the power of 400. 77 00:03:57,633 --> 00:04:01,533 And we will select add equals I, that is add number two. 78 00:04:01,866 --> 00:04:05,400 And then same principle as before when we'll move on to the next 79 00:04:05,400 --> 00:04:08,500 i's upper bound will be equal to ten to the power of 400. 80 00:04:08,766 --> 00:04:12,133 Max upper bound will also be equal to ten to the power of 400. 81 00:04:12,400 --> 00:04:16,900 So this condition won't be verified and therefore we will keep add number two. 82 00:04:17,700 --> 00:04:20,066 So that's why this nice little trick here works perfectly 83 00:04:20,066 --> 00:04:23,066 well for us and gives us exactly what we want. 84 00:04:23,366 --> 00:04:26,366 All right so the ten first round select the ten add. 85 00:04:26,466 --> 00:04:30,100 And then after round ten we use this strategy to select the add. 86 00:04:30,733 --> 00:04:31,100 All right. 87 00:04:31,100 --> 00:04:34,300 So now the only thing that we need to do is to append 88 00:04:34,666 --> 00:04:38,700 the selected add here in this add selected vector here. 89 00:04:38,866 --> 00:04:40,800 And that's exactly what we'll do now. 90 00:04:40,800 --> 00:04:42,200 So let's do it. 91 00:04:42,200 --> 00:04:46,466 We just need to get out of the for I loop because we're done with this loop. 92 00:04:46,466 --> 00:04:48,166 We did exactly what we had to do. 93 00:04:48,166 --> 00:04:50,000 That is to select the right add. 94 00:04:50,000 --> 00:04:52,733 And now we need to get out of this for eye loop. 95 00:04:52,733 --> 00:04:57,733 But remain in this for loop because we are still at a specific round end. 96 00:04:58,100 --> 00:05:00,766 And what we have to do now is to append 97 00:05:00,766 --> 00:05:03,866 the add that was selected here to this huge vector. 98 00:05:03,866 --> 00:05:08,266 Add selected that contains all the different add selected at each round. 99 00:05:08,666 --> 00:05:11,166 Okay, so now things are easy. 100 00:05:11,166 --> 00:05:15,033 We just need to use the append function on this 101 00:05:15,033 --> 00:05:18,533 add selected huge vector and then append. 102 00:05:18,566 --> 00:05:19,733 Here it is. 103 00:05:19,733 --> 00:05:20,733 All right. 104 00:05:20,733 --> 00:05:24,000 And now in this append function we enter as input. 105 00:05:24,300 --> 00:05:27,600 First input add selected and second input add. 106 00:05:28,000 --> 00:05:29,000 Because it corresponds 107 00:05:29,000 --> 00:05:32,566 to the index of the add that was selected in this for I loop here. 108 00:05:33,366 --> 00:05:34,266 All right. 109 00:05:34,266 --> 00:05:36,500 So done. Add appended. 110 00:05:36,500 --> 00:05:39,666 And now since we've just selected a new add here, 111 00:05:39,800 --> 00:05:44,133 what we need to do is to update this vector number of selections here. 112 00:05:44,433 --> 00:05:47,966 That is you know the vector telling for each add the number of times 113 00:05:47,966 --> 00:05:49,166 it was selected. 114 00:05:49,166 --> 00:05:52,833 So since here we know which index of the L was just selected, 115 00:05:53,100 --> 00:05:56,900 what we need to do is to add a plus one in this particular 116 00:05:56,900 --> 00:06:00,933 index of this numbers of selections vector to update this vector. 117 00:06:01,166 --> 00:06:02,500 So let's do it right now. 118 00:06:02,500 --> 00:06:07,400 We will stay of course in this for end loop because this vector 119 00:06:07,400 --> 00:06:11,666 will contain the numbers of times each add was selected at the specific round. 120 00:06:11,666 --> 00:06:13,466 And so we need to stay in the loop. 121 00:06:13,466 --> 00:06:17,566 And we will simply take this vector here. 122 00:06:18,300 --> 00:06:21,900 Copy and paste it here to update it. 123 00:06:22,133 --> 00:06:25,133 And so what we need to update is the add 124 00:06:25,200 --> 00:06:28,500 index of this numbers of selections vector. 125 00:06:28,500 --> 00:06:32,166 Because this add index corresponds to the index of the add 126 00:06:32,166 --> 00:06:33,300 that was just selected. 127 00:06:33,300 --> 00:06:36,300 Here a selection that is based on all this strategy. 128 00:06:36,700 --> 00:06:42,000 And so what we simply need to do is increment by one this vector here. 129 00:06:42,000 --> 00:06:44,033 So we will copy that. 130 00:06:44,033 --> 00:06:47,800 Again copy equals paste. 131 00:06:47,800 --> 00:06:50,800 And then a little plus one here. All right. 132 00:06:50,833 --> 00:06:54,900 So now these numbers of selections vector is well updated. 133 00:06:55,133 --> 00:06:58,133 And that's all we need to do for this vector at this specific round. 134 00:06:58,133 --> 00:07:01,833 Because of course only one add was selected okay. 135 00:07:01,833 --> 00:07:04,166 And now we need to take care of the rewards 136 00:07:04,166 --> 00:07:07,466 because indeed we have this sums of rewards vector here 137 00:07:07,766 --> 00:07:10,766 that we need to update, because this vector contains 138 00:07:10,866 --> 00:07:15,866 the different sums of rewards of each of the ten ads at each round. 139 00:07:15,966 --> 00:07:17,800 So we need to update it. 140 00:07:17,800 --> 00:07:21,133 And of course afterwards we would like to get the total reward 141 00:07:21,400 --> 00:07:24,233 that you know will be a variable containing the unique 142 00:07:24,233 --> 00:07:28,266 sum of rewards we accumulate it over the N rounds. 143 00:07:28,333 --> 00:07:31,866 So let's first take care of this sums of rewards vector here. 144 00:07:32,033 --> 00:07:34,600 And then we'll take care of a total reward. 145 00:07:34,600 --> 00:07:38,300 So to update this sums of rewards vector here, what we need to get 146 00:07:38,300 --> 00:07:42,933 now is the reward we get at this specific round n 147 00:07:43,333 --> 00:07:46,233 because you know we just selected this ad here. 148 00:07:46,233 --> 00:07:48,900 But we haven't got its reward yet. 149 00:07:48,900 --> 00:07:50,733 We just selected the ad so far. 150 00:07:50,733 --> 00:07:52,366 Now we need to get the reward. 151 00:07:52,366 --> 00:07:56,666 So in real life what happens is that, you know, we show the ad to the user 152 00:07:56,833 --> 00:07:59,833 and then the user clicks yes or no on the ad. 153 00:08:00,033 --> 00:08:02,333 But here we're not in real life. 154 00:08:02,333 --> 00:08:05,766 As much as I would have loved to show you a real experiment of this 155 00:08:05,966 --> 00:08:07,500 right now in front of your eyes. 156 00:08:07,500 --> 00:08:11,866 This would not be that simple, but we have this simulation data set. 157 00:08:12,133 --> 00:08:15,033 You know, this data set here that, you know, 158 00:08:15,033 --> 00:08:18,033 contains the real results only known by God. 159 00:08:18,200 --> 00:08:22,000 You know, because we have no idea of which ad each user would click on. 160 00:08:22,166 --> 00:08:23,533 So as a reminder, 161 00:08:23,533 --> 00:08:28,033 the first user clicks on ad one at five and AD nine, not the rest of them. 162 00:08:28,366 --> 00:08:30,966 So that's just a simulation data set. 163 00:08:30,966 --> 00:08:34,500 And so what we have to do now is get the reward at each round. 164 00:08:34,500 --> 00:08:37,900 And based on the ad there was selected thanks to this data set. 165 00:08:38,166 --> 00:08:38,833 So let's do it. 166 00:08:38,833 --> 00:08:43,600 What we'll simply do is get the reward that is either 1 167 00:08:43,600 --> 00:08:47,833 or 0 at this specific round end where we are at. 168 00:08:48,166 --> 00:08:50,000 And so to get this it's really simple. 169 00:08:50,000 --> 00:08:54,866 What we need to do is, you know, take our data set and in the brackets 170 00:08:54,900 --> 00:08:59,400 we'll need to specify the index of the line where we are at, which is 171 00:08:59,400 --> 00:09:03,000 you know, the round n corresponding to the round where we are right now. 172 00:09:03,200 --> 00:09:07,200 So since all the lines of this data set here are nothing else 173 00:09:07,200 --> 00:09:10,200 but the rounds, well, the first index will be the round. 174 00:09:10,200 --> 00:09:12,766 So for example, let's say we are at round nine. 175 00:09:12,766 --> 00:09:16,700 Well we will need to take the line index nine of the data set. 176 00:09:17,066 --> 00:09:18,133 And then for the columns. 177 00:09:18,133 --> 00:09:22,133 Since the columns here correspond to the rewards of the ten different ads, 178 00:09:22,400 --> 00:09:25,400 well we will need to take the index of the ad to a selected. 179 00:09:25,500 --> 00:09:28,400 That is nothing else than this ad index. 180 00:09:28,400 --> 00:09:28,633 Here. 181 00:09:30,033 --> 00:09:31,200 So that's the idea. 182 00:09:31,200 --> 00:09:32,400 That's what we have to do. 183 00:09:32,400 --> 00:09:36,266 But of course, in real life you'll get what really happens with the users. 184 00:09:36,300 --> 00:09:37,633 So I'm going to close this. 185 00:09:37,633 --> 00:09:42,266 And right now I'm going to get the real reward at round N 186 00:09:42,400 --> 00:09:45,933 that we get with the selection of this ad index here. 187 00:09:46,300 --> 00:09:47,466 So let's do it. 188 00:09:47,466 --> 00:09:49,500 We will call this real reward. 189 00:09:49,500 --> 00:09:51,733 Simply reward. All right. 190 00:09:51,733 --> 00:09:55,566 And now as I just explained we need to take the data set and then brackets. 191 00:09:56,000 --> 00:09:59,400 Then we need to take the index of the line that corresponds to round n. 192 00:09:59,666 --> 00:10:02,900 So n here and then we need to take the index of the column 193 00:10:03,133 --> 00:10:06,566 that corresponds to the index of the ad that was just selected. 194 00:10:06,600 --> 00:10:08,400 So it's ad here. 195 00:10:08,400 --> 00:10:09,000 And that's all. 196 00:10:09,000 --> 00:10:13,133 We simply get the real reward at round N for this specific selection of this 197 00:10:13,133 --> 00:10:15,266 ad by using this. 198 00:10:15,266 --> 00:10:16,800 Okay. So great. 199 00:10:16,800 --> 00:10:18,100 We've just got the real reward. 200 00:10:18,100 --> 00:10:22,166 And now we can update this sum of rewards vector here. 201 00:10:22,166 --> 00:10:23,466 That as a reminder, gives 202 00:10:23,466 --> 00:10:27,300 to sons of rewards of each of the ten ads at each one end. 203 00:10:27,633 --> 00:10:30,700 So we will take this copy. 204 00:10:31,333 --> 00:10:36,066 And then just below we will increment this vector. 205 00:10:36,300 --> 00:10:39,900 And of course we need to take the ad index of this vector. 206 00:10:39,900 --> 00:10:43,133 Because only the ad that was selected at this specific 207 00:10:43,133 --> 00:10:46,200 round end will have its sum of rewards changed. 208 00:10:46,400 --> 00:10:49,066 So that's the only sum of rewards we need to update. 209 00:10:49,066 --> 00:10:53,066 And therefore what we need to do is to increment it by the reward, 210 00:10:53,200 --> 00:10:58,366 not by one of course, only by the reward, because the reward is either 0 or 1. 211 00:10:58,766 --> 00:11:00,700 So equals here. 212 00:11:00,700 --> 00:11:03,000 And then we take that again. 213 00:11:03,000 --> 00:11:06,000 And then plus reward. 214 00:11:06,000 --> 00:11:10,866 So if we go to zero reward, the sum of rewards for this specific ad won't change. 215 00:11:10,966 --> 00:11:12,600 And if the reward equals one, 216 00:11:12,600 --> 00:11:16,500 this sum of rewards for this specific ad will be incremented by one. 217 00:11:17,033 --> 00:11:17,700 Okay. 218 00:11:17,700 --> 00:11:19,833 And now we just have to do one last thing. 219 00:11:19,833 --> 00:11:24,866 It's of course to get the total reward accumulated over the N rounds. 220 00:11:24,933 --> 00:11:29,066 Well, the total reward is not very interesting for us at any round 221 00:11:29,066 --> 00:11:33,400 N, but at the last round end that is at the ten thousands round, 222 00:11:33,666 --> 00:11:36,466 because we will compare it, of course, to the total reward 223 00:11:36,466 --> 00:11:38,800 we got with this random selection algorithm, 224 00:11:38,800 --> 00:11:42,566 which as a reminder, was 1002 hundred on average. 225 00:11:42,733 --> 00:11:46,566 So that's why we're very excited to find out about this total reward. 226 00:11:46,733 --> 00:11:49,733 But at the end of the 10,000 rounds. 227 00:11:49,866 --> 00:11:54,066 So of course we need to initialize this total reward variable because, 228 00:11:54,700 --> 00:11:57,733 you know, we are updating it at each round. 229 00:11:57,900 --> 00:12:00,866 So we need to give it initial value like in physics. 230 00:12:00,866 --> 00:12:04,866 And this initial value will give is of course zero because at around 231 00:12:04,866 --> 00:12:09,166 zero at the beginning of this experiment, the total reward is of course zero. 232 00:12:09,300 --> 00:12:12,133 We haven't accumulated any reward yet. 233 00:12:12,133 --> 00:12:16,000 So let's declare this new variable a total reward 234 00:12:17,466 --> 00:12:20,900 right here and then set it equal to zero. 235 00:12:21,433 --> 00:12:24,166 And now very simply, what we need to do 236 00:12:24,166 --> 00:12:28,766 is to compute the total reward that we accumulate at each round. 237 00:12:29,066 --> 00:12:32,066 And very simply we need to do another increment 238 00:12:32,066 --> 00:12:34,500 the same as the one we just did. 239 00:12:34,500 --> 00:12:37,200 So I'm copying total reward here, 240 00:12:37,200 --> 00:12:39,966 pasting it here and equal. 241 00:12:39,966 --> 00:12:43,300 And then paste it again and adding a plus. 242 00:12:43,900 --> 00:12:46,000 And then according to you what do we need to add. 243 00:12:46,000 --> 00:12:52,433 Of course we need to add the reward that we get at each round n and that's done. 244 00:12:52,700 --> 00:12:55,866 The UCB algorithm is implemented. 245 00:12:55,866 --> 00:12:57,333 Congratulate us. 246 00:12:57,333 --> 00:13:01,500 This is the first algorithm we implement from scratch in this course. 247 00:13:01,700 --> 00:13:02,933 That's very exciting. 248 00:13:02,933 --> 00:13:07,033 And as you notice we built and implemented this algorithm 249 00:13:07,200 --> 00:13:08,666 as if we would do it in real life. 250 00:13:08,666 --> 00:13:11,500 You know we didn't add the lines one by one. 251 00:13:11,500 --> 00:13:14,600 We implemented it as a developer would do it step 252 00:13:14,600 --> 00:13:17,633 by step, you know, with the same logical thinking process. 253 00:13:18,833 --> 00:13:20,100 So congratulations. 254 00:13:20,100 --> 00:13:24,300 And now I'm so excited to look at the results and find out by 255 00:13:24,300 --> 00:13:28,833 how much the UCB algorithm will beat the random selection algorithm. 256 00:13:29,133 --> 00:13:34,200 So as a reminder, the random selection algorithm gave us a reward of 1002 257 00:13:34,200 --> 00:13:34,833 hundred. 258 00:13:34,833 --> 00:13:39,166 Let's see how UCB beats that and let's hope we get a nice result. 259 00:13:39,266 --> 00:13:44,066 So I'm going to select everything from here to here. 260 00:13:44,566 --> 00:13:45,300 All right. 261 00:13:45,300 --> 00:13:48,100 And there we go. 262 00:13:48,100 --> 00:13:49,000 Here it is. 263 00:13:49,000 --> 00:13:51,500 Let's immediately have a look at the total reward. 264 00:13:51,500 --> 00:13:56,300 We can see that the total reward is 2178. 265 00:13:56,500 --> 00:14:02,100 We almost doubled the total reward earned by the random selection algorithm. 266 00:14:02,366 --> 00:14:03,200 So that's great. 267 00:14:03,200 --> 00:14:07,666 You know, if you are the casino and if the ads are not ads, but, 268 00:14:08,000 --> 00:14:11,500 you know, slot machines, that means that you would earn twice as much money. 269 00:14:11,733 --> 00:14:15,300 And that's not all, because this is just the total reward of the experiment. 270 00:14:15,566 --> 00:14:19,166 But what's very interesting to see now is the specific ad 271 00:14:19,300 --> 00:14:21,066 that has the highest conversion rate. 272 00:14:21,066 --> 00:14:24,300 You know, simply what is the best ad to show to the users. 273 00:14:24,666 --> 00:14:26,600 And how can we find out about this? 274 00:14:26,600 --> 00:14:32,466 Well, what we simply need to look at is this ad selected vector here. 275 00:14:32,700 --> 00:14:34,566 So let's have a look first. 276 00:14:34,566 --> 00:14:39,066 What we can see is that as expected, you know, as the expected result 277 00:14:39,066 --> 00:14:40,500 of our implementation here. 278 00:14:40,500 --> 00:14:45,266 Well, we can see that during the ten first rounds the ten ads are selected. 279 00:14:45,600 --> 00:14:50,500 You know, round one we select at one, round two at two, round three at three. 280 00:14:50,700 --> 00:14:52,633 Up to round ten at ten. 281 00:14:52,633 --> 00:14:56,100 So that's exactly the result of what we did here with this huge 282 00:14:56,100 --> 00:15:00,133 value trick to get these ten ads at the ten first rounds. 283 00:15:00,300 --> 00:15:04,500 And then the strategy starts, you know, since we get some informations 284 00:15:04,500 --> 00:15:08,066 based on the selections of these ten ads here during the ten first rounds, 285 00:15:08,333 --> 00:15:12,766 then we get the sums of rewards info and the numbers of selections info. 286 00:15:12,766 --> 00:15:14,600 And that's when the strategy can start. 287 00:15:14,600 --> 00:15:16,200 And that's exactly what happens here. 288 00:15:16,200 --> 00:15:20,900 From here the strategy is running and different selections appear. 289 00:15:21,400 --> 00:15:24,166 And what's really interesting to see now 290 00:15:24,166 --> 00:15:28,466 is to look at the last round that is, you know, the rounds close to 10,000. 291 00:15:28,833 --> 00:15:32,133 Because if the strategy works well, logically, 292 00:15:32,300 --> 00:15:36,533 this algorithm should select always the same ad during the last rounds 293 00:15:36,533 --> 00:15:39,533 because, you know, there is one ad that is the best, 294 00:15:39,600 --> 00:15:41,600 that has the highest conversion rate. 295 00:15:41,600 --> 00:15:44,466 You know, maybe it's the ad where the car is on the beautiful bridge. 296 00:15:44,466 --> 00:15:48,600 So there is this winner ad that we don't know, of course, 297 00:15:48,600 --> 00:15:53,166 but that's what this ad selected vector will tell us if we look at the last round. 298 00:15:53,166 --> 00:15:54,200 So let's do it. 299 00:15:54,200 --> 00:15:56,366 We will go down and down and down. 300 00:15:56,366 --> 00:15:57,233 Okay. Here we go. 301 00:15:57,233 --> 00:16:01,100 So as you can see as I'm going down, it appears that there are more and more 302 00:16:01,233 --> 00:16:03,266 fine with them at a different rounds. 303 00:16:03,266 --> 00:16:07,300 As you can see, there are more and more five that we're selecting at each round. 304 00:16:07,966 --> 00:16:13,133 And if I go down again and go down more and more, well, we get even more five. 305 00:16:13,466 --> 00:16:16,266 And that the last rounds, that is during the last thousand 306 00:16:16,266 --> 00:16:19,266 rounds, well, we select only five. 307 00:16:19,433 --> 00:16:25,900 As you can see right now we are at around 9800 and we can only observe five. 308 00:16:25,900 --> 00:16:29,666 So clearly the best ad that we should pick to show to the user. 309 00:16:29,666 --> 00:16:33,133 And that has the highest conversion rate is ad number five. 310 00:16:33,533 --> 00:16:35,266 So great. 311 00:16:35,266 --> 00:16:41,033 Not only we almost doubled the total reward with this 2178 total reward. 312 00:16:41,200 --> 00:16:45,166 But also we get to know what is the best ad to show to the users. 313 00:16:45,833 --> 00:16:47,400 And of course we're interested in both. 314 00:16:47,400 --> 00:16:49,400 We're interested in knowing what's the best 315 00:16:49,400 --> 00:16:53,000 ad, but also to optimize this total we want here because, you know, 316 00:16:53,233 --> 00:16:57,200 experimenting this placement of these ads on the social network costs money. 317 00:16:57,366 --> 00:17:00,666 And, you know, I said at the beginning, we have a limited budget, and that's 318 00:17:00,666 --> 00:17:04,333 generally the case for the departments of marketing or any business. 319 00:17:04,633 --> 00:17:05,833 We have a limited budget. 320 00:17:05,833 --> 00:17:10,433 So we need to optimize this total reward here to compensate the costs 321 00:17:10,600 --> 00:17:12,900 and hopefully already earn some money. 322 00:17:12,900 --> 00:17:15,500 So these two results are very important. 323 00:17:15,500 --> 00:17:18,833 And so thank you very much through this UCB algorithm. 324 00:17:19,200 --> 00:17:23,700 And now as usual as promised we will finish this UCB algorithm section 325 00:17:23,700 --> 00:17:27,533 with the last step, the exciting step about visualizing the results. 326 00:17:27,800 --> 00:17:30,766 And simply what we'll do is plot a histogram 327 00:17:30,766 --> 00:17:33,766 showing for each ad the number of times it was selected. 328 00:17:34,033 --> 00:17:35,900 So we'll do that in the next tutorial. 329 00:17:35,900 --> 00:17:37,633 And until then, enjoy machine learning.