1 00:00:00,300 --> 00:00:03,033 Hello and welcome back to the course on Machine Learning. 2 00:00:03,033 --> 00:00:05,900 Today we're talking about the upper confidence bound. 3 00:00:05,900 --> 00:00:09,433 And the intuition behind this algorithm of the. 4 00:00:09,433 --> 00:00:12,266 Reinforcement branch of machine learning. 5 00:00:12,266 --> 00:00:13,700 So let's get started. 6 00:00:13,700 --> 00:00:16,400 As we discussed previously, the problem we are solving. 7 00:00:16,400 --> 00:00:19,766 Is a multi-armed bandit problem where you've got 8 00:00:20,166 --> 00:00:23,633 five or more or any number of slot machines, and 9 00:00:23,833 --> 00:00:27,833 you can bet your money in any one of them, and you need to find out. 10 00:00:28,200 --> 00:00:31,266 How to bet to maximize. 11 00:00:31,266 --> 00:00:33,133 Your returns. 12 00:00:33,133 --> 00:00:36,466 And basically, we agreed that behind every machine 13 00:00:36,466 --> 00:00:39,166 there is a certain distribution. 14 00:00:39,166 --> 00:00:42,733 And that's because you don't know which of these distributions is optimal. 15 00:00:42,966 --> 00:00:43,366 You need. 16 00:00:43,366 --> 00:00:47,400 To combine exploration of these machines with their. 17 00:00:47,400 --> 00:00:51,000 Exploitation in order to find out which. 18 00:00:51,000 --> 00:00:54,900 One of these machines is the best, and then you can start exploiting that one. 19 00:00:55,566 --> 00:00:59,766 adds the more implication of this problem is, of course, advertising. 20 00:00:59,766 --> 00:01:03,233 So if you have 5 or 10 or 50 or 500 different ads, 21 00:01:03,600 --> 00:01:07,166 how do you find out which one is the best one? 22 00:01:07,400 --> 00:01:12,100 Of course, you can run just an AB test and then use the results of that. 23 00:01:12,100 --> 00:01:14,400 But that means you're doing the exploration 24 00:01:14,400 --> 00:01:16,133 and then you're doing the exploitation separately. 25 00:01:16,133 --> 00:01:17,466 You're going to incur lots of costs. 26 00:01:17,466 --> 00:01:20,333 You're going to incur, you're going to waste a lot of time. 27 00:01:20,333 --> 00:01:23,466 we want to combine exploration, exploitation and get. To. 28 00:01:23,466 --> 00:01:29,733 The optimal result as soon as we can and maximize the output of our efforts. 29 00:01:30,166 --> 00:01:30,533 All right. 30 00:01:30,533 --> 00:01:35,666 So, this is a quick summary of the multi-armed bandit problem. 31 00:01:35,900 --> 00:01:38,100 So let's go through this very quickly so we can get to. 32 00:01:38,100 --> 00:01:39,066 The fun stuff. 33 00:01:39,066 --> 00:01:40,466 So we have the arms. 34 00:01:40,466 --> 00:01:45,333 For example, arms are ads that we display, each time a user comes to a web page, 35 00:01:45,600 --> 00:01:48,366 each time an ad is displayed or a user visits this page, 36 00:01:48,366 --> 00:01:51,500 that's around, for each round. And. 37 00:01:52,200 --> 00:01:53,800 we choose. Which ads to display. 38 00:01:53,800 --> 00:01:57,966 So you can only display one ads, like with, one, armed bandits. 39 00:01:57,966 --> 00:01:59,700 You can only pull one of those arms. 40 00:01:59,700 --> 00:02:02,666 You can only choose one machine to bet on, at each round. 41 00:02:02,666 --> 00:02:06,466 And, ads AI gives a reward whether it's a 0 or 1. 42 00:02:06,800 --> 00:02:12,400 and basically I of n t of n is equal to one 43 00:02:12,400 --> 00:02:15,666 if the user clicks on the ad and zero if you didn't. 44 00:02:15,866 --> 00:02:18,266 These are didn't. And our goal is to Maxwell total reward. 45 00:02:18,266 --> 00:02:19,633 We get over the many rounds. 46 00:02:19,633 --> 00:02:22,366 So that's basically what we're doing. 47 00:02:22,366 --> 00:02:26,400 And this is how the upper, confidence bound algorithm works. 48 00:02:26,700 --> 00:02:27,333 And I. 49 00:02:27,333 --> 00:02:29,933 Won't go into too much detail on this because. 50 00:02:29,933 --> 00:02:32,966 Actually, how the land is going to, run you through this and. 51 00:02:33,200 --> 00:02:34,433 You're going to code. 52 00:02:34,433 --> 00:02:39,200 This, from scratch, in R, and you can code this also in Python 53 00:02:39,600 --> 00:02:41,100 in the fully lectures, of course. 54 00:02:41,100 --> 00:02:42,533 So we're not going to waste. 55 00:02:42,533 --> 00:02:44,100 spend time on this. We're actually going to get. 56 00:02:44,100 --> 00:02:46,033 To the essence of the algorithm. Right. 57 00:02:46,033 --> 00:02:50,833 So, let's get to the intuition part, which is, how does it work? 58 00:02:50,833 --> 00:02:54,100 What what's actually happening in the background when this algorithm is running? 59 00:02:54,600 --> 00:02:56,000 All right, so let's have a look. 60 00:02:56,000 --> 00:02:59,633 these are our slot machines, or one armed bandits. 61 00:02:59,900 --> 00:03:01,500 And, they each 62 00:03:01,500 --> 00:03:05,400 one of them has a distribution behind, it we want to find the best one. 63 00:03:05,400 --> 00:03:06,200 Right. Look at them. 64 00:03:06,200 --> 00:03:08,433 We can't tell which one it. Is, but let's see. 65 00:03:08,433 --> 00:03:09,866 We do know. Let's see. 66 00:03:09,866 --> 00:03:11,533 We know the end result. 67 00:03:11,533 --> 00:03:14,333 just for argument's sake, what would it look like? 68 00:03:14,333 --> 00:03:15,833 Well, this is. 69 00:03:15,833 --> 00:03:17,800 For instance, in this case, the. Distribution. 70 00:03:17,800 --> 00:03:20,100 These are the distributions. Behind those machines. 71 00:03:20,100 --> 00:03:22,933 You've got, you know, the machine. 72 00:03:22,933 --> 00:03:25,933 This is how they're spitting out the results with these distributions. 73 00:03:25,933 --> 00:03:27,700 And just by looking at this you can 74 00:03:27,700 --> 00:03:30,700 you can tell right away which one is the best machine. 75 00:03:30,733 --> 00:03:33,100 Which one would you bet your money on? 76 00:03:33,100 --> 00:03:36,300 constantly if you were playing around with this one. 77 00:03:36,300 --> 00:03:36,533 Right. 78 00:03:36,533 --> 00:03:41,166 So right away here you can see that this one has the best, return. 79 00:03:41,166 --> 00:03:43,033 And you would want to just, 80 00:03:43,033 --> 00:03:46,700 well, all the time, just bet on this one, and your outcome would be the best. 81 00:03:47,433 --> 00:03:48,633 But we don't know that, right? 82 00:03:48,633 --> 00:03:49,933 We don't know that. 83 00:03:49,933 --> 00:03:53,566 And we want to find that out in the process of playing 84 00:03:53,566 --> 00:03:57,266 these machines of or using those ads that we're running, 85 00:03:57,533 --> 00:04:00,533 and find out, you know, which one is getting the most clicks? 86 00:04:00,833 --> 00:04:04,166 we don't want to we don't have the time, and money 87 00:04:04,333 --> 00:04:09,033 to do that exploration before or the actual campaign is running. 88 00:04:09,033 --> 00:04:10,266 We want to do that in the process. 89 00:04:10,266 --> 00:04:13,700 We want to maximize our return already from the very start. 90 00:04:13,833 --> 00:04:14,900 So how do we do that? 91 00:04:14,900 --> 00:04:19,200 Well, let's transfer these distributions or the actual expected return 92 00:04:19,433 --> 00:04:21,533 from these distributions onto. A vertical axis. 93 00:04:21,533 --> 00:04:23,800 So we're going to take these values 94 00:04:23,800 --> 00:04:26,800 and we're going to put them onto a vertical axis over here. 95 00:04:26,800 --> 00:04:28,233 So there's our vertical axis. 96 00:04:28,233 --> 00:04:31,100 So for distribution one let's say that what value is there for distribution. 97 00:04:31,100 --> 00:04:33,133 Two there was a value we could. Remember. 98 00:04:33,133 --> 00:04:35,700 It was lower distribution three even lower to sure. 99 00:04:35,700 --> 00:04:39,066 For higher and have five the best right. 100 00:04:39,066 --> 00:04:42,300 So those are the expected 101 00:04:42,300 --> 00:04:45,800 values or returns for each of those distribution. 102 00:04:45,800 --> 00:04:48,066 For each of those machine. That's why our y axis. 103 00:04:48,066 --> 00:04:49,600 But again we don't know that. 104 00:04:49,600 --> 00:04:51,900 So, what how does this algorithm work. 105 00:04:51,900 --> 00:04:53,266 Well, it assumes. 106 00:04:53,266 --> 00:04:55,366 Some starting point. For every distribution. 107 00:04:55,366 --> 00:04:56,300 It just assumes that. 108 00:04:56,300 --> 00:04:59,266 There is a certain starting, value that. 109 00:04:59,266 --> 00:05:02,633 Okay, let's just assume that because we can't distinguish, we can't. 110 00:05:02,733 --> 00:05:04,966 Discriminate. Against these machines in any way. 111 00:05:04,966 --> 00:05:06,000 They all look the same. 112 00:05:06,000 --> 00:05:08,633 Let's assume that they all have the same return. 113 00:05:08,633 --> 00:05:10,366 And let's put it on that level. 114 00:05:10,366 --> 00:05:14,100 Now then what the algorithm does is, 115 00:05:14,100 --> 00:05:17,466 those formulas that, are behind the algorithm, they. 116 00:05:17,966 --> 00:05:19,833 create a. 117 00:05:19,833 --> 00:05:22,166 confidence band. And it's, 118 00:05:23,233 --> 00:05:24,066 it is 119 00:05:24,066 --> 00:05:27,466 designed in such a way that with a very high level. 120 00:05:27,466 --> 00:05:28,466 Of certainty, that. 121 00:05:28,466 --> 00:05:31,866 Confidence band will include the actual, 122 00:05:32,333 --> 00:05:37,633 will include the actual return or the actual expected return. 123 00:05:37,633 --> 00:05:40,633 So basically, the first couple of. 124 00:05:40,866 --> 00:05:43,300 rounds are going to be trial runs. 125 00:05:43,300 --> 00:05:45,600 So we're going to intentionally just. Try out. 126 00:05:45,600 --> 00:05:51,166 The machines at least one time each in order for us to be able to place this. 127 00:05:51,166 --> 00:05:54,066 Value here and come up with a confidence band. 128 00:05:54,066 --> 00:05:55,500 Who's going to be very. Large. 129 00:05:55,500 --> 00:05:58,800 So at the very start, it's very large, but it is designed specifically in a way 130 00:05:58,800 --> 00:06:04,100 that, the expected value, which is this one over here 131 00:06:04,866 --> 00:06:09,100 will have very high level of confidence, falls inside this confidence 132 00:06:09,466 --> 00:06:10,200 with a little high, 133 00:06:10,200 --> 00:06:13,133 with a very high degree of certainty, falls inside this confidence bound, 134 00:06:13,133 --> 00:06:16,133 which is built around this, 135 00:06:16,133 --> 00:06:19,133 red empirical value which we have derived. 136 00:06:19,300 --> 00:06:21,000 And the very solid, it's all the same. Right. 137 00:06:21,000 --> 00:06:22,800 So and then how does this algorithm work? 138 00:06:22,800 --> 00:06:25,066 Well, out of all of them, we pick the, 139 00:06:25,066 --> 00:06:27,766 the machine with the highest confidence bound. 140 00:06:27,766 --> 00:06:29,900 Right now, it can be any of these machines, right? 141 00:06:29,900 --> 00:06:32,400 They all have the same confidence bound that we're talking about. 142 00:06:32,400 --> 00:06:34,133 The upper confidence bound. 143 00:06:34,133 --> 00:06:37,133 That's why the algorithm is called upper confidence bound. 144 00:06:37,500 --> 00:06:39,333 and so we just going to pick any one of them 145 00:06:39,333 --> 00:06:40,800 because it doesn't matter which one we pick. 146 00:06:40,800 --> 00:06:43,400 Again we don't. Know these blue these color. Lines. 147 00:06:43,400 --> 00:06:44,933 We don't know about them all. We see. 148 00:06:46,333 --> 00:06:47,900 as as the 149 00:06:47,900 --> 00:06:51,300 agent or as the person analyzing this. 150 00:06:51,300 --> 00:06:53,700 We only see these boxes and all. They're all identical to us. 151 00:06:53,700 --> 00:06:55,900 So we just pick any one of them. Let's say we pick this one. 152 00:06:55,900 --> 00:06:56,600 So what happens. 153 00:06:56,600 --> 00:06:57,466 Next is we actually. 154 00:06:57,466 --> 00:07:00,866 Pull that lever of that machine and something happens, 155 00:07:00,866 --> 00:07:02,400 or we place that ad, right. 156 00:07:02,400 --> 00:07:05,700 So we display that ad next and we want to see 157 00:07:05,733 --> 00:07:08,200 did the person click on it or did the person not click on it. 158 00:07:08,200 --> 00:07:12,800 And in this case, the person didn't click on it. 159 00:07:12,800 --> 00:07:13,000 Right. 160 00:07:13,000 --> 00:07:18,466 So it went this red value goes down because it's 161 00:07:18,600 --> 00:07:21,700 when now we have another observation just for this machine that is added 162 00:07:21,700 --> 00:07:25,566 to the whole, sample of observations for this machine. 163 00:07:25,566 --> 00:07:28,533 And the value goes down because, well. 164 00:07:28,533 --> 00:07:29,833 All always this red. 165 00:07:29,833 --> 00:07:33,733 Value is like the observed average. 166 00:07:33,733 --> 00:07:35,900 The observed average. Is going to according. 167 00:07:35,900 --> 00:07:37,266 To the law of large numbers, is always. 168 00:07:37,266 --> 00:07:39,333 Going to, In the long run is. 169 00:07:39,333 --> 00:07:41,100 Going to converge to the. 170 00:07:41,100 --> 00:07:46,800 Expected, expected return or expected average 171 00:07:46,800 --> 00:07:50,366 or expected value for this distribution. 172 00:07:51,133 --> 00:07:53,600 So, therefore 173 00:07:53,600 --> 00:07:56,266 it is very likely that this value is going to go down. 174 00:07:56,266 --> 00:07:57,833 And now because we. 175 00:07:57,833 --> 00:08:01,900 Have an extra observation, the second thing happens is the confidence 176 00:08:01,900 --> 00:08:03,866 bounds confidence interval. 177 00:08:03,866 --> 00:08:06,766 You see that confidence interval becomes smaller. 178 00:08:06,766 --> 00:08:08,100 Simply because we have. 179 00:08:08,100 --> 00:08:10,566 An additional duration of course doesn't become that much smaller. 180 00:08:10,566 --> 00:08:14,100 But this is to just to illustrate a point, because 181 00:08:14,100 --> 00:08:17,533 we have an additional observation, we are more confident in our predictions. 182 00:08:17,533 --> 00:08:19,633 We are more confident in everything that's going on. 183 00:08:19,633 --> 00:08:23,533 So the confidence interval interval, slowly starts to shrink. 184 00:08:23,933 --> 00:08:24,200 All right. 185 00:08:24,200 --> 00:08:27,833 So the next step is now we find the next one with the highest confidence bound. 186 00:08:28,033 --> 00:08:29,066 So this is not this one. 187 00:08:29,066 --> 00:08:31,333 It's one of these for just picking a random one. 188 00:08:31,333 --> 00:08:34,433 There we go. This one do the same thing. 189 00:08:34,433 --> 00:08:40,433 So again the, ad is displayed, a person either clicks or doesn't click. 190 00:08:40,566 --> 00:08:44,466 And that affects the average that we've measured so far, the empirical average or, 191 00:08:44,733 --> 00:08:47,466 if you've pulled the lever, you've got a certain, 192 00:08:47,466 --> 00:08:49,000 you know, you either one or you lost. 193 00:08:49,000 --> 00:08:52,100 And that affects your, empirical average, this red line. 194 00:08:52,366 --> 00:08:57,700 And as expected, it's, slowly starts to converge over like, lots of iterations. 195 00:08:57,700 --> 00:09:01,400 It will start to converge to the to the, expected value. 196 00:09:02,100 --> 00:09:03,566 so it comes closer and right. 197 00:09:03,566 --> 00:09:06,066 Away, you can see now this machine is all of. 198 00:09:06,066 --> 00:09:08,366 A sudden above all of the other machines. Right. 199 00:09:08,366 --> 00:09:10,633 So if this was the end of this. 200 00:09:10,633 --> 00:09:11,900 Iteration, that's it. 201 00:09:11,900 --> 00:09:14,533 We, we would assume from here that this is the best machine. 202 00:09:14,533 --> 00:09:15,666 And we'd start exploiting it. 203 00:09:15,666 --> 00:09:18,666 And, therefore, this algorithm would be completely useless. 204 00:09:18,800 --> 00:09:21,700 But the we we shouldn't forget about the second thing that happens. 205 00:09:21,700 --> 00:09:24,366 The second thing that happens is that. 206 00:09:24,366 --> 00:09:25,033 Because we. 207 00:09:25,033 --> 00:09:29,066 Got an additional, observation in our sample now, 208 00:09:29,833 --> 00:09:32,100 we are more confident in this interval. 209 00:09:32,100 --> 00:09:35,433 And these confidence bounds, they're designed, they're they're. 210 00:09:35,433 --> 00:09:38,300 Only purpose is to include. 211 00:09:38,300 --> 00:09:42,100 The, actual expected value, wherever it is. 212 00:09:42,100 --> 00:09:47,133 We don't know where it is, but they are, they are telling us that this value, 213 00:09:47,133 --> 00:09:48,733 this green value is somewhere inside this box. 214 00:09:48,733 --> 00:09:50,633 But because we've got an additional observation, we're. 215 00:09:50,633 --> 00:09:52,966 More confident our sample size is larger. 216 00:09:52,966 --> 00:09:56,833 So we're more confident in the overall picture, for this machine. 217 00:09:56,833 --> 00:09:59,433 So the confidence bounds decrease. 218 00:09:59,433 --> 00:10:02,166 And now as you can see it's no longer the top machine 219 00:10:02,166 --> 00:10:05,066 because even though it went up the confidence bounds went down. 220 00:10:05,066 --> 00:10:08,166 So now we're going to look for the next highest confidence bound. 221 00:10:08,400 --> 00:10:10,566 It can be any one of these three machines. 222 00:10:10,566 --> 00:10:14,466 And just look at any any one randomly for now this one. 223 00:10:14,766 --> 00:10:19,666 And here even though the red line is above the blue line. So, 224 00:10:19,933 --> 00:10:23,633 according to the law of large numbers, you'd expect this to converge to that. 225 00:10:23,733 --> 00:10:27,800 But sometimes it can randomly occur that, it can go the other way. 226 00:10:27,800 --> 00:10:29,533 Right? Things can happen like this. 227 00:10:29,533 --> 00:10:32,533 It's it's all probabilities. so. Basically. 228 00:10:32,733 --> 00:10:34,166 It might even go up. 229 00:10:34,166 --> 00:10:34,800 So there we go. 230 00:10:34,800 --> 00:10:37,800 It went up even though the blue line was below the red line. 231 00:10:38,200 --> 00:10:41,233 it can happen as a, you know, like as a, 232 00:10:41,700 --> 00:10:44,500 just as per chance. 233 00:10:44,500 --> 00:10:44,700 Right. 234 00:10:44,700 --> 00:10:48,133 In the long run, it will converge, but on a random occasion it can go up. 235 00:10:48,300 --> 00:10:51,666 It can go in anyway, and again, we got another, 236 00:10:51,833 --> 00:10:53,800 another element in the sample. 237 00:10:53,800 --> 00:10:57,300 So the, confidence bounds converges. Okay. 238 00:10:57,300 --> 00:11:00,033 So can we kind of get the picture of what's going on here. 239 00:11:00,033 --> 00:11:03,266 So now we're going to pick the next one with the highest upper bound. 240 00:11:03,633 --> 00:11:05,233 let's say this one. 241 00:11:05,233 --> 00:11:07,233 then we do the trial. 242 00:11:07,233 --> 00:11:08,700 We do the, the rounds. 243 00:11:08,700 --> 00:11:12,166 What happens is the person click on add, do we win money from the slot machine? 244 00:11:13,166 --> 00:11:14,600 and it goes down? 245 00:11:14,600 --> 00:11:16,033 Probably not. 246 00:11:16,033 --> 00:11:18,100 we didn't didn't click on the add. 247 00:11:18,100 --> 00:11:20,100 Didn't win from the money from the slot machine. 248 00:11:20,100 --> 00:11:24,066 So the average of our observation goes down, comes closer to the, 249 00:11:24,666 --> 00:11:26,166 expected value. 250 00:11:26,166 --> 00:11:28,866 And the confidence bounds also decrease. Okay. 251 00:11:28,866 --> 00:11:31,800 Now we kind of when our in business, we can 252 00:11:31,800 --> 00:11:35,233 all of them are kind of starting to play, next one is this one okay. 253 00:11:35,233 --> 00:11:36,666 This is the now. 254 00:11:36,666 --> 00:11:38,333 Because we know. The end result. 255 00:11:38,333 --> 00:11:39,766 We know that this is the best one. Right. 256 00:11:39,766 --> 00:11:43,166 We know that this is the best ad or this is the best slot machine. 257 00:11:43,166 --> 00:11:44,700 We should be using. But like. 258 00:11:44,700 --> 00:11:45,600 Because just we. 259 00:11:45,600 --> 00:11:50,033 We were kind of like, given this insight, just for argument's sake 260 00:11:50,033 --> 00:11:52,966 or for the purpose of this exercise, but the person that's, 261 00:11:52,966 --> 00:11:55,966 reforming this algorithm or the algorithm itself doesn't know that. 262 00:11:55,966 --> 00:12:00,766 So unknowingly, it's actually starting to exploit the best, option right now. 263 00:12:01,100 --> 00:12:04,166 so again, okay, it goes up good. 264 00:12:04,366 --> 00:12:07,833 kind of this band goes down and as you can see, it's still the best one. 265 00:12:07,833 --> 00:12:09,933 All right. So now we're going to do it again. 266 00:12:09,933 --> 00:12:11,433 we're going to use this one again. 267 00:12:11,433 --> 00:12:15,300 And it comes closer and but the confidence bound goes down again. 268 00:12:15,366 --> 00:12:18,233 This is all just, for illustration purposes. 269 00:12:18,233 --> 00:12:21,833 Of course, it's not going to go down by that much just because one observation. 270 00:12:21,833 --> 00:12:24,566 But we don't want to be sitting here through a thousand iterations. 271 00:12:24,566 --> 00:12:26,700 This is just to demonstrate the overall picture. 272 00:12:26,700 --> 00:12:29,933 So even though we exploited the best option 273 00:12:29,933 --> 00:12:33,266 by exploiting the best option, we're decreasing the confidence bound. 274 00:12:33,266 --> 00:12:35,300 Which gives an opportunity or breaks learning. 275 00:12:35,300 --> 00:12:36,300 Any option. 276 00:12:36,300 --> 00:12:40,100 If it goes, if it keeps going up, kind of keeps being good. 277 00:12:40,433 --> 00:12:41,700 because we're building. 278 00:12:41,700 --> 00:12:48,133 Up the sample size, this gives an option to the other, it gives opportunity 279 00:12:48,133 --> 00:12:52,800 to the other options too, or machines or adds to have a chance in the play. 280 00:12:52,833 --> 00:12:55,900 So that we are not just, you know, we're not biased towards 281 00:12:55,900 --> 00:12:59,700 which one we think is the best or optimal outcome, optimal machine. 282 00:12:59,866 --> 00:13:02,800 So now that we move on to this one same thing 283 00:13:02,800 --> 00:13:06,900 and comes closer to decrease, we want to this one, 284 00:13:07,133 --> 00:13:12,366 but decrease and we move on to this one and the decrease. 285 00:13:12,366 --> 00:13:15,600 And then again this one bounds decrease. 286 00:13:15,600 --> 00:13:18,933 And again this one might jump closer bounds decrease. 287 00:13:19,200 --> 00:13:21,533 And even though we were very close to, you. 288 00:13:21,533 --> 00:13:23,900 Know, finding the solution that as that one bounds the. 289 00:13:23,900 --> 00:13:26,100 Bounds decrease so much. And you'll actually see this. 290 00:13:26,100 --> 00:13:27,533 In the practical application. 291 00:13:27,533 --> 00:13:31,366 The practical areas that are following that sometimes 292 00:13:31,766 --> 00:13:36,300 we will after using the optimal option for some time, we'll switch. 293 00:13:36,566 --> 00:13:37,600 The algorithm will still switch 294 00:13:37,600 --> 00:13:40,866 to a suboptimal option just because the bounds are decreasing all the time. 295 00:13:41,366 --> 00:13:43,533 And then we'll use this one. Bounds will decrease. 296 00:13:43,533 --> 00:13:46,166 And now we're back to the best one on the crease. 297 00:13:46,166 --> 00:13:49,900 And then we're just going to be exploiting this one and exploiting this one 298 00:13:50,133 --> 00:13:53,133 and exploiting this one because we found out that it's the best one. 299 00:13:53,400 --> 00:13:56,400 So that is in essence 300 00:13:56,400 --> 00:14:01,500 the whole concept behind this upper confidence bound algorithm. 301 00:14:01,500 --> 00:14:06,033 And that's how it solves the, multi-armed bandit problem. 302 00:14:06,600 --> 00:14:10,066 it's a it's a very interesting solution, much more sophisticated 303 00:14:10,100 --> 00:14:13,833 than just selecting randomly or running an AB test and then, 304 00:14:14,133 --> 00:14:17,400 selecting the option, you know, that that one. 305 00:14:18,133 --> 00:14:21,666 So, you know, if you're in advertising or if you've got, campaigns 306 00:14:21,666 --> 00:14:25,700 or if you come across problems that are similar to this, always 307 00:14:25,700 --> 00:14:27,600 just remember about the upper confidence bound algorithm. 308 00:14:27,600 --> 00:14:30,900 And you can apply this in your work as well. 309 00:14:30,900 --> 00:14:32,466 Very powerful algorithm. 310 00:14:32,466 --> 00:14:33,766 And on that note. 311 00:14:33,766 --> 00:14:35,600 I hope you enjoyed today's tutorial. 312 00:14:35,600 --> 00:14:39,566 In the next couple of videos, Helena will take you through the. 313 00:14:39,566 --> 00:14:43,200 Programing of this algorithm both in R and in Python, and you'll. 314 00:14:43,200 --> 00:14:45,200 Get your takeaway templates. 315 00:14:45,200 --> 00:14:46,600 And I can't wait to see you. 316 00:14:46,600 --> 00:14:50,800 Next time when we'll be talking about the Thompson sampling algorithm. 317 00:14:50,800 --> 00:14:52,866 And until then, enjoy machine learning.