1 00:00:01,600 --> 00:00:04,800 Hello and welcome back to the course on Machine Learning. 2 00:00:05,333 --> 00:00:08,533 Today we're talking about the multi-armed bandit problem. 3 00:00:08,666 --> 00:00:10,633 Don't you just love these names when they come up? 4 00:00:10,633 --> 00:00:15,000 Such cool names for machine learning algorithms and problems. 5 00:00:15,733 --> 00:00:19,033 Well, today we're indeed talking about this problem. 6 00:00:19,033 --> 00:00:22,633 And it is the example 7 00:00:22,633 --> 00:00:25,700 that we're going to be using in this whole section on reinforcement learning. 8 00:00:25,700 --> 00:00:29,100 We're going to be looking at different ways that we can solve 9 00:00:29,100 --> 00:00:32,100 the multi-armed bandit problem and comparing the results. 10 00:00:32,266 --> 00:00:34,900 But before we continue, I just wanted to mention that 11 00:00:34,900 --> 00:00:38,766 the multi-armed bandit problem is not the only problem 12 00:00:38,766 --> 00:00:41,100 that can be solved with reinforcement learning. 13 00:00:41,100 --> 00:00:44,066 Reinforcement learning is actually really, really cool. 14 00:00:44,066 --> 00:00:49,133 Reinforcement learning, for instance, is used to train robot dogs to walk. 15 00:00:49,133 --> 00:00:50,400 And I'll give you a quick example. 16 00:00:50,400 --> 00:00:55,500 For instance, you can once you've created a robot dog, you can implement 17 00:00:55,500 --> 00:00:59,200 an algorithm inside the robot dog, which will tell it how to walk. 18 00:00:59,200 --> 00:01:00,033 You can tell it, 19 00:01:00,033 --> 00:01:03,266 all right, move your front right foot and then move your left back foot 20 00:01:03,266 --> 00:01:05,933 and then front left foot, right back foot and so on. 21 00:01:05,933 --> 00:01:07,700 You can actually give the sequence of actions 22 00:01:07,700 --> 00:01:11,100 that it needs to take in order to accomplish a task which is walking. 23 00:01:11,600 --> 00:01:14,866 Or you can implement a reinforcement learning algorithm 24 00:01:14,866 --> 00:01:20,366 which will train the dog to walk, in a very, very interesting way. 25 00:01:20,366 --> 00:01:20,900 So basically 26 00:01:20,900 --> 00:01:24,933 what it will do is will say, hey dog, here all the actions you can take, 27 00:01:25,800 --> 00:01:28,966 you can, move your legs like this, you can move your legs like that. 28 00:01:28,966 --> 00:01:32,700 And, your goal is to make a step forward. 29 00:01:32,700 --> 00:01:36,066 Every time you make a step forward, you are, given a reward. 30 00:01:36,066 --> 00:01:39,566 Every time you fall over, you're, given a punishment. 31 00:01:39,566 --> 00:01:42,066 And a reward is basically a one in the algorithm. 32 00:01:42,066 --> 00:01:46,800 You don't actually have to give it a carrot or, you know, something, to eat. 33 00:01:46,800 --> 00:01:48,166 You just give it a one. 34 00:01:48,166 --> 00:01:50,733 An algorithm and a punishment is a zero. 35 00:01:50,733 --> 00:01:54,000 And basically every time it takes a step forward, it knows it's got a reward. 36 00:01:54,000 --> 00:01:56,166 And it will, yes, that's good for it. 37 00:01:56,166 --> 00:01:59,233 So it basically will try all these random sets of actions 38 00:01:59,966 --> 00:02:02,133 and see what they lead to. 39 00:02:02,133 --> 00:02:05,666 Every time it takes a step forward, you'll remember that those were good actions, 40 00:02:05,666 --> 00:02:07,533 and you'll try to repeat them more and more. 41 00:02:07,533 --> 00:02:10,866 And actually, dogs like that can learn to walk. 42 00:02:11,200 --> 00:02:15,100 so you don't have to program an actual walking algorithm into it. 43 00:02:15,100 --> 00:02:18,300 It'll figure out the steps it needs to take on its own. 44 00:02:18,566 --> 00:02:21,566 The I think that's really mind blowing and really cool. 45 00:02:21,800 --> 00:02:26,866 but unfortunately, that is a more, a topic, more of 46 00:02:27,066 --> 00:02:30,666 on the side of artificial intelligence rather than just machine learning. 47 00:02:30,966 --> 00:02:34,900 And that is, you know, that can be a whole course on its own. 48 00:02:34,900 --> 00:02:39,600 We're not going to delve into, training robot dogs to walk inside this section. 49 00:02:39,600 --> 00:02:40,733 Inside this section, 50 00:02:40,733 --> 00:02:44,033 we are going to talk about the multi-armed bandit problem, 51 00:02:44,266 --> 00:02:48,700 which is a bit of a different application of, this machine 52 00:02:48,800 --> 00:02:51,800 learning branch of reinforcement learning. 53 00:02:51,933 --> 00:02:53,966 And plus, of course, there's other, 54 00:02:53,966 --> 00:02:56,733 lots of other applications of reinforcement learning as well. 55 00:02:56,733 --> 00:03:00,766 So moving on to our multi-armed bandit, problem. 56 00:03:00,766 --> 00:03:05,400 So first of all, what on earth is a multi-armed bandit? 57 00:03:05,400 --> 00:03:05,666 Right. 58 00:03:05,666 --> 00:03:09,200 So the first thing that comes to mind, a is like a robber going into a bank 59 00:03:09,200 --> 00:03:13,800 and so on, and, or somebody with a gun, but actually a, 60 00:03:14,133 --> 00:03:17,833 a bandit or a one armed bandits. 61 00:03:17,833 --> 00:03:19,800 So let's simplify things. 62 00:03:19,800 --> 00:03:24,000 A one armed bandit is a slot machine, right? 63 00:03:24,000 --> 00:03:24,966 It's one of these. 64 00:03:24,966 --> 00:03:27,966 And, why is it called a worm on one armed bandit? 65 00:03:27,966 --> 00:03:30,000 Well, it's got a bit of a history there. 66 00:03:30,000 --> 00:03:33,766 back in the day, they used to have this, handle 67 00:03:33,766 --> 00:03:35,300 on the right, and you can still see that in movies. 68 00:03:35,300 --> 00:03:39,233 And maybe some places you can still find these slot machines 69 00:03:39,233 --> 00:03:40,900 where you actually have to pull the handle, 70 00:03:40,900 --> 00:03:43,900 because now they're all electronic, and you just press a button 71 00:03:43,900 --> 00:03:46,600 right there, push with your push slot machines. 72 00:03:46,600 --> 00:03:49,966 Whereas in the back in the day you had to pull the lever, 73 00:03:50,366 --> 00:03:54,333 to make it work to like initiate, 74 00:03:54,333 --> 00:03:59,733 the, the game and so hence the arm. 75 00:03:59,733 --> 00:04:00,933 Yeah. But why is it called the bandit? 76 00:04:00,933 --> 00:04:06,400 Well, because, these machines, they would actually, 77 00:04:07,033 --> 00:04:07,700 you know, these 78 00:04:07,700 --> 00:04:12,000 this is the one of the quickest way to lose your money in, in a casino, 79 00:04:12,666 --> 00:04:16,533 they would take, I think it was like, 80 00:04:16,566 --> 00:04:20,633 a 50% chance that they would take away your money back in the day. 81 00:04:20,633 --> 00:04:21,766 So they would. 82 00:04:21,766 --> 00:04:24,766 Of course, you would earn less than your, 83 00:04:24,866 --> 00:04:27,433 you're actually winning. 84 00:04:27,433 --> 00:04:32,233 And it was about a, you know, a 5050 chance whether or not you actually make a, 85 00:04:32,633 --> 00:04:37,666 or you get a win or you, you lose money, but then they put a bug into them. 86 00:04:37,700 --> 00:04:39,133 I think I read up a little bit online. 87 00:04:39,133 --> 00:04:42,266 They put a bug into them that people who were playing 88 00:04:42,266 --> 00:04:46,066 them were losing even faster than, or even more frequently and 50%. 89 00:04:46,066 --> 00:04:50,033 So hence the name bandit, because it was basically robbing you of your money. 90 00:04:50,333 --> 00:04:54,033 And, you know, one of the quickest way to ways to lose your money, 91 00:04:54,766 --> 00:04:55,600 hence the multiple. 92 00:04:55,600 --> 00:04:57,966 Oh, that's why it's called the One arm bandit. 93 00:04:57,966 --> 00:05:00,466 and what is the multi-armed bandit? 94 00:05:00,466 --> 00:05:05,333 Well, the market multi-armed bandit problem is kind of the challenge 95 00:05:05,333 --> 00:05:11,700 that a person is faced when he comes up to a whole set of these machines. 96 00:05:11,800 --> 00:05:14,800 When he doesn't have just one, he has, like, 5 or 10, 97 00:05:14,966 --> 00:05:16,200 you know, programing examples. 98 00:05:16,200 --> 00:05:20,233 We'll have, example of ten, but we won't be talking specifically 99 00:05:20,233 --> 00:05:21,300 about these machines. 100 00:05:21,300 --> 00:05:24,166 Of course. This is this is the historic problem. 101 00:05:24,166 --> 00:05:30,200 you'll just now we'll see that there are many, many other applications, that, 102 00:05:31,233 --> 00:05:34,233 even though it's called the multi-armed bandit problem, it's actually, 103 00:05:34,233 --> 00:05:37,233 used to solve many other problems as well. 104 00:05:37,933 --> 00:05:40,666 So, basically here you're faced with a challenge. 105 00:05:40,666 --> 00:05:42,466 You've got five of these machines, right? 106 00:05:42,466 --> 00:05:46,733 And, how do you actually play them to maximize your return? 107 00:05:47,066 --> 00:05:52,200 from the the number of games that you can, actually play. 108 00:05:52,200 --> 00:05:54,233 So you've, you know, you decided you're going to play, 109 00:05:54,233 --> 00:05:56,466 you know, 100 times or a thousand times. 110 00:05:56,466 --> 00:05:58,600 and you want to maximize return. 111 00:05:58,600 --> 00:06:01,233 How do you figure out which ones of them to play? 112 00:06:01,233 --> 00:06:03,866 in order to maximize your returns? 113 00:06:03,866 --> 00:06:07,900 Well, the problem, you know, to describe the problem in more detail, 114 00:06:07,900 --> 00:06:11,933 we've got to mention that, the, assumption here 115 00:06:11,933 --> 00:06:16,666 is that each one of these machines has a distribution behind it. 116 00:06:16,666 --> 00:06:20,000 So there's a distribution, of numbers 117 00:06:20,100 --> 00:06:24,366 out of which or outcomes out of which the machine, picks results. 118 00:06:24,366 --> 00:06:24,633 Right? 119 00:06:24,633 --> 00:06:27,766 So, it has, it has sort of like each one of these machines 120 00:06:27,766 --> 00:06:31,166 has its own distribution, and it picks out a result. 121 00:06:31,166 --> 00:06:34,666 You pull the trigger and it just picks out randomly out of its distribution, 122 00:06:34,933 --> 00:06:37,866 a result, an outcome, you know, whether you win or whether you lose 123 00:06:37,866 --> 00:06:39,433 and how much you win and how much you lose. 124 00:06:40,600 --> 00:06:43,600 or basically you lose the same mode you just put in the coin. 125 00:06:43,700 --> 00:06:46,300 but basically it tells you whether you win lose 126 00:06:46,300 --> 00:06:49,666 based on the, and distribution that's built into the machine. 127 00:06:49,800 --> 00:06:53,600 But the problem here is that you don't know these distributions, right? 128 00:06:53,633 --> 00:06:56,500 You don't know in advance what the distributions are. 129 00:06:56,500 --> 00:06:59,533 And they are assumed to be different for these machines. 130 00:06:59,666 --> 00:07:02,566 Sometimes it can be similar to the same, in some of the machines. 131 00:07:02,566 --> 00:07:05,566 But by by default they are different. 132 00:07:05,966 --> 00:07:09,566 And your goal is to figure out 133 00:07:09,966 --> 00:07:12,733 which of these distributions is the best one for you. 134 00:07:12,733 --> 00:07:14,533 So, let's have a look. 135 00:07:14,533 --> 00:07:16,500 So there are these distributions. Right. 136 00:07:16,500 --> 00:07:20,700 So for example, we've got these five machines, the five distributions. 137 00:07:21,000 --> 00:07:22,533 And as you can see right away 138 00:07:22,533 --> 00:07:25,900 just by looking at this which is the best machine right away, 139 00:07:25,900 --> 00:07:29,533 obviously the one on the right, the orange one is the best machine 140 00:07:29,533 --> 00:07:34,500 because it's got the best, you know, it's the most left skewed. 141 00:07:34,900 --> 00:07:36,733 left skewed because the tails on the left. 142 00:07:36,733 --> 00:07:38,100 So it's, it's got the most 143 00:07:38,100 --> 00:07:41,100 favorable outcomes, got the highest mean median and mode. 144 00:07:41,200 --> 00:07:45,633 And you, if you knew these distributions and what you would obviously 145 00:07:45,633 --> 00:07:49,666 just go to the fifth machine and you would bet on the fifth machine 146 00:07:49,666 --> 00:07:52,466 just on the fifth machine all the time because, 147 00:07:52,466 --> 00:07:54,366 it's got the best distribution right. 148 00:07:54,366 --> 00:07:56,866 So on average you would get the best results. 149 00:07:56,866 --> 00:07:58,833 But you don't know that. You don't know that in advance. 150 00:07:58,833 --> 00:08:01,833 And your goal is to figure out, 151 00:08:02,066 --> 00:08:05,200 you know, it's it's like a it's like a mind game. 152 00:08:05,200 --> 00:08:10,500 You know, how there's all these movies about, machine learning and really cool 153 00:08:10,500 --> 00:08:13,600 or cool mathematics on how they're, using 154 00:08:13,600 --> 00:08:16,600 their cool, really good movie was, 155 00:08:16,600 --> 00:08:20,266 imitation game, right, about Alan Turing and, 156 00:08:20,266 --> 00:08:23,700 and, and how he was solving the Enigma and so on. 157 00:08:23,700 --> 00:08:25,800 But similar kind of concept. 158 00:08:25,800 --> 00:08:28,566 You don't know which one of these is the best. 159 00:08:28,566 --> 00:08:29,433 You got to figure it out. 160 00:08:29,433 --> 00:08:33,666 But at the same time, you are already spending your money doing this right. 161 00:08:33,666 --> 00:08:37,366 You can't just, you know, the longer you take to figure it out, 162 00:08:38,133 --> 00:08:39,133 there's a trade off, right? 163 00:08:39,133 --> 00:08:40,866 The longer you take to figure it out. 164 00:08:40,866 --> 00:08:45,300 The, more money you will probably spend on the wrong ones. 165 00:08:45,766 --> 00:08:48,933 and therefore, you have to figure out very quickly. 166 00:08:49,333 --> 00:08:51,900 So there are these two factors that are in play, exploration 167 00:08:51,900 --> 00:08:53,000 and exploitation. 168 00:08:53,000 --> 00:08:56,266 So you need to explore the machines to find out 169 00:08:56,266 --> 00:08:58,766 which one of them is the best one. 170 00:08:58,766 --> 00:09:03,566 And at the same time, you need to as soon as you can already start exploiting, 171 00:09:03,933 --> 00:09:08,666 exploiting these machines, exploiting your findings to make the maximum return. 172 00:09:09,200 --> 00:09:11,866 So basically, and there's another mathematical concept behind 173 00:09:11,866 --> 00:09:14,866 behind all this, which is called regret. 174 00:09:14,933 --> 00:09:17,700 And a regret is is mathematically defined. 175 00:09:17,700 --> 00:09:19,633 And if you want to read more about this as a goal, 176 00:09:19,633 --> 00:09:21,066 there's a really good white paper on it. 177 00:09:21,066 --> 00:09:24,833 It's called using confidence bounds for exploitation 178 00:09:24,833 --> 00:09:27,033 and exploration or trade offs. 179 00:09:27,033 --> 00:09:30,000 And it is by, Peter, 180 00:09:31,066 --> 00:09:32,400 Euler or a 181 00:09:32,400 --> 00:09:35,566 AQR from the University of Technology in Austria. 182 00:09:36,633 --> 00:09:38,100 really like the white paper. 183 00:09:38,100 --> 00:09:40,066 it goes into a lot of detail. 184 00:09:40,066 --> 00:09:41,233 like I didn't even read the whole thing, 185 00:09:41,233 --> 00:09:45,433 but the first couple of, chapters are pretty good if you want to go into detail. 186 00:09:45,433 --> 00:09:50,700 But basically, regret is, is when it's suffered 187 00:09:50,700 --> 00:09:54,466 when you're using non non alternative and not optimal method. 188 00:09:54,466 --> 00:09:54,666 Right. 189 00:09:54,666 --> 00:09:57,666 So the one on the right is the optimal 190 00:09:57,900 --> 00:10:00,666 or the one on the right the optimal machine. 191 00:10:00,666 --> 00:10:04,433 Whenever you're using the non optimal machine you have a regret which, 192 00:10:04,500 --> 00:10:08,433 which can be quantified as like as the difference 193 00:10:08,433 --> 00:10:12,000 between the best outcome and the known best outcome and the, 194 00:10:12,566 --> 00:10:15,600 you know, all of those sums of the money that you put, 195 00:10:15,766 --> 00:10:20,000 like your, opportunity cost of actually exploring the other machines. 196 00:10:20,633 --> 00:10:25,400 And, so the longer you explore the non-optimal machines, that higher regret. 197 00:10:25,400 --> 00:10:29,600 But at the same time, if you don't explore for long enough, right, 198 00:10:29,600 --> 00:10:33,500 if you explore, if you don't explore for longer, long enough, then you're. 199 00:10:33,500 --> 00:10:38,233 And a suboptimal machine might might appear as an optimal machine. 200 00:10:38,233 --> 00:10:41,166 So for instance, this machine over here. Right. 201 00:10:41,166 --> 00:10:45,666 So if we explore, explore, explore, but we don't spend enough time exploring, 202 00:10:46,000 --> 00:10:47,700 we might think that this is the best machine 203 00:10:47,700 --> 00:10:50,333 because it's got quite a good return right close to this one. 204 00:10:50,333 --> 00:10:53,333 And we might start exploiting this one for the rest of the time. 205 00:10:53,633 --> 00:10:56,233 But in reality, this one was the best one. 206 00:10:56,233 --> 00:11:01,366 So the the, goal is to find the best one and exploit the best one, 207 00:11:02,166 --> 00:11:06,166 but spend the least amount of time exploring all of them. 208 00:11:06,166 --> 00:11:06,466 Right? 209 00:11:06,466 --> 00:11:08,600 And while you're exploring is still earning money, 210 00:11:08,600 --> 00:11:10,600 but not from the optimal machine. Right. 211 00:11:10,600 --> 00:11:12,000 So that's the goal. 212 00:11:12,000 --> 00:11:14,100 That's the point, of this whole exercise. 213 00:11:14,100 --> 00:11:19,766 And it's important to understand here that, there is the best one so that where 214 00:11:20,133 --> 00:11:22,766 even though these machines, you know, they, 215 00:11:22,766 --> 00:11:25,400 have like jackpots sometimes. 216 00:11:25,400 --> 00:11:28,533 And so on, but we are assuming that there's just that 217 00:11:28,533 --> 00:11:31,866 these distributions are, finite there. 218 00:11:31,866 --> 00:11:35,666 And out of them, there's a best one that you are looking for that's kind of the, 219 00:11:36,100 --> 00:11:40,400 pre emphasis or the whole assumption on this problem if, there are 220 00:11:40,466 --> 00:11:43,900 there are more complex options and versions of this problem and, 221 00:11:44,333 --> 00:11:49,533 again, check out some additional reading on that topic. 222 00:11:49,533 --> 00:11:51,566 That's, that's more or even more advanced. 223 00:11:51,566 --> 00:11:54,600 But what are we going to be using this for is that's going to be sufficient. 224 00:11:54,600 --> 00:11:56,600 And why is it going to be sufficient for us? 225 00:11:56,600 --> 00:12:02,366 Because the most common modern application of this that we can, think of 226 00:12:02,366 --> 00:12:06,300 and the one that we are going to be exploring is advertising. 227 00:12:06,600 --> 00:12:08,400 But, so let's have a look at some ads. 228 00:12:08,400 --> 00:12:09,766 This is going to be fun. 229 00:12:09,766 --> 00:12:13,433 So just a disclaimer this there's no affiliation with Coca Cola. 230 00:12:13,433 --> 00:12:15,766 Examples I used just for educational purposes. 231 00:12:15,766 --> 00:12:16,966 All right. So let's have a look. 232 00:12:16,966 --> 00:12:20,133 we have let's say Coca Cola or 233 00:12:20,133 --> 00:12:23,300 some company wants to run a campaign. 234 00:12:23,633 --> 00:12:27,633 and it's going to be called welcome to the Coke Side of Life campaign. 235 00:12:28,066 --> 00:12:28,900 And if you search for this 236 00:12:28,900 --> 00:12:32,400 campaign online, you'll see that they had, you know, hundreds of different ads 237 00:12:32,666 --> 00:12:35,300 that so they came up with, for this campaign. 238 00:12:35,300 --> 00:12:38,633 And here's here's one example of them where these are just some images 239 00:12:38,633 --> 00:12:39,333 I pulled from Google. 240 00:12:39,333 --> 00:12:42,900 So maybe these are even drawn by, people, but we're going to assume 241 00:12:42,900 --> 00:12:47,400 that these are legitimate ads that, we're going to go into the campaign. 242 00:12:47,700 --> 00:12:49,866 And so we want to find out which is the best ad, 243 00:12:49,866 --> 00:12:51,300 which is the ad that works best. 244 00:12:51,300 --> 00:12:53,100 So we've got options number one. 245 00:12:53,100 --> 00:12:57,000 Number two, number three, number four, and number five. 246 00:12:57,333 --> 00:13:02,066 And so now our goal is to find out which ad works the best. 247 00:13:02,233 --> 00:13:03,666 Maximize our returns. 248 00:13:03,666 --> 00:13:05,800 But right now we don't know which has worked the best. Right. 249 00:13:05,800 --> 00:13:10,833 So there's no there is a distribution behind it, but that distribution 250 00:13:10,833 --> 00:13:15,300 will only become known after thousands and thousands and thousands of people. 251 00:13:15,733 --> 00:13:18,233 Look at these ads and click or not click on these ads. 252 00:13:18,233 --> 00:13:20,000 And this is actually going to be very similar 253 00:13:20,000 --> 00:13:21,800 to the example that we're going to be looking at. 254 00:13:21,800 --> 00:13:24,600 The example that had land is going to be walking you through 255 00:13:24,600 --> 00:13:25,800 in the programing tutorials. 256 00:13:25,800 --> 00:13:27,700 And in that example we're going to have ten ads. 257 00:13:27,700 --> 00:13:29,100 So even more. 258 00:13:29,100 --> 00:13:31,100 And so the what can you do here. 259 00:13:31,100 --> 00:13:34,200 Well one way to approach a problem is just run an AB test. 260 00:13:34,200 --> 00:13:34,433 Right. 261 00:13:34,433 --> 00:13:37,700 So take your five or 50 or 500 ads 262 00:13:37,700 --> 00:13:43,400 and run a huge AB test, with or multiple AB test and, 263 00:13:43,400 --> 00:13:47,766 wait until you have enough, of a, a large enough sample. 264 00:13:48,533 --> 00:13:52,600 and then, conclude which ad is the best, right, with, with certain confidence. 265 00:13:53,133 --> 00:13:56,800 But the problem with that is that you would spend 266 00:13:56,800 --> 00:13:59,666 a lot of time and money doing that. Right. 267 00:13:59,666 --> 00:14:02,633 So an AB test is pure exploration, right? 268 00:14:02,633 --> 00:14:04,433 You're not exploiting the best option. 269 00:14:04,433 --> 00:14:05,766 You are exploring the best option. 270 00:14:05,766 --> 00:14:10,100 But, to the same extent as you're exploiting the non-optimal options. 271 00:14:10,100 --> 00:14:10,300 Right. 272 00:14:10,300 --> 00:14:14,800 So if, if we go by our previous distribution, if this is the best one, 273 00:14:14,800 --> 00:14:18,000 if you just run an AB test and you're uniformly distributing 274 00:14:18,000 --> 00:14:21,466 or uniformly using these, five options and therefore, 275 00:14:21,700 --> 00:14:25,333 as much as you're using this one, you might using all for all four of them. 276 00:14:25,333 --> 00:14:27,400 So basically all five of them. 277 00:14:27,400 --> 00:14:30,433 So basically you are exploiting it a bit, but 278 00:14:30,433 --> 00:14:33,433 unconsciously, right, in a random way. 279 00:14:34,166 --> 00:14:37,500 and therefore AB tests are just for exploration. 280 00:14:38,300 --> 00:14:41,400 So the challenge is to find out which is the best one. 281 00:14:41,766 --> 00:14:45,700 But, do it while you're explore. 282 00:14:45,700 --> 00:14:51,400 while, you know, to exploit the best one while you're exploring for it. 283 00:14:51,400 --> 00:14:51,600 Right. 284 00:14:51,600 --> 00:14:55,366 So find out which of them is the best one in the process of, 285 00:14:55,900 --> 00:14:59,800 hold on to find out 286 00:14:59,800 --> 00:15:03,033 which is the best one in the process of the actual launched campaign. 287 00:15:03,033 --> 00:15:05,033 So not don't have two phases. Yeah. 288 00:15:05,033 --> 00:15:08,033 And do the AB test and then use the most, the best one. 289 00:15:08,166 --> 00:15:10,900 but actually find out the best one in the quickest way 290 00:15:10,900 --> 00:15:14,066 possible and start exploiting it along the way. 291 00:15:14,066 --> 00:15:15,433 So that's the challenge here. 292 00:15:15,433 --> 00:15:17,200 And that's what we're going to be solving. 293 00:15:17,200 --> 00:15:22,233 and that's the modern application of the multi-armed bandit problem. 294 00:15:22,600 --> 00:15:24,433 So hopefully you're excited about this. 295 00:15:24,433 --> 00:15:27,200 We've got two great algorithms coming up. 296 00:15:27,200 --> 00:15:29,000 can't wait to get started. 297 00:15:29,000 --> 00:15:30,833 I look forward to seeing you in the next tutorial. 298 00:15:30,833 --> 00:15:33,000 And until then, enjoy machine learning.