1 00:00:00,233 --> 00:00:03,066 Hello my friends, and welcome to this new practical. 2 00:00:03,066 --> 00:00:05,766 Activity, the first one of part six. 3 00:00:05,766 --> 00:00:07,233 Reinforcement Learning. 4 00:00:07,233 --> 00:00:09,366 Where we. Will be implementing the. 5 00:00:09,366 --> 00:00:11,200 UCB. Algorithm. 6 00:00:11,200 --> 00:00:13,766 Upper confidence bounds. 7 00:00:13,766 --> 00:00:17,166 It is one of the most exciting branches of machine learning 8 00:00:17,166 --> 00:00:19,200 because you know it is the one. The closest to. 9 00:00:19,200 --> 00:00:22,833 Artificial intelligence in the sense that, you know, we are making some programs. 10 00:00:22,833 --> 00:00:23,266 That. 11 00:00:23,266 --> 00:00:25,733 Play some actions just like. A robot. 12 00:00:25,733 --> 00:00:27,233 So this is very exciting. 13 00:00:27,233 --> 00:00:30,433 This is one of my favorite branches, if not the number one. 14 00:00:30,666 --> 00:00:32,633 And so I'm so excited right now to. 15 00:00:32,633 --> 00:00:33,866 Teach you about the. 16 00:00:33,866 --> 00:00:35,700 Fundamentals of reinforcement learning. 17 00:00:35,700 --> 00:00:38,400 Especially. I'm so excited to implement two. 18 00:00:38,400 --> 00:00:39,033 Of the best. 19 00:00:39,033 --> 00:00:40,366 Reinforcement learning models. 20 00:00:40,366 --> 00:00:42,466 With you, which are UCB. 21 00:00:42,466 --> 00:00:44,033 And Thompson sampling. 22 00:00:44,033 --> 00:00:45,933 So first in this section we will implement. 23 00:00:45,933 --> 00:00:48,100 UCB Upper. Confidence Bound. 24 00:00:48,100 --> 00:00:50,566 And once again we will apply it on a business. 25 00:00:50,566 --> 00:00:51,333 Case study. 26 00:00:51,333 --> 00:00:52,900 Which you know will be the next. 27 00:00:52,900 --> 00:00:54,500 Part of the story we. 28 00:00:54,500 --> 00:00:56,500 Had in part three. Classification. 29 00:00:56,500 --> 00:00:58,033 You remember that. SUV. 30 00:00:58,033 --> 00:01:00,900 You know, that brand new. Luxury SUV that this. 31 00:01:00,900 --> 00:01:04,800 Car company was trying to optimize the targeting thanks to classification? 32 00:01:05,100 --> 00:01:07,900 Well, this time we were going to optimize the online. 33 00:01:07,900 --> 00:01:09,000 Advertising. 34 00:01:09,000 --> 00:01:10,933 Meaning that we are going to find the. 35 00:01:10,933 --> 00:01:12,000 Best ad. 36 00:01:12,000 --> 00:01:14,233 You know, among different ad designs. 37 00:01:14,233 --> 00:01:15,133 The best ad that. 38 00:01:15,133 --> 00:01:18,900 Will convert the maximum customers to click on the 39 00:01:18,900 --> 00:01:20,600 ad and, you know, potentially. 40 00:01:20,600 --> 00:01:22,733 By the product, by the car. Okay. 41 00:01:22,733 --> 00:01:25,233 So I will. Explain this story a bit more later. 42 00:01:25,233 --> 00:01:26,033 But before. 43 00:01:26,033 --> 00:01:28,466 Let's just make sure everyone here is on. This. 44 00:01:28,466 --> 00:01:29,633 Same. Page. 45 00:01:29,633 --> 00:01:31,933 I gave you the links to this whole folder right before this. 46 00:01:31,933 --> 00:01:34,766 Tutorial in the article, so make sure to. Click it. 47 00:01:34,766 --> 00:01:36,866 And now we should all be on the same page. 48 00:01:36,866 --> 00:01:37,866 So let's do this. 49 00:01:37,866 --> 00:01:39,500 Let's end to. Part six. 50 00:01:39,500 --> 00:01:42,300 Reinforcement learning. And we're going to start. 51 00:01:42,300 --> 00:01:43,900 As we said with. 52 00:01:43,900 --> 00:01:46,566 Upper confidence bound. UCB. 53 00:01:46,566 --> 00:01:48,000 And so this time you not only. 54 00:01:48,000 --> 00:01:49,533 See the two folders. Python and R. 55 00:01:49,533 --> 00:01:52,833 But you also see the full slide of the UCB. 56 00:01:52,833 --> 00:01:54,033 Algorithm. And you will also. 57 00:01:54,033 --> 00:01:56,933 See the full. Slide of the Thompson Something algorithm in the. 58 00:01:56,933 --> 00:01:57,866 Other folder. 59 00:01:57,866 --> 00:02:00,066 So let's have a look at it. Make sure to download it. 60 00:02:00,066 --> 00:02:03,066 And if you want you can print it and post it on your wall. 61 00:02:03,300 --> 00:02:03,900 There you go. 62 00:02:03,900 --> 00:02:06,266 You have the three steps of the UCB. 63 00:02:06,266 --> 00:02:08,066 Algorithm which. We. 64 00:02:08,066 --> 00:02:10,466 Are going to implement together. You know. 65 00:02:10,466 --> 00:02:11,700 I will actually give you a lot of. 66 00:02:11,700 --> 00:02:13,366 Exercises in this implementation. 67 00:02:13,366 --> 00:02:15,966 I will tell you before. We implement each of the. Steps. 68 00:02:15,966 --> 00:02:17,366 To implement it. Yourself. 69 00:02:17,366 --> 00:02:19,933 So at first you. Will have to implement step one. 70 00:02:19,933 --> 00:02:22,933 Then once step one is implemented we will implement step two. 71 00:02:22,933 --> 00:02:24,900 You will do it first before we do it together. 72 00:02:24,900 --> 00:02:27,766 And then. Step three. So you see it will be a very. 73 00:02:27,766 --> 00:02:29,733 Learned by doing. Process. 74 00:02:29,733 --> 00:02:32,466 All right so that's the slide. Make sure to download it. 75 00:02:32,466 --> 00:02:35,166 And now let's go into Python first 76 00:02:35,166 --> 00:02:38,166 to implement the UCB algorithm. 77 00:02:38,333 --> 00:02:38,566 All right. 78 00:02:38,566 --> 00:02:42,066 So as usual you have two files here you have two data set add CTR. 79 00:02:42,066 --> 00:02:44,600 Optimization CTR means click. Through. Rate. 80 00:02:44,600 --> 00:02:48,733 And that's what we are going to optimize thanks to upper confidence bound first. 81 00:02:48,733 --> 00:02:50,700 And then Thompson Sampling. 82 00:02:50,700 --> 00:02:52,900 And then we have the implementation. Of course. 83 00:02:52,900 --> 00:02:56,866 Upper confidence bound in the Ipynb format, which you can open with either. 84 00:02:57,066 --> 00:02:59,933 Google Collaboratory or Jupyter. Notebook. 85 00:02:59,933 --> 00:03:00,200 All right. 86 00:03:00,200 --> 00:03:01,533 So as usual let's start by. 87 00:03:01,533 --> 00:03:02,400 Explaining what. 88 00:03:02,400 --> 00:03:03,900 This data set is about. 89 00:03:03,900 --> 00:03:06,033 So as I said, we are doing. 90 00:03:06,033 --> 00:03:08,133 The next. Part of the story of this car dealership. 91 00:03:08,133 --> 00:03:08,900 Trying to. 92 00:03:08,900 --> 00:03:10,933 Sell. That new. SUV. 93 00:03:10,933 --> 00:03:12,900 We've already done the targeting. You know. 94 00:03:12,900 --> 00:03:13,400 We've already. 95 00:03:13,400 --> 00:03:16,333 Optimized tutoring thanks to classification in part three. 96 00:03:16,333 --> 00:03:17,533 And now we're going to optimize. 97 00:03:17,533 --> 00:03:19,000 The click through rate. 98 00:03:19,000 --> 00:03:21,666 Of some ads we're going to make. For this car. 99 00:03:21,666 --> 00:03:22,366 Okay. 100 00:03:22,366 --> 00:03:24,800 So what happened exactly is that the. 101 00:03:24,800 --> 00:03:26,100 Advertising team. 102 00:03:26,100 --> 00:03:26,933 Prepared. 103 00:03:26,933 --> 00:03:28,300 Ten different ads. 104 00:03:28,300 --> 00:03:30,000 You know, with ten different designs. 105 00:03:30,000 --> 00:03:32,766 For example, on one ad. We will see the SUV in. 106 00:03:32,766 --> 00:03:34,233 A beautiful mountain. 107 00:03:34,233 --> 00:03:37,833 On the other ad, we will see the SUV in a futuristic city. 108 00:03:37,833 --> 00:03:39,466 And another. Ad, we'll see the SUV. 109 00:03:39,466 --> 00:03:42,633 In a charming city, you know, like a charming city in the south of France or. 110 00:03:42,633 --> 00:03:45,233 Italy. On another. Ad, we will see the. 111 00:03:45,233 --> 00:03:47,800 Car on the moon, you know why not on another. 112 00:03:47,800 --> 00:03:48,900 Ad? We'll see the car. 113 00:03:48,900 --> 00:03:51,100 On a beautiful countryside cornfield. 114 00:03:51,100 --> 00:03:51,900 You know, something like that. 115 00:03:51,900 --> 00:03:53,866 So basically, all the ads. 116 00:03:53,866 --> 00:03:55,066 Have different designs. 117 00:03:55,066 --> 00:03:57,333 And advertising team is wondering. 118 00:03:57,333 --> 00:03:59,800 Well, which ad. Will convert the most, you know. 119 00:03:59,800 --> 00:04:01,166 Which ad will attract. 120 00:04:01,166 --> 00:04:01,566 The most. 121 00:04:01,566 --> 00:04:04,500 People to click the. Ad and then potentially buy. 122 00:04:04,500 --> 00:04:05,666 The SUV. 123 00:04:05,666 --> 00:04:08,100 So we have these ten different ads. And what we're going to do. 124 00:04:08,100 --> 00:04:10,666 And that is. The process. Of online learning. 125 00:04:10,666 --> 00:04:12,366 We're going to show these ads to. 126 00:04:12,366 --> 00:04:13,866 Different users online. 127 00:04:13,866 --> 00:04:14,833 You know, once they connect. 128 00:04:14,833 --> 00:04:16,833 To a website or to. 129 00:04:16,833 --> 00:04:18,866 A search engine, you know, it can be the ads. 130 00:04:18,866 --> 00:04:20,566 That appear at the top of a page. 131 00:04:20,566 --> 00:04:21,366 When you type. 132 00:04:21,366 --> 00:04:24,266 Of research on Google, we're going to show one of these. 133 00:04:24,266 --> 00:04:26,333 Ads each. Time the user connects. 134 00:04:26,333 --> 00:04:27,300 To the web page. 135 00:04:27,300 --> 00:04:28,566 And then we're going to record. 136 00:04:28,566 --> 00:04:30,866 The result whether this user clicked. 137 00:04:30,866 --> 00:04:32,566 Yes or no on the ad. 138 00:04:32,566 --> 00:04:35,800 Okay, so just to recap, there is a first user. 139 00:04:35,800 --> 00:04:36,933 That connects to. 140 00:04:36,933 --> 00:04:38,433 Let's say a web page or. 141 00:04:38,433 --> 00:04:40,100 Algorithm, which. Will be here. 142 00:04:40,100 --> 00:04:41,966 First, UCB will. 143 00:04:41,966 --> 00:04:44,133 Select an ad to. Show to this user. 144 00:04:44,133 --> 00:04:46,200 And then the user will. Decide to click. 145 00:04:46,200 --> 00:04:47,566 Yes or no on. The ad. 146 00:04:47,566 --> 00:04:50,600 If the user clicks on the ad, we will record it as one. 147 00:04:50,600 --> 00:04:51,933 And if the user doesn't click on the. 148 00:04:51,933 --> 00:04:54,700 Ad, we will record it as zero. Okay. 149 00:04:54,700 --> 00:04:57,800 And then a new user connects to the web page and same. 150 00:04:57,900 --> 00:04:59,033 The algorithm. Selects. 151 00:04:59,033 --> 00:05:00,966 An ad to show to this new user. 152 00:05:00,966 --> 00:05:02,666 And if this. New user clicks the ad. 153 00:05:02,666 --> 00:05:03,566 Then it's a one. 154 00:05:03,566 --> 00:05:05,133 And if not, it's a zero. 155 00:05:05,133 --> 00:05:05,766 Okay. 156 00:05:05,766 --> 00:05:08,500 And we're going to do. This for lots of users actually. 157 00:05:08,500 --> 00:05:10,133 10,000 users. 158 00:05:10,133 --> 00:05:12,100 And that's what. This data set is about. 159 00:05:12,100 --> 00:05:14,200 However, now. There is something you must. 160 00:05:14,200 --> 00:05:15,433 Absolutely. Understand. 161 00:05:15,433 --> 00:05:17,300 And that. Is very. Very important. 162 00:05:17,300 --> 00:05:22,233 Make sure to understand it and make sure to rewind if this is not understood okay. 163 00:05:22,500 --> 00:05:23,766 So I'm going to explain this. 164 00:05:23,766 --> 00:05:25,466 Please listen carefully. 165 00:05:25,466 --> 00:05:27,000 So you know, in. Reality. 166 00:05:27,000 --> 00:05:28,866 What happens is that. Users. 167 00:05:28,866 --> 00:05:31,033 Connect one by one to the. Webpage. 168 00:05:31,033 --> 00:05:32,466 And for each of them we. 169 00:05:32,466 --> 00:05:34,933 Successively show. Them the ad. Right. 170 00:05:34,933 --> 00:05:36,933 So everything happens in real time. 171 00:05:36,933 --> 00:05:38,400 You know, it's a dynamic process. 172 00:05:38,400 --> 00:05:41,100 It's not a static process with a static data set 173 00:05:41,100 --> 00:05:43,400 which was recorded over a certain period of time. 174 00:05:43,400 --> 00:05:45,500 It's a real time process. 175 00:05:45,500 --> 00:05:46,533 And therefore the. 176 00:05:46,533 --> 00:05:50,133 Only way to simulate this would be either that. 177 00:05:50,133 --> 00:05:53,133 I, you know, make ten real ads right now, you know, general. 178 00:05:53,133 --> 00:05:53,933 Ads of a car. 179 00:05:53,933 --> 00:05:56,666 Then I open a Google AdWords account, and then I show. 180 00:05:56,666 --> 00:05:57,766 The ads for real to. 181 00:05:57,766 --> 00:06:01,000 Some users, you know, real persons connecting to the website. 182 00:06:01,233 --> 00:06:02,366 Of course, I'm not going to do. 183 00:06:02,366 --> 00:06:04,466 This because first of all, this is costly. 184 00:06:04,466 --> 00:06:05,400 And then, you know, this. 185 00:06:05,400 --> 00:06:06,300 Would deceive the users. 186 00:06:06,300 --> 00:06:09,200 Well, you know, I would have to really sell a car somehow. 187 00:06:09,200 --> 00:06:10,366 So of course, this is. 188 00:06:10,366 --> 00:06:12,300 Not an option and therefore I have to make a. 189 00:06:12,300 --> 00:06:14,633 Simulation. Okay. I have to make a simulation. 190 00:06:14,633 --> 00:06:16,333 And this simulation is. Exactly. 191 00:06:16,333 --> 00:06:18,500 Given by. This data set. 192 00:06:18,500 --> 00:06:19,133 Because in this. 193 00:06:19,133 --> 00:06:22,200 Data set what happens is that each row. 194 00:06:22,200 --> 00:06:24,100 Corresponds to. The. Different. 195 00:06:24,100 --> 00:06:26,466 Users. Connecting to the. Web page. 196 00:06:26,466 --> 00:06:28,666 And to whom we're going to show the ads. 197 00:06:28,666 --> 00:06:31,466 And then. Each. Column of. This data set. 198 00:06:31,466 --> 00:06:33,500 Corresponds to the different ad okay. 199 00:06:33,500 --> 00:06:36,166 From ad one to AD ten. 200 00:06:36,166 --> 00:06:37,200 And this. Data set. 201 00:06:37,200 --> 00:06:39,066 Is a simulation in the sense. 202 00:06:39,066 --> 00:06:41,233 That each. Time a user connects. 203 00:06:41,233 --> 00:06:43,733 To the web page, well, this data set tells us. 204 00:06:43,733 --> 00:06:44,933 Even if we wouldn't know. 205 00:06:44,933 --> 00:06:47,333 In reality, this data set tells us on 206 00:06:47,333 --> 00:06:50,766 which ad the user of the row would click on. 207 00:06:50,766 --> 00:06:51,333 You know? 208 00:06:51,333 --> 00:06:54,200 So for example, this first user, you know, this corresponds. 209 00:06:54,200 --> 00:06:55,300 To the first user to. 210 00:06:55,300 --> 00:06:57,300 Whom we going to show. The ad. 211 00:06:57,300 --> 00:06:58,700 And what the sales mean. 212 00:06:58,700 --> 00:07:01,266 Is that this user would click. 213 00:07:01,266 --> 00:07:04,266 On add one. If we show this user ad one. 214 00:07:04,566 --> 00:07:06,066 Then it wouldn't click at two. 215 00:07:06,066 --> 00:07:08,166 If we show at two because there is a zero. 216 00:07:08,166 --> 00:07:10,500 Here, then the user. Wouldn't click at three. 217 00:07:10,500 --> 00:07:12,600 If we show at. Three, it wouldn't click at. Four. 218 00:07:12,600 --> 00:07:15,166 If we show ad four, but then it would click at five 219 00:07:15,166 --> 00:07:17,600 if we showed at five and etc.. 220 00:07:17,600 --> 00:07:18,833 So in other words. 221 00:07:18,833 --> 00:07:21,766 We know thanks to this simulation, you know this. 222 00:07:21,766 --> 00:07:24,300 Data set doing a. Simulation, we know. 223 00:07:24,300 --> 00:07:26,833 That this user would only click at. 224 00:07:26,833 --> 00:07:29,333 One at five and at nine. 225 00:07:29,333 --> 00:07:30,533 If we showed these. Ads. 226 00:07:30,533 --> 00:07:31,233 And then if. 227 00:07:31,233 --> 00:07:34,700 We showed all of the other ads at two, at three, etc., up to. 228 00:07:34,700 --> 00:07:36,466 At eight and AD ten. 229 00:07:36,466 --> 00:07:39,300 Well, this user wouldn't click the ad. 230 00:07:39,300 --> 00:07:40,800 Now, I know we wouldn't know. 231 00:07:40,800 --> 00:07:42,900 That in reality, but that's. Why I'm saying. 232 00:07:42,900 --> 00:07:45,900 That this data set is a simulation. 233 00:07:45,966 --> 00:07:48,366 And this. Is the only way we can actually. 234 00:07:48,366 --> 00:07:49,733 Run the. UCB. 235 00:07:49,733 --> 00:07:51,966 Algorithm or the Thompson algorithm. 236 00:07:51,966 --> 00:07:55,266 If not doing it for real, you know, with a real advertising campaign. 237 00:07:55,600 --> 00:07:56,866 Okay, so I hope it's clear. 238 00:07:56,866 --> 00:07:58,800 Please rewind if it's not clear. Because. 239 00:07:58,800 --> 00:08:00,333 I think I've said all. The keywords. 240 00:08:00,333 --> 00:08:02,333 You know, this. Data set. Is a simulation. 241 00:08:02,333 --> 00:08:03,733 Therefore, for all the users. 242 00:08:03,733 --> 00:08:05,200 You know, corresponding to these. Rows. 243 00:08:05,200 --> 00:08:06,066 We know on. 244 00:08:06,066 --> 00:08:08,800 Which ad the users will click right. 245 00:08:08,800 --> 00:08:09,833 This user, for example. 246 00:08:09,833 --> 00:08:12,566 Would click only on ad number two. Or. 247 00:08:12,566 --> 00:08:13,366 Ad number. 248 00:08:13,366 --> 00:08:16,333 Eight, right, but wouldn't click on all the other ads. 249 00:08:16,333 --> 00:08:17,733 And so that's the only. Way we. 250 00:08:17,733 --> 00:08:19,600 Can indeed simulate. The. 251 00:08:19,600 --> 00:08:22,566 Thompson sampling or. UCB algorithm. 252 00:08:22,566 --> 00:08:24,600 All right. So I hope it's clear. Then we have let's see. 253 00:08:24,600 --> 00:08:26,100 Let's scroll down to the bottom. 254 00:08:26,100 --> 00:08:28,966 We have in total 10,000. Users as we said. 255 00:08:28,966 --> 00:08:31,133 And so we're going to run. The UCB. 256 00:08:31,133 --> 00:08:33,900 Algorithm first and then the Thompson something algorithm. 257 00:08:33,900 --> 00:08:36,833 To figure out the ad that has the. 258 00:08:36,833 --> 00:08:38,433 Highest conversion rate. 259 00:08:38,433 --> 00:08:39,433 Right. The ad on. 260 00:08:39,433 --> 00:08:41,733 Which the users clicked. The most. 261 00:08:41,733 --> 00:08:42,600 So I know that. 262 00:08:42,600 --> 00:08:43,033 We could do. 263 00:08:43,033 --> 00:08:43,633 It, for example, 264 00:08:43,633 --> 00:08:47,433 with a naive strategy, you know, a naive algorithm, like a simple one. 265 00:08:47,433 --> 00:08:48,600 Where we collect. 266 00:08:48,600 --> 00:08:49,866 Some simple statistics. 267 00:08:49,866 --> 00:08:51,066 To see which. 268 00:08:51,066 --> 00:08:53,433 Ad is most frequently clicked on. 269 00:08:53,433 --> 00:08:56,433 But remember, as Carol explains in the intuition lectures, 270 00:08:56,533 --> 00:08:59,100 each time we impress an ad, you know, on the website. 271 00:08:59,100 --> 00:08:59,633 Or the Google. 272 00:08:59,633 --> 00:09:02,466 Search engine, well, this incurs a. Cost, right? 273 00:09:02,466 --> 00:09:03,933 It has a. Cost to impress. 274 00:09:03,933 --> 00:09:06,600 Ad. Therefore, we need to figure out as fast as. 275 00:09:06,600 --> 00:09:10,366 Possible, you know, in the minimum number of rounds, because you know. 276 00:09:10,366 --> 00:09:11,233 The users here. 277 00:09:11,233 --> 00:09:13,433 Are represented as rounds because we. 278 00:09:13,433 --> 00:09:14,366 Show that add to. 279 00:09:14,366 --> 00:09:17,033 The users one by one, as in one round after. 280 00:09:17,033 --> 00:09:17,766 The other. 281 00:09:17,766 --> 00:09:18,500 So we. 282 00:09:18,500 --> 00:09:20,766 Need to figure out and a minimum number of rounds 283 00:09:20,766 --> 00:09:22,966 which ad converts to the most meaning. 284 00:09:22,966 --> 00:09:25,166 Which is the best ad to which the. 285 00:09:25,166 --> 00:09:27,133 Users are most. Attracted to. 286 00:09:27,133 --> 00:09:29,266 And that's why we need a stronger. 287 00:09:29,266 --> 00:09:32,000 Algorithm than a simple. Statistics. Algorithm. 288 00:09:32,000 --> 00:09:34,900 And that stronger algorithm will be first. UCB. 289 00:09:34,900 --> 00:09:37,433 And then Thompson sampling. And we will even see. 290 00:09:37,433 --> 00:09:39,966 Which of the two. Is the most. Powerful. 291 00:09:39,966 --> 00:09:40,433 All right. 292 00:09:40,433 --> 00:09:43,366 So I. Think that's enough explained for this. Data set. 293 00:09:43,366 --> 00:09:45,900 Now we're going to start the implementation I can't wait. 294 00:09:45,900 --> 00:09:47,633 This is a very. Exciting and actually. 295 00:09:47,633 --> 00:09:48,366 Widely used. 296 00:09:48,366 --> 00:09:51,400 Algorithm in online advertising or digital marketing. 297 00:09:51,566 --> 00:09:52,466 So let's do this. 298 00:09:52,466 --> 00:09:53,733 Let's click this. 299 00:09:53,733 --> 00:09:55,866 Implementation and then let's open it. 300 00:09:55,866 --> 00:09:58,233 With Google Colaboratory. 301 00:09:58,233 --> 00:10:00,133 Or Jupyter Notebook as. You want. 302 00:10:00,133 --> 00:10:00,500 All right. 303 00:10:00,500 --> 00:10:02,400 So now it is loading it. 304 00:10:02,400 --> 00:10:03,133 It is. 305 00:10:03,133 --> 00:10:05,433 Loading the. Notebook. Laying out the notebook. 306 00:10:05,433 --> 00:10:06,700 And now here you go. 307 00:10:06,700 --> 00:10:09,700 Welcome to the UCB implementation. 308 00:10:09,700 --> 00:10:10,000 All right. 309 00:10:10,000 --> 00:10:12,233 So as usual we're going to create a copy because this. 310 00:10:12,233 --> 00:10:13,800 Is in read only mode. 311 00:10:13,800 --> 00:10:16,700 So in order to re-implement this from scratch we're going to click. 312 00:10:16,700 --> 00:10:17,833 File here. 313 00:10:17,833 --> 00:10:20,433 And then save. A copy and drive. 314 00:10:20,433 --> 00:10:22,400 This will create a copy. Inside. 315 00:10:22,400 --> 00:10:24,000 Which we will be able to. 316 00:10:24,000 --> 00:10:26,666 Re-Implement the whole algorithm from. 317 00:10:26,666 --> 00:10:27,533 Scratch. 318 00:10:27,533 --> 00:10:27,966 All right. 319 00:10:27,966 --> 00:10:30,133 So it is opening you notice. That I. 320 00:10:30,133 --> 00:10:32,166 Have my data preprocessing template opened. 321 00:10:32,166 --> 00:10:34,533 That's because we will use it very quickly. 322 00:10:34,533 --> 00:10:37,533 You know just to actually import the libraries and the data set. 323 00:10:37,766 --> 00:10:38,533 And so now. 324 00:10:38,533 --> 00:10:41,433 Before we start let's delete all the code cells. 325 00:10:41,433 --> 00:10:44,666 Here, but not the. Text cells. And 326 00:10:45,800 --> 00:10:46,933 very soon we should. 327 00:10:46,933 --> 00:10:48,733 Be able to start okay. 328 00:10:48,733 --> 00:10:51,300 So that's a. Simple implementation you know simple. Structure. 329 00:10:51,300 --> 00:10:53,766 But this cell will actually be long. 330 00:10:53,766 --> 00:10:56,533 And that's where. You will practice by doing first. 331 00:10:56,533 --> 00:10:59,433 Some steps of the implementation before. We do it together. 332 00:10:59,433 --> 00:11:00,700 And so let's have a look. 333 00:11:00,700 --> 00:11:03,000 Welcome to. The UCB. Implementation. 334 00:11:03,000 --> 00:11:03,733 We will start. 335 00:11:03,733 --> 00:11:05,366 First by importing the. Libraries. 336 00:11:05,366 --> 00:11:07,333 And then we will import. The data set. 337 00:11:07,333 --> 00:11:10,533 Then we will implement the full UCB algorithm 338 00:11:10,833 --> 00:11:13,933 just following the steps on the slide you know with the three steps. 339 00:11:14,100 --> 00:11:16,333 And finally we. Will visualize the. Results. 340 00:11:16,333 --> 00:11:17,566 And by that I mean that we. 341 00:11:17,566 --> 00:11:21,633 Will plot a histogram where we will clearly see the ad that was. 342 00:11:21,633 --> 00:11:23,200 The. Most selected. 343 00:11:23,200 --> 00:11:23,500 You know. 344 00:11:23,500 --> 00:11:24,166 And that of course. 345 00:11:24,166 --> 00:11:24,466 Added. 346 00:11:24,466 --> 00:11:27,466 Was identified as the strongest ad, you know, the most. 347 00:11:27,466 --> 00:11:29,566 Attractive ad for the users. 348 00:11:29,566 --> 00:11:31,266 And one thing I forgot. To say. 349 00:11:31,266 --> 00:11:33,166 And which is really, really important. 350 00:11:33,166 --> 00:11:35,866 This data set actually suppose 351 00:11:35,866 --> 00:11:39,300 that each ad has a fixed conversion rate. 352 00:11:39,300 --> 00:11:40,633 So ad number one. 353 00:11:40,633 --> 00:11:42,300 Has a. Certain conversion rate. 354 00:11:42,300 --> 00:11:44,633 AD number two has another conversion rate. 355 00:11:44,633 --> 00:11:47,333 And then same for all. The other. Ads. And that's. 356 00:11:47,333 --> 00:11:49,400 Of course because it is a required. 357 00:11:49,400 --> 00:11:50,366 Assumption. 358 00:11:50,366 --> 00:11:51,900 Of both the UCB. 359 00:11:51,900 --> 00:11:53,533 And Thompson sampling algorithm. 360 00:11:53,533 --> 00:11:56,900 Basically reinforcement learning algorithms for online learning. 361 00:11:56,900 --> 00:11:58,266 And that's, you know, anyway. 362 00:11:58,266 --> 00:12:00,400 The case in reality, for. Example. 363 00:12:00,400 --> 00:12:00,900 With the. 364 00:12:00,900 --> 00:12:03,633 Slot machines in the casino. Well, they all have a. 365 00:12:03,633 --> 00:12:06,166 Fixed conversion rate unless they change it over time. 366 00:12:06,166 --> 00:12:08,466 But that's another. Question. There you go. 367 00:12:08,466 --> 00:12:10,833 Usually an ad that you show online. 368 00:12:10,833 --> 00:12:12,933 Has a. Fixed conversion rate because. It will. 369 00:12:12,933 --> 00:12:15,466 Convert over time, the same rate of people. 370 00:12:15,466 --> 00:12:17,166 So we will assume this. 371 00:12:17,166 --> 00:12:19,200 And besides it's close to reality. 372 00:12:19,200 --> 00:12:19,866 But there you go. 373 00:12:19,866 --> 00:12:22,866 That's an important assumption of online learning. 374 00:12:23,233 --> 00:12:25,300 All right. So now. We're ready. 375 00:12:25,300 --> 00:12:26,400 We're ready to begin. 376 00:12:26,400 --> 00:12:28,366 This implementation I hope. You're excited. 377 00:12:28,366 --> 00:12:29,666 I hope. You understood. 378 00:12:29,666 --> 00:12:30,433 This data. 379 00:12:30,433 --> 00:12:33,233 Set and the fact that we're doing a simulation because. We. 380 00:12:33,233 --> 00:12:35,033 Actually don't have too much of. A choice. 381 00:12:35,033 --> 00:12:36,733 And so if everything is all good. 382 00:12:36,733 --> 00:12:40,533 Well, my friends, let's begin this implementation in the next tutorial. 383 00:12:40,800 --> 00:12:42,533 And until then, enjoy machine learning.