1 00:00:00,233 --> 00:00:00,633 Hello my 2 00:00:00,633 --> 00:00:03,833 friends, and welcome to this new practical activity 3 00:00:03,900 --> 00:00:08,433 which will be, you know, an exciting one because after implementing the UCB 4 00:00:08,433 --> 00:00:12,800 algorithm, we're about implement a new one and we will see if this new one, 5 00:00:12,800 --> 00:00:15,800 which is Thompson Sampling, will beat the UCB. 6 00:00:16,000 --> 00:00:19,433 So I just want to say here that it's important that you check out 7 00:00:19,433 --> 00:00:22,433 first the UCB section, you know, and the practical activity 8 00:00:22,466 --> 00:00:25,800 before doing this one because we will work on the same data set. 9 00:00:25,966 --> 00:00:29,000 And I will mention several times the UCB so that we can compare 10 00:00:29,000 --> 00:00:30,300 the two performance. 11 00:00:30,300 --> 00:00:30,633 All right. 12 00:00:30,633 --> 00:00:34,400 So make sure to check out first the section on upper Confidence bound. 13 00:00:34,400 --> 00:00:37,166 And if that's the case you ready to go okay. 14 00:00:37,166 --> 00:00:40,133 And then let's make sure everyone here is on the same page 15 00:00:40,133 --> 00:00:41,733 I give you the link to this folder. 16 00:00:41,733 --> 00:00:42,400 Machine learning 17 00:00:42,400 --> 00:00:45,933 is that code and data sets right before this tutorial in the article. 18 00:00:45,933 --> 00:00:47,533 So make sure to connect to it. 19 00:00:47,533 --> 00:00:50,166 And now all good we can start. So let's do this. 20 00:00:50,166 --> 00:00:54,433 I can't wait to not only implement that new model Thompson sampling, 21 00:00:54,600 --> 00:00:58,533 but also to see if we're going to beat the UCB mini, 22 00:00:58,633 --> 00:01:03,266 if we're going to manage to catch that best ad in 500 rounds or less. 23 00:01:03,266 --> 00:01:03,933 Okay. 24 00:01:03,933 --> 00:01:08,100 So once again, I give you the slide of the whole Thompson sampling algorithm. 25 00:01:08,333 --> 00:01:10,466 So I really recommend that you open it. 26 00:01:10,466 --> 00:01:13,466 You have the three steps which we will implement together 27 00:01:13,466 --> 00:01:14,733 in the following tutorials. 28 00:01:14,733 --> 00:01:17,933 And once again you will implement the steps on your own 29 00:01:17,933 --> 00:01:19,233 before we do it together. 30 00:01:19,233 --> 00:01:22,866 Because that's the best way to practice and progress in machine learning. 31 00:01:23,133 --> 00:01:24,766 So that's the slide. 32 00:01:24,766 --> 00:01:27,233 And now we're going to go into the Python folder 33 00:01:27,233 --> 00:01:29,233 in which you're going to find two files. 34 00:01:29,233 --> 00:01:33,366 Well the exact same data set after 10,000 rounds. 35 00:01:33,366 --> 00:01:38,300 Or you know the 10,000 users in the rows and then ten ads in the columns. 36 00:01:38,466 --> 00:01:42,900 And so I remind that successively we're going to show ads to each of these users. 37 00:01:43,033 --> 00:01:46,033 And because this data set is a simulation, well, 38 00:01:46,033 --> 00:01:49,033 we know for each of them on which ad they're going to click. 39 00:01:49,133 --> 00:01:53,900 So for example, this user is going to click only on this ad at number eight. 40 00:01:54,200 --> 00:01:54,800 All right. 41 00:01:54,800 --> 00:01:56,566 So that's the same data set. 42 00:01:56,566 --> 00:02:01,900 And now here is the implementation Thompson sampling in the Ipynb format. 43 00:02:01,900 --> 00:02:06,666 So if you're ready let's open it in either Google Colaboratory or Jupyter Notebook. 44 00:02:06,966 --> 00:02:09,200 And I will show you what we're going to do with it. 45 00:02:09,200 --> 00:02:12,433 We won't have actually to re-implement it from scratch 46 00:02:12,433 --> 00:02:17,000 because it is actually very similar to the UCB implementation 47 00:02:17,233 --> 00:02:19,866 in the sense that we have the same data preprocessing, 48 00:02:19,866 --> 00:02:23,300 the same beginning of implementation when implementing the model 49 00:02:23,300 --> 00:02:26,400 in the cell, and the same code for the histogram in the end. 50 00:02:26,400 --> 00:02:30,666 So we will actually only reimplement a part of this cell. 51 00:02:30,666 --> 00:02:32,100 You know, where we have indeed 52 00:02:32,100 --> 00:02:35,100 to implement those three steps of Thompson sampling. 53 00:02:35,133 --> 00:02:35,900 Okay. 54 00:02:35,900 --> 00:02:39,266 So but first of all, as usual this notebook is in read only mode. 55 00:02:39,266 --> 00:02:40,900 That's because you all have access to it. 56 00:02:40,900 --> 00:02:42,866 We can't modify it of course. 57 00:02:42,866 --> 00:02:45,766 Therefore we're going to create a copy here by clicking file. 58 00:02:45,766 --> 00:02:48,433 And then save a copy in drive. 59 00:02:48,433 --> 00:02:49,800 And this will create a copy. 60 00:02:49,800 --> 00:02:52,800 And that's where we will re-implement part 61 00:02:52,800 --> 00:02:56,266 of this implementation where we have something different than before. 62 00:02:56,266 --> 00:02:57,633 With UCB. 63 00:02:57,633 --> 00:02:57,933 All right. 64 00:02:57,933 --> 00:03:01,633 So while speaking of this let's see where we have that difference okay. 65 00:03:01,633 --> 00:03:04,933 So first of all you see that we have the exact same structure as before. 66 00:03:04,933 --> 00:03:07,100 You know we first import the libraries. 67 00:03:07,100 --> 00:03:08,600 Then we import the data set. 68 00:03:08,600 --> 00:03:11,466 Then we implement the Thompson sampling algorithm. 69 00:03:11,466 --> 00:03:14,400 And finally we visualize the results in the histogram. 70 00:03:14,400 --> 00:03:17,500 I'm not going to click on it because we will keep the surprise for the end. 71 00:03:17,733 --> 00:03:18,833 But there you go. 72 00:03:18,833 --> 00:03:20,633 That's the exact same structure. 73 00:03:20,633 --> 00:03:24,900 Now not only this is the exact same structure, but here this is the same code 74 00:03:24,966 --> 00:03:25,666 here as well. 75 00:03:25,666 --> 00:03:26,533 This is the same code. 76 00:03:26,533 --> 00:03:29,533 You know, we imported do it the exact same way. 77 00:03:29,600 --> 00:03:34,100 And here well here we import a different library which is the random library. 78 00:03:34,100 --> 00:03:37,166 And that's because, you know, we will have to work with beta distributions. 79 00:03:37,400 --> 00:03:40,933 You know, when we take a random draw from the beta distributions. 80 00:03:41,133 --> 00:03:45,633 Well, we do this with this random library instead of the math library 81 00:03:45,900 --> 00:03:49,800 in the UCB implementations, which by the way, I've kept here because, 82 00:03:49,800 --> 00:03:54,400 you know, at the end we will compare the two results of Thompson sampling and UCB. 83 00:03:54,833 --> 00:03:57,066 So there you go. We import this library. 84 00:03:57,066 --> 00:03:59,333 Then we have the exact same parameters here. 85 00:03:59,333 --> 00:04:00,900 That's the total number of rounds. 86 00:04:00,900 --> 00:04:05,400 Or you know, the total number of users to whom we show successively the ads. 87 00:04:05,733 --> 00:04:07,166 Then this is the number of ads. 88 00:04:07,166 --> 00:04:10,333 You know, we have ten ads among which we want to find the best one, 89 00:04:10,333 --> 00:04:12,633 you know, the one with the highest conversion rate. 90 00:04:12,633 --> 00:04:16,466 And about that, I insist to remind that there is this important assumption 91 00:04:16,466 --> 00:04:20,466 in the data set that each ad has a fixed conversion rate. 92 00:04:20,633 --> 00:04:23,700 And the goal of our implementation, you know, the goal of what we're doing 93 00:04:23,700 --> 00:04:26,766 here, you know, reinforcement learning with online learning. 94 00:04:26,766 --> 00:04:30,700 Well, the goal is to find that ad having the highest conversion rate. 95 00:04:31,066 --> 00:04:31,600 All right. 96 00:04:31,600 --> 00:04:33,400 So so far exactly the same. 97 00:04:33,400 --> 00:04:35,266 Just that library changes. 98 00:04:35,266 --> 00:04:37,133 You know we're using a different library. 99 00:04:37,133 --> 00:04:38,300 Then here is the same. 100 00:04:38,300 --> 00:04:40,800 We make this variable to prepare. 101 00:04:40,800 --> 00:04:44,433 You know the list of all the ads that will be selected over the rounds. 102 00:04:44,566 --> 00:04:46,900 And we initialized that list as an empty list. 103 00:04:46,900 --> 00:04:48,366 So exactly the same. 104 00:04:48,366 --> 00:04:50,633 Now this is where things change. 105 00:04:50,633 --> 00:04:53,700 So actually we will delete everything from here. 106 00:04:53,700 --> 00:04:55,900 You know we will delete all this cell. 107 00:04:55,900 --> 00:04:58,533 You know all the part of the cell from here. 108 00:04:58,533 --> 00:05:02,133 And we will re-implement all this because all the rest is the same 109 00:05:02,133 --> 00:05:03,633 and same for this histogram. 110 00:05:03,633 --> 00:05:07,166 You know, this is the exact same code as in the UCB. 111 00:05:07,633 --> 00:05:10,900 Basically everything here is the exact same code as in the UCB, 112 00:05:11,133 --> 00:05:14,733 except for this line where we import different library and only what we're 113 00:05:14,733 --> 00:05:18,766 about to implement right now is specific to Thompson sampling. 114 00:05:18,900 --> 00:05:23,666 So that's why I told you to really study first the practical activity of UCB. 115 00:05:23,666 --> 00:05:25,200 If you did not do it already, 116 00:05:25,200 --> 00:05:27,000 because indeed we're going to start from here 117 00:05:27,000 --> 00:05:29,600 instead of re-implementing everything we already did. 118 00:05:29,600 --> 00:05:32,300 So there you go, my friends. Now we're ready to start. 119 00:05:32,300 --> 00:05:35,966 So join me in the next tutorial to implement Thompson Sampling. 120 00:05:35,966 --> 00:05:39,500 And then in another tutorial after that, we will of course visualize 121 00:05:39,500 --> 00:05:42,733 the final result and compare them with UCB. 122 00:05:43,000 --> 00:05:44,166 I can't wait to start! 123 00:05:44,166 --> 00:05:47,500 See you in the next tutorial and until then, enjoy machine learning!