1 00:00:01,266 --> 00:00:04,233 Hello and welcome back to the course on Machine Learning. 2 00:00:04,233 --> 00:00:08,000 I hope you enjoyed the previous tutorials, and now you're quite confident 3 00:00:08,000 --> 00:00:09,600 with the upper confidence bound 4 00:00:09,600 --> 00:00:12,933 and the Thompson sampling algorithm, or at least the intuition behind them. 5 00:00:13,366 --> 00:00:13,800 And today 6 00:00:13,800 --> 00:00:17,633 we're going to quickly compare the two because they do solve the same problem. 7 00:00:17,633 --> 00:00:20,900 They solve the problem of the multi armed bandit. 8 00:00:20,933 --> 00:00:25,433 And let's have a look at some of the pros and cons of each of the algorithms. 9 00:00:25,700 --> 00:00:28,700 So of course there's lots and lots of different 10 00:00:28,800 --> 00:00:31,633 advantages and disadvantages and differences between the two. 11 00:00:31,633 --> 00:00:34,500 But we'll just highlight the main ones. 12 00:00:34,500 --> 00:00:36,400 So here we've got the two algorithms on the left. 13 00:00:36,400 --> 00:00:40,266 We've got the UCB and the image from the intuition tutorial 14 00:00:40,266 --> 00:00:43,733 that will help us to remember what it's what it's about. 15 00:00:43,733 --> 00:00:46,866 And the Thompson sampling algorithm and same the image behind it. 16 00:00:47,666 --> 00:00:51,066 So, the first characteristic is probably 17 00:00:51,600 --> 00:00:55,533 that is different is that, that the UCB is a deterministic algorithm. 18 00:00:55,533 --> 00:00:59,800 And actually there's lots of different, 19 00:00:59,800 --> 00:01:03,833 there's lots of different, modifications to the UCB algorithm. 20 00:01:03,866 --> 00:01:05,100 And you can find them online. 21 00:01:05,100 --> 00:01:06,733 There's lots of different white papers 22 00:01:06,733 --> 00:01:10,200 on how the YouTube algorithm can be modified to improve it and to, 23 00:01:10,533 --> 00:01:14,733 make it better at, one thing or like ads, one advantage made. 24 00:01:14,733 --> 00:01:19,300 It makes this, a bit better a result, but 25 00:01:19,300 --> 00:01:22,466 it makes it more computationally intensive or the other way around and so on. 26 00:01:22,500 --> 00:01:27,633 So, but all of that, all of those algorithms belong to a family, of UCB 27 00:01:27,633 --> 00:01:30,700 algorithms or upper confidence bound algorithms, which are all deterministic. 28 00:01:30,700 --> 00:01:34,500 And basically it what that means is, is that it is very straightforward. 29 00:01:34,500 --> 00:01:36,733 Right? So, once, 30 00:01:38,066 --> 00:01:38,800 you have 31 00:01:38,800 --> 00:01:43,800 a certain round, it is very, it's very straightforward. 32 00:01:43,800 --> 00:01:44,633 What's going to happen. 33 00:01:44,633 --> 00:01:47,866 You just look at the upper confidence boundary, whichever one has the highest. 34 00:01:48,133 --> 00:01:49,500 That's the one you pick. 35 00:01:49,500 --> 00:01:51,966 you pull the algorithm. Yes. 36 00:01:51,966 --> 00:01:53,500 you pull, you pull the lever. Yes. 37 00:01:53,500 --> 00:01:56,933 Then you do get, like a random value from the machine, but that's, 38 00:01:57,266 --> 00:02:01,200 that's that's the it's, that's on the side of the machine. 39 00:02:01,200 --> 00:02:03,633 So that's randomness is on the side of machine. 40 00:02:03,633 --> 00:02:04,433 On the machine. 41 00:02:04,433 --> 00:02:08,766 And then, but then whatever value you get, it is very determined. 42 00:02:08,766 --> 00:02:12,333 It's very deterministic what the UCB is going to do with that value. 43 00:02:12,933 --> 00:02:16,500 and so all of the steps that the UCB actually takes, 44 00:02:16,500 --> 00:02:17,700 they're very deterministic. 45 00:02:17,700 --> 00:02:20,766 They there's no randomness in the algorithm itself. 46 00:02:21,466 --> 00:02:25,333 Whereas on the other hand, the Thompson sampling algorithm is a probabilistic 47 00:02:25,333 --> 00:02:29,366 algorithm because in the algorithm itself it has these distributions 48 00:02:30,033 --> 00:02:33,300 which represent our perception of the world and where 49 00:02:33,300 --> 00:02:39,333 we think the actual expected returns of each of those, machines might lie. 50 00:02:39,533 --> 00:02:43,533 And therefore, every time we are, 51 00:02:43,800 --> 00:02:48,200 implementing or, iterating in the Thompson sampling algorithm, 52 00:02:48,200 --> 00:02:52,566 we actually generate random values from those distributions. So 53 00:02:53,866 --> 00:02:56,866 if you rerun around in the UCB algorithm, 54 00:02:56,900 --> 00:03:00,200 just one given round, after you've received the value from 55 00:03:00,466 --> 00:03:04,000 the previous value from, the machine, and then you rerun 56 00:03:04,000 --> 00:03:06,033 the round, it's always going to be the same result. 57 00:03:06,033 --> 00:03:08,933 Whereas in the Thompson sampling algorithm, after you've received 58 00:03:08,933 --> 00:03:11,933 the previous value from the machine and you rerun the current round, 59 00:03:12,400 --> 00:03:15,400 it's all it's always going to be different because you're always, 60 00:03:15,600 --> 00:03:19,333 sampling from your distributions, 61 00:03:19,500 --> 00:03:22,533 which, characterize your perception of the world. 62 00:03:22,933 --> 00:03:27,200 And, that is a whole different type of algorithm. 63 00:03:27,200 --> 00:03:28,833 It's a probabilistic algorithm. 64 00:03:28,833 --> 00:03:32,566 And those two things, they actually have, a few different, 65 00:03:32,933 --> 00:03:35,866 implications. Right. 66 00:03:35,866 --> 00:03:39,200 So, for instance, one important one is that 67 00:03:39,200 --> 00:03:42,533 the UCB requires an update at every round. 68 00:03:42,533 --> 00:03:42,800 Right? 69 00:03:42,800 --> 00:03:47,933 So, basically the value that you get back from the you, from the machine. 70 00:03:47,933 --> 00:03:50,133 So once you've pulled the lever and you get a value back 71 00:03:50,133 --> 00:03:53,333 from that machine, that value, you have to incorporate it right away 72 00:03:53,533 --> 00:03:55,166 in order to proceed to the next round. 73 00:03:55,166 --> 00:04:00,633 You cannot proceed to the next round until you have, incorporated that value 74 00:04:00,633 --> 00:04:05,300 until you have made an adjustment, to the algorithm based on that value. 75 00:04:06,400 --> 00:04:07,300 because if you don't make the 76 00:04:07,300 --> 00:04:10,866 adjustment, then nothing changes and you're going to be stuck. 77 00:04:11,233 --> 00:04:15,133 Whereas in the Thompson sampling, it can accommodate delayed feedback. 78 00:04:15,133 --> 00:04:16,166 And this is very important. 79 00:04:16,166 --> 00:04:19,566 This basically means that if you pull the lever and you only will get, 80 00:04:19,866 --> 00:04:22,966 the result, or you will only know the result 81 00:04:22,966 --> 00:04:26,700 of pulling the lever, 500 rounds down the track, right? 82 00:04:26,733 --> 00:04:27,366 Not right away. 83 00:04:27,366 --> 00:04:30,466 You'll only get to know the result 500 rounds later. 84 00:04:30,866 --> 00:04:32,733 The Thompson sampling algorithm will still work. 85 00:04:32,733 --> 00:04:33,533 Why will it still work? 86 00:04:33,533 --> 00:04:36,533 Because, if you now run, 87 00:04:36,666 --> 00:04:39,533 the algorithm without even updating your perception of the world, 88 00:04:39,533 --> 00:04:42,666 you still going to get a new, 89 00:04:42,666 --> 00:04:45,666 set of hypothetical, 90 00:04:45,733 --> 00:04:46,633 bandits, right? 91 00:04:46,633 --> 00:04:50,466 You're going to generate a new a new expected return for every band 92 00:04:50,500 --> 00:04:53,566 because you are generating them in a probabilistic manner. 93 00:04:53,800 --> 00:04:57,900 And this is very important to understand because, this gives the Thompson sampling 94 00:04:57,900 --> 00:04:58,700 that advantage, 95 00:04:58,700 --> 00:05:02,433 that you don't have to update the algorithm with the result every time. 96 00:05:02,633 --> 00:05:06,300 And in terms of, I mean, in terms of vendors, of course, it doesn't, 97 00:05:07,000 --> 00:05:10,400 really matter that much because if you're playing in the casino and, 98 00:05:10,966 --> 00:05:14,033 or if some hypothetical person is playing in the casino 99 00:05:14,033 --> 00:05:15,733 and they're pulling these levers, 100 00:05:15,733 --> 00:05:18,566 they get see the results right away so they could update the algorithm. 101 00:05:18,566 --> 00:05:23,100 But in terms of websites and ads, that is a big deal, right? 102 00:05:23,100 --> 00:05:26,300 So not even just, not even just, 103 00:05:27,133 --> 00:05:29,833 displaying ads, on, on a website. 104 00:05:29,833 --> 00:05:34,300 Or you could use this for like AB testing different instead of AB testing. 105 00:05:34,600 --> 00:05:38,100 the different, layouts of your website. 106 00:05:38,100 --> 00:05:38,500 Right. 107 00:05:38,500 --> 00:05:41,633 You could, you could use, a Thompson sampling algorithm to, 108 00:05:41,900 --> 00:05:46,500 have that power, balance between exploitation, exploration. 109 00:05:46,500 --> 00:05:47,100 Right away. 110 00:05:47,100 --> 00:05:50,766 So basically anything you're doing on the web with Thompson 111 00:05:50,766 --> 00:05:55,233 sampling or like, solving a, multi-armed bandit problem 112 00:05:55,633 --> 00:05:58,633 for your business or for a business on on the web. 113 00:05:59,300 --> 00:06:04,366 you you're getting all these thousands and tens of thousands of clicks, right? 114 00:06:04,366 --> 00:06:08,166 And to update the algorithm right away, that would be very 115 00:06:08,200 --> 00:06:12,033 highly computationally, or computationally costly. 116 00:06:12,666 --> 00:06:16,166 or it might require additional resources and, complex process. 117 00:06:16,166 --> 00:06:17,833 Whereas if you, 118 00:06:17,833 --> 00:06:20,400 what is what the Thompson sampling algorithm allows you to do is to 119 00:06:20,400 --> 00:06:24,600 update your data set or your information algorithm in a batch manner. 120 00:06:24,600 --> 00:06:28,566 So you you wait until you get 500 clicks, or you wait until you get 5000 clicks, 121 00:06:28,733 --> 00:06:30,800 you update the algorithm, then you let it run 122 00:06:30,800 --> 00:06:34,033 and then it runs, runs, runs, and then you get another 5000 clicks and then, 123 00:06:34,633 --> 00:06:37,266 you update the algorithm again and it will still work. 124 00:06:37,266 --> 00:06:39,833 And that's a very, important thing. 125 00:06:39,833 --> 00:06:44,033 That's that flexibility that the Thompson sampling algorithm creates. 126 00:06:44,866 --> 00:06:48,166 And, finally, again, we're not going to go into too much detail 127 00:06:48,166 --> 00:06:51,300 on the pros and cons, but, the, 128 00:06:51,900 --> 00:06:54,300 Thompson sampling algorithm is actually 129 00:06:54,300 --> 00:06:56,966 it has better empirical evidence than they used to be. 130 00:06:56,966 --> 00:07:00,766 And, they you will find this phrase better empirical evidence. 131 00:07:00,766 --> 00:07:05,366 And that's because up until recently, the, 132 00:07:05,533 --> 00:07:08,933 the theory behind the Thompson sampling algorithm or, 133 00:07:09,166 --> 00:07:12,000 you know, the whole research wasn't it wasn't complete. 134 00:07:12,000 --> 00:07:17,100 It's only been researched very detail in a lot of detail just a few years ago. 135 00:07:17,400 --> 00:07:20,966 And, you know, now you can find, like, a lot of information on the Thompson 136 00:07:20,966 --> 00:07:23,966 sampling algorithm, but previously people just see that, 137 00:07:24,066 --> 00:07:26,800 from experimental evidence, 138 00:07:26,800 --> 00:07:29,900 the Thompson sampling algorithm does work better than the UCB. 139 00:07:29,900 --> 00:07:32,000 And that's exactly what we're going to see. 140 00:07:32,000 --> 00:07:34,033 spoiler alert. 141 00:07:34,033 --> 00:07:35,666 That's exactly what you're going to see 142 00:07:35,666 --> 00:07:39,700 in the, practical tutorials for this section. 143 00:07:39,700 --> 00:07:40,700 So we're going to be now 144 00:07:40,700 --> 00:07:44,200 coding the Or Hudlin will walk you through coding the same, 145 00:07:44,666 --> 00:07:48,300 exercise, the same problem that we had previously with the UCB. 146 00:07:48,300 --> 00:07:49,200 Now we're going to be 147 00:07:49,200 --> 00:07:52,500 or you're going to be solving it with, the Thompson sampling algorithm. 148 00:07:52,700 --> 00:07:56,100 And you will see actually some very interesting results. 149 00:07:56,333 --> 00:07:57,733 So we'll leave it at that. 150 00:07:57,733 --> 00:08:00,266 And, I hope you enjoyed these intuition tutorials. 151 00:08:00,266 --> 00:08:03,433 And off we go to the practical side of things. 152 00:08:03,600 --> 00:08:04,800 Can't wait for you to get started. 153 00:08:04,800 --> 00:08:08,900 Had lunch will show you all around and I'll see you back here next time. 154 00:08:08,933 --> 00:08:10,733 And until then, enjoy machine learning.