1 00:00:00,233 --> 00:00:02,433 Hello and welcome to this art tutorial. 2 00:00:02,433 --> 00:00:05,633 So in the previous tutorial we implemented Thomson something from scratch. 3 00:00:05,866 --> 00:00:08,833 And now time for the moment we've all been waiting for. 4 00:00:08,833 --> 00:00:12,000 Let's see if Thomson sampling can beat UCB. 5 00:00:12,500 --> 00:00:15,266 So in fact we are ready to execute 6 00:00:15,266 --> 00:00:18,933 this code section here and find out about the final result. 7 00:00:19,200 --> 00:00:24,833 So let's remember the random selection gave us a total reward of 1200 on average. 8 00:00:25,133 --> 00:00:30,033 The UCB algorithm gave us a total reward of 2178. 9 00:00:30,433 --> 00:00:34,733 And now let's see if Thompson sampling can beat that. 10 00:00:35,166 --> 00:00:39,700 Now let's select everything from here to the top. 11 00:00:39,866 --> 00:00:41,900 Because, you know we haven't imported the data set. 12 00:00:41,900 --> 00:00:45,233 So we will execute everything all at once to immediately 13 00:00:45,233 --> 00:00:48,300 get this final result that we were so excited to find out about. 14 00:00:48,566 --> 00:00:51,833 So ready I'm going to press Command plus enter to execute. 15 00:00:52,200 --> 00:00:55,533 And let's see who is the big winner. 16 00:00:56,233 --> 00:00:56,766 Here we go. 17 00:00:56,766 --> 00:01:00,700 And it turns out to be Thompson Sampling 18 00:01:00,700 --> 00:01:05,100 because we got a total reward of 2602. 19 00:01:05,933 --> 00:01:07,733 So we have some random factor. 20 00:01:07,733 --> 00:01:09,633 So let's not scream victory yet. 21 00:01:09,633 --> 00:01:13,700 We are going to execute that again to see the new total reward. 22 00:01:13,700 --> 00:01:16,533 Will we get 2600. Almost. 23 00:01:16,533 --> 00:01:17,433 We can do that again. 24 00:01:17,433 --> 00:01:22,800 And basically it's averaging around 2006 hundred. 25 00:01:23,133 --> 00:01:27,633 So yes definitely it's beating the upper confidence bound algorithm. 26 00:01:27,966 --> 00:01:30,600 And by the way remember that with the UCB algorithm 27 00:01:30,600 --> 00:01:34,233 we almost doubled the total reward of the random selection algorithm. 28 00:01:34,300 --> 00:01:37,833 But now with Thompson sampling we're not only beating the UCB algorithm, 29 00:01:38,033 --> 00:01:41,966 but also we are doing better than doubling the random selection total reward 30 00:01:42,266 --> 00:01:46,466 because we get this 2600 total reward on average, 31 00:01:46,633 --> 00:01:49,466 which is more than the double of 1200. 32 00:01:49,466 --> 00:01:52,866 That was the total reward of the random selection algorithm on average. 33 00:01:53,366 --> 00:01:54,100 So great. 34 00:01:54,100 --> 00:01:56,600 Definitely. Thompson sampling is the big winner. 35 00:01:56,600 --> 00:01:59,266 And now we have last thing to check. 36 00:01:59,266 --> 00:02:01,866 You know remember we need to check that Thompson sampling 37 00:02:01,866 --> 00:02:05,333 also gives us the best ad that has the highest conversion rate. 38 00:02:05,633 --> 00:02:08,633 You know, on which the users of the social network would click the most. 39 00:02:09,033 --> 00:02:12,766 And so we need to make sure that it's also the ad version 40 00:02:12,766 --> 00:02:16,233 number five, which was the adverse and found by the UCB algorithm. 41 00:02:16,600 --> 00:02:21,366 And to check that out very efficiently, we can select this code section here 42 00:02:21,766 --> 00:02:25,000 and execute to look at the histogram. 43 00:02:25,033 --> 00:02:26,200 And here we go. 44 00:02:26,200 --> 00:02:30,833 We also get that the ad version that was most selected is ad version number five. 45 00:02:31,233 --> 00:02:35,400 And by the way in UCB we had some higher bias here if I remember correctly. 46 00:02:35,700 --> 00:02:38,700 But here with Thompson sampling we can clearly see that 47 00:02:38,733 --> 00:02:42,800 this was this ad version number five here that was most selected. 48 00:02:43,100 --> 00:02:45,966 You know, this bar here corresponding to the ad version number 49 00:02:45,966 --> 00:02:48,966 five is clearly dominating the other boys. 50 00:02:49,233 --> 00:02:50,000 And that's because 51 00:02:50,000 --> 00:02:53,866 Thompson sampling quickly figured out which ad is the best to select. 52 00:02:53,900 --> 00:02:54,466 That is it. 53 00:02:54,466 --> 00:02:57,533 Quickly figured out which ad has the best click through rate. 54 00:02:57,666 --> 00:03:01,033 And so now we can congratulate ourselves because we clearly solved 55 00:03:01,033 --> 00:03:04,500 very efficiently this click through rate optimization problem. 56 00:03:04,900 --> 00:03:08,733 And the best algorithm that we found for this is Thompson sampling. 57 00:03:09,466 --> 00:03:09,866 All right. 58 00:03:09,866 --> 00:03:13,200 So congratulations for having implemented these two 59 00:03:13,200 --> 00:03:16,200 algorithms UCB and Thompson sampling. 60 00:03:16,300 --> 00:03:17,733 That's the end of this section. 61 00:03:17,733 --> 00:03:20,733 And that's also the end of this part reinforcement learning. 62 00:03:20,766 --> 00:03:23,066 So I look forward to seeing you in the next part. 63 00:03:23,066 --> 00:03:25,366 Natural language processing. 64 00:03:25,366 --> 00:03:26,900 Until then, enjoy machine learning.