1 00:00:00,166 --> 00:00:00,966 Okay, my friends. 2 00:00:00,966 --> 00:00:01,266 Are you. 3 00:00:01,266 --> 00:00:02,700 Ready to now implement. 4 00:00:02,700 --> 00:00:04,733 The step three of this upper. 5 00:00:04,733 --> 00:00:06,366 Confidence bound implementation? 6 00:00:06,366 --> 00:00:09,300 We already implemented step two, and we even got. 7 00:00:09,300 --> 00:00:11,533 These values in for each of the ads. 8 00:00:11,533 --> 00:00:13,833 And now we need. To use a trick to. 9 00:00:13,833 --> 00:00:15,900 Find the maximum of them. 10 00:00:15,900 --> 00:00:16,400 All right. 11 00:00:16,400 --> 00:00:17,333 So in the. 12 00:00:17,333 --> 00:00:21,133 Last tutorial we left things at this else you know else. 13 00:00:21,166 --> 00:00:24,000 Meaning if the numbers of selections of. 14 00:00:24,000 --> 00:00:25,966 This. Ad number I we're dealing with right. 15 00:00:25,966 --> 00:00:28,600 Now in the second full loop is equal to zero. 16 00:00:28,600 --> 00:00:33,066 Meaning if the ad was not selected yet, well, in that case, 17 00:00:33,066 --> 00:00:36,866 what we have to do is absolutely select it, right? 18 00:00:36,866 --> 00:00:40,766 We must select it if the ad has not been selected yet, we have to select it. 19 00:00:41,033 --> 00:00:41,966 Why is that? 20 00:00:41,966 --> 00:00:45,300 That's because, as we explained at the end of the previous tutorial. 21 00:00:45,366 --> 00:00:49,833 We need to make sure that the denominator and I n, which is the number of times 22 00:00:49,833 --> 00:00:52,300 the ad. I was selected, is different. 23 00:00:52,300 --> 00:00:54,600 Than. Zero, so that. We can indeed compute. 24 00:00:54,600 --> 00:00:55,666 The average reward. 25 00:00:55,666 --> 00:00:58,300 And then the upper confidence bounds. 26 00:00:58,300 --> 00:00:59,400 So there is a priority. 27 00:00:59,400 --> 00:01:02,266 Now, you know, at the beginning, you know in the first round. 28 00:01:02,266 --> 00:01:04,500 To select all the ads. 29 00:01:04,500 --> 00:01:08,566 And therefore the trick that we have to use is to set 30 00:01:08,733 --> 00:01:11,866 the upper bound of this particular ad we're dealing with. 31 00:01:11,866 --> 00:01:12,933 Right now in the second for. 32 00:01:12,933 --> 00:01:16,466 Loop to a super high value, so that it will be. 33 00:01:16,466 --> 00:01:19,466 Indeed the maximum upper bound, and therefore the one that will be. 34 00:01:19,466 --> 00:01:20,966 Selected, because at the. End we. 35 00:01:20,966 --> 00:01:22,900 Will of course select the ad with the highest. 36 00:01:22,900 --> 00:01:24,300 Upper confidence bound. 37 00:01:24,300 --> 00:01:25,766 So the trick right now. 38 00:01:25,766 --> 00:01:30,000 Is to take that variable again, which so far thanks to this. 39 00:01:30,000 --> 00:01:31,966 If condition is. Equal to this value. 40 00:01:31,966 --> 00:01:34,866 But if we're not in this if condition and if we are in this. 41 00:01:34,866 --> 00:01:37,800 Else, well we would like. That same variable. 42 00:01:37,800 --> 00:01:40,566 To be equal to a super high value. 43 00:01:40,566 --> 00:01:42,133 Like, you know, infinity. 44 00:01:42,133 --> 00:01:44,366 But we can't set it equal to infinity. 45 00:01:44,366 --> 00:01:45,100 However, we can. 46 00:01:45,100 --> 00:01:47,333 Set it equal to a super high value like. 47 00:01:47,333 --> 00:01:49,100 One times ten at the. 48 00:01:49,100 --> 00:01:51,000 Power of 400. 49 00:01:51,000 --> 00:01:51,633 That's a classic. 50 00:01:51,633 --> 00:01:54,633 Trick in Python to, you know, use the infinity. 51 00:01:54,766 --> 00:01:57,100 This is a super high value, which will. 52 00:01:57,100 --> 00:01:59,100 Definitely be the maximum of the upper. 53 00:01:59,100 --> 00:02:03,900 Bound, so that if that ad was not selected yet, well, we will select it because. 54 00:02:03,900 --> 00:02:06,966 Indeed it will have the maximum of the upper confidence bounds. 55 00:02:07,433 --> 00:02:09,100 And now we need to finish. 56 00:02:09,100 --> 00:02:12,266 With a. Final if condition to make sure. 57 00:02:12,300 --> 00:02:14,566 That indeed, we select the ad with the highest. 58 00:02:14,566 --> 00:02:16,100 Upper confidence bound. 59 00:02:16,100 --> 00:02:19,100 And the trick. To do that is to play with. 60 00:02:19,200 --> 00:02:20,700 Max upper bound here, which. 61 00:02:20,700 --> 00:02:22,133 So far is initialized to. Zero. 62 00:02:22,133 --> 00:02:25,066 You know, before we. Start this. Second loop and. 63 00:02:25,066 --> 00:02:28,200 That. Upper bound and the if condition that. 64 00:02:28,200 --> 00:02:33,133 We have to add here is to check if, well, the upper. 65 00:02:33,133 --> 00:02:35,433 Bound. Of the ad. 66 00:02:35,433 --> 00:02:38,966 We're dealing with right now in the second full loop in which we've just computed. 67 00:02:39,133 --> 00:02:40,500 Either through this first if. 68 00:02:40,500 --> 00:02:42,433 Condition or in this else. 69 00:02:42,433 --> 00:02:45,100 We have to check if that upper. Bound. 70 00:02:45,100 --> 00:02:46,600 Is larger. 71 00:02:46,600 --> 00:02:49,600 Than the maximum upper bound. 72 00:02:49,800 --> 00:02:50,300 Right. 73 00:02:50,300 --> 00:02:52,633 Because okay. At the beginning, you know, before. 74 00:02:52,633 --> 00:02:55,200 We start. This full loop, max upper bound is equal to. Zero. 75 00:02:55,200 --> 00:02:56,600 Then we will compute. 76 00:02:56,600 --> 00:03:00,866 This for the first ad we will get either value if this ad had. 77 00:03:00,866 --> 00:03:03,466 Already been. Selected, or we will get this value. 78 00:03:03,466 --> 00:03:05,600 And of course since this or this will be larger. 79 00:03:05,600 --> 00:03:06,800 Than. Zero. 80 00:03:06,800 --> 00:03:09,133 Then this max upper bound will. 81 00:03:09,133 --> 00:03:12,100 Be updated. Right? And this is exactly. Our next step. Here. 82 00:03:12,100 --> 00:03:15,466 If indeed upper bound is larger than this maximum upper bound. 83 00:03:15,733 --> 00:03:17,866 Well we need to update. 84 00:03:17,866 --> 00:03:20,666 That max upper bound. 85 00:03:20,666 --> 00:03:21,500 Value. 86 00:03:21,500 --> 00:03:23,766 There we go to be equal to. 87 00:03:23,766 --> 00:03:26,266 That new upper bound. 88 00:03:26,266 --> 00:03:28,600 Just. Computed. Either through this if. 89 00:03:28,600 --> 00:03:30,233 Condition. Or this else. 90 00:03:30,233 --> 00:03:30,833 All right. 91 00:03:30,833 --> 00:03:32,533 So upper bound okay. 92 00:03:32,533 --> 00:03:34,966 And then in the next step. Of this. Second for. 93 00:03:34,966 --> 00:03:35,700 Loop well. 94 00:03:35,700 --> 00:03:38,100 We will compute a new. Value of the upper. 95 00:03:38,100 --> 00:03:40,800 Bound here. If you know the. Was already selected. 96 00:03:40,800 --> 00:03:42,900 And if this new value of the upper bound. 97 00:03:42,900 --> 00:03:43,833 Is larger. 98 00:03:43,833 --> 00:03:46,466 Than this new max upper bound, which. 99 00:03:46,466 --> 00:03:49,800 Was just updated to the previous upper bound of the previous add. 100 00:03:49,966 --> 00:03:51,800 Well, we will update it again. 101 00:03:51,800 --> 00:03:54,400 You see the you see the trick each time. 102 00:03:54,400 --> 00:03:58,266 You know, at each iteration of this second loop, we compute a new upper bound. 103 00:03:58,500 --> 00:04:00,700 We compare. That upper. Bound with. 104 00:04:00,700 --> 00:04:02,133 The largest upper. 105 00:04:02,133 --> 00:04:04,466 Bound collected so far, you know, with. The previous. Ads. 106 00:04:04,466 --> 00:04:08,333 And if that new upper bound is larger than this maximum upper bound, well. We. 107 00:04:08,333 --> 00:04:10,833 Update the new maximum upper bound. 108 00:04:10,833 --> 00:04:13,566 And of course. For the ads that. Have not been selected. 109 00:04:13,566 --> 00:04:16,100 Yet, this. Upper bound. Will always be. 110 00:04:16,100 --> 00:04:17,533 Larger than the max upper bound. 111 00:04:17,533 --> 00:04:20,100 And therefore this ad will be selected. 112 00:04:20,100 --> 00:04:24,433 And speaking of this, as will be selected, well, that's exactly the final step 113 00:04:24,433 --> 00:04:25,500 we have to do here. 114 00:04:25,500 --> 00:04:27,300 We have to select the add. 115 00:04:27,300 --> 00:04:28,700 In in order. To select it. 116 00:04:28,700 --> 00:04:32,866 Well we need to update that variable here at equal zero to. 117 00:04:33,000 --> 00:04:35,766 Of course I. You know the index. 118 00:04:35,766 --> 00:04:37,800 I love that ad. 119 00:04:37,800 --> 00:04:40,500 We're dealing with right now in the second for loop. 120 00:04:40,500 --> 00:04:41,766 And that's my friend. 121 00:04:41,766 --> 00:04:42,333 That's how. 122 00:04:42,333 --> 00:04:43,900 You implement step. Three. 123 00:04:43,900 --> 00:04:46,000 By making sure at the same time. 124 00:04:46,000 --> 00:04:47,433 That you select the ad. 125 00:04:47,433 --> 00:04:49,533 That was not. Selected yet. 126 00:04:49,533 --> 00:04:51,800 And at some point, you know, after a couple of rounds, will all. 127 00:04:51,800 --> 00:04:52,833 The ads will be selected. 128 00:04:52,833 --> 00:04:55,366 Well, you know, actually, after the ten first rounds, all. 129 00:04:55,366 --> 00:04:57,366 The ads. Will automatically. Be selected. 130 00:04:57,366 --> 00:04:59,866 And then we will only be in this condition. 131 00:04:59,866 --> 00:05:02,866 You know, this else condition will never happen after the ten first round. 132 00:05:02,900 --> 00:05:05,766 Okay. So there you go. Now congratulations. 133 00:05:05,766 --> 00:05:07,800 This step three is implemented. 134 00:05:07,800 --> 00:05:08,866 We selected the eye that. 135 00:05:08,866 --> 00:05:11,000 Has a maximum upper confidence bound. 136 00:05:11,000 --> 00:05:13,533 And now what we simply need to. Do is just. 137 00:05:13,533 --> 00:05:14,933 Finish. This. 138 00:05:14,933 --> 00:05:16,200 Main code here. You know this. 139 00:05:16,200 --> 00:05:19,200 Cell by. Going back to. This first. 140 00:05:19,533 --> 00:05:20,166 For loop. 141 00:05:20,166 --> 00:05:22,066 You know iterating through the rounds, you know, 142 00:05:22,066 --> 00:05:24,666 through the users who connect to the website. 143 00:05:24,666 --> 00:05:25,200 And, well, the. 144 00:05:25,200 --> 00:05:29,700 Way to do this is just to update each of these variables, which we created at the. 145 00:05:29,700 --> 00:05:31,666 Beginning, you know, before starting the for. Loop. 146 00:05:31,666 --> 00:05:33,200 To indeed get, you know, that. 147 00:05:33,200 --> 00:05:35,233 Full list of all the ads selected. 148 00:05:35,233 --> 00:05:36,366 Over the round. 149 00:05:36,366 --> 00:05:37,033 Then of course. 150 00:05:37,033 --> 00:05:39,166 Update this variable to update. 151 00:05:39,166 --> 00:05:41,600 The number of selections for. Each of the add. 152 00:05:41,600 --> 00:05:44,900 Then of course update this one, which gives the accumulated rewards 153 00:05:44,900 --> 00:05:46,066 for each of the ads. 154 00:05:46,066 --> 00:05:50,400 And finally update the total accumulated reward over the rounds. 155 00:05:50,600 --> 00:05:52,633 All right, so I will let you do this. 156 00:05:52,633 --> 00:05:53,933 Please try to do it yourself. 157 00:05:53,933 --> 00:05:56,900 Before the next tutorial and we will implement this solution 158 00:05:56,900 --> 00:06:01,500 together in the next tutorial, which will at the same time finish and complete. 159 00:06:01,500 --> 00:06:02,666 That cell. 160 00:06:02,666 --> 00:06:05,666 Implementing the upper confidence bound algorithm. 161 00:06:06,033 --> 00:06:06,433 All right. 162 00:06:06,433 --> 00:06:07,033 So good luck. 163 00:06:07,033 --> 00:06:10,166 Again and I'll see you in the next tutorial for the solution. 164 00:06:10,300 --> 00:06:12,166 And until then enjoy machine learning.