1 00:00:00,100 --> 00:00:03,166 All right, my friends, let's finish this implementation. 2 00:00:03,166 --> 00:00:04,800 Indeed, we're very close to the end. 3 00:00:04,800 --> 00:00:08,133 We have already implemented step one, step two, and step three, 4 00:00:08,133 --> 00:00:11,433 and now we just need to finish this by, you know, updating. 5 00:00:11,433 --> 00:00:13,433 These variables that were. Created and. 6 00:00:13,433 --> 00:00:16,666 Initialized right before this big four loop, iterating 7 00:00:16,666 --> 00:00:18,966 through the rounds, you know, through the users. 8 00:00:18,966 --> 00:00:21,066 All right. So let's start simply with this one. 9 00:00:21,066 --> 00:00:26,200 Add selected, which is the full list of all the ads selected over the rounds. 10 00:00:26,500 --> 00:00:28,166 So let's bear here. 11 00:00:28,166 --> 00:00:29,933 You know here I'm back inside. 12 00:00:29,933 --> 00:00:32,366 This first for loop iterating through. Rounds. 13 00:00:32,366 --> 00:00:33,133 And right now. 14 00:00:33,133 --> 00:00:36,833 How do you think we must update this add selected variable. 15 00:00:37,133 --> 00:00:42,633 Well simply of course we have to add to this list the add that was just selected. 16 00:00:42,633 --> 00:00:44,566 You know right here you know in this. 17 00:00:44,566 --> 00:00:45,800 Second for loop. 18 00:00:45,800 --> 00:00:46,300 All right. 19 00:00:46,300 --> 00:00:48,733 And you know. The trick there is this append. 20 00:00:48,733 --> 00:00:49,900 Function in Python that. 21 00:00:49,900 --> 00:00:52,466 Allows to add. An element to a. List. 22 00:00:52,466 --> 00:00:54,300 And well that's. Exactly what we're going to. Use. 23 00:00:54,300 --> 00:00:56,066 Now we're going to add a dot here. 24 00:00:56,066 --> 00:00:59,066 Then add append for the append function. 25 00:00:59,266 --> 00:01:02,733 And inside this function you have to input the element which you want to add. 26 00:01:02,733 --> 00:01:04,700 To this add selected list. Right. 27 00:01:04,700 --> 00:01:05,066 And that's of. 28 00:01:05,066 --> 00:01:10,633 Course at you know that add variable that was computed here in this second folder. 29 00:01:11,166 --> 00:01:11,466 All right. 30 00:01:11,466 --> 00:01:14,633 So that's how you update this first variable add selected. 31 00:01:15,033 --> 00:01:16,400 Then next step. 32 00:01:16,400 --> 00:01:19,133 Well next step. Is simply to update this. Variable. 33 00:01:19,133 --> 00:01:20,300 Now number. 34 00:01:20,300 --> 00:01:22,700 Of selections which is a list. 35 00:01:22,700 --> 00:01:25,500 Of ten. Values. Corresponding to the ten. Add. 36 00:01:25,500 --> 00:01:27,300 And for each of these values well you have. 37 00:01:27,300 --> 00:01:28,866 To number of times the. 38 00:01:28,866 --> 00:01:31,233 Add was selected up to. 39 00:01:31,233 --> 00:01:33,166 Round n that same round. 40 00:01:33,166 --> 00:01:35,900 And we're dealing with right now in this first for loop. 41 00:01:35,900 --> 00:01:38,900 All right. So let's paste it right here. 42 00:01:38,900 --> 00:01:39,900 And now according to you. 43 00:01:39,900 --> 00:01:42,300 How are. We going to. Update this variable. 44 00:01:43,300 --> 00:01:45,100 Well since this. Add. 45 00:01:45,100 --> 00:01:46,766 You know. Was just selected. 46 00:01:46,766 --> 00:01:49,300 In this new round N well therefore we need to. 47 00:01:49,300 --> 00:01:52,466 Update the. Element of index. Add. 48 00:01:52,666 --> 00:01:55,233 Basically the index of the. Add was just selected. 49 00:01:55,233 --> 00:01:57,166 Inside this numbers of selections list. 50 00:01:57,166 --> 00:01:57,766 We need to. 51 00:01:57,766 --> 00:01:58,533 Update that. 52 00:01:58,533 --> 00:02:01,100 Element by incrementing it. By one. 53 00:02:01,100 --> 00:02:03,300 All right. So here. Not only we need to take. This. 54 00:02:03,300 --> 00:02:06,700 List numbers of selections, but we need to take mostly. The. 55 00:02:07,100 --> 00:02:10,166 Element of index add meaning the index of the. 56 00:02:10,166 --> 00:02:11,966 Add that was just selected. 57 00:02:11,966 --> 00:02:14,133 And then you have two ways to do this. 58 00:02:14,133 --> 00:02:17,233 The first classic way is doing a plus equals. 59 00:02:17,533 --> 00:02:20,566 One which increments this particular number. 60 00:02:20,566 --> 00:02:24,833 You know, the element of index add inside this numbers of selections list by one. 61 00:02:25,033 --> 00:02:27,500 Or you know, if you don't like this notation, you can. 62 00:02:27,500 --> 00:02:29,100 Simply do a qual. 63 00:02:29,100 --> 00:02:34,100 Then you copy this again, you paste it. 64 00:02:34,100 --> 00:02:35,933 Here and you just add. Plus. 65 00:02:35,933 --> 00:02:37,300 One as you want. 66 00:02:37,300 --> 00:02:39,533 That's exactly the. Same depends on how. 67 00:02:39,533 --> 00:02:41,033 You prefer to see it. 68 00:02:41,033 --> 00:02:43,766 Okay then. Let's update this variable. 69 00:02:43,766 --> 00:02:44,433 Sums. 70 00:02:44,433 --> 00:02:48,700 Of rewards, which is the accumulated reward for each of the add. 71 00:02:48,700 --> 00:02:51,700 In the same list of ten elements corresponding to the. Add. 72 00:02:51,933 --> 00:02:54,433 So I'm copying this. 73 00:02:54,433 --> 00:02:56,400 Then I'm facing here. 74 00:02:56,400 --> 00:02:58,533 Then of course, what we need to change inside this. 75 00:02:58,533 --> 00:03:01,366 List is that. Element of index add. 76 00:03:01,366 --> 00:03:02,533 You know that index of. 77 00:03:02,533 --> 00:03:04,666 The add that was just selected. 78 00:03:04,666 --> 00:03:05,333 So there we go. 79 00:03:05,333 --> 00:03:08,166 Let's here add in a pair of square brackets. 80 00:03:08,166 --> 00:03:11,100 Add the index add. That was just selected. 81 00:03:11,100 --> 00:03:14,100 And then same we're going to update it again. 82 00:03:14,166 --> 00:03:16,333 You know we're going to take that. 83 00:03:16,333 --> 00:03:19,766 And then according. To you what do we have to add to this. 84 00:03:19,766 --> 00:03:20,733 Sums of rewards. 85 00:03:20,733 --> 00:03:22,466 And especially to this. Element of index. 86 00:03:22,466 --> 00:03:24,966 Add inside this sums of rewards list. 87 00:03:24,966 --> 00:03:30,900 Well that's exactly the reward that we got by selecting this add. 88 00:03:30,900 --> 00:03:33,733 You know. This add which had. The maximum upper bound. 89 00:03:33,733 --> 00:03:35,333 And where do we have these rewards. 90 00:03:35,333 --> 00:03:36,533 You know for each of the add. 91 00:03:36,533 --> 00:03:39,333 Well that's of course in our data set right. 92 00:03:39,333 --> 00:03:43,100 This data set is simulation telling us for each user. 93 00:03:43,200 --> 00:03:45,000 On which. Add they're. Going to click. 94 00:03:45,000 --> 00:03:46,500 You know we don't know. That in reality. 95 00:03:46,500 --> 00:03:48,800 But this data set is a simulation. 96 00:03:48,800 --> 00:03:50,733 And therefore we have for each user. 97 00:03:50,733 --> 00:03:53,333 And for. Each ad, if the user clicks yes. 98 00:03:53,333 --> 00:03:54,733 Or no on the add. 99 00:03:54,733 --> 00:03:56,866 And since now we know with. 100 00:03:56,866 --> 00:03:59,300 Which user we're dealing with thanks to this. 101 00:03:59,300 --> 00:04:01,700 You know realm, because the round here corresponds. 102 00:04:01,700 --> 00:04:03,000 To the user. 103 00:04:03,000 --> 00:04:04,466 And since we also know which. 104 00:04:04,466 --> 00:04:05,933 Ad was selected. 105 00:04:05,933 --> 00:04:06,400 You know. 106 00:04:06,400 --> 00:04:10,466 Because of this second for loop, well, we can directly access 107 00:04:10,700 --> 00:04:13,933 the reward that was just received at this particular round. 108 00:04:13,933 --> 00:04:16,733 And for this particular ad that was selected. 109 00:04:16,733 --> 00:04:20,600 And the way to do this is simply to take here our data set. 110 00:04:21,000 --> 00:04:23,000 Right, because this. 111 00:04:23,000 --> 00:04:24,033 Is the original data set. 112 00:04:24,033 --> 00:04:25,533 But remember. That we created the. 113 00:04:25,533 --> 00:04:28,200 Pandas dataframe in this. Data set variable. 114 00:04:28,200 --> 00:04:30,200 So we need to take our data set. 115 00:04:30,200 --> 00:04:31,066 And then remember we need. 116 00:04:31,066 --> 00:04:35,533 To add a values in order to access a particular value of this data set. 117 00:04:35,900 --> 00:04:39,200 And then we need to enter inside some pair of square brackets. 118 00:04:39,433 --> 00:04:40,633 Well first. 119 00:04:40,633 --> 00:04:41,566 The index of the. 120 00:04:41,566 --> 00:04:44,500 Row of. The cell we want to access. Which is of course. 121 00:04:44,500 --> 00:04:47,933 N because n corresponds to the user meaning the. Row. 122 00:04:48,166 --> 00:04:50,533 And then we need to enter. The index of. 123 00:04:50,533 --> 00:04:52,566 The column of the cell we're dealing with. 124 00:04:52,566 --> 00:04:55,433 Which is of course the ad that was selected. 125 00:04:55,433 --> 00:04:57,000 Because now we need to get. 126 00:04:57,000 --> 00:05:02,200 The reward of the ad that we selected for the particular user we're dealing. 127 00:05:02,200 --> 00:05:04,133 With right now in this. First for loop. 128 00:05:04,133 --> 00:05:04,500 And that's. 129 00:05:04,500 --> 00:05:06,700 Exactly data set built values. 130 00:05:06,700 --> 00:05:08,366 And. AD. Okay. 131 00:05:08,366 --> 00:05:10,333 And now I'm just going to do something so that. 132 00:05:10,333 --> 00:05:14,666 It's really clear to you that this is the reward we got at this. 133 00:05:14,666 --> 00:05:15,366 Particular round. 134 00:05:15,366 --> 00:05:18,766 N and in order to do this I'm going to take that here. 135 00:05:19,100 --> 00:05:22,000 Cut it. And right. 136 00:05:22,000 --> 00:05:25,700 Between these two lines of code I'm going to create a new variable 137 00:05:25,700 --> 00:05:27,600 which I'm going to. Call reward. 138 00:05:27,600 --> 00:05:28,633 You know, so that I can really. 139 00:05:28,633 --> 00:05:31,500 Highlight that this is the reward. 140 00:05:31,500 --> 00:05:33,700 You know, the reward collected after. 141 00:05:33,700 --> 00:05:34,400 Showing this. 142 00:05:34,400 --> 00:05:37,400 Ad to this user. N this is the reward. 143 00:05:37,400 --> 00:05:39,933 And therefore here I'm just going to add reward. 144 00:05:39,933 --> 00:05:41,700 All right. So that you can. Clearly see that. 145 00:05:41,700 --> 00:05:44,133 Very important concept in reinforcement learning. 146 00:05:44,133 --> 00:05:44,400 You know. 147 00:05:44,400 --> 00:05:47,400 It's all about the reward the reward that is collected at. 148 00:05:47,400 --> 00:05:49,900 Each round and then the. Accumulated reward. 149 00:05:49,900 --> 00:05:50,500 And speaking. 150 00:05:50,500 --> 00:05:53,000 Of the accumulated reward, well, that's exactly. 151 00:05:53,000 --> 00:05:54,533 Our next. Step here, because. 152 00:05:54,533 --> 00:05:54,933 Indeed. 153 00:05:54,933 --> 00:05:59,033 The last variable which we have to update is that variable. 154 00:05:59,166 --> 00:06:00,300 Computing the. 155 00:06:00,300 --> 00:06:03,233 Total reward we get up to. Around n. 156 00:06:03,233 --> 00:06:05,433 And now you will perfectly know how. 157 00:06:05,433 --> 00:06:08,900 To update that total reward variable. 158 00:06:08,900 --> 00:06:11,266 According to you. What do we have to do here? 159 00:06:11,266 --> 00:06:12,566 Well, you know. 160 00:06:12,566 --> 00:06:16,066 We just need to add to this total reward variable 161 00:06:16,500 --> 00:06:20,166 that last reward which we just got at. 162 00:06:20,166 --> 00:06:21,266 This round, that meaning. 163 00:06:21,266 --> 00:06:23,833 This reward of selecting the ad. 164 00:06:23,833 --> 00:06:26,766 And showing it. To the user. N all right. 165 00:06:26,766 --> 00:06:27,966 And that's it my friends. 166 00:06:27,966 --> 00:06:28,500 Now the. 167 00:06:28,500 --> 00:06:31,133 UCB. Algorithm is. Fully. 168 00:06:31,133 --> 00:06:32,400 Implemented. 169 00:06:32,400 --> 00:06:35,833 If this sounds a bit overwhelming, well, I really encourage you. 170 00:06:35,833 --> 00:06:36,566 To try. 171 00:06:36,566 --> 00:06:37,466 To implement that. 172 00:06:37,466 --> 00:06:39,900 Fully again, because really you just have to follow the. 173 00:06:39,900 --> 00:06:42,000 Logic and then everything makes sense. 174 00:06:42,000 --> 00:06:43,400 So I understand that, you know. 175 00:06:43,400 --> 00:06:44,933 This is the first. Time we implement. 176 00:06:44,933 --> 00:06:46,500 Such a. Code because, you know, so far. 177 00:06:46,500 --> 00:06:48,300 We've only worked with libraries. 178 00:06:48,300 --> 00:06:50,566 It was simpler before, but you also. 179 00:06:50,566 --> 00:06:53,566 Need to know how to implement such algorithms from scratch. 180 00:06:53,766 --> 00:06:55,566 And so here it's. Really good that we did it. 181 00:06:55,566 --> 00:06:58,033 But don't worry if it feels a bit overwhelming. 182 00:06:58,033 --> 00:06:59,633 You just. Have to. You know. 183 00:06:59,633 --> 00:07:01,400 Either rewatch the videos or try to. 184 00:07:01,400 --> 00:07:03,533 Re-Implement this from scratch yourself. 185 00:07:03,533 --> 00:07:04,300 And I promise you. 186 00:07:04,300 --> 00:07:07,300 That this. Will sound easy. Peasy. Okay. 187 00:07:07,300 --> 00:07:08,500 So no worries. 188 00:07:08,500 --> 00:07:09,000 Do this. 189 00:07:09,000 --> 00:07:12,166 Again if needed and whenever you're ready, meet me in the next. 190 00:07:12,166 --> 00:07:13,833 Tutorial to plot the. 191 00:07:13,833 --> 00:07:15,366 Final histogram, which. 192 00:07:15,366 --> 00:07:18,366 Will show us exactly which ad was. 193 00:07:18,366 --> 00:07:20,133 Identified. As the best ad. 194 00:07:20,133 --> 00:07:22,200 Of the ad with the highest conversion rate. 195 00:07:22,200 --> 00:07:24,466 By this UCB algorithm. 196 00:07:24,466 --> 00:07:25,566 I can't wait to. Show this to. 197 00:07:25,566 --> 00:07:27,966 You and until then, enjoy machine learning.