1 00:00:00,133 --> 00:00:00,966 Hello, my friends. 2 00:00:00,966 --> 00:00:02,400 All right, let's do this. 3 00:00:02,400 --> 00:00:04,466 Let's start this. Implementation. 4 00:00:04,466 --> 00:00:06,233 So the first step is, of. 5 00:00:06,233 --> 00:00:08,733 Course, to implement the data. Preprocessing phase. 6 00:00:08,733 --> 00:00:10,200 And it will be slightly. 7 00:00:10,200 --> 00:00:11,300 Different than what we. 8 00:00:11,300 --> 00:00:14,733 Did before because we will only use actually from. 9 00:00:14,733 --> 00:00:16,600 Our data preprocessing template. 10 00:00:16,600 --> 00:00:17,700 This code cell. 11 00:00:17,700 --> 00:00:19,266 Where we import, you know, the. 12 00:00:19,266 --> 00:00:20,466 Widely used libraries. 13 00:00:20,466 --> 00:00:22,466 Like numpy, matplotlib and vendors. 14 00:00:22,466 --> 00:00:23,266 And then only. 15 00:00:23,266 --> 00:00:25,266 This. Line of code, you know, to. 16 00:00:25,266 --> 00:00:27,133 Load the data set. And that's it. 17 00:00:27,133 --> 00:00:27,700 Then we won't. 18 00:00:27,700 --> 00:00:28,100 Create a. 19 00:00:28,100 --> 00:00:31,566 Matrix of features or dependent variable vector because here we're doing a. 20 00:00:31,566 --> 00:00:33,900 Totally different thing the association rule learning. 21 00:00:33,900 --> 00:00:35,166 And we don't have to. Split the. 22 00:00:35,166 --> 00:00:36,900 Data set into a training set and test. 23 00:00:36,900 --> 00:00:40,800 It, because we will directly learn all the rules through the whole data set. 24 00:00:40,900 --> 00:00:41,633 Okay. 25 00:00:41,633 --> 00:00:43,100 So right now we just want. 26 00:00:43,100 --> 00:00:44,300 This first, you know. 27 00:00:44,300 --> 00:00:47,100 To import the main libraries. 28 00:00:47,100 --> 00:00:48,833 And then we'll. Get that. 29 00:00:48,833 --> 00:00:50,333 Line of code to import the. 30 00:00:50,333 --> 00:00:51,000 Data set. 31 00:00:51,000 --> 00:00:53,233 So here I'm creating a new code cell. 32 00:00:53,233 --> 00:00:54,666 I'm importing this. 33 00:00:54,666 --> 00:00:56,500 And then we can load the data set. 34 00:00:56,500 --> 00:00:59,166 However there is something important you should know about. 35 00:00:59,166 --> 00:01:01,466 This a priori. Implementation. 36 00:01:01,466 --> 00:01:04,066 It is the fact that for the first. Time we won't. 37 00:01:04,066 --> 00:01:05,466 Use scikit learn. 38 00:01:05,466 --> 00:01:08,433 Unfortunately, the cyclone library doesn't. 39 00:01:08,433 --> 00:01:11,233 Include some classes or functions on a. 40 00:01:11,233 --> 00:01:11,866 Priory. 41 00:01:11,866 --> 00:01:14,633 Basically, it doesn't include the a priori model. 42 00:01:14,633 --> 00:01:16,166 Therefore, we're not going to. 43 00:01:16,166 --> 00:01:17,433 Use scikit learn to. 44 00:01:17,433 --> 00:01:18,633 Train the model. 45 00:01:18,633 --> 00:01:20,233 We will actually use another. 46 00:01:20,233 --> 00:01:22,866 Library which is called a priory. You know. 47 00:01:22,866 --> 00:01:25,000 priory.py is actually a. 48 00:01:25,000 --> 00:01:27,500 Python implementation containing all. The. 49 00:01:27,500 --> 00:01:29,633 Algorithm of the Priory model. 50 00:01:29,633 --> 00:01:30,166 And that's. 51 00:01:30,166 --> 00:01:30,633 What we. 52 00:01:30,633 --> 00:01:33,400 Will get, you know, and use to train our. 53 00:01:33,400 --> 00:01:34,633 Priory. Model here on the. 54 00:01:34,633 --> 00:01:37,766 Whole data set, but exceptionally, because. 55 00:01:37,766 --> 00:01:38,633 You know. Google. 56 00:01:38,633 --> 00:01:40,266 Colab contains most of. 57 00:01:40,266 --> 00:01:42,966 The libraries and packages pre-installed. 58 00:01:42,966 --> 00:01:45,266 But here, exceptionally, Google Colab. 59 00:01:45,266 --> 00:01:46,500 Doesn't include. That. 60 00:01:46,500 --> 00:01:47,633 Appearing module. 61 00:01:47,633 --> 00:01:50,266 So we will actually have to install it. 62 00:01:50,266 --> 00:01:52,633 And this. Is very good that you see this because you. 63 00:01:52,633 --> 00:01:56,866 Will encounter this situation sometimes, you know, rarely but sometimes. 64 00:01:57,133 --> 00:01:59,400 And so you need to know how to install. 65 00:01:59,400 --> 00:02:01,333 A certain package or a certain library. 66 00:02:01,333 --> 00:02:03,133 From the web, you know, from online. 67 00:02:03,133 --> 00:02:06,300 Because what we'll do now is we will enter a command, you know, Pip 68 00:02:06,300 --> 00:02:07,266 command, which. 69 00:02:07,266 --> 00:02:09,533 Will download first the binary file. 70 00:02:09,533 --> 00:02:11,466 From the internet, you know, through link. 71 00:02:11,466 --> 00:02:14,200 And then we'll install it inside. This. 72 00:02:14,200 --> 00:02:15,433 Particular notebook. 73 00:02:15,433 --> 00:02:15,866 All right. 74 00:02:15,866 --> 00:02:17,566 Let me to. Show you let's do this. 75 00:02:17,566 --> 00:02:19,766 So we're. Actually going to implement this. 76 00:02:19,766 --> 00:02:22,000 Right before you. Know importing the main. Libraries. 77 00:02:22,000 --> 00:02:25,166 Usually the packages are installed as a first step. 78 00:02:25,533 --> 00:02:28,666 And to do this we need to start with an exclamation mark. 79 00:02:28,800 --> 00:02:31,100 Then Pip you know that's the pip command that. 80 00:02:31,100 --> 00:02:33,366 Allows us to. Install. Packages. 81 00:02:33,366 --> 00:02:34,600 Then very simply. 82 00:02:34,600 --> 00:02:35,100 Well speaking. 83 00:02:35,100 --> 00:02:37,466 Of installing packages, the next thing to enter here. 84 00:02:37,466 --> 00:02:38,800 Is install. 85 00:02:38,800 --> 00:02:40,933 And then you simply enter the name of. 86 00:02:40,933 --> 00:02:42,966 That library or module. 87 00:02:42,966 --> 00:02:44,266 That you want to install. 88 00:02:44,266 --> 00:02:47,100 And in our case. As we said it is a fire. 89 00:02:47,100 --> 00:02:47,700 Be careful. 90 00:02:47,700 --> 00:02:48,200 It is not a. 91 00:02:48,200 --> 00:02:53,266 Priory, it is a fire like that AP or AI okay. 92 00:02:53,566 --> 00:02:56,933 So exclamation mark pip, install a binary. 93 00:02:57,366 --> 00:02:59,166 All right. And now let me show. You what it does. 94 00:02:59,166 --> 00:03:01,733 As I said it's. Going to download it. First. 95 00:03:01,733 --> 00:03:04,166 And then it will install it inside a. 96 00:03:04,166 --> 00:03:06,300 Notebook. So here it is. 97 00:03:06,300 --> 00:03:08,100 Finding it on the web. 98 00:03:08,100 --> 00:03:09,633 And very soon we will see in the. 99 00:03:09,633 --> 00:03:11,666 Output the download. 100 00:03:11,666 --> 00:03:13,833 All right there we go. The download is starting now. 101 00:03:14,800 --> 00:03:15,733 collecting it by. 102 00:03:15,733 --> 00:03:16,733 Downloading. 103 00:03:16,733 --> 00:03:20,833 It. From the link and then installing it and then, you know. 104 00:03:20,833 --> 00:03:22,400 Successfully. Installed. A binary. 105 00:03:22,400 --> 00:03:24,533 See it just takes a few seconds. 106 00:03:24,533 --> 00:03:25,766 And this is the version of this. 107 00:03:25,766 --> 00:03:28,166 Binary package. 1.1.2. 108 00:03:28,166 --> 00:03:30,166 Maybe you'll get a different one, you know, if you. 109 00:03:30,166 --> 00:03:33,166 Take this course way after I recorded this tutorial. 110 00:03:33,266 --> 00:03:33,933 All right. So good. 111 00:03:33,933 --> 00:03:35,566 Now you know how to install. A package. 112 00:03:35,566 --> 00:03:36,966 Within Google. Colab. 113 00:03:36,966 --> 00:03:38,700 But don't worry, most of the. 114 00:03:38,700 --> 00:03:41,700 Packages and including the deep learning one like TensorFlow. 115 00:03:41,800 --> 00:03:44,033 Are already pre-installed. 116 00:03:44,033 --> 00:03:45,500 Okay, good. 117 00:03:45,500 --> 00:03:48,400 So now well let's import the libraries now. 118 00:03:48,400 --> 00:03:49,633 As we're here. 119 00:03:49,633 --> 00:03:52,066 And now let's implement the. Data preprocessing phase. 120 00:03:52,066 --> 00:03:53,866 So let's create a new code cell. 121 00:03:53,866 --> 00:03:54,433 Then as. 122 00:03:54,433 --> 00:03:57,433 We said we will just get that first. 123 00:03:57,433 --> 00:04:00,000 Line of importing the data sets in our data. 124 00:04:00,000 --> 00:04:02,400 Preprocessing. Template. 125 00:04:02,400 --> 00:04:04,800 And we're going to paste. That. 126 00:04:04,800 --> 00:04:06,833 Right here in our copy. 127 00:04:06,833 --> 00:04:07,500 And that's it. 128 00:04:07,500 --> 00:04:09,133 That's all we'll get from the template. 129 00:04:09,133 --> 00:04:12,566 And now we will adapt and make it 100% customized. 130 00:04:12,566 --> 00:04:14,733 To the primary model. 131 00:04:14,733 --> 00:04:15,033 All right. 132 00:04:15,033 --> 00:04:16,866 So first let's just replace. 133 00:04:16,866 --> 00:04:18,233 The name. Of the data set here by. 134 00:04:18,233 --> 00:04:21,233 The right name which is remember. Market. 135 00:04:21,833 --> 00:04:22,600 Underscore. 136 00:04:22,600 --> 00:04:26,200 Basket underscore. Optimization 137 00:04:27,200 --> 00:04:27,866 okay. 138 00:04:27,866 --> 00:04:29,466 And now that. We did this let's. 139 00:04:29,466 --> 00:04:31,300 Think of the right thing to do. 140 00:04:31,300 --> 00:04:34,000 Which is to upload the data set here. 141 00:04:34,000 --> 00:04:34,666 Oh good. 142 00:04:34,666 --> 00:04:36,633 So let's click upload. 143 00:04:36,633 --> 00:04:38,700 All right then make sure to fine. 144 00:04:38,700 --> 00:04:41,366 You know that machine. Learning is that codes. And data sets. 145 00:04:41,366 --> 00:04:44,166 Folder on your. Machine. And if you haven't downloaded. Yet. 146 00:04:44,166 --> 00:04:44,633 You can. 147 00:04:44,633 --> 00:04:47,700 Download it right before this tutorial at the bottom of the article. 148 00:04:47,933 --> 00:04:48,300 All right. 149 00:04:48,300 --> 00:04:51,866 So let's go inside and let's go to part five Association. 150 00:04:51,866 --> 00:04:52,733 Rule learning. 151 00:04:52,733 --> 00:04:55,533 Then Priory then. Python. And there we. Go. 152 00:04:55,533 --> 00:04:58,700 Let's select market basket optimization. 153 00:04:59,100 --> 00:05:01,466 Let's click open. Click okay. 154 00:05:01,466 --> 00:05:03,266 And it. Will load it. 155 00:05:03,266 --> 00:05:04,966 And there we go. We have it. 156 00:05:04,966 --> 00:05:07,300 Don't try to open it because it is actually very big. 157 00:05:07,300 --> 00:05:08,966 And Google Colab won't allow you. 158 00:05:08,966 --> 00:05:12,333 But you know if you want to have a look at it, you can go back to the folder. 159 00:05:12,566 --> 00:05:13,600 And, you know. 160 00:05:13,600 --> 00:05:15,400 Just double click it here, right. 161 00:05:15,400 --> 00:05:18,500 And you can see it very well and see all the transaction okay. 162 00:05:18,500 --> 00:05:21,166 So we'll actually leave it like that in case we want to see it. 163 00:05:21,166 --> 00:05:22,200 And now. Back to. 164 00:05:22,200 --> 00:05:23,466 Our implementation. 165 00:05:23,466 --> 00:05:25,233 Let's see what we. Have to do. Next. 166 00:05:25,233 --> 00:05:28,300 So we already loaded the data set as. 167 00:05:28,300 --> 00:05:29,800 A pandas dataframe. 168 00:05:29,800 --> 00:05:32,466 However we have to be careful with something. 169 00:05:32,466 --> 00:05:33,366 Look at the data set. 170 00:05:33,366 --> 00:05:35,366 Again and notice something. 171 00:05:35,366 --> 00:05:38,000 Notice that you know on our. Previous data set. 172 00:05:38,000 --> 00:05:39,333 The first row here. 173 00:05:39,333 --> 00:05:42,000 Of this index one here contained actually. 174 00:05:42,000 --> 00:05:44,733 Before. The names of the. Columns, right. 175 00:05:44,733 --> 00:05:47,733 For example, if we take our social network ads data. 176 00:05:47,733 --> 00:05:48,900 Set in. Part three. 177 00:05:48,900 --> 00:05:50,866 Classification, we'll remember this. 178 00:05:50,866 --> 00:05:53,366 First row here had first the age. 179 00:05:53,366 --> 00:05:54,700 The name of the first column. 180 00:05:54,700 --> 00:05:58,366 And then the estimated salary, which was the name of the second column, 181 00:05:58,633 --> 00:06:01,466 and then the name of the dependent variable purchased. 182 00:06:01,466 --> 00:06:04,133 And here in this. Particular data set. 183 00:06:04,133 --> 00:06:06,833 Well, you know, we don't have names of columns because. 184 00:06:06,833 --> 00:06:08,433 Each of these elements. 185 00:06:08,433 --> 00:06:10,000 Here correspond to just. 186 00:06:10,000 --> 00:06:11,066 Different products. Okay. 187 00:06:11,066 --> 00:06:13,800 So there was no need to name any of these columns. 188 00:06:13,800 --> 00:06:16,700 And therefore here we need to add something in our. 189 00:06:16,700 --> 00:06:19,000 Read underscore CSV. Function to. 190 00:06:19,000 --> 00:06:21,033 Tell that the first row. 191 00:06:21,033 --> 00:06:24,233 Doesn't contain the names of the column, because if we don't say this. 192 00:06:24,233 --> 00:06:25,500 Well, actually it. Will not. 193 00:06:25,500 --> 00:06:27,500 Take that first. Transaction. 194 00:06:27,500 --> 00:06:28,533 Because it will. Think. 195 00:06:28,533 --> 00:06:31,100 That this first row. Are just names of. The columns. 196 00:06:31,100 --> 00:06:32,500 We have to specify. This and the. 197 00:06:32,500 --> 00:06:34,633 Way to specify this. And it's good that. 198 00:06:34,633 --> 00:06:36,566 You know, this trick in pandas. 199 00:06:36,566 --> 00:06:37,033 Is just. 200 00:06:37,033 --> 00:06:39,766 Add another parameter in this read underscore. 201 00:06:39,766 --> 00:06:40,800 Csb. Function. 202 00:06:40,800 --> 00:06:43,833 Which is header, and we set it equal. 203 00:06:43,833 --> 00:06:45,500 To num. Right. 204 00:06:45,500 --> 00:06:49,466 So that we can specify that indeed there are no headers. 205 00:06:49,466 --> 00:06:51,766 Meaning no column names. Okay. 206 00:06:51,766 --> 00:06:52,800 That's what it means. 207 00:06:52,800 --> 00:06:54,600 So now you know this. And therefore. It will. 208 00:06:54,600 --> 00:06:56,366 Take into account, you know, it will. 209 00:06:56,366 --> 00:06:59,066 Take the first row of this market basket. 210 00:06:59,066 --> 00:07:01,966 Optimization data set containing all the transactions. 211 00:07:01,966 --> 00:07:04,500 And we'll do this. It will take the first transaction. 212 00:07:04,500 --> 00:07:07,466 All right. Good. So that was the first step. 213 00:07:07,466 --> 00:07:10,366 But then we actually. Have another. Big step. 214 00:07:10,366 --> 00:07:10,766 And that. 215 00:07:10,766 --> 00:07:14,066 Has to do with the fact that when we will train the a. 216 00:07:14,066 --> 00:07:15,666 Priori model on the data. 217 00:07:15,666 --> 00:07:17,200 Set, well we will use a. 218 00:07:17,200 --> 00:07:20,100 Certain function that. Is actually called a priory. 219 00:07:20,100 --> 00:07:21,100 And of course that function. 220 00:07:21,100 --> 00:07:24,100 Will take. As input. The data set. 221 00:07:24,300 --> 00:07:25,200 But the thing is. 222 00:07:25,200 --> 00:07:28,200 That it expects this data set to have a. 223 00:07:28,200 --> 00:07:33,233 Certain format, and that format is unfortunately not append this data frame. 224 00:07:33,433 --> 00:07:37,933 And therefore we have to recreate the data set, you know, from this original. 225 00:07:37,933 --> 00:07:40,066 Pandas dataframe so that it can have. 226 00:07:40,066 --> 00:07:43,000 This format expected by the a priori. 227 00:07:43,000 --> 00:07:43,666 Function. 228 00:07:43,666 --> 00:07:46,100 Which will. Train the Priory model on the whole. 229 00:07:46,100 --> 00:07:46,900 Data set. 230 00:07:46,900 --> 00:07:49,133 And so now the question is what is this format? 231 00:07:49,133 --> 00:07:52,400 Well, that format is simply a list of transactions. 232 00:07:52,400 --> 00:07:54,266 You know, instead of having the data set. 233 00:07:54,266 --> 00:07:56,233 As a. Pandas dataframe, we want to. 234 00:07:56,233 --> 00:07:58,733 Have the data set as a list. Of transactions. 235 00:07:58,733 --> 00:08:00,733 You know, the transactions. Listed. 236 00:08:00,733 --> 00:08:01,666 One by one. 237 00:08:01,666 --> 00:08:04,666 With the different products purchased by the customers. 238 00:08:05,100 --> 00:08:06,833 And that's exactly what we have to do now. 239 00:08:06,833 --> 00:08:08,133 We have to. Create that list. 240 00:08:08,133 --> 00:08:10,033 And the first. Step to create that list is. 241 00:08:10,033 --> 00:08:11,100 Actually to initialize. 242 00:08:11,100 --> 00:08:13,600 It as. An empty. List. Because then in order. 243 00:08:13,600 --> 00:08:16,266 To create that list, we will have to populate it with. 244 00:08:16,266 --> 00:08:18,966 The different transactions of. The data set. 245 00:08:18,966 --> 00:08:19,633 And the way we. 246 00:08:19,633 --> 00:08:21,900 Will do this is of course with a for loop. 247 00:08:21,900 --> 00:08:22,866 You know, for loop. 248 00:08:22,866 --> 00:08:26,133 Iterating all the 7500 transactions of. 249 00:08:26,133 --> 00:08:29,000 The data set so as to populate it. Okay. 250 00:08:29,000 --> 00:08:29,700 So let's do this. 251 00:08:29,700 --> 00:08:32,466 First let's. Initialize. That list. 252 00:08:32,466 --> 00:08:35,033 We will call it. Transactions and we will. 253 00:08:35,033 --> 00:08:37,400 Initialize it as. An empty list. 254 00:08:37,400 --> 00:08:37,733 All right. 255 00:08:37,733 --> 00:08:40,300 So so far it. Is just an empty list. And now. 256 00:08:40,300 --> 00:08:44,133 We will start for loop to populate this list of. 257 00:08:44,133 --> 00:08:45,800 Transactions with. All. 258 00:08:45,800 --> 00:08:48,866 The transactions contained in that pandas DataFrame. 259 00:08:48,966 --> 00:08:50,166 Data set. 260 00:08:50,166 --> 00:08:50,633 Okay. 261 00:08:50,633 --> 00:08:52,733 So as we make that loop you. 262 00:08:52,733 --> 00:08:54,033 Will totally understand. 263 00:08:54,033 --> 00:08:55,233 What we we're doing. 264 00:08:55,233 --> 00:08:56,266 So for. 265 00:08:56,266 --> 00:08:58,900 Then. Classic iterated variable I. 266 00:08:58,900 --> 00:08:59,400 You know, which. 267 00:08:59,400 --> 00:09:02,566 Will take all the values from zero to. 268 00:09:02,566 --> 00:09:04,266 7500. 269 00:09:04,266 --> 00:09:07,266 But remember that the upper bound of a range is excluded. 270 00:09:07,300 --> 00:09:10,533 So we will actually have to go up. To 7501. 271 00:09:10,833 --> 00:09:13,833 Therefore here this iterated variable will go. 272 00:09:13,933 --> 00:09:16,766 In the range from. 273 00:09:16,766 --> 00:09:18,366 Zero to. 274 00:09:18,366 --> 00:09:21,566 7501. All right. 275 00:09:21,600 --> 00:09:25,700 They're actually 7501 transactions 276 00:09:25,700 --> 00:09:28,766 because we start from zero and not 7500. 277 00:09:28,766 --> 00:09:31,200 Right. We can check it out very quickly. 278 00:09:31,200 --> 00:09:32,733 You know here we start from one. 279 00:09:32,733 --> 00:09:36,000 And when we scroll down we go down to let's see. 280 00:09:36,000 --> 00:09:39,166 Yes 7501. So that's the exact. 281 00:09:39,333 --> 00:09:40,700 Number of transactions. 282 00:09:40,700 --> 00:09:41,400 All right. 283 00:09:41,400 --> 00:09:42,533 So all good. 284 00:09:42,533 --> 00:09:43,333 For the range. 285 00:09:43,333 --> 00:09:46,333 Now don't forget the column. 286 00:09:46,500 --> 00:09:48,700 And now we begin the for loop. 287 00:09:48,700 --> 00:09:51,800 And well you know building this list of transactions is very easy. 288 00:09:51,800 --> 00:09:53,700 We will just use the append. 289 00:09:53,700 --> 00:09:56,400 Function which means add and which will simply. 290 00:09:56,400 --> 00:09:58,366 You know, add the. Different transactions of. 291 00:09:58,366 --> 00:09:59,966 The dataset. One by one. 292 00:09:59,966 --> 00:10:02,700 All right. It's a very classic way to build a list. 293 00:10:02,700 --> 00:10:03,400 You know, you use. 294 00:10:03,400 --> 00:10:06,266 The append function to add your different elements one by one. 295 00:10:06,266 --> 00:10:07,033 So let me. 296 00:10:07,033 --> 00:10:08,500 Show this to you very simply. 297 00:10:08,500 --> 00:10:08,866 What we need. 298 00:10:08,866 --> 00:10:10,200 To do here is take. 299 00:10:10,200 --> 00:10:13,133 Our transactions. List, okay. 300 00:10:13,133 --> 00:10:14,400 From which we're going to. 301 00:10:14,400 --> 00:10:15,033 Call this. 302 00:10:15,033 --> 00:10:17,933 Append function, which. Is a prebuilt. 303 00:10:17,933 --> 00:10:19,800 Function of a Python list. 304 00:10:19,800 --> 00:10:20,900 You know, Python has. 305 00:10:20,900 --> 00:10:21,733 All these pre-built. 306 00:10:21,733 --> 00:10:22,633 Functions. 307 00:10:22,633 --> 00:10:23,400 And inside. 308 00:10:23,400 --> 00:10:25,800 Well we're. Going to add the transaction. 309 00:10:25,800 --> 00:10:27,300 But we have to add. It in. 310 00:10:27,300 --> 00:10:28,800 A pair of square brackets. 311 00:10:28,800 --> 00:10:30,133 Because it's will contain, you know, 312 00:10:30,133 --> 00:10:32,100 all the different elements, you know, all the different. 313 00:10:32,100 --> 00:10:33,900 Products purchased by. 314 00:10:33,900 --> 00:10:36,666 The underlying customer in the transaction. 315 00:10:36,666 --> 00:10:37,800 And this transaction must. 316 00:10:37,800 --> 00:10:40,233 Be created as a list of. Products. 317 00:10:40,233 --> 00:10:40,800 And that's why. 318 00:10:40,800 --> 00:10:43,333 We have these new pair of square brackets. 319 00:10:43,333 --> 00:10:45,866 To make that transaction a list. Of products. 320 00:10:45,866 --> 00:10:47,066 So in the end we're. 321 00:10:47,066 --> 00:10:49,166 Actually creating a list of lists. 322 00:10:49,166 --> 00:10:51,033 You know, each transaction in the big. 323 00:10:51,033 --> 00:10:54,000 List. Of transactions is. Actually a list. All right. 324 00:10:54,000 --> 00:10:56,033 And so now there's a new little trick. 325 00:10:56,033 --> 00:10:58,233 But which is good. That, you know. 326 00:10:58,233 --> 00:11:00,900 It is a single row for loop. 327 00:11:00,900 --> 00:11:01,800 Because now actually. 328 00:11:01,800 --> 00:11:04,066 We need to do a second for loop to. 329 00:11:04,066 --> 00:11:05,066 You know. 330 00:11:05,066 --> 00:11:08,866 Get all the products of each transaction. 331 00:11:08,866 --> 00:11:10,200 So let me scroll back up. 332 00:11:10,200 --> 00:11:11,700 You know, the. First for loop will go. 333 00:11:11,700 --> 00:11:14,766 From this transaction to this one to this one, to this one, 334 00:11:14,766 --> 00:11:18,300 down to the last one at the bottom 7501. 335 00:11:18,600 --> 00:11:19,200 But then we need to. 336 00:11:19,200 --> 00:11:21,533 Do a second for loop, which will loop. 337 00:11:21,533 --> 00:11:24,400 Over the different products in each transaction. 338 00:11:24,400 --> 00:11:26,566 So you know it will. Add this product. 339 00:11:26,566 --> 00:11:28,533 Then this one, then this one, then this one. 340 00:11:28,533 --> 00:11:31,833 And since the maximum number of products in one basket. 341 00:11:31,833 --> 00:11:33,766 Is actually 20. You know. 342 00:11:33,766 --> 00:11:35,366 I put this first. 343 00:11:35,366 --> 00:11:37,166 Transaction at the. Top so that we. 344 00:11:37,166 --> 00:11:39,766 Can have the maximum. Size of the basket. 345 00:11:39,766 --> 00:11:40,800 Which is 20. 346 00:11:40,800 --> 00:11:43,100 And therefore what we'll do now is the second for. 347 00:11:43,100 --> 00:11:46,566 Loop that will iterate from zero to. 20. 348 00:11:46,900 --> 00:11:48,900 It will start from zero here. 349 00:11:48,900 --> 00:11:50,833 You know. That's the index. Of the first. Column. 350 00:11:50,833 --> 00:11:53,933 And then it will iterate to this one, then to this one, then to this one 351 00:11:53,933 --> 00:11:57,266 up to the final index, which is index 19. 352 00:11:57,266 --> 00:12:00,333 Actually, because it starts from zero there are 20 columns. 353 00:12:00,466 --> 00:12:02,833 So it goes. Up to the index 19. 354 00:12:02,833 --> 00:12:03,700 Okay. So that's the. 355 00:12:03,700 --> 00:12:04,566 Second for loop. 356 00:12:04,566 --> 00:12:05,400 And that's what. 357 00:12:05,400 --> 00:12:06,900 We'll do right away. 358 00:12:06,900 --> 00:12:07,866 Inside this. 359 00:12:07,866 --> 00:12:09,466 Append function to. 360 00:12:09,466 --> 00:12:10,366 Add all these. 361 00:12:10,366 --> 00:12:12,933 Different elements in each transaction. 362 00:12:12,933 --> 00:12:17,066 And for the transactions that don't have 20 elements, well that's totally fine. 363 00:12:17,300 --> 00:12:19,366 We will still iterate up to the. 364 00:12:19,366 --> 00:12:21,466 End, you know, up to the index 19. 365 00:12:21,466 --> 00:12:23,133 But we will just populate the list. 366 00:12:23,133 --> 00:12:26,833 With nones, you know, Nans values which mean empty, 367 00:12:26,833 --> 00:12:30,900 so that our model will understand that the transaction here only has. 368 00:12:30,900 --> 00:12:32,800 Three products. Okay. 369 00:12:32,800 --> 00:12:35,100 So we can totally iterate up to 20. 370 00:12:35,100 --> 00:12:37,566 And therefore let's start this second. 371 00:12:37,566 --> 00:12:38,500 For loop. Here. 372 00:12:38,500 --> 00:12:41,133 For now we need to take another. 373 00:12:41,133 --> 00:12:44,766 Iterated variable, which we'll call naturally j for. 374 00:12:44,766 --> 00:12:47,500 J in the range from. 375 00:12:47,500 --> 00:12:49,366 Zero to be careful. 376 00:12:49,366 --> 00:12:51,700 Not 19, but 20. 377 00:12:51,700 --> 00:12:54,500 Because the. Upper bound is once again. Excluded. 378 00:12:54,500 --> 00:12:57,233 So from 0 to 20 then what we will do. 379 00:12:57,233 --> 00:12:58,200 And that's. 380 00:12:58,200 --> 00:13:00,266 The syntax in a single. 381 00:13:00,266 --> 00:13:03,600 Row for loop we have to do what we want to do with this. 382 00:13:03,600 --> 00:13:04,166 For loop 383 00:13:04,166 --> 00:13:08,966 at the beginning, you know, before the for and what we want to do is just get that. 384 00:13:08,966 --> 00:13:10,733 Product in the transaction. 385 00:13:10,733 --> 00:13:12,733 Well from the index. Zero to. 386 00:13:12,733 --> 00:13:15,066 The index 19 and to. Access this product. 387 00:13:15,066 --> 00:13:17,500 Well we will of course use our data set. 388 00:13:17,500 --> 00:13:19,333 And play with the right indexes. 389 00:13:19,333 --> 00:13:20,733 To get the right product. 390 00:13:20,733 --> 00:13:21,033 All right. 391 00:13:21,033 --> 00:13:25,233 So here we need to call first the data set okay. 392 00:13:25,600 --> 00:13:28,100 And now let's add some. Brackets. 393 00:13:28,100 --> 00:13:29,833 To add you know the index. Of the row. 394 00:13:29,833 --> 00:13:32,966 And the index of the column that contains that product. 395 00:13:32,966 --> 00:13:35,333 We want to include in this transaction. 396 00:13:35,333 --> 00:13:36,566 So first according to you. 397 00:13:36,566 --> 00:13:38,566 What will be the index of the row. 398 00:13:38,566 --> 00:13:41,433 Well that will be of course I because I 399 00:13:41,433 --> 00:13:44,566 iterate through all the rows of the data set. 400 00:13:44,800 --> 00:13:46,633 And therefore now we're dealing actually. 401 00:13:46,633 --> 00:13:47,700 With that particular. 402 00:13:47,700 --> 00:13:50,166 Transaction of index I. Which were. 403 00:13:50,166 --> 00:13:51,800 Appending into that transactions. 404 00:13:51,800 --> 00:13:55,766 List, and therefore the row of the data set we are at. 405 00:13:55,766 --> 00:13:56,600 Right now. 406 00:13:56,600 --> 00:13:58,100 Is exactly I. 407 00:13:58,100 --> 00:14:00,466 Okay, I will start from zero. 408 00:14:00,466 --> 00:14:03,000 It will first get that transaction. 409 00:14:03,000 --> 00:14:04,500 Then it will be equal to one. 410 00:14:04,500 --> 00:14:05,166 So it will get. 411 00:14:05,166 --> 00:14:08,400 That transaction and then this one and this one etc.. 412 00:14:08,600 --> 00:14:09,166 All right. 413 00:14:09,166 --> 00:14:12,166 And so now we're dealing with a particular transaction of a particular row. 414 00:14:12,300 --> 00:14:12,833 And that's. 415 00:14:12,833 --> 00:14:14,666 Exactly the index of the. 416 00:14:14,666 --> 00:14:16,233 Data set we need to input here. 417 00:14:16,233 --> 00:14:19,433 And now for the column which index you think it's going to be. 418 00:14:19,700 --> 00:14:22,400 Well that will be of course that. Index, you know, which. 419 00:14:22,400 --> 00:14:24,266 Iterates through all the. 420 00:14:24,266 --> 00:14:25,933 Columns of. Actually. 421 00:14:25,933 --> 00:14:27,833 The transaction. Right. 422 00:14:27,833 --> 00:14:29,500 J will go from. Zero. 423 00:14:29,500 --> 00:14:32,700 To 1 to 2. To 3 to 4 up. To 19. 424 00:14:32,833 --> 00:14:34,000 And therefore. Here. 425 00:14:34,000 --> 00:14:36,000 The index will one for a column is of. 426 00:14:36,000 --> 00:14:37,733 Course J. 427 00:14:37,733 --> 00:14:38,366 All right. 428 00:14:38,366 --> 00:14:39,666 So very good. 429 00:14:39,666 --> 00:14:42,000 But now we need to add one final thing. 430 00:14:42,000 --> 00:14:43,200 You know unfortunately we can. 431 00:14:43,200 --> 00:14:46,366 Not directly access the cell of row. 432 00:14:46,666 --> 00:14:49,133 And column J in the. Data set. In order to. 433 00:14:49,133 --> 00:14:53,200 Access the cell we just need to add here dot values. 434 00:14:53,466 --> 00:14:57,100 Because that's the part of that data set structure. 435 00:14:57,100 --> 00:14:59,866 You know, this is an advanced. Structure by. Pandas. 436 00:14:59,866 --> 00:15:02,300 That allow us to get access to the cells. All right. 437 00:15:02,300 --> 00:15:04,000 So good. That, you know. This. 438 00:15:04,000 --> 00:15:05,900 And now we're almost ready. 439 00:15:05,900 --> 00:15:07,800 We almost. Have everything. 440 00:15:07,800 --> 00:15:09,200 The only thing that's left 441 00:15:09,200 --> 00:15:13,233 has again, to do with, you know, a certain expectation of the future. 442 00:15:13,233 --> 00:15:16,233 A primary function which we'll use to train our primary model. 443 00:15:16,500 --> 00:15:18,300 It is the fact that all the. 444 00:15:18,300 --> 00:15:19,666 Elements in the list. 445 00:15:19,666 --> 00:15:21,600 Must be strings. 446 00:15:21,600 --> 00:15:23,433 They must. Be strings. Otherwise. 447 00:15:23,433 --> 00:15:26,433 So a primary model won't be able to learn the rules. 448 00:15:26,700 --> 00:15:27,300 And to make. 449 00:15:27,300 --> 00:15:31,800 Sure that these, you know, values we populate inside. 450 00:15:31,800 --> 00:15:34,200 Each of our transactions in our list. 451 00:15:34,200 --> 00:15:37,000 Well. In order to make sure that these are strings, we can. 452 00:15:37,000 --> 00:15:38,366 Force it this way by. 453 00:15:38,366 --> 00:15:39,900 Putting that into. The. 454 00:15:39,900 --> 00:15:41,266 String function. 455 00:15:41,266 --> 00:15:44,400 Str, which will take us input exactly these. 456 00:15:44,400 --> 00:15:45,866 Elements, meaning this. 457 00:15:45,866 --> 00:15:47,666 Product. Of the transactions. 458 00:15:47,666 --> 00:15:47,933 All right. 459 00:15:47,933 --> 00:15:49,333 So now we make sure. 460 00:15:49,333 --> 00:15:50,766 That the products are. 461 00:15:50,766 --> 00:15:53,300 Strings you know in actually. Quotes. 462 00:15:53,300 --> 00:15:54,100 And this. 463 00:15:54,100 --> 00:15:58,466 Will give exactly what the primary model is expecting as. 464 00:15:58,466 --> 00:16:00,366 The format of its inputs. 465 00:16:00,366 --> 00:16:01,800 Good. Perfect. 466 00:16:01,800 --> 00:16:04,200 So now we're done with data preprocessing. 467 00:16:04,200 --> 00:16:06,566 As I told you, it's a. Bit different than before. 468 00:16:06,566 --> 00:16:07,400 But now you know. 469 00:16:07,400 --> 00:16:10,366 That for the primary model, you have to just create a. 470 00:16:10,366 --> 00:16:12,866 List of transactions and make sure. That all. 471 00:16:12,866 --> 00:16:15,400 The elements in each of your transactions are. 472 00:16:15,400 --> 00:16:16,500 Strings. 473 00:16:16,500 --> 00:16:17,133 Perfect. 474 00:16:17,133 --> 00:16:20,100 So now let's execute this cell because we're ready. 475 00:16:20,100 --> 00:16:21,900 Let's hope we didn't make any mistake. 476 00:16:21,900 --> 00:16:25,033 Play running this cell and all. 477 00:16:25,033 --> 00:16:25,933 Good. Perfect. 478 00:16:25,933 --> 00:16:27,666 It just ran successfully. 479 00:16:27,666 --> 00:16:29,400 Now we have. This list of. 480 00:16:29,400 --> 00:16:32,966 Transactions containing all the same transactions as in. 481 00:16:32,966 --> 00:16:33,966 This data set. 482 00:16:33,966 --> 00:16:36,900 But in the format. Of a list. Good. 483 00:16:36,900 --> 00:16:38,366 So now we're. Going to take a little break. 484 00:16:38,366 --> 00:16:40,033 Because then comes the. Real. 485 00:16:40,033 --> 00:16:41,400 Big. Important step. 486 00:16:41,400 --> 00:16:42,500 Training the a. 487 00:16:42,500 --> 00:16:45,466 Priori model on the data set. 488 00:16:45,466 --> 00:16:48,733 And to do this we'll call this a primary function 489 00:16:49,033 --> 00:16:52,200 that will take as input exactly this list. 490 00:16:52,200 --> 00:16:54,266 Of transactions that is now. Correctly. 491 00:16:54,266 --> 00:16:59,300 Populated in the right format to train the primary model on the data. Set. 492 00:17:00,000 --> 00:17:02,333 So take a little break and as soon as you're ready. 493 00:17:02,333 --> 00:17:04,833 Let's implement that. Next step together. 494 00:17:04,833 --> 00:17:06,766 And until then, enjoy machine learning.