1 00:00:00,133 --> 00:00:02,400 Hello and welcome to this art tutorial. 2 00:00:02,400 --> 00:00:05,300 So in the previous section, we implemented the apriori model 3 00:00:05,300 --> 00:00:09,366 to find some relevant association rules that could help us find a more 4 00:00:09,366 --> 00:00:13,500 strategic placement of some products in a grocery store in the South of France. 5 00:00:13,900 --> 00:00:17,466 And therefore, thanks to these association rules, the manager of this store 6 00:00:17,466 --> 00:00:20,833 is likely to optimize the sales and therefore increase the revenues. 7 00:00:21,266 --> 00:00:24,200 And accordingly, we managed to create some added values 8 00:00:24,200 --> 00:00:27,566 in this business thanks to association rule learning. 9 00:00:27,600 --> 00:00:31,600 And today we are going to implement a new model of association 10 00:00:31,600 --> 00:00:34,666 rule learning, which is called the E-Class model. 11 00:00:35,133 --> 00:00:39,500 So the first thing I want to say is to prevent any disappointment you might have. 12 00:00:39,700 --> 00:00:43,900 The E-Class model is actually very simple compared to what we did previously, 13 00:00:43,900 --> 00:00:47,300 because the model is basically the apriori model, 14 00:00:47,333 --> 00:00:50,400 simplified because previously we had two parameters. 15 00:00:50,400 --> 00:00:54,133 We had the support and the confidence, and we also had the lift, by the way, 16 00:00:54,133 --> 00:00:57,466 when we ranked our rules by their decreasing lifts. 17 00:00:57,900 --> 00:00:59,066 But here in the E-Class 18 00:00:59,066 --> 00:01:02,800 we will only have one parameter which will be the support parameter. 19 00:01:03,033 --> 00:01:06,766 And therefore when we obtain our rules, that will not be the rules 20 00:01:06,766 --> 00:01:07,833 that we obtained earlier. 21 00:01:07,833 --> 00:01:11,966 Like people who bought this also, but that but we will just get some 22 00:01:11,966 --> 00:01:15,300 different sets of products frequently, but together we will get 23 00:01:15,466 --> 00:01:16,833 the different sets of products. 24 00:01:16,833 --> 00:01:20,700 There are the most frequently purchased together, so that's what we'll get. 25 00:01:20,733 --> 00:01:22,366 We have to expect this. 26 00:01:22,366 --> 00:01:26,300 But you know, the exam model can be very useful if you don't have much time. 27 00:01:26,333 --> 00:01:29,066 If you want to get some simple results, 28 00:01:29,066 --> 00:01:29,800 if you don't want to play 29 00:01:29,800 --> 00:01:33,066 with too many parameters like the support and the confidence here, 30 00:01:33,100 --> 00:01:35,500 we won't even have to compute a value for the support. 31 00:01:35,500 --> 00:01:37,433 And moreover, we won't even have to 32 00:01:37,433 --> 00:01:40,833 choose a value for the confidence because there is no confidence parameter. 33 00:01:41,166 --> 00:01:41,933 So you'll see 34 00:01:41,933 --> 00:01:45,833 this is the way of using association rule learning in the simplest way. 35 00:01:46,000 --> 00:01:47,333 So let's do it right now. 36 00:01:47,333 --> 00:01:49,933 You'll see what I mean at the end of this tutorial. 37 00:01:49,933 --> 00:01:52,166 So let's start with the very first step. 38 00:01:52,166 --> 00:01:54,233 Setting the right folder is working directory. 39 00:01:54,233 --> 00:01:56,433 So we'll go to our Machine Learning A-Z folder. 40 00:01:56,433 --> 00:01:58,800 Part five Association Rule Learning. 41 00:01:58,800 --> 00:02:01,033 And right now we are in a club. 42 00:02:01,033 --> 00:02:02,233 So here it is. 43 00:02:02,233 --> 00:02:04,200 We have the market basket optimization. 44 00:02:04,200 --> 00:02:06,266 We will work on the same business problem. 45 00:02:06,266 --> 00:02:08,633 And so we can click on this more button here. 46 00:02:08,633 --> 00:02:10,866 And now set as working directory. 47 00:02:10,866 --> 00:02:15,366 And you know since I told you the model is a simplified version 48 00:02:15,366 --> 00:02:19,733 of the apriori model, what we'll do now is take our apriori model 49 00:02:20,100 --> 00:02:23,100 and you'll see that it will almost be the same. 50 00:02:23,133 --> 00:02:25,300 And we will just need to change one thing. 51 00:02:25,300 --> 00:02:28,300 So I'm going to select everything from here 52 00:02:29,100 --> 00:02:31,600 to here copy. 53 00:02:31,600 --> 00:02:34,200 And I'm going to paste it in the model. 54 00:02:34,200 --> 00:02:35,300 All right. 55 00:02:35,300 --> 00:02:38,100 So here in this section data preprocessing. 56 00:02:38,100 --> 00:02:39,800 We don't have anything to change. 57 00:02:39,800 --> 00:02:45,333 We are just going to import the data set with the read transaction functions. 58 00:02:45,333 --> 00:02:47,700 Be careful not with the red dot CSV file. 59 00:02:47,700 --> 00:02:50,700 Because we still need to have our sparse matrix. 60 00:02:50,933 --> 00:02:54,000 That will also be an input of the function, 61 00:02:54,233 --> 00:02:56,400 like here for the apriori function. 62 00:02:56,400 --> 00:02:59,800 So we will select this and execute. 63 00:03:00,266 --> 00:03:03,266 And of course we get the same number of duplicates. 64 00:03:03,333 --> 00:03:05,500 The CSV file hasn't changed. 65 00:03:05,500 --> 00:03:08,600 We have five duplicates and no duplicates. 66 00:03:08,600 --> 00:03:10,700 Just five duplicates okay. 67 00:03:10,700 --> 00:03:11,200 Then of course 68 00:03:11,200 --> 00:03:14,600 we can use the summary function to get some info about this data set. 69 00:03:14,600 --> 00:03:17,600 But of course these are going to be the same enforce before. 70 00:03:17,633 --> 00:03:20,766 So we have 7500 transactions 71 00:03:21,100 --> 00:03:26,300 119 products, and the density is 0.03. 72 00:03:26,300 --> 00:03:31,900 That means that the proportion of non-zero values in the matrix is 0.03 3%. 73 00:03:32,333 --> 00:03:34,433 And then of course we have the most frequent items. 74 00:03:34,433 --> 00:03:36,666 Mineral water comes first and eggs. 75 00:03:36,666 --> 00:03:39,600 And we can actually have a better look at these 76 00:03:39,600 --> 00:03:43,066 most frequent items by selecting this line and execute. 77 00:03:43,400 --> 00:03:46,400 And of course we get the same frequency plot. 78 00:03:46,466 --> 00:03:48,133 So that's exactly like before. 79 00:03:48,133 --> 00:03:49,866 We don't have anything to change here. 80 00:03:49,866 --> 00:03:53,666 However, now we enter the second code section which is to train 81 00:03:53,933 --> 00:03:55,666 the model on the data set. 82 00:03:55,666 --> 00:04:00,300 So first we will just replace apriori here by Eclat. 83 00:04:00,966 --> 00:04:02,066 Here we go. 84 00:04:02,066 --> 00:04:04,566 And guess what? It is so simple. 85 00:04:04,566 --> 00:04:08,166 Here we have the apriori function to train the apriori model. 86 00:04:08,466 --> 00:04:11,466 And guess what the function is going to be to train the model. 87 00:04:11,600 --> 00:04:14,966 It's going to be a class. So very simple. 88 00:04:15,000 --> 00:04:18,000 We are almost ready to train it on our data set. 89 00:04:18,066 --> 00:04:21,433 But of course the economical is much more simple. 90 00:04:21,633 --> 00:04:24,366 It doesn't include the confidence in the parameters. 91 00:04:24,366 --> 00:04:28,066 If right now we get to the slide of the Eclat algorithm from the intuition 92 00:04:28,066 --> 00:04:32,466 tutorial, well, we can see that the algorithm has three steps. 93 00:04:32,766 --> 00:04:35,366 And the first step is to set a minimum support. 94 00:04:35,366 --> 00:04:38,433 Remember before, for the apriori algorithm, the first step 95 00:04:38,433 --> 00:04:41,433 was to set a minimum support and a minimum confidence. 96 00:04:41,500 --> 00:04:45,133 Now we only need to set a minimum support so we don't need this. 97 00:04:45,333 --> 00:04:47,966 Actually this parameter is only for the apriori model. 98 00:04:47,966 --> 00:04:49,900 So if we keep that we will get an error. 99 00:04:51,033 --> 00:04:52,933 So I'll just remove this and 100 00:04:52,933 --> 00:04:56,666 we can leave the minimum support to 0.04. 101 00:04:56,666 --> 00:04:59,466 But you will see that it's even not necessary here. 102 00:04:59,466 --> 00:05:00,933 You'll see why at the end. 103 00:05:00,933 --> 00:05:03,933 However we might need to add another parameter. 104 00:05:04,300 --> 00:05:08,066 It's because since, you know, I mentioned that the algorithm 105 00:05:08,066 --> 00:05:12,900 is just going to return the sets of items most frequently bought together. 106 00:05:13,200 --> 00:05:17,000 Well, it wouldn't be interesting to have sets of only one item. 107 00:05:17,300 --> 00:05:18,733 So in order to get some sets 108 00:05:18,733 --> 00:05:22,500 of at least two items, we will add another parameter here, 109 00:05:22,566 --> 00:05:27,466 which we actually encountered earlier and which is the Midlane parameter. 110 00:05:27,733 --> 00:05:31,900 And we will set it to two, because we want to have the different sets 111 00:05:31,900 --> 00:05:35,433 of at least two items most frequently purchased together. 112 00:05:35,933 --> 00:05:37,866 Okay. And now we're ready. 113 00:05:37,866 --> 00:05:39,366 See how it was? Simple. 114 00:05:39,366 --> 00:05:44,300 So let's select this and train the model on our data set. 115 00:05:45,333 --> 00:05:46,266 Done. 116 00:05:46,266 --> 00:05:47,266 Okay. 117 00:05:47,266 --> 00:05:49,700 So now things change a little bit. 118 00:05:49,700 --> 00:05:53,900 First of course we see that we just trained the model obviously. 119 00:05:54,300 --> 00:05:56,700 Then we have the parameter specification like before. 120 00:05:56,700 --> 00:05:59,100 But this time we don't have the confidence parameter. 121 00:05:59,100 --> 00:06:02,966 We have the support that we set to 0.04. 122 00:06:02,966 --> 00:06:04,600 That is the minimum support. 123 00:06:04,600 --> 00:06:07,533 And of course this time we have the middle and equals to two. 124 00:06:07,533 --> 00:06:10,500 Remember before the million parameter was set to one. 125 00:06:10,500 --> 00:06:13,933 But we didn't have any problem with that because all our rules contained 126 00:06:13,933 --> 00:06:15,633 at least two products. 127 00:06:15,633 --> 00:06:19,333 But when using class we need to set this Midlane parameter to two, 128 00:06:19,566 --> 00:06:23,433 because otherwise we will get some sets of only one item. 129 00:06:24,066 --> 00:06:24,600 Okay. 130 00:06:24,600 --> 00:06:27,600 Then we have these other more advanced informations here. 131 00:06:27,800 --> 00:06:30,866 And then there is something that I would like to highlight here. 132 00:06:31,200 --> 00:06:34,800 It's the number of sets and not rules. 133 00:06:34,800 --> 00:06:38,833 Remember before we had you know, let's say we had 845, 134 00:06:38,900 --> 00:06:43,766 it was written 845 rules because we had the rules of the forum. 135 00:06:44,100 --> 00:06:45,566 If people buy live cream, 136 00:06:45,566 --> 00:06:48,900 then they're likely to buy chicken with a confidence of 40%. 137 00:06:48,900 --> 00:06:51,033 That is, with 40% chance. 138 00:06:51,033 --> 00:06:54,900 And here, since, you know, we don't get these kind of rules of this form 139 00:06:55,166 --> 00:07:00,600 and we only get two sets of items, well, this time we have indeed 845 sets. 140 00:07:00,933 --> 00:07:04,033 So even if Eclair is considered as a association 141 00:07:04,033 --> 00:07:07,100 rule learning model, well this doesn't return rules. 142 00:07:07,100 --> 00:07:09,533 This actually returns some sets. 143 00:07:09,533 --> 00:07:11,700 Okay. So we're going to have a look at the sets. 144 00:07:11,700 --> 00:07:12,433 Right now. 145 00:07:12,433 --> 00:07:16,200 We are ready to move to the last step of our model. 146 00:07:16,200 --> 00:07:18,300 Well we can actually jump back to the slide. 147 00:07:18,300 --> 00:07:22,166 And as we can see the step two is to take all the subsets and transactions 148 00:07:22,166 --> 00:07:24,300 having higher support than minimum support. 149 00:07:24,300 --> 00:07:26,800 So that's what the function excluded itself. 150 00:07:26,800 --> 00:07:31,666 And then the last step step three is to sort this subset by decreasing support. 151 00:07:31,800 --> 00:07:35,266 So this time it's not by decreasing lift like for Priory 152 00:07:35,566 --> 00:07:37,000 there is no lift in Aguila. 153 00:07:37,000 --> 00:07:41,033 So this time we're going to sort the rules by decreasing 154 00:07:42,300 --> 00:07:44,100 support. 155 00:07:44,100 --> 00:07:44,566 All right. 156 00:07:44,566 --> 00:07:47,966 And we're going to take the ten first rules with the ten highest support. 157 00:07:48,300 --> 00:07:49,733 So we are actually ready. 158 00:07:49,733 --> 00:07:52,433 Well the whole code is actually ready. 159 00:07:52,433 --> 00:07:53,666 We did it very efficiently. 160 00:07:53,666 --> 00:07:55,366 Well it was very simple. 161 00:07:55,366 --> 00:07:57,900 So eventually let's have a look at the rules. 162 00:07:57,900 --> 00:08:01,500 And you'll see that it's much more simple and honestly less 163 00:08:01,500 --> 00:08:03,600 interesting than apriori. 164 00:08:03,600 --> 00:08:05,033 But that's what we'll get. 165 00:08:05,033 --> 00:08:08,600 So I'm going to select this line and execute. 166 00:08:09,033 --> 00:08:13,466 And as I just told you, we simply get the different sets of items 167 00:08:13,833 --> 00:08:17,200 most frequently purchased together. 168 00:08:17,500 --> 00:08:21,300 So for example, the set of items most frequently purchased together 169 00:08:21,300 --> 00:08:26,633 is mineral water and spaghetti with the support of 0.059. 170 00:08:27,066 --> 00:08:30,366 And then it's chocolate, and then it's chocolate and mineral water, 171 00:08:30,566 --> 00:08:34,733 then eggs and mineral water and that's, of course, strongly related to this 172 00:08:35,133 --> 00:08:38,133 most frequently purchased products in this store. 173 00:08:38,133 --> 00:08:41,566 So that's the éclair results, much less interesting than the apriori. 174 00:08:41,866 --> 00:08:45,800 But that can be very useful if you're looking for some very simple information. 175 00:08:46,066 --> 00:08:50,533 And by the way, since I was telling you, if we, you know, change the support 176 00:08:50,533 --> 00:08:55,266 and set it to 0.03 like we did for the apriori model in the first place, 177 00:08:55,500 --> 00:08:56,400 well, you'll see that 178 00:08:56,400 --> 00:08:59,966 we'll get the same ranking here because as you can see, the supports 179 00:09:00,133 --> 00:09:03,966 of these ten first sets of items most frequently purchase together, 180 00:09:04,200 --> 00:09:08,733 all have supports higher than 0.04 or 0.03. 181 00:09:09,200 --> 00:09:13,766 So indeed, if we retrain the model with the minimum support of 0.03, 182 00:09:14,533 --> 00:09:17,533 and if we select that again and execute, 183 00:09:17,733 --> 00:09:22,500 we get the same ranking with mineral water and spaghetti coming first 184 00:09:22,833 --> 00:09:25,833 and then frozen vegetables, mineral water 185 00:09:26,100 --> 00:09:29,700 as the 10th set of products most frequently purchased together. 186 00:09:30,333 --> 00:09:30,733 All right. 187 00:09:30,733 --> 00:09:33,200 So that was the Éclair model. 188 00:09:33,200 --> 00:09:35,300 Again, if you want to have a serious analysis 189 00:09:35,300 --> 00:09:38,333 about how to create some added value for your business, 190 00:09:38,600 --> 00:09:42,266 optimize the sales and the revenue, you should go for apriori. 191 00:09:42,600 --> 00:09:45,733 But if you are looking for some very simple informations like 192 00:09:45,733 --> 00:09:48,733 the sets of products most frequently purchased together, 193 00:09:48,833 --> 00:09:51,266 then you can go for it. 194 00:09:51,266 --> 00:09:52,466 So congratulations! 195 00:09:52,466 --> 00:09:56,833 Anyway, you now know how to implement two association rule learning models 196 00:09:57,000 --> 00:09:58,633 the apriori and Eclat. 197 00:09:58,633 --> 00:10:01,033 And you know what to use them for. 198 00:10:01,033 --> 00:10:03,500 So thank you very much for watching these tutorials 199 00:10:03,500 --> 00:10:05,700 and I look forward to seeing you in the next part. 200 00:10:05,700 --> 00:10:07,433 Reinforcement learning. 201 00:10:07,433 --> 00:10:09,166 Until then, enjoy machine learning.