1 00:00:00,200 --> 00:00:02,533 Hello and welcome to this art tutorial. 2 00:00:02,533 --> 00:00:05,533 So in the previous tutorials we did the data pre-processing step. 3 00:00:05,533 --> 00:00:08,533 And then we train our apriori model on the data set 4 00:00:08,800 --> 00:00:12,900 with a minimum support of 0.030. 5 00:00:12,900 --> 00:00:16,000 3% and a minimum confidence of 40%. 6 00:00:16,233 --> 00:00:19,933 And now we did the job, and we are finally getting to the exciting step, 7 00:00:19,933 --> 00:00:24,333 which is to visualize the results, that is, to look at the rules explicitly. 8 00:00:24,633 --> 00:00:27,633 We're going to have the list of the strongest rules, 9 00:00:27,800 --> 00:00:32,133 and we will eventually know how to place the products to optimize the sales. 10 00:00:32,533 --> 00:00:34,500 Okay, so let's do it without waiting. 11 00:00:34,500 --> 00:00:36,433 This will actually be very easy as well. 12 00:00:36,433 --> 00:00:38,100 We will only need one line. 13 00:00:38,100 --> 00:00:42,166 But before we write that line, let's jump back to the apriori algorithm. 14 00:00:42,166 --> 00:00:42,866 Intuition. 15 00:00:42,866 --> 00:00:44,700 Slide two. See the steps 16 00:00:44,700 --> 00:00:48,000 we already did in the algorithm and what steps we need to do left? 17 00:00:48,466 --> 00:00:51,666 Okay, so step one was to set the minimum support in confidence. 18 00:00:51,900 --> 00:00:53,200 That was done here. 19 00:00:53,200 --> 00:00:58,200 We set the minimum support here of 0.3 percent and the minimum confidence of 40%. 20 00:00:58,500 --> 00:00:59,933 That's not our final word. 21 00:00:59,933 --> 00:01:02,933 We might need to change the confidence again. 22 00:01:02,933 --> 00:01:05,466 But we did the first step anyway. 23 00:01:05,466 --> 00:01:08,900 And then the second step is to take all the subsets in transaction 24 00:01:09,133 --> 00:01:13,100 having higher support than the minimum 0.3 percent support. 25 00:01:13,633 --> 00:01:14,400 Okay. 26 00:01:14,400 --> 00:01:19,600 So this step two was actually completed when we train the apriori on our data set. 27 00:01:19,600 --> 00:01:22,166 So the function did it itself. 28 00:01:22,166 --> 00:01:23,800 And that's the same for step three. 29 00:01:23,800 --> 00:01:27,033 The step three is to take all the rules of these subsets 30 00:01:27,033 --> 00:01:30,033 that have a higher support than 0.3 percent. 31 00:01:30,566 --> 00:01:33,233 And so we need to take all the rules of these subsets 32 00:01:33,233 --> 00:01:36,933 that have higher confidence than this minimum confidence of 40%. 33 00:01:37,200 --> 00:01:40,000 And so again this was completed when we trained 34 00:01:40,000 --> 00:01:43,966 the apriori model on a data set thanks to this apriori function. 35 00:01:44,366 --> 00:01:45,900 Okay. So step three done. 36 00:01:45,900 --> 00:01:48,200 And now we're finally getting to step four. 37 00:01:48,200 --> 00:01:51,566 And that's the step that will lead us to the visualization of the results. 38 00:01:51,966 --> 00:01:56,500 So this step four is to sort the rules by their decreasing lift. 39 00:01:57,033 --> 00:01:59,533 As a rule explaining the intuition tutorial 40 00:01:59,533 --> 00:02:03,600 to lift is the best metric to measure the relevance of a rule. 41 00:02:03,800 --> 00:02:06,366 So that's why we're sorting the rules by their lift 42 00:02:06,366 --> 00:02:09,366 rather than their confidence or their support. 43 00:02:09,566 --> 00:02:12,833 So that's what we're going to do in this single line of code. 44 00:02:13,300 --> 00:02:16,500 And to do this we're going to use the inspect function. 45 00:02:16,766 --> 00:02:18,066 So we type here inspect. 46 00:02:19,333 --> 00:02:21,033 And so in this inspect function 47 00:02:21,033 --> 00:02:25,200 we can directly have a look at the rules by typing rules here. 48 00:02:25,566 --> 00:02:28,100 And then you know if we want to have a look at the first ten rules 49 00:02:28,100 --> 00:02:32,733 we need to specify here in brackets one colon ten. 50 00:02:32,733 --> 00:02:35,100 So here we will get the first ten rules. 51 00:02:35,100 --> 00:02:38,900 But that's not very interesting because this will just give us 52 00:02:38,900 --> 00:02:41,933 the ten first rules found by our apriori model. 53 00:02:41,933 --> 00:02:44,466 So that's not going to be the rules. 54 00:02:44,466 --> 00:02:46,066 Having the highest lifts. 55 00:02:46,066 --> 00:02:48,100 That is the most relevant rules. 56 00:02:48,100 --> 00:02:51,900 So that's why right now we need to sort the rules 57 00:02:52,066 --> 00:02:55,633 to get indeed the ten first rules that have the ten highest lifts. 58 00:02:56,100 --> 00:02:57,400 So let's do this. 59 00:02:57,400 --> 00:03:01,033 We need to add another function here, which is the sort function 60 00:03:01,033 --> 00:03:05,966 which obviously is used to sort any table by a decreasing or increasing variable. 61 00:03:06,133 --> 00:03:10,266 So actually the first argument of this sort function is rules 62 00:03:10,466 --> 00:03:13,733 that contains all the rules found by our apriori model. 63 00:03:14,233 --> 00:03:17,400 And then the second argument is by 64 00:03:17,600 --> 00:03:20,966 which tells by what we want to sort the rules. 65 00:03:21,200 --> 00:03:24,200 And of course we want to sort the rules by their lift. 66 00:03:24,500 --> 00:03:28,566 So here we'll add by equals and then lift. 67 00:03:29,033 --> 00:03:32,766 And we keep this one content here to get the ten first rules 68 00:03:32,766 --> 00:03:34,766 that have the ten highest lifts. 69 00:03:34,766 --> 00:03:35,100 All right. 70 00:03:35,100 --> 00:03:36,900 So that's actually ready. 71 00:03:36,900 --> 00:03:38,633 We are ready to visualize the result. 72 00:03:38,633 --> 00:03:43,500 That is to clearly see what the rules are what the strongest rules are okay. 73 00:03:43,500 --> 00:03:46,500 So let's execute this. 74 00:03:47,000 --> 00:03:49,033 And here we go. All right. 75 00:03:49,033 --> 00:03:50,033 That's the rules. 76 00:03:50,033 --> 00:03:53,466 As I told you we get very explicit and clear rules. 77 00:03:53,833 --> 00:03:57,300 The first rule is if people buy mineral water and holy pasta, 78 00:03:57,300 --> 00:04:00,900 they will also buy olive oil in 40% of the cases. 79 00:04:01,300 --> 00:04:04,433 Then the second rule is if people buy spaghetti in tomato sauce, 80 00:04:04,633 --> 00:04:08,166 they will also buy ground beef and 48% of the cases. 81 00:04:08,600 --> 00:04:11,900 Then if people buy French fries and herb and pepper, they will also buy 82 00:04:11,900 --> 00:04:14,900 ground beef and 46% of the cases. 83 00:04:15,000 --> 00:04:16,466 So that's how the rules work. 84 00:04:16,466 --> 00:04:19,200 That's how they are presented thanks to this package. 85 00:04:19,200 --> 00:04:23,033 And so these are the first ten rules with the ten highest lifts. 86 00:04:23,466 --> 00:04:27,366 However do you see something that is not necessarily very relevant here. 87 00:04:27,933 --> 00:04:30,600 Well if we have a better look at these rules we can notice 88 00:04:30,600 --> 00:04:34,566 that some products are present in a set of items 89 00:04:34,733 --> 00:04:39,033 not because they make a good association, but because they have a high support. 90 00:04:39,500 --> 00:04:42,400 A good example of this is the chocolate here. 91 00:04:42,400 --> 00:04:46,066 This set of items, composed of chocolate and herb and pepper, corresponds 92 00:04:46,066 --> 00:04:49,500 to some customers that bought chocolate and herb and pepper in their basket. 93 00:04:49,866 --> 00:04:52,500 And according to this rule, well, these customers 94 00:04:52,500 --> 00:04:55,666 also buy ground beef in 44% of the cases. 95 00:04:56,066 --> 00:04:57,966 But the thing is, they don't want to buy ground beef 96 00:04:57,966 --> 00:05:00,966 because they had some chocolate in their basket in the first place. 97 00:05:01,133 --> 00:05:05,033 Again, that doesn't make any sense, but it's not because the confidence is too low 98 00:05:05,033 --> 00:05:08,400 this time, it's because the chocolate has a very high support. 99 00:05:08,833 --> 00:05:09,800 We can see that here. 100 00:05:09,800 --> 00:05:13,766 Chocolate is the fifth product most purchased by these French 101 00:05:13,766 --> 00:05:15,500 customers in the south of France, 102 00:05:15,500 --> 00:05:19,400 and therefore this product, chocolate here falls in a lot of baskets. 103 00:05:19,633 --> 00:05:23,766 And especially in this basket number six here, composed of chocolate and herb 104 00:05:23,766 --> 00:05:24,433 and pepper, 105 00:05:24,433 --> 00:05:27,333 as well as this basket number seven here composed of chocolate, 106 00:05:27,333 --> 00:05:28,900 mineral water and shrimp. 107 00:05:28,900 --> 00:05:32,366 And by the way, in this same basket there is mineral water 108 00:05:32,500 --> 00:05:35,433 that also falls in this eight basket here. 109 00:05:35,433 --> 00:05:39,166 And that's because mineral water is the most purchased product in the store. 110 00:05:39,333 --> 00:05:42,000 So of course it's falling in a lot of baskets. 111 00:05:42,000 --> 00:05:44,433 So these products have high support. 112 00:05:44,433 --> 00:05:47,433 However, this doesn't mean that we have to change the support now 113 00:05:47,533 --> 00:05:51,566 because first we still want to validate our first business points, which was to 114 00:05:51,600 --> 00:05:55,200 consider only the products that are bought at least 3 or 4 times a day. 115 00:05:55,466 --> 00:05:58,600 But what might be helpful to change now is the confidence. 116 00:05:58,966 --> 00:06:03,100 Because indeed, we asked for the rules that have at least a 40% confidence. 117 00:06:03,100 --> 00:06:07,100 You can see that all the rules here have confidence over 40%. 118 00:06:07,433 --> 00:06:09,866 And the reason the rules have high confidence 119 00:06:09,866 --> 00:06:12,866 is because the rules are associated to the baskets 120 00:06:12,900 --> 00:06:16,200 that contain the most purchased products in the store. 121 00:06:16,633 --> 00:06:20,100 Because of course, if the baskets contain the most purchased products in the store, 122 00:06:20,400 --> 00:06:24,700 well, these products will go together and the basket, and it won't be 123 00:06:24,700 --> 00:06:28,433 because of some association rules related to this principle. 124 00:06:28,433 --> 00:06:29,500 People who bought also. 125 00:06:29,500 --> 00:06:31,033 But but it will be because 126 00:06:31,033 --> 00:06:34,600 simply the baskets contain the most purchased products overall. 127 00:06:34,966 --> 00:06:38,533 So in order to avoid this, the first idea would be to change the supports. 128 00:06:38,733 --> 00:06:42,400 But we don't want to change our business related starting points, you know, 129 00:06:42,400 --> 00:06:45,400 to consider the products, purchase at least 3 or 4 times a day. 130 00:06:45,533 --> 00:06:49,600 So the other idea that remains left is, of course, to change to confidence. 131 00:06:49,933 --> 00:06:51,533 Because if we reduce the confidence 132 00:06:51,533 --> 00:06:53,833 now, well, we won't get these specific rules 133 00:06:53,833 --> 00:06:55,200 that exist because they're 134 00:06:55,200 --> 00:06:58,800 associated to the most purchased products that fall in the same baskets. 135 00:06:59,133 --> 00:07:02,133 But we will get the most relevant rules that we are looking for 136 00:07:02,133 --> 00:07:04,100 and that are related to this principle. 137 00:07:04,100 --> 00:07:05,866 People who bought also bought. 138 00:07:05,866 --> 00:07:07,900 So that's what we're going to do right now. 139 00:07:07,900 --> 00:07:10,600 We're going to change the confidence will reduce it. 140 00:07:10,600 --> 00:07:13,600 And what we're going to do is what we already did the first time. 141 00:07:13,733 --> 00:07:16,500 You know, remember we had a confidence of 0.8. 142 00:07:16,500 --> 00:07:17,933 That gave us no rule. 143 00:07:17,933 --> 00:07:22,200 And then we divided it by two to obtain this confidence of 0.4. 144 00:07:22,333 --> 00:07:25,333 That gave us these rules related to the most purchased products. 145 00:07:25,566 --> 00:07:28,800 So now we're going to do the same and divide the confidence by two. 146 00:07:28,800 --> 00:07:31,866 Again, to get a confidence of 20%. 147 00:07:32,200 --> 00:07:35,200 And that should lead us to some more relevant rules 148 00:07:35,400 --> 00:07:39,666 that are related to the association rules that we're looking for, 149 00:07:39,866 --> 00:07:41,300 related to the principle. 150 00:07:41,300 --> 00:07:43,200 People who bought also bought. 151 00:07:43,200 --> 00:07:45,466 Okay, so let's do it. 152 00:07:45,466 --> 00:07:48,000 Let's try these new rules. 153 00:07:48,000 --> 00:07:51,000 Because by selecting this line and executing it, 154 00:07:51,033 --> 00:07:53,300 the apriori model will be trained again. 155 00:07:53,300 --> 00:07:54,900 And so we will get some new rules. 156 00:07:54,900 --> 00:07:57,300 So let's do it. Execute. 157 00:07:57,300 --> 00:08:01,466 And here it is okay first of all we get a lot more rules. 158 00:08:01,466 --> 00:08:04,366 We get 1348 rules. 159 00:08:04,366 --> 00:08:07,666 And that's expected of course because since we reduced 160 00:08:07,666 --> 00:08:13,000 the confidence down to 0.2, well of course the algorithm found a lot more rules. 161 00:08:13,000 --> 00:08:17,133 But don't worry, we will not have a look at the 1348 rules. 162 00:08:17,300 --> 00:08:21,533 We will still look at the ten first rules with the ten highest lifts, and that's 163 00:08:21,533 --> 00:08:25,500 what we're going to do right now by selecting this line and execute it. 164 00:08:25,966 --> 00:08:28,133 And here are the new rules okay. 165 00:08:28,133 --> 00:08:29,166 So let's have a look. 166 00:08:29,166 --> 00:08:33,333 The first rule is if people buy mineral water and holy pasta 167 00:08:33,333 --> 00:08:36,600 they will buy olive oil in 40% of the cases. 168 00:08:36,933 --> 00:08:37,533 Okay. 169 00:08:37,533 --> 00:08:38,933 So that's the rule 170 00:08:38,933 --> 00:08:41,933 that makes a lot of sense even if there is still mineral water. 171 00:08:42,166 --> 00:08:43,500 But, you know, 172 00:08:43,500 --> 00:08:44,200 it's still a rule 173 00:08:44,200 --> 00:08:48,133 that makes sense because, you know, this might be related to some people who 174 00:08:48,333 --> 00:08:53,333 are looking to have some healthy meals with mineral water, whole wheat pasta, 175 00:08:53,500 --> 00:08:56,433 and of course, olive oil is very healthy as well. 176 00:08:56,433 --> 00:09:00,766 So these go well together and actually this is kind of a relevant rule. 177 00:09:01,100 --> 00:09:04,100 Olive oil should be placed not too far from whole wheat pasta. 178 00:09:04,900 --> 00:09:05,166 Okay. 179 00:09:05,166 --> 00:09:08,400 Then the second rule is if people buy frozen vegetables, 180 00:09:08,400 --> 00:09:11,933 milk, mineral water, then they will buy soup. 181 00:09:12,200 --> 00:09:15,200 And 27% of the cases. 182 00:09:15,233 --> 00:09:18,033 Again, this is a rule that actually makes sense. 183 00:09:18,033 --> 00:09:19,500 Still, related to the need 184 00:09:19,500 --> 00:09:23,200 of having healthy meals and milk can go quite well with soup. 185 00:09:23,500 --> 00:09:26,500 Well, I know this is the case for French people. 186 00:09:26,700 --> 00:09:29,700 French people do tend to put some milk in their soup. 187 00:09:30,033 --> 00:09:33,033 And oh, by the way, speaking of a French tradition, 188 00:09:33,200 --> 00:09:36,233 this is typically what French people love. 189 00:09:36,233 --> 00:09:38,733 Absolutely. From us. Blown with honey. 190 00:09:38,733 --> 00:09:40,833 So, for example, for those of you who don't know, it's 191 00:09:40,833 --> 00:09:43,900 kind of some good cheese, I invite you to look at Wikipedia. 192 00:09:44,166 --> 00:09:47,166 But anyway, this goes very well with honey. 193 00:09:47,366 --> 00:09:51,300 You know, in a lot of French restaurants you will find for marshmallow with honey 194 00:09:51,300 --> 00:09:52,700 in the dessert menu. 195 00:09:52,700 --> 00:09:56,366 But, people also make this at their place. 196 00:09:56,366 --> 00:09:59,700 So that's kind of a very good association rule, 197 00:10:00,033 --> 00:10:02,566 even if for marshmallow is very different than honey. 198 00:10:02,566 --> 00:10:05,433 Well, these two products associate very well. 199 00:10:05,433 --> 00:10:09,533 And, you know, the direction counts here, because if we want to buy from us below, 200 00:10:09,566 --> 00:10:10,866 then we want to buy honey. 201 00:10:10,866 --> 00:10:13,866 Because honey goes very well onto from us below. 202 00:10:14,000 --> 00:10:17,866 But if we buy honey, we don't necessarily want to buy from us below 203 00:10:17,900 --> 00:10:20,900 because we don't want to have from us, not in our honey. 204 00:10:21,066 --> 00:10:26,100 It's rather we want to have honey on our from us and not from us, but on our honey. 205 00:10:26,566 --> 00:10:27,600 All right. 206 00:10:27,600 --> 00:10:29,200 And then what do we have? We have. 207 00:10:29,200 --> 00:10:33,466 If people buy spaghetti and tomato sauce, they want to buy ground beef. 208 00:10:33,566 --> 00:10:35,266 Well, that makes a lot of sense. 209 00:10:35,266 --> 00:10:38,566 That's of course, to make some spaghetti bolognese. 210 00:10:39,000 --> 00:10:41,400 All right. So interesting. Quite classic. 211 00:10:41,400 --> 00:10:42,766 You know, we don't need 212 00:10:42,766 --> 00:10:46,366 a, machine learning algorithm to find out about this rule, but, 213 00:10:46,666 --> 00:10:51,800 you know, it's actually what French people love associating in their basket. 214 00:10:51,800 --> 00:10:55,600 So, of course, the ground beef shouldn't be too far from the spaghetti 215 00:10:55,766 --> 00:10:57,600 and the tomato sauce. 216 00:10:57,600 --> 00:11:00,600 And then this rule, if people buy light cream, 217 00:11:00,866 --> 00:11:03,866 they will buy chicken in 29% of the cases. 218 00:11:04,366 --> 00:11:07,366 So not necessarily obvious. 219 00:11:07,500 --> 00:11:08,566 You know, 220 00:11:08,566 --> 00:11:12,333 if the manager of the store had to place the product itself 221 00:11:12,333 --> 00:11:15,633 without any algorithm, this manager would not have necessarily 222 00:11:15,633 --> 00:11:19,133 thought to place the chicken next to the light cream. 223 00:11:19,700 --> 00:11:23,466 And, you know, if we try to explain this rule, it's probably because, 224 00:11:23,700 --> 00:11:27,700 you know, people who buy some light cream want to pay attention to what they eat. 225 00:11:27,933 --> 00:11:32,866 And so since chicken is a lighter meat and maybe healthier meat than red meat 226 00:11:32,933 --> 00:11:36,433 like ground beef, well, if they buy light cream and want to associate it 227 00:11:36,433 --> 00:11:39,433 with some meat, well, they would rather choose chicken. 228 00:11:39,766 --> 00:11:43,966 That's you know, if we try to explain the rules with some good sense 229 00:11:44,566 --> 00:11:45,366 and then what do we have? 230 00:11:45,366 --> 00:11:49,766 We have if people buy pasta, they will buy scallop in 37% of the cases. 231 00:11:50,166 --> 00:11:51,966 Well why not? That goes well together. 232 00:11:51,966 --> 00:11:53,566 Well, that's simply what 233 00:11:53,566 --> 00:11:57,000 French people in the south of France love to eat in their meals. 234 00:11:57,400 --> 00:11:59,733 And maybe that's also related to French taste 235 00:11:59,733 --> 00:12:02,733 that French people associate like cream with chicken. 236 00:12:02,833 --> 00:12:05,833 If we were in India, it would probably have been butter here 237 00:12:06,066 --> 00:12:09,066 for, you know, butter chicken, which is a very good Indian meal. 238 00:12:09,366 --> 00:12:09,666 Okay. 239 00:12:09,666 --> 00:12:12,566 And then we have French fries and herb and pepper that go 240 00:12:12,566 --> 00:12:13,966 very well with ground beef. 241 00:12:13,966 --> 00:12:16,600 Of course, that's a classic French meal. 242 00:12:16,600 --> 00:12:18,866 then cereal spaghetti, ground beef. 243 00:12:18,866 --> 00:12:19,200 Okay. 244 00:12:19,200 --> 00:12:21,733 That's a rule that doesn't make necessarily much sense. 245 00:12:21,733 --> 00:12:25,500 And maybe it's due to the same logic that I was explaining earlier that is that, 246 00:12:25,500 --> 00:12:29,266 you know, a lot of people buy cereals, that a lot of people buy spaghetti. 247 00:12:29,466 --> 00:12:32,466 So spaghetti and cereals often fall in the same basket. 248 00:12:32,833 --> 00:12:36,466 And since a lot of people associate spaghetti with ground beef, well, 249 00:12:36,466 --> 00:12:40,466 we find this rule if people buy cereals and spaghetti, they will buy ground beef. 250 00:12:40,533 --> 00:12:42,600 So let's be careful with this one. 251 00:12:42,600 --> 00:12:45,933 Cereals is not necessarily associated to ground beef. 252 00:12:46,200 --> 00:12:49,900 But you know we can investigate further okay. 253 00:12:50,233 --> 00:12:52,166 And then we have these last two rules. 254 00:12:52,166 --> 00:12:55,166 If people buy frozen vegetables mineral water and soup 255 00:12:55,200 --> 00:12:58,500 they're likely to buy milk in 60% of the cases. 256 00:12:58,500 --> 00:13:01,200 Well, this rule is probably related to two facts. 257 00:13:01,200 --> 00:13:04,766 First, the fact that milk goes very well with soup for French people, 258 00:13:05,066 --> 00:13:08,433 and also because, you know, all this looks very healthy. 259 00:13:08,666 --> 00:13:10,866 So that goes quite well together. 260 00:13:10,866 --> 00:13:14,500 And then the rest rules French fries and ground beef that go again 261 00:13:14,500 --> 00:13:16,033 very well with herb and pepper. Yes. 262 00:13:16,033 --> 00:13:17,066 Because of course 263 00:13:17,066 --> 00:13:20,700 here we have French fries and herb and pepper that lead to ground beef. 264 00:13:21,000 --> 00:13:25,433 Well of course an association rules the other direction can sometimes be true. 265 00:13:25,433 --> 00:13:27,166 And that's exactly the case here. 266 00:13:27,166 --> 00:13:28,100 Here we have 267 00:13:28,100 --> 00:13:32,233 if people buy ground beef and French fries they will also buy herb and pepper. 268 00:13:32,233 --> 00:13:36,300 That's kind of a triangle association rules here that we observe with the 269 00:13:36,300 --> 00:13:40,800 three sides of the triangle being French fries, ground beef and herb and pepper. 270 00:13:41,366 --> 00:13:42,600 So that's quite common, 271 00:13:42,600 --> 00:13:46,866 but it's not always the case to observe the two directions okay. 272 00:13:47,133 --> 00:13:50,600 So that's definitely very helpful for this tour. 273 00:13:50,600 --> 00:13:54,300 Now thanks to these rules we can experience a new placement 274 00:13:54,300 --> 00:13:55,300 of the products here. 275 00:13:55,300 --> 00:13:59,133 We can also look at the 21st rules with the 21st lifts. 276 00:13:59,600 --> 00:14:02,466 We won't do it now because we get the point here. 277 00:14:02,466 --> 00:14:07,066 However, what we can do is you know, we actually try another support. 278 00:14:07,233 --> 00:14:11,166 We tried three values for the confidence, but we only tried one 279 00:14:11,166 --> 00:14:12,600 value of the support. 280 00:14:12,600 --> 00:14:15,100 And remember we had this hypothesis. 281 00:14:15,100 --> 00:14:17,033 I'm in this business starting point. 282 00:14:17,033 --> 00:14:18,133 When we said that we wanted 283 00:14:18,133 --> 00:14:21,666 to consider the products that are but at least 3 or 4 times a day. 284 00:14:21,900 --> 00:14:27,000 Well, this support here, 0.03, is related to the business starting point 285 00:14:27,000 --> 00:14:30,000 of considering products that are bought at least three times a day. 286 00:14:30,400 --> 00:14:33,866 Then what if we consider now products that are bought at least four times a day? 287 00:14:34,200 --> 00:14:36,166 So that's what we're going to do right now. 288 00:14:36,166 --> 00:14:39,200 This is the only other support value that we're going to try. 289 00:14:39,500 --> 00:14:40,866 So let's try this. 290 00:14:40,866 --> 00:14:43,833 Remember the computation that leads to the support. 291 00:14:43,833 --> 00:14:46,833 If we come to the products that are bought at least four times a day, 292 00:14:47,100 --> 00:14:50,100 then on average these products will be bought 293 00:14:50,400 --> 00:14:53,400 four times seven times a week. 294 00:14:53,666 --> 00:14:56,400 And then in order to get the support, we need to divided 295 00:14:56,400 --> 00:15:01,466 by the total number of transactions, which is 7005 hundreds. 296 00:15:02,266 --> 00:15:04,033 All right. So let's see what we get. 297 00:15:04,033 --> 00:15:06,300 We get a minimum support of 298 00:15:07,333 --> 00:15:09,666 0.00 37. 299 00:15:09,666 --> 00:15:14,266 So this time if we round it we get a minimum support of 0.04. 300 00:15:14,266 --> 00:15:18,400 So let's try this 10.004 that corresponds to products 301 00:15:18,666 --> 00:15:22,233 that were purchased at least four times a day instead of three previously. 302 00:15:22,500 --> 00:15:26,900 So let's train the apriori model again to obtain the new rules. 303 00:15:27,133 --> 00:15:30,700 So I'm going to execute this and we get the new rules. 304 00:15:30,900 --> 00:15:34,866 And since we increase the support we find a smaller 305 00:15:34,866 --> 00:15:38,533 number of rules 811 rules compared to previously. 306 00:15:38,533 --> 00:15:40,933 You know, we had more than 1000 rules. 307 00:15:40,933 --> 00:15:45,600 That's because we increased the support and kept the same 20% confidence. 308 00:15:45,900 --> 00:15:46,500 Okay. 309 00:15:46,500 --> 00:15:50,566 So now let's have a look at these new rules ordered by their decreasing lift 310 00:15:51,166 --> 00:15:53,733 okay. So let's execute this. 311 00:15:53,733 --> 00:15:56,100 And here are the new rules okay. Very interesting. 312 00:15:56,100 --> 00:16:00,766 First by increasing the supports and therefore excluding some products 313 00:16:00,766 --> 00:16:05,466 that have supports between 0.03 and 0.04. 314 00:16:05,866 --> 00:16:08,166 Well remember this first rule. 315 00:16:08,166 --> 00:16:11,000 If people buy light cream they will also buy chicken. 316 00:16:11,000 --> 00:16:13,400 Well this rule now became the top rule. 317 00:16:13,400 --> 00:16:15,733 That's definitely a great rule to consider. 318 00:16:15,733 --> 00:16:18,833 And chicken should definitely be close to light cream. 319 00:16:19,300 --> 00:16:19,566 Okay. 320 00:16:19,566 --> 00:16:22,566 And then we have if people buy pasta they buy scallop. 321 00:16:22,700 --> 00:16:24,933 Well we also had this rule before. 322 00:16:24,933 --> 00:16:26,600 However we have a new rule here. 323 00:16:26,600 --> 00:16:29,600 If people buy pasta they will also buy shrimp. 324 00:16:29,700 --> 00:16:31,800 Yes of course this is the south of France. 325 00:16:31,800 --> 00:16:33,300 People are close to the sea. 326 00:16:33,300 --> 00:16:36,866 Well, either the Mediterranean Sea or the Atlantic Ocean. 327 00:16:37,166 --> 00:16:41,200 And of course, people love associating shrimp with their pasta. 328 00:16:41,666 --> 00:16:44,833 That's a very common and exquisite meal in the south of France. 329 00:16:45,433 --> 00:16:46,166 And then what do we have? 330 00:16:46,166 --> 00:16:49,166 We have if people buy eggs and ground beef, 331 00:16:49,433 --> 00:16:53,033 they're likely to buy herb and pepper in 20% of the cases. 332 00:16:53,333 --> 00:16:57,200 Well, of course, having pepper is very good on both eggs and ground beef. 333 00:16:57,733 --> 00:17:02,400 And then we have the same kind of rules as before, except for this new rule here. 334 00:17:02,833 --> 00:17:07,500 French people love to associate mushroom cream sauce with scallop. 335 00:17:07,833 --> 00:17:09,600 So indeed it's very good meal. 336 00:17:09,600 --> 00:17:12,600 Mushroom cream sauce goes very well with scallop. 337 00:17:12,666 --> 00:17:13,033 Okay. 338 00:17:13,033 --> 00:17:17,766 So interesting new rules that we obtain when we increase the support. 339 00:17:18,133 --> 00:17:22,100 So the manager of the store now should, you know, experience 340 00:17:22,100 --> 00:17:26,133 this new rules that we observe here by placing the new products of these rules 341 00:17:26,133 --> 00:17:27,233 next to each other, 342 00:17:27,233 --> 00:17:31,000 then experience for a few weeks and then observe the impact on the sales, 343 00:17:31,266 --> 00:17:34,266 how much the sales increase, how much the revenue increases, 344 00:17:34,466 --> 00:17:36,633 and then see if the business goals are achieved. 345 00:17:36,633 --> 00:17:39,600 And if that's the case, try to strengthen the rules 346 00:17:39,600 --> 00:17:43,900 or try some more powerful rules by changing the support in confidence or 347 00:17:43,900 --> 00:17:47,200 on the other hand, if the business goals are not achieved, well, same. 348 00:17:47,200 --> 00:17:50,133 We can try to get some new rules by increasing the confidence 349 00:17:50,133 --> 00:17:51,700 and maybe the support. 350 00:17:51,700 --> 00:17:54,133 Well, that's all related to experience. 351 00:17:54,133 --> 00:17:57,900 That's what data analysts and managers do in retail stores, 352 00:17:57,900 --> 00:18:01,800 whether there are online stores or grocery stores or any kind of stores. 353 00:18:02,000 --> 00:18:05,333 Well, they use these association rules and update them. 354 00:18:05,333 --> 00:18:09,166 And of course, combine these rules with other recommendation system 355 00:18:09,166 --> 00:18:13,133 techniques like collaborative filtering with, you know, the user profiles 356 00:18:13,133 --> 00:18:17,700 that can add some additional relevant info, and also other more advanced 357 00:18:17,700 --> 00:18:21,433 techniques like the neighborhood model, the latent factor models. 358 00:18:21,433 --> 00:18:25,133 Well, they combine a lot of models to increase the sales and the revenue. 359 00:18:25,800 --> 00:18:28,866 but here you have a very good technique, very powerful technique, 360 00:18:28,866 --> 00:18:30,233 association rules. 361 00:18:30,233 --> 00:18:35,000 So congratulations for having implemented this first recommendation system 362 00:18:35,000 --> 00:18:36,000 in some way. 363 00:18:36,000 --> 00:18:40,033 And I really hope this will be useful for your business and your work. 364 00:18:40,233 --> 00:18:42,733 And don't hesitate if you have any questions about this. 365 00:18:42,733 --> 00:18:45,300 We will answer your questions very quickly. 366 00:18:45,300 --> 00:18:47,333 So that was the apriori algorithm. 367 00:18:47,333 --> 00:18:49,333 I was very happy to build this model with you. 368 00:18:49,333 --> 00:18:50,733 Congratulations again. 369 00:18:50,733 --> 00:18:54,766 Now in the next section we are going to implement the Eclat algorithm, 370 00:18:54,766 --> 00:18:58,866 which is very close to the apriori algorithm but much more simple. 371 00:18:58,866 --> 00:19:01,866 So that's another very good solution if you want to be very efficient 372 00:19:01,933 --> 00:19:05,100 and don't have too much time to experience different values of the support 373 00:19:05,333 --> 00:19:08,033 and the confidence, because indeed you will see that 374 00:19:08,033 --> 00:19:11,200 there is no confidence parameter in the algorithm. 375 00:19:11,666 --> 00:19:12,133 So I look 376 00:19:12,133 --> 00:19:15,800 forward to building this new association rule model with you in the next section. 377 00:19:15,933 --> 00:19:17,633 Until then, enjoy machine learning.