1 00:00:01,733 --> 00:00:04,133 Hello and welcome back to the course on Machine Learning. 2 00:00:04,133 --> 00:00:07,900 Today we're continuing exploring association rule learning. 3 00:00:07,900 --> 00:00:10,900 And we're talking about the intuition behind the model. 4 00:00:11,200 --> 00:00:15,100 So the cloud model is very, very simple. 5 00:00:15,100 --> 00:00:17,066 After we have already studied 6 00:00:17,066 --> 00:00:20,900 the apriori model, it's kind of like a simplified, version. 7 00:00:21,000 --> 00:00:23,166 All right. So let's have a look. 8 00:00:23,166 --> 00:00:26,266 it also talks about the cloud models, talks about people 9 00:00:26,266 --> 00:00:27,300 who are bored, also bored. 10 00:00:27,300 --> 00:00:30,300 So it's kind of like a, recommender system. 11 00:00:30,366 --> 00:00:33,566 And similar to what we had in the, 12 00:00:34,000 --> 00:00:37,000 a, a algorithm. 13 00:00:37,266 --> 00:00:40,500 here we've got, for instance, movies and we've got some potential rules. 14 00:00:40,500 --> 00:00:42,600 So basically exactly the same things. 15 00:00:42,600 --> 00:00:45,666 If you've got, your, 16 00:00:45,666 --> 00:00:49,733 movie lists or people, the movies that people liked, people 17 00:00:49,733 --> 00:00:53,100 who like movie one hour or just generally looks like 18 00:00:53,700 --> 00:00:57,133 if somebody likes movie one, they're very likely to like movie two, 19 00:00:58,000 --> 00:01:01,000 if they like movie two, they're likely to like movie four. 20 00:01:01,100 --> 00:01:03,466 If they like movie one, they might like movie three. 21 00:01:03,466 --> 00:01:08,066 Again, these rules have will have different, strength. 22 00:01:08,066 --> 00:01:12,000 But here we are actually going to be talking about rules per se, 23 00:01:12,000 --> 00:01:15,366 because the cloud model is different to the apriori model. 24 00:01:15,366 --> 00:01:20,833 in the parallel model we came up with rules towards the end that's output. 25 00:01:20,833 --> 00:01:23,833 And we based on the left, we could, judge, 26 00:01:23,866 --> 00:01:27,133 the strength of each rule, whereas here 27 00:01:27,566 --> 00:01:31,633 we are, going to be talking about sets and you'll see why just now. 28 00:01:31,833 --> 00:01:35,366 So here we've got the market basket optimization. 29 00:01:35,366 --> 00:01:40,066 Same thing that people who buy burgers are likely to buy French fries as well. 30 00:01:40,066 --> 00:01:42,466 People who I buy vegetables, I like to buy fruits. 31 00:01:42,466 --> 00:01:44,200 And these are just some potential rules. 32 00:01:44,200 --> 00:01:46,200 So we're not saying they're, strong 33 00:01:47,200 --> 00:01:49,900 or some potential outcomes that we're looking at. 34 00:01:49,900 --> 00:01:52,000 We're not saying that they're strong or we're not selecting 35 00:01:52,000 --> 00:01:54,533 and we're just saying what could potentially be. 36 00:01:54,533 --> 00:01:58,700 And then the cloud model is responsible for actually 37 00:01:58,700 --> 00:02:03,300 going through all of these combinations and telling us what we should focus on. 38 00:02:04,633 --> 00:02:04,933 all right. 39 00:02:04,933 --> 00:02:08,833 So in the cloud model, just like in the a priori model, 40 00:02:08,833 --> 00:02:12,166 we have the support factor. 41 00:02:12,166 --> 00:02:17,666 So they're producing the apriori model we had or the algorithm we had support. 42 00:02:17,666 --> 00:02:20,400 We had confidence and we had lift in their cloud model. 43 00:02:20,400 --> 00:02:21,900 We only have support. 44 00:02:21,900 --> 00:02:25,333 So we only are looking at okay, 45 00:02:25,333 --> 00:02:29,633 so people who are watching, a certain, 46 00:02:30,266 --> 00:02:33,700 certain combinations of movies, how often does this happen? 47 00:02:33,700 --> 00:02:37,366 And here just bear in mind that, doesn't mean just one movie. 48 00:02:37,366 --> 00:02:39,800 And this was the same for a priori. 49 00:02:39,800 --> 00:02:40,500 It was just easier 50 00:02:40,500 --> 00:02:45,600 for us to understand the intuition based on one movie or one product. 51 00:02:45,600 --> 00:02:51,766 But actually, and I what they stand for is a set of items or a set of movies. 52 00:02:51,766 --> 00:02:56,000 So as specifically in the cloud model, it's doesn't really make sense to, 53 00:02:56,766 --> 00:03:01,200 look at, you know, an item by itself because we don't have the, 54 00:03:01,600 --> 00:03:05,400 confidence and the lift factors. 55 00:03:05,400 --> 00:03:06,433 We're only looking at support. 56 00:03:06,433 --> 00:03:10,733 So we're just looking at how frequently does this set of items, occur. 57 00:03:10,866 --> 00:03:11,566 So if we just going to 58 00:03:11,566 --> 00:03:15,466 look at a set of items of, of items which contains only one item, 59 00:03:16,033 --> 00:03:18,633 then we just looking at the frequency and how what 60 00:03:18,633 --> 00:03:21,033 what is the popularity of movies. And that is very trivial. 61 00:03:21,033 --> 00:03:22,200 So we're not going to be looking at that. 62 00:03:22,200 --> 00:03:25,933 We're going to be aiming for at least two items in a set. 63 00:03:26,066 --> 00:03:31,266 And therefore M here stands for a set of two movies or more. 64 00:03:31,633 --> 00:03:37,166 And what we're calculating for a support, we're calculating. 65 00:03:37,166 --> 00:03:38,000 Okay. 66 00:03:38,000 --> 00:03:42,666 what is how often does this set of two movies, let's say, 67 00:03:43,033 --> 00:03:48,433 interstellar or and Ex Machina, how often does it 68 00:03:48,433 --> 00:03:52,433 occur in all of the watch lists, or what percentage of watch lists 69 00:03:52,433 --> 00:03:56,066 or what percentage of, 70 00:03:56,066 --> 00:04:00,366 lists of movies that people liked contain those two together? 71 00:04:00,400 --> 00:04:02,566 Not just one of them, but those two together. 72 00:04:02,566 --> 00:04:06,500 And let's say if, if hypothetically, 73 00:04:06,500 --> 00:04:11,633 if 100% of the lists that you had in a large data set contained both movies 74 00:04:11,633 --> 00:04:16,333 together, then that would imply that, you know, anybody who likes 75 00:04:16,566 --> 00:04:20,200 interstellar, likes Ex Machina, like Ex Machina likes interstellar, 76 00:04:20,466 --> 00:04:24,266 and pretty much so if anybody has seen even one of those movies 77 00:04:24,266 --> 00:04:27,266 that you need to recommend that movie or the other one to them, 78 00:04:27,633 --> 00:04:30,933 if you or if you had like 80% of 79 00:04:30,933 --> 00:04:34,666 the list of your lists had those two movies together, 80 00:04:34,900 --> 00:04:38,600 that basically mean there's a high likelihood that they come in pairs, right? 81 00:04:38,600 --> 00:04:39,866 That if somebody liked one of them, 82 00:04:39,866 --> 00:04:42,166 then they'll like that same thing for transaction. 83 00:04:42,166 --> 00:04:43,433 Like if you have, 84 00:04:43,433 --> 00:04:48,700 chips and burgers and, you know, 75% of all of your orders, right? 85 00:04:48,700 --> 00:04:53,300 Then if somebody is just buying, burgers, then they're likely to, 86 00:04:53,900 --> 00:04:58,200 then when you recommend chips to them, there's a 75% chance that they will also 87 00:04:58,700 --> 00:05:01,733 be interested or will like to buy chips with their burgers. 88 00:05:02,233 --> 00:05:06,000 And that's it's a very, very trivial, approach. 89 00:05:06,833 --> 00:05:09,866 and that's, that's pretty much it. 90 00:05:09,866 --> 00:05:11,866 That's all there is to that model. 91 00:05:11,866 --> 00:05:12,900 It's much faster. 92 00:05:12,900 --> 00:05:16,333 And the steps involved are, set a minimum support. 93 00:05:16,333 --> 00:05:21,166 So you want to set your, support level at which you want to, 94 00:05:21,166 --> 00:05:24,866 only after below, which you want to disregard anything. 95 00:05:25,466 --> 00:05:28,633 then you take all the subsets and transactions having higher support 96 00:05:28,633 --> 00:05:32,300 and mind and support and then you sort this subset in decreasing support. 97 00:05:32,300 --> 00:05:36,933 And basically at the top you will have the most, the strongest, 98 00:05:37,300 --> 00:05:40,900 combinations of items, which you should look at, 99 00:05:41,200 --> 00:05:44,666 maybe, you know, you'll look at the top ten or to file or something like that. 100 00:05:44,666 --> 00:05:46,133 So that's pretty much it. 101 00:05:46,133 --> 00:05:48,200 That's all the A club model is. 102 00:05:48,200 --> 00:05:51,966 And as you can see, it's much easier to understand after you already know, 103 00:05:52,266 --> 00:05:55,166 a bit about the apriori. 104 00:05:55,166 --> 00:05:57,000 All right. Hope you enjoyed this tutorial. 105 00:05:57,000 --> 00:06:00,000 And if we go to learned to 106 00:06:00,033 --> 00:06:02,833 implement this in practice and I'll see you here next time. 107 00:06:02,833 --> 00:06:04,300 And till then happy analyzing.