1 00:00:00,033 --> 00:00:00,333 All right. 2 00:00:00,333 --> 00:00:04,633 So remember that the cluster numbers don't go from 1 to 5, 3 00:00:04,633 --> 00:00:08,466 but from 0 to 4 because indexes in Python start from zero. 4 00:00:08,666 --> 00:00:11,533 So let's see let's open this again. 5 00:00:11,533 --> 00:00:15,766 What these numbers mean is that well the first customer, customer I.D. 6 00:00:15,766 --> 00:00:18,300 number one belongs to the last cluster. 7 00:00:18,300 --> 00:00:20,000 You know, cluster number five. 8 00:00:20,000 --> 00:00:23,166 Then the second customer belongs to cluster number four. 9 00:00:23,500 --> 00:00:27,833 Third customer belongs to cluster number five or cluster the next four as you want. 10 00:00:28,200 --> 00:00:31,466 Then customer number four belongs to cluster number four. 11 00:00:31,500 --> 00:00:33,533 All right. So this is how you should read it. 12 00:00:33,533 --> 00:00:37,200 And the last customer in this data set you know 13 00:00:37,766 --> 00:00:41,100 the customer actually number 200. 14 00:00:41,200 --> 00:00:45,966 This one of age 30 and earning a high salary and spending a lot actually 15 00:00:45,966 --> 00:00:50,433 in them all belongs to the third cluster or cluster of index. 16 00:00:51,300 --> 00:00:53,100 All right. So that's how you should read it. 17 00:00:53,100 --> 00:00:56,700 And now now let's visualize the final clusters. 18 00:00:56,700 --> 00:01:00,300 You know, now that we have this dependent variable that we just created 19 00:01:00,533 --> 00:01:03,533 through the hierarchical clustering process. 20 00:01:03,533 --> 00:01:06,100 And so there you go I'm going to close this. 21 00:01:06,100 --> 00:01:09,833 We're going to run that cell to indeed find 22 00:01:09,900 --> 00:01:15,033 actually you know the same clusters as with K-means with remember 23 00:01:15,333 --> 00:01:20,200 this cluster representing the customers that earn a low salary and don't 24 00:01:20,200 --> 00:01:24,300 spend much in the middle, and therefore we should just not target them too much 25 00:01:24,300 --> 00:01:28,900 because we want to be socially responsible and don't push them to consume too much. 26 00:01:29,000 --> 00:01:29,733 Right? 27 00:01:29,733 --> 00:01:33,300 However, this cluster is the cluster of the customers 28 00:01:33,300 --> 00:01:36,866 having a high annual income but a low spending score. 29 00:01:36,866 --> 00:01:40,066 And therefore we want to target these customers to offer them 30 00:01:40,066 --> 00:01:43,833 some more attractive deals in order to incentivize them 31 00:01:43,833 --> 00:01:47,833 to spend more in the mall, because otherwise the mall is missing out. 32 00:01:48,200 --> 00:01:51,000 Then this cluster is the cluster of customers 33 00:01:51,000 --> 00:01:54,300 having a low annual income but a high spending score. 34 00:01:54,433 --> 00:01:56,966 And therefore with these customers, you know you want to be 35 00:01:56,966 --> 00:02:00,700 the maximum socially responsible and maybe protect them 36 00:02:00,700 --> 00:02:04,300 from spending too much and potentially more than they could afford. 37 00:02:04,466 --> 00:02:09,133 So to these customers, we, for example, want to reduce any kind of advertising. 38 00:02:09,433 --> 00:02:11,966 Then we have this cluster, which is the best cluster, you know, 39 00:02:11,966 --> 00:02:15,466 the one who want to target the most because it is the cluster of the customers 40 00:02:15,466 --> 00:02:19,233 having a high annual income and at the same time spending a lot. 41 00:02:19,433 --> 00:02:23,233 So we definitely want to target these customers to, you know, 42 00:02:23,533 --> 00:02:26,500 offer them the new products and new deals, because we know that 43 00:02:26,500 --> 00:02:29,766 we have a high chance to have a high conversion rates with them. 44 00:02:29,766 --> 00:02:30,500 All right. 45 00:02:30,500 --> 00:02:32,866 And then we have this cluster which is the average cluster, 46 00:02:32,866 --> 00:02:35,866 you know, average annual income and average spending score. 47 00:02:36,000 --> 00:02:39,166 And for this cluster, well, we don't have much specific to do. 48 00:02:39,333 --> 00:02:39,966 All right. 49 00:02:39,966 --> 00:02:42,866 So these are the same five clusters as with K-means. 50 00:02:42,866 --> 00:02:47,466 But now I'm very curious to see what we get with three clusters. 51 00:02:47,700 --> 00:02:50,666 And therefore what we're going to do is try no. 52 00:02:50,666 --> 00:02:53,500 And cluster is equal three here. 53 00:02:53,500 --> 00:02:54,666 But then be careful. 54 00:02:54,666 --> 00:02:57,166 We need to actually remove two lines here. 55 00:02:57,166 --> 00:02:59,133 When, you know, visualizing the clusters. 56 00:02:59,133 --> 00:03:02,766 Because each scatterplot here corresponds to one cluster. 57 00:03:02,933 --> 00:03:03,466 And therefore. 58 00:03:03,466 --> 00:03:04,400 Now since we're about to 59 00:03:04,400 --> 00:03:08,166 have three clusters well we need to remove two clusters here. 60 00:03:08,166 --> 00:03:11,166 So we're going to remove cluster four and cluster five. 61 00:03:11,400 --> 00:03:12,533 All right. 62 00:03:12,533 --> 00:03:15,300 And therefore we're just going to end up with you know cluster 63 00:03:15,300 --> 00:03:18,600 one cluster two and cluster three of colors red blue and green. 64 00:03:18,733 --> 00:03:19,566 All right. 65 00:03:19,566 --> 00:03:22,900 So let's just run this again. 66 00:03:22,900 --> 00:03:27,400 You know we can leave the previous cells and just rerun this one to indeed 67 00:03:27,400 --> 00:03:32,166 get a new hierarchical clustering model, this time identifying three clusters. 68 00:03:32,466 --> 00:03:37,133 We can print this again in order to get the new dependent variable with this time. 69 00:03:37,133 --> 00:03:40,100 Indeed three clusters the cluster of index zero, 70 00:03:40,100 --> 00:03:43,700 which seems to contain most of the first customers, 71 00:03:44,033 --> 00:03:47,700 then the cluster of index one, the second cluster and the cluster 72 00:03:47,733 --> 00:03:50,266 index to the third cluster. Okay. 73 00:03:50,266 --> 00:03:54,166 And now I'm really curious to see what we get when visualizing the cluster. 74 00:03:54,166 --> 00:03:55,000 So here we go. 75 00:03:55,000 --> 00:04:00,066 We just have to play this cell again and let's see what we get okay. 76 00:04:00,066 --> 00:04:01,533 So yeah really 77 00:04:01,533 --> 00:04:05,800 five clusters was actually a better number because here with three clusters. 78 00:04:05,800 --> 00:04:09,266 Well the model just puts all these customers, you know 79 00:04:09,533 --> 00:04:13,533 actually the low income customers with both a low spending score, 80 00:04:13,533 --> 00:04:17,700 a high spending score into a same cluster, also taking the average one. 81 00:04:17,933 --> 00:04:21,166 And then we have these two other clusters, the high spinning score 82 00:04:21,166 --> 00:04:26,233 with the high annual income and the low spending score with the high annual income 83 00:04:26,233 --> 00:04:30,533 and you know, this still actually makes some sense, because remember that 84 00:04:30,733 --> 00:04:35,133 the clusters of customers that we really want to target after all, 85 00:04:35,133 --> 00:04:37,333 this one and this one 86 00:04:37,333 --> 00:04:40,366 and this, you know, is something we don't really want to target 87 00:04:40,366 --> 00:04:45,266 but maybe protect, you know, you know, as per your social responsibility. 88 00:04:45,533 --> 00:04:47,833 So this actually still makes kind of sense. 89 00:04:47,833 --> 00:04:49,433 And we indeed end up 90 00:04:49,433 --> 00:04:53,633 with the same focus of targeting these two important customers. 91 00:04:53,633 --> 00:04:56,366 That can boost indeed the sales. 92 00:04:56,366 --> 00:04:58,433 All right. So that was very interesting. 93 00:04:58,433 --> 00:05:03,266 I, I didn't expect actually to show you the result with three clusters. 94 00:05:03,266 --> 00:05:07,800 I was just curious to see and that's very interesting because indeed we end up 95 00:05:07,800 --> 00:05:13,066 with kind of the same final marketing decisions of targeting our customers. 96 00:05:13,600 --> 00:05:16,066 All right. So I hope you enjoyed clustering. 97 00:05:16,066 --> 00:05:17,066 Now we're going to move on 98 00:05:17,066 --> 00:05:21,433 to the next part, part five on association rule learning. 99 00:05:21,533 --> 00:05:22,700 It's going to be pretty exciting. 100 00:05:22,700 --> 00:05:26,233 We're going to work on two new models Priory and Eclat. 101 00:05:26,233 --> 00:05:29,400 And so I will either meet you in this next part 102 00:05:29,433 --> 00:05:33,600 or if you want to learn as well, I will meet you in the next section 103 00:05:33,600 --> 00:05:36,666 to build the Hierarchical Clustering model in R. 104 00:05:36,900 --> 00:05:40,500 And either way, I look forward to building another model with you 105 00:05:40,733 --> 00:05:42,566 and until then, enjoy machine learning.