1
00:00:00,033 --> 00:00:00,333
All right.

2
00:00:00,333 --> 00:00:04,633
So remember that the cluster numbers
don't go from 1 to 5,

3
00:00:04,633 --> 00:00:08,466
but from 0 to 4
because indexes in Python start from zero.

4
00:00:08,666 --> 00:00:11,533
So let's see let's open this again.

5
00:00:11,533 --> 00:00:15,766
What these numbers mean is that
well the first customer, customer I.D.

6
00:00:15,766 --> 00:00:18,300
number one belongs to the last cluster.

7
00:00:18,300 --> 00:00:20,000
You know, cluster number five.

8
00:00:20,000 --> 00:00:23,166
Then the second customer belongs
to cluster number four.

9
00:00:23,500 --> 00:00:27,833
Third customer belongs to cluster number
five or cluster the next four as you want.

10
00:00:28,200 --> 00:00:31,466
Then customer number
four belongs to cluster number four.

11
00:00:31,500 --> 00:00:33,533
All right.
So this is how you should read it.

12
00:00:33,533 --> 00:00:37,200
And the last customer in this data set
you know

13
00:00:37,766 --> 00:00:41,100
the customer actually number 200.

14
00:00:41,200 --> 00:00:45,966
This one of age 30 and earning
a high salary and spending a lot actually

15
00:00:45,966 --> 00:00:50,433
in them all belongs to the third cluster
or cluster of index.

16
00:00:51,300 --> 00:00:53,100
All right.
So that's how you should read it.

17
00:00:53,100 --> 00:00:56,700
And now now let's visualize
the final clusters.

18
00:00:56,700 --> 00:01:00,300
You know, now that we have this
dependent variable that we just created

19
00:01:00,533 --> 00:01:03,533
through
the hierarchical clustering process.

20
00:01:03,533 --> 00:01:06,100
And so there you go
I'm going to close this.

21
00:01:06,100 --> 00:01:09,833
We're going to run that cell
to indeed find

22
00:01:09,900 --> 00:01:15,033
actually you know the same clusters
as with K-means with remember

23
00:01:15,333 --> 00:01:20,200
this cluster representing the customers
that earn a low salary and don't

24
00:01:20,200 --> 00:01:24,300
spend much in the middle, and therefore
we should just not target them too much

25
00:01:24,300 --> 00:01:28,900
because we want to be socially responsible
and don't push them to consume too much.

26
00:01:29,000 --> 00:01:29,733
Right?

27
00:01:29,733 --> 00:01:33,300
However, this cluster
is the cluster of the customers

28
00:01:33,300 --> 00:01:36,866
having a high annual income
but a low spending score.

29
00:01:36,866 --> 00:01:40,066
And therefore we want to target
these customers to offer them

30
00:01:40,066 --> 00:01:43,833
some more attractive deals
in order to incentivize them

31
00:01:43,833 --> 00:01:47,833
to spend more in the mall,
because otherwise the mall is missing out.

32
00:01:48,200 --> 00:01:51,000
Then this cluster is
the cluster of customers

33
00:01:51,000 --> 00:01:54,300
having a low annual income
but a high spending score.

34
00:01:54,433 --> 00:01:56,966
And therefore with these customers,
you know you want to be

35
00:01:56,966 --> 00:02:00,700
the maximum socially responsible
and maybe protect them

36
00:02:00,700 --> 00:02:04,300
from spending too much and potentially
more than they could afford.

37
00:02:04,466 --> 00:02:09,133
So to these customers, we, for example,
want to reduce any kind of advertising.

38
00:02:09,433 --> 00:02:11,966
Then we have this cluster,
which is the best cluster, you know,

39
00:02:11,966 --> 00:02:15,466
the one who want to target the most
because it is the cluster of the customers

40
00:02:15,466 --> 00:02:19,233
having a high annual income
and at the same time spending a lot.

41
00:02:19,433 --> 00:02:23,233
So we definitely want to target
these customers to, you know,

42
00:02:23,533 --> 00:02:26,500
offer them the new products and new deals,
because we know that

43
00:02:26,500 --> 00:02:29,766
we have a high chance
to have a high conversion rates with them.

44
00:02:29,766 --> 00:02:30,500
All right.

45
00:02:30,500 --> 00:02:32,866
And then we have this cluster
which is the average cluster,

46
00:02:32,866 --> 00:02:35,866
you know, average
annual income and average spending score.

47
00:02:36,000 --> 00:02:39,166
And for this cluster, well,
we don't have much specific to do.

48
00:02:39,333 --> 00:02:39,966
All right.

49
00:02:39,966 --> 00:02:42,866
So these are the same five clusters
as with K-means.

50
00:02:42,866 --> 00:02:47,466
But now I'm very curious
to see what we get with three clusters.

51
00:02:47,700 --> 00:02:50,666
And therefore
what we're going to do is try no.

52
00:02:50,666 --> 00:02:53,500
And cluster is equal three here.

53
00:02:53,500 --> 00:02:54,666
But then be careful.

54
00:02:54,666 --> 00:02:57,166
We need to actually remove two lines here.

55
00:02:57,166 --> 00:02:59,133
When, you know, visualizing the clusters.

56
00:02:59,133 --> 00:03:02,766
Because each scatterplot here
corresponds to one cluster.

57
00:03:02,933 --> 00:03:03,466
And therefore.

58
00:03:03,466 --> 00:03:04,400
Now since we're about to

59
00:03:04,400 --> 00:03:08,166
have three clusters
well we need to remove two clusters here.

60
00:03:08,166 --> 00:03:11,166
So we're going to remove cluster
four and cluster five.

61
00:03:11,400 --> 00:03:12,533
All right.

62
00:03:12,533 --> 00:03:15,300
And therefore we're
just going to end up with you know cluster

63
00:03:15,300 --> 00:03:18,600
one cluster two and cluster
three of colors red blue and green.

64
00:03:18,733 --> 00:03:19,566
All right.

65
00:03:19,566 --> 00:03:22,900
So let's just run this again.

66
00:03:22,900 --> 00:03:27,400
You know we can leave the previous cells
and just rerun this one to indeed

67
00:03:27,400 --> 00:03:32,166
get a new hierarchical clustering model,
this time identifying three clusters.

68
00:03:32,466 --> 00:03:37,133
We can print this again in order to get
the new dependent variable with this time.

69
00:03:37,133 --> 00:03:40,100
Indeed three clusters
the cluster of index zero,

70
00:03:40,100 --> 00:03:43,700
which seems to contain
most of the first customers,

71
00:03:44,033 --> 00:03:47,700
then the cluster of index
one, the second cluster and the cluster

72
00:03:47,733 --> 00:03:50,266
index to the third cluster. Okay.

73
00:03:50,266 --> 00:03:54,166
And now I'm really curious to see
what we get when visualizing the cluster.

74
00:03:54,166 --> 00:03:55,000
So here we go.

75
00:03:55,000 --> 00:04:00,066
We just have to play this cell again
and let's see what we get okay.

76
00:04:00,066 --> 00:04:01,533
So yeah really

77
00:04:01,533 --> 00:04:05,800
five clusters was actually a better number
because here with three clusters.

78
00:04:05,800 --> 00:04:09,266
Well the model just puts
all these customers, you know

79
00:04:09,533 --> 00:04:13,533
actually the low income customers
with both a low spending score,

80
00:04:13,533 --> 00:04:17,700
a high spending score into a same cluster,
also taking the average one.

81
00:04:17,933 --> 00:04:21,166
And then we have these two other clusters,
the high spinning score

82
00:04:21,166 --> 00:04:26,233
with the high annual income and the low
spending score with the high annual income

83
00:04:26,233 --> 00:04:30,533
and you know, this still actually
makes some sense, because remember that

84
00:04:30,733 --> 00:04:35,133
the clusters of customers
that we really want to target after all,

85
00:04:35,133 --> 00:04:37,333
this one and this one

86
00:04:37,333 --> 00:04:40,366
and this, you know, is something
we don't really want to target

87
00:04:40,366 --> 00:04:45,266
but maybe protect, you know, you know,
as per your social responsibility.

88
00:04:45,533 --> 00:04:47,833
So this actually
still makes kind of sense.

89
00:04:47,833 --> 00:04:49,433
And we indeed end up

90
00:04:49,433 --> 00:04:53,633
with the same focus of targeting
these two important customers.

91
00:04:53,633 --> 00:04:56,366
That can boost indeed the sales.

92
00:04:56,366 --> 00:04:58,433
All right. So that was very interesting.

93
00:04:58,433 --> 00:05:03,266
I, I didn't expect actually to show you
the result with three clusters.

94
00:05:03,266 --> 00:05:07,800
I was just curious to see and that's
very interesting because indeed we end up

95
00:05:07,800 --> 00:05:13,066
with kind of the same final marketing
decisions of targeting our customers.

96
00:05:13,600 --> 00:05:16,066
All right.
So I hope you enjoyed clustering.

97
00:05:16,066 --> 00:05:17,066
Now we're going to move on

98
00:05:17,066 --> 00:05:21,433
to the next part, part
five on association rule learning.

99
00:05:21,533 --> 00:05:22,700
It's going to be pretty exciting.

100
00:05:22,700 --> 00:05:26,233
We're going to work on two new models
Priory and Eclat.

101
00:05:26,233 --> 00:05:29,400
And so I will either
meet you in this next part

102
00:05:29,433 --> 00:05:33,600
or if you want to learn as well,
I will meet you in the next section

103
00:05:33,600 --> 00:05:36,666
to build the Hierarchical
Clustering model in R.

104
00:05:36,900 --> 00:05:40,500
And either way, I look forward
to building another model with you

105
00:05:40,733 --> 00:05:42,566
and until then, enjoy machine learning.