1
00:00:00,900 --> 00:00:03,100
Hello and welcome back to the course
on Machine Learning.

2
00:00:03,100 --> 00:00:06,600
We've already talked about hierarchical
clustering and how the algorithm works.

3
00:00:06,666 --> 00:00:09,666
Also we talked about dental grammars
and how they're constructed.

4
00:00:09,700 --> 00:00:13,166
Today we're going to put the two together
and learn how to get

5
00:00:13,166 --> 00:00:16,833
the maximum value out
of our hierarchical clustering algorithms.

6
00:00:17,100 --> 00:00:19,066
So let's get straight into it.

7
00:00:19,066 --> 00:00:21,866
All right so here we've got an example.

8
00:00:21,866 --> 00:00:22,733
The example that we looked

9
00:00:22,733 --> 00:00:26,900
at previously where on the left
we've got the points you know scatterplot.

10
00:00:26,900 --> 00:00:31,133
And then here on the right we've got the
dendrogram as a which contains the memory

11
00:00:31,200 --> 00:00:35,700
of how the clusters were formed during
the hierarchical clustering algorithm.

12
00:00:35,700 --> 00:00:37,166
So here we can tell right away from

13
00:00:37,166 --> 00:00:39,800
first of all Peter
and P3 were combined into a cluster.

14
00:00:39,800 --> 00:00:41,166
Then because their

15
00:00:41,166 --> 00:00:44,100
their height is the lowest,
the height of this bar is the lowest.

16
00:00:44,100 --> 00:00:46,800
Then we look at the next lowest bar
is this one.

17
00:00:46,800 --> 00:00:50,700
So P5 and p6 are the least dissimilar
out of the remaining.

18
00:00:51,000 --> 00:00:53,000
And then these are
pretty much the same height.

19
00:00:53,000 --> 00:00:56,766
But we first perform cluster A
we combine these into one cluster.

20
00:00:56,766 --> 00:00:59,700
So P1 was added to cluster p2 p3.

21
00:00:59,700 --> 00:01:02,400
Then P4 was added to cluster of p5 p6.

22
00:01:02,400 --> 00:01:05,400
And then at the end all of the points
were combined into one cluster.

23
00:01:05,400 --> 00:01:07,466
So that's what the dendrogram
is telling us.

24
00:01:07,466 --> 00:01:11,333
As you can see right away, is giving us
a lot of additional information

25
00:01:11,333 --> 00:01:12,800
on top of the scatter plots.

26
00:01:12,800 --> 00:01:17,400
And, it contains that memory
of the hierarchical clustering algorithm.

27
00:01:17,900 --> 00:01:21,233
So how do we use this dendrogram
to understand

28
00:01:21,333 --> 00:01:25,900
how to best execute,
or get the most value out of the HTK.

29
00:01:26,266 --> 00:01:27,600
So let's have a look.

30
00:01:27,600 --> 00:01:29,500
What we need to do with the dendrogram,

31
00:01:29,500 --> 00:01:33,000
or what we can do is
look at the horizontal levels

32
00:01:33,000 --> 00:01:37,666
and set thresholds so we can set height
thresholds a distance actually distance

33
00:01:37,666 --> 00:01:40,766
thresholds are also called
dissimilarity thresholds

34
00:01:40,933 --> 00:01:44,566
because this vertical axis measures
the Euclidean distance between points,

35
00:01:44,566 --> 00:01:49,266
which also represents the dissimilarity
between them or points or clusters.

36
00:01:49,500 --> 00:01:53,433
So what we can do is set a threshold
for all dissimilarity.

37
00:01:53,433 --> 00:01:57,233
And we can say that, we don't want
the dissimilarity to be greater

38
00:01:57,233 --> 00:01:58,366
than this level.

39
00:01:58,366 --> 00:02:01,666
So again, it doesn't matter
what the absolute value is, it matters

40
00:02:01,666 --> 00:02:04,866
what's, the relative values
and how it looks on this image.

41
00:02:04,866 --> 00:02:07,666
So we we're setting the dissimilarity
threshold.

42
00:02:07,666 --> 00:02:11,400
We're saying that anything
if we come across clusters

43
00:02:11,566 --> 00:02:13,500
that are above this threshold.

44
00:02:13,500 --> 00:02:18,900
So we don't want within a cluster to have
dissimilarity above this threshold.

45
00:02:18,900 --> 00:02:21,600
So what that will do
is it'll give us two clusters.

46
00:02:21,600 --> 00:02:22,400
And let's have a look at them.

47
00:02:22,400 --> 00:02:25,133
There's our first cluster
and there's our second cluster.

48
00:02:25,133 --> 00:02:26,700
And that's that makes sense.

49
00:02:26,700 --> 00:02:30,900
So what that it's telling us
is that within each one of these clusters,

50
00:02:30,900 --> 00:02:34,100
the dissimilarity is always less
than our threshold.

51
00:02:34,100 --> 00:02:36,000
So let's say we've got some values here.

52
00:02:36,000 --> 00:02:38,900
Let's say this is 1.5. This is 2.0.

53
00:02:38,900 --> 00:02:42,600
So let's say
we want to set the threshold at 1.7.

54
00:02:43,000 --> 00:02:47,966
And what this is doing is
it is not allowing any clusters

55
00:02:47,966 --> 00:02:52,733
that would have dissimilarity
of greater than 1.7 within them.

56
00:02:53,133 --> 00:02:57,600
And as you can see from the dendrogram,
we can tell that over everything below

57
00:02:57,600 --> 00:03:00,600
that level, this cluster and this cluster,

58
00:03:00,633 --> 00:03:03,633
they don't have dissimilarity of 1.7,

59
00:03:03,900 --> 00:03:08,566
because the similarity is represented
by these vertical lines.

60
00:03:09,266 --> 00:03:11,600
And that's how the concept
of thresholding works.

61
00:03:11,600 --> 00:03:16,600
And the interesting part about dendrogram
is you can quickly tell how many classes

62
00:03:16,600 --> 00:03:20,500
you will have at a certain threshold
by just looking at how many

63
00:03:20,500 --> 00:03:23,666
vertical lines this horizontal threshold
actually crosses.

64
00:03:23,866 --> 00:03:27,100
So here you can see it
crosses one two vertical lines.

65
00:03:27,100 --> 00:03:28,766
That means we will have two clusters.

66
00:03:28,766 --> 00:03:33,066
Will be this cluster of all these points
p1, p2, p3 and this cluster p45 p6.

67
00:03:33,566 --> 00:03:35,833
All right.
So let's have a look at another example.

68
00:03:35,833 --> 00:03:40,466
Let's have a look at a example
where we put the threshold at this level.

69
00:03:40,466 --> 00:03:45,266
So somewhere just below where we combined
as you remember

70
00:03:45,266 --> 00:03:50,166
we had p5 p6 in one class A, p2, p3 in
one cluster before by itself P1 by itself.

71
00:03:50,400 --> 00:03:54,100
And then we combined P1 with this cluster,
P4 with this cluster.

72
00:03:54,300 --> 00:03:57,900
So let's say we're setting the threshold
at just before

73
00:03:57,900 --> 00:04:01,033
that level of dissimilarity,
which allowed us to combine

74
00:04:01,033 --> 00:04:04,200
P1 with this cluster
and before with this cluster.

75
00:04:04,433 --> 00:04:07,833
So what that will do is it
will give us a certain number of clusters.

76
00:04:07,833 --> 00:04:11,466
So can you tell just by looking at the
dendrogram how many clusters will have.

77
00:04:11,600 --> 00:04:12,533
Exactly correct.

78
00:04:12,533 --> 00:04:13,800
We're going to have four clusters

79
00:04:13,800 --> 00:04:17,900
because it crosses four vertical lines
one, two, three, four.

80
00:04:17,933 --> 00:04:18,200
Right.

81
00:04:18,200 --> 00:04:20,233
So we're going to have a cluster P1

82
00:04:20,233 --> 00:04:24,433
cluster with P2 and P3 cluster
with before cluster five and p6.

83
00:04:24,600 --> 00:04:26,766
Let's have a look for clusters.

84
00:04:26,766 --> 00:04:27,533
And there they are.

85
00:04:27,533 --> 00:04:31,133
So that is what we're going to get
if we set the

86
00:04:31,133 --> 00:04:34,700
dissimilarity
or distance threshold at that level.

87
00:04:35,233 --> 00:04:36,500
Let's try another one.

88
00:04:36,500 --> 00:04:42,000
Let's say we want to set our dissimilarity
threshold very low at 0.3,

89
00:04:42,000 --> 00:04:46,633
meaning that we don't want clusters
that have any points

90
00:04:46,633 --> 00:04:50,933
within them that have dissimilarity
greater than this threshold.

91
00:04:50,933 --> 00:04:53,633
So we're not going to allow any clusters
like that.

92
00:04:53,633 --> 00:04:56,900
And the interesting part here is
that we're actually setting the threshold

93
00:04:57,066 --> 00:05:01,333
below our very first cluster
that we created over here, P2 and P3.

94
00:05:01,333 --> 00:05:04,833
So we're not even going to allow P2
and P3 to be combined in one cluster.

95
00:05:04,833 --> 00:05:05,700
We're going to say

96
00:05:05,700 --> 00:05:09,866
that dissimilarity level, that distance
between them is too great, too high.

97
00:05:09,866 --> 00:05:12,900
We we don't think that based

98
00:05:12,900 --> 00:05:16,433
on our business knowledge
or based on our other internal research

99
00:05:16,433 --> 00:05:20,400
or external research,
that we don't think that any points with,

100
00:05:20,466 --> 00:05:24,966
dissimilarity greater than this
level should be combined into a cluster.

101
00:05:25,200 --> 00:05:27,200
It's just it just doesn't make sense

102
00:05:27,200 --> 00:05:31,066
from a, from a finite line of financial,
from a business perspective,

103
00:05:31,066 --> 00:05:34,800
from a perspective of knowledge
about what this dataset is about.

104
00:05:35,166 --> 00:05:39,200
And what that will do is it'll create
six clusters because we cross six lives

105
00:05:39,233 --> 00:05:43,000
one, two, three, 4 or 5, six,
and then they are every single point

106
00:05:43,000 --> 00:05:44,566
will be in its own cluster.

107
00:05:44,566 --> 00:05:47,400
As you can see, we've got six clusters.

108
00:05:47,400 --> 00:05:52,366
So that is how a dendrogram works or
how you can get value out of a dendrogram.

109
00:05:52,366 --> 00:05:56,600
And you can set this threshold
at different levels to understand

110
00:05:56,933 --> 00:05:58,066
how many clusters you'll get.

111
00:05:58,066 --> 00:06:00,066
Just by looking at the dendrogram,
you can tell right away.

112
00:06:00,066 --> 00:06:03,800
And, you can that we find
the optimal level for the threshold,

113
00:06:03,800 --> 00:06:08,133
or the optimal number of clusters
that suits your project the best.

114
00:06:08,966 --> 00:06:11,433
So but how do you find the actual,

115
00:06:11,433 --> 00:06:14,900
not just an optimal number of clusters
that you think is optimal?

116
00:06:14,900 --> 00:06:16,400
What is the dendrogram giving us?

117
00:06:16,400 --> 00:06:19,400
Any ideas
about the optimal number of clusters?

118
00:06:19,500 --> 00:06:22,500
Well,
what can we tell from the dendrogram?

119
00:06:22,500 --> 00:06:27,233
That might be a good guide for us
to select the optimal number of clusters.

120
00:06:27,600 --> 00:06:30,600
Well, there's a great giveaway
that the dendrogram contains,

121
00:06:30,833 --> 00:06:35,833
and that is the vertical distance
because it is measuring a dissimilarity.

122
00:06:35,833 --> 00:06:39,000
So the one of the standard approaches
is just to look for

123
00:06:39,000 --> 00:06:42,500
the highest vertical distance
that you can find on the dendrogram.

124
00:06:42,500 --> 00:06:47,766
So basically any line
that will not cross any horizontal lines.

125
00:06:48,066 --> 00:06:51,166
So for instance
this line can be considered.

126
00:06:51,200 --> 00:06:52,633
This line can be considered.

127
00:06:52,633 --> 00:06:55,500
This line cannot be considered
for that research

128
00:06:55,500 --> 00:06:58,800
because it crosses hypothetical
horizontal lines.

129
00:06:58,800 --> 00:07:02,033
So what you need to do is kind of like
every horizontal line you have.

130
00:07:02,033 --> 00:07:05,033
Just imagine it extends all the way
across the dendrogram.

131
00:07:05,133 --> 00:07:07,100
Every single horizontal line you have.

132
00:07:07,100 --> 00:07:13,033
And now find the longest line among yours,
among your existing vertical lines

133
00:07:13,033 --> 00:07:16,866
that doesn't cross any horizontal,
any of these extended horizontal lines.

134
00:07:16,866 --> 00:07:19,833
So for instance,
even this line cannot be considered

135
00:07:19,833 --> 00:07:24,066
for that purpose because it would
hypothetically cross this horizontal line

136
00:07:24,066 --> 00:07:27,700
that we have coming from
this red line between 5 and 6.

137
00:07:28,100 --> 00:07:32,100
Again, this line cannot be considered
because it's crossing this line.

138
00:07:32,100 --> 00:07:35,600
So you would need to look at this line,
for example, or this line.

139
00:07:35,733 --> 00:07:39,433
Or if you wanted to use this line,
you would need to use only a bit of it,

140
00:07:39,433 --> 00:07:41,033
that part or this part.

141
00:07:41,033 --> 00:07:44,500
So you can only use parts of lines
that are between horizontal lines.

142
00:07:45,066 --> 00:07:47,566
So out of all of the lines
that you have here,

143
00:07:47,566 --> 00:07:52,333
which is the longest that doesn't cross
any extended horizontal lines.

144
00:07:52,700 --> 00:07:53,500
Well that's correct.

145
00:07:53,500 --> 00:07:56,500
This one over here is the longest one.

146
00:07:56,500 --> 00:07:59,500
Or basically in our example,
the green and the red

147
00:07:59,500 --> 00:08:01,200
there were about the same height.

148
00:08:01,200 --> 00:08:04,200
So this one
or this one are the longest ones.

149
00:08:04,566 --> 00:08:06,933
And so this is the largest distance

150
00:08:06,933 --> 00:08:10,800
and therefore the best
or the recommended approach.

151
00:08:11,100 --> 00:08:13,666
Again it's not a set in stone approach.

152
00:08:13,666 --> 00:08:16,000
It's a kind of one of the things
that you could do

153
00:08:16,000 --> 00:08:20,766
is take a threshold
that will cross this largest distance.

154
00:08:20,766 --> 00:08:23,100
So cross that largest distance
with a threshold,

155
00:08:23,100 --> 00:08:25,366
and then you use that threshold
to calculate

156
00:08:25,366 --> 00:08:27,900
the optimal number of clusters
and actually find them.

157
00:08:27,900 --> 00:08:32,200
So once we've crossed this,
largest distance with our threshold,

158
00:08:32,600 --> 00:08:34,066
it doesn't matter what you said,
you can set it here,

159
00:08:34,066 --> 00:08:37,000
you can set low or you can set high
as long as it crosses this line.

160
00:08:37,000 --> 00:08:40,266
Then now the two clusters are this one
and this one.

161
00:08:40,300 --> 00:08:44,266
As you can see, that is considered
to be one of the approaches.

162
00:08:44,800 --> 00:08:48,233
or this approach is telling us that
the optimal number of clusters are two

163
00:08:48,233 --> 00:08:49,400
and these are them.

164
00:08:49,400 --> 00:08:51,600
And kind of in this case it makes sense.

165
00:08:51,600 --> 00:08:56,833
You can see that indeed these points
look that as if they're closer together.

166
00:08:57,100 --> 00:08:59,500
And these points
look as if they're closer together.

167
00:08:59,500 --> 00:09:04,166
that rather than getting any clusters
in between them or even breaking up

168
00:09:04,166 --> 00:09:08,000
into more classes, wouldn't
make as much sense as this makes sense.

169
00:09:08,566 --> 00:09:09,633
And, so there you go.

170
00:09:09,633 --> 00:09:12,033
That's that's
one of the approaches that you can use.

171
00:09:12,033 --> 00:09:15,033
You can still look at this whole problem
using

172
00:09:15,033 --> 00:09:18,266
a similar approach to K-means,
where you use the elbow method.

173
00:09:18,266 --> 00:09:19,766
So you could use something like that.

174
00:09:19,766 --> 00:09:21,700
But in, hierarchical clustering

175
00:09:21,700 --> 00:09:25,000
we're going to focus on this approach
with the largest distance.

176
00:09:25,533 --> 00:09:28,533
And now let's quickly
have a knowledge test.

177
00:09:28,833 --> 00:09:32,833
So I'm going to I have two charts here
which are hidden on the left.

178
00:09:32,833 --> 00:09:35,733
We've got the scatterplot on the right.
We've got the dendrogram.

179
00:09:35,733 --> 00:09:37,833
I'm going to show you only the dendrogram.

180
00:09:37,833 --> 00:09:40,900
And I would like you to try to understand
or try to assess

181
00:09:40,900 --> 00:09:43,900
very quickly
what's going on on the scatter plots.

182
00:09:44,266 --> 00:09:48,333
So for instance, we'd like to know
even without seeing the scatterplot

183
00:09:48,333 --> 00:09:49,066
or the data set.

184
00:09:49,066 --> 00:09:52,433
At the moment we'd like to know
what is the optimal number of clusters

185
00:09:52,433 --> 00:09:54,466
in this dataset
just by looking at dendrogram.

186
00:09:54,466 --> 00:09:55,800
Can you identify that.

187
00:09:55,800 --> 00:10:00,600
So if you like you can pause the video
and just look at, these vertical

188
00:10:00,600 --> 00:10:03,700
and horizontal lines and try to find out
based on the method that we discussed

189
00:10:03,700 --> 00:10:06,700
what would be
the optimal number of clusters.

190
00:10:07,066 --> 00:10:12,333
So in 3 to 1 I'm going to now
reveal how I would solve this, challenge.

191
00:10:12,333 --> 00:10:14,000
Well,
what I would do is I would look for the

192
00:10:14,000 --> 00:10:18,233
the longest vertical line that doesn't
cross any extended horizontal lines.

193
00:10:18,233 --> 00:10:21,000
So if you extend that extend that extend
that,

194
00:10:21,000 --> 00:10:23,300
you can see that
it's probably this line over here.

195
00:10:23,300 --> 00:10:26,300
And so that's the largest distance.

196
00:10:26,400 --> 00:10:30,500
That means we need to cross it with
a horizontal lines with our threshold.

197
00:10:30,900 --> 00:10:33,300
And that will give us
the number of clusters

198
00:10:33,300 --> 00:10:37,400
which is three clusters because it crosses
three lines here one, two, three.

199
00:10:37,700 --> 00:10:42,000
And if we look at the chart, as you can
see, indeed we do have three clusters.

200
00:10:42,000 --> 00:10:43,900
And it does look that, that like that

201
00:10:43,900 --> 00:10:47,666
is the optimal number of clusters
for this business problem.

202
00:10:48,100 --> 00:10:51,066
So hopefully you enjoyed this tutorial.

203
00:10:51,066 --> 00:10:53,800
We walk through all of this many
so that you have a better intuitive

204
00:10:53,800 --> 00:10:58,366
understanding of how the hierarchical
clustering algorithm and dendrogram work.

205
00:10:58,633 --> 00:11:03,600
And next, headlong will show you around
in R and Python, and together

206
00:11:03,600 --> 00:11:09,166
you will create some amazing
analysis around hierarchical clustering.

207
00:11:09,600 --> 00:11:12,633
And together with him
you will solve a business problem

208
00:11:12,900 --> 00:11:15,466
using the hierarchical clustering
algorithm.

209
00:11:15,466 --> 00:11:17,266
There you got some fun
tutorials ahead of you

210
00:11:17,266 --> 00:11:19,133
and I look forward
to seeing you next time.

211
00:11:19,133 --> 00:11:21,100
Until then, enjoy machine learning.