1
00:00:00,600 --> 00:00:03,200
Hello and welcome back to the course
on Machine Learning.

2
00:00:03,200 --> 00:00:07,133
And in this section we're kicking off
the tutorials for hierarchical clustering.

3
00:00:07,500 --> 00:00:10,566
All right so another very interesting
topic ahead of us.

4
00:00:10,566 --> 00:00:15,733
And as always let's make the complex as
simple and demystify what it's all about.

5
00:00:16,200 --> 00:00:18,600
All right
so what is hierarchical clustering.

6
00:00:18,600 --> 00:00:19,933
Well believe it or not.

7
00:00:19,933 --> 00:00:24,000
But if you have points on 
your scatterplot or

8
00:00:24,000 --> 00:00:27,900
data points like we looked at previously,
this is a two dimensional space.

9
00:00:28,200 --> 00:00:31,566
If you apply hierarchical clustering
or let's just say H.C.

10
00:00:31,566 --> 00:00:35,233
for short,
what will happen is you will get clusters.

11
00:00:35,233 --> 00:00:38,833
Again very, very similar to K-means.

12
00:00:38,833 --> 00:00:42,300
In fact, sometimes as a result, often
the results can be exactly

13
00:00:42,300 --> 00:00:47,966
the same as k means clustering,
but the whole process is a bit different.

14
00:00:48,033 --> 00:00:50,566
So let's talk about it
in a bit more detail.

15
00:00:50,566 --> 00:00:53,200
So the first thing that we need to note
is that there's two types

16
00:00:53,200 --> 00:00:56,900
of hierarchical clustering
agglomerative and divisive.

17
00:00:56,933 --> 00:00:59,733
So agglomerative
is the bottom up approach.

18
00:00:59,733 --> 00:01:02,700
And you'll see in more detail just now
what that means.

19
00:01:02,700 --> 00:01:05,966
So we'll be starting at the very bottom
and then building our way up.

20
00:01:06,000 --> 00:01:07,200
Divisive is the opposite.

21
00:01:07,200 --> 00:01:10,200
Starting at the top
and devising a clusters into multiple.

22
00:01:10,600 --> 00:01:15,600
In this course we'll be focusing on
the agglomerative approach, the divisive.

23
00:01:15,700 --> 00:01:19,133
We'll mention it a bit just now
when we're talking about agglomerative.

24
00:01:19,133 --> 00:01:21,500
But it's basically the same thing
but the other way around.

25
00:01:21,500 --> 00:01:23,366
It's the reverse of the agglomerative.

26
00:01:23,366 --> 00:01:27,100
And if you like, you can definitely
research more about the divisive approach.

27
00:01:27,166 --> 00:01:30,166
But for now,
we're going to focus on the matter of one.

28
00:01:30,733 --> 00:01:31,066
All right.

29
00:01:31,066 --> 00:01:34,200
So the agglomerative
hierarchical clustering how does it work.

30
00:01:34,600 --> 00:01:36,966
Well to start off we're going to
break it down into step by step approach.

31
00:01:36,966 --> 00:01:40,366
And then we'll look at an example
and manually perform the clustering.

32
00:01:40,733 --> 00:01:46,000
All right so step one in HTK is to make
each data point a single point cluster.

33
00:01:46,000 --> 00:01:47,133
So that forms and clusters.

34
00:01:47,133 --> 00:01:50,400
So if you have n data points
your first step is to

35
00:01:50,400 --> 00:01:53,400
look at each
one of them as an individual cluster.

36
00:01:53,466 --> 00:01:58,500
Then step two is to take the two closest
data points and make them one cluster.

37
00:01:58,500 --> 00:02:02,400
So to combine them into one cluster
that forms n minus one clusters.

38
00:02:02,933 --> 00:02:06,100
Then step three is to take
the two closest clusters out of the ones.

39
00:02:06,100 --> 00:02:10,800
Now that you now have and make them one
cluster that forms n minus two clusters.

40
00:02:11,400 --> 00:02:15,133
Then step four is to repeat step three
until there is only one cluster.

41
00:02:15,133 --> 00:02:19,800
Basically, just keep repeating step three
and combining points into bigger

42
00:02:19,800 --> 00:02:23,433
and bigger and bigger clusters
until there's only one huge cluster left.

43
00:02:24,000 --> 00:02:27,600
So you just keep repeating step three
and at the end you're done.

44
00:02:27,600 --> 00:02:30,600
So at the end
you'll have one huge cluster left.

45
00:02:31,133 --> 00:02:34,166
And how to get from one to 2
or 3 clusters.

46
00:02:34,166 --> 00:02:36,300
How do you get the final result
that you actually want?

47
00:02:36,300 --> 00:02:38,800
We'll talk about that
in this section as well.

48
00:02:38,800 --> 00:02:40,700
So that's that's the goal of course.

49
00:02:40,700 --> 00:02:43,566
But one thing that stands out here

50
00:02:43,566 --> 00:02:46,800
is the words closest clusters.

51
00:02:47,533 --> 00:02:49,466
Now we've already talked about distances.

52
00:02:49,466 --> 00:02:53,266
And we mentioned Euclidean as you can use
Euclidean distance or other distances.

53
00:02:53,266 --> 00:02:56,033
And that's totally fine when you're
working with individual points.

54
00:02:56,033 --> 00:02:59,100
But here we're actually going a step
even further.

55
00:02:59,100 --> 00:02:59,666
We're talking about

56
00:02:59,666 --> 00:03:03,466
not just the proximity of points,
but actually proximity of clusters.

57
00:03:03,600 --> 00:03:04,900
And this is something worth noting.

58
00:03:04,900 --> 00:03:07,000
So I'd like to pause here for a bit.

59
00:03:07,000 --> 00:03:12,266
And or I kind of step to the side
and talk about the closeness of clusters

60
00:03:12,266 --> 00:03:13,533
and how you measure

61
00:03:13,533 --> 00:03:15,066
distance between classes, because that

62
00:03:15,066 --> 00:03:18,900
can really affect your algorithm
if you're using hierarchical clustering.

63
00:03:18,900 --> 00:03:21,400
So let's talk about that
for for a few minutes.

64
00:03:21,400 --> 00:03:25,300
So first of all Euclidean distance
just to get this out of the way

65
00:03:25,333 --> 00:03:28,933
once and for all Euclidean
distance is in a two dimensional space.

66
00:03:28,933 --> 00:03:31,333
It's calculated like that.
So the distance.

67
00:03:31,333 --> 00:03:35,200
So if you got two points
P1 with the coordinates x1

68
00:03:35,200 --> 00:03:38,900
and y1 p2 of coordinates
x2 and y2, then Euclidean distance

69
00:03:38,933 --> 00:03:44,300
or the length of this line is calculated
as x2 minus x1, so that the distance

70
00:03:44,300 --> 00:03:49,800
between the x's squared
plus the distance between the y's squared.

71
00:03:49,800 --> 00:03:53,066
So basically and then the you add them up
and then you take the square root.

72
00:03:53,066 --> 00:03:57,233
So basically it's a you've
got a right angled triangle over here.

73
00:03:57,466 --> 00:03:59,700
And you've got the cat at here.

74
00:03:59,700 --> 00:04:03,133
And you got another one cat at here
I hope I'm pronouncing is right.

75
00:04:03,433 --> 00:04:06,366
And then this is the hypotenuse. Right.

76
00:04:06,366 --> 00:04:10,733
So that is how you calculate the distance
between the two points.

77
00:04:10,733 --> 00:04:14,333
So it's basic
geometry back from high school.

78
00:04:14,666 --> 00:04:15,666
And there you go.

79
00:04:15,666 --> 00:04:18,233
That's, that's
just how Euclidean distance is calculated.

80
00:04:18,233 --> 00:04:20,200
And that's what
we're going to be working with.

81
00:04:20,200 --> 00:04:20,733
But again

82
00:04:20,733 --> 00:04:25,533
there are other types of distances that
you could be invoking in your algorithms.

83
00:04:25,533 --> 00:04:27,866
And it really depends on the scenario

84
00:04:27,866 --> 00:04:32,166
and what exactly how your algorithm
is going to be structured.

85
00:04:32,166 --> 00:04:35,466
But in our examples we're going to be
working with Euclidean distances

86
00:04:35,466 --> 00:04:39,000
because they're 
the more natural type of distance.

87
00:04:39,466 --> 00:04:43,000
And now let's talk about the distance
between two clusters.

88
00:04:43,000 --> 00:04:45,233
So let's say you have two clusters
the red and the blue.

89
00:04:45,233 --> 00:04:47,600
How do you measure the distance
between them.

90
00:04:47,600 --> 00:04:50,800
What what is the definition
of the distance between two clusters.

91
00:04:50,800 --> 00:04:52,700
It's not a not as obvious

92
00:04:52,700 --> 00:04:55,900
as it might sound at the start,
because there can be a couple options.

93
00:04:55,900 --> 00:04:59,300
For instance, you can option
one is just take the two closest points

94
00:04:59,600 --> 00:05:02,733
and measure that and call that
the distance between the two clusters.

95
00:05:03,300 --> 00:05:05,800
Option two
is actually take the two furthest points

96
00:05:05,800 --> 00:05:07,666
and call
that the distance between two clusters.

97
00:05:07,666 --> 00:05:09,200
That's also valid approach.

98
00:05:09,200 --> 00:05:12,400
Option three take the average
to take the average of all the distances

99
00:05:12,766 --> 00:05:15,566
between the different points
in the cluster, so all the combinations

100
00:05:15,566 --> 00:05:18,566
of different points,
and take the average of that distance.

101
00:05:18,600 --> 00:05:21,133
Option four is take the distance
between the centroids.

102
00:05:21,133 --> 00:05:22,933
So find the centroids
and find the distance

103
00:05:22,933 --> 00:05:26,900
between the centroids and call that
the distance between the two clusters.

104
00:05:27,100 --> 00:05:30,300
So it's a very important part of

105
00:05:30,300 --> 00:05:34,033
hierarchical clustering what you define
as the distance between two clusters.

106
00:05:34,033 --> 00:05:38,700
Because that can significantly impact
the output of your algorithm.

107
00:05:38,866 --> 00:05:41,866
Now we're not going to delve
much deeper into this.

108
00:05:41,900 --> 00:05:44,366
It's just something to remember to note.

109
00:05:44,366 --> 00:05:49,266
And based on your particular situation,
whether it's a business problem

110
00:05:49,266 --> 00:05:54,700
or other type of, data science problem,
based on what exactly you think

111
00:05:54,700 --> 00:05:57,900
is the best approach, you're going to need
to define that in your algorithm.

112
00:05:57,900 --> 00:06:01,266
So just keep that in mind
that for the hierarchical clustering

113
00:06:01,266 --> 00:06:04,866
algorithm, the distance
between clusters is a crucial element.

114
00:06:04,866 --> 00:06:08,666
And you need to remember
what exactly you're setting at is

115
00:06:08,666 --> 00:06:12,033
you're setting it to be
how you defining it in your approach.

116
00:06:12,500 --> 00:06:12,900
All right.

117
00:06:12,900 --> 00:06:14,533
So we've talked about that.

118
00:06:14,533 --> 00:06:16,633
Now let's move back to our example.

119
00:06:16,633 --> 00:06:19,600
So we've already looked at the step
by step rules.

120
00:06:19,600 --> 00:06:22,966
And I I'm a big fan of a step
by step approach.

121
00:06:23,100 --> 00:06:24,866
Now we have this step by step approach.

122
00:06:24,866 --> 00:06:26,100
And it might have seemed

123
00:06:26,100 --> 00:06:29,166
a bit overwhelming as always,
because we didn't have an example.

124
00:06:29,166 --> 00:06:31,200
But now we're going to have that example

125
00:06:31,200 --> 00:06:34,666
and we're going to look at how to build
those hierarchical clusters.

126
00:06:34,866 --> 00:06:40,200
So step one make each data point a single
point cluster that forms a six clusters.

127
00:06:40,200 --> 00:06:40,900
Let's have a look at that.

128
00:06:40,900 --> 00:06:43,866
You can see
every single point is a separate cluster.

129
00:06:43,866 --> 00:06:48,566
Next take the two closest data points
and make them one cluster

130
00:06:49,200 --> 00:06:51,966
where we can see that
these two points are the closest.

131
00:06:51,966 --> 00:06:53,466
So we're putting them together
in one cluster.

132
00:06:53,466 --> 00:06:55,800
Now we have five clusters 123, four.

133
00:06:55,800 --> 00:06:57,600
And these two are one cluster.

134
00:06:57,600 --> 00:06:58,933
All right. Step three.

135
00:06:58,933 --> 00:07:00,566
Take the two closest clusters

136
00:07:00,566 --> 00:07:03,566
out of the ones we had
and turn them into one cluster.

137
00:07:03,633 --> 00:07:07,333
So out of the clusters we had because
remember each point out of these four.

138
00:07:07,500 --> 00:07:09,933
So if I go back here
each point here is a separate cluster.

139
00:07:09,933 --> 00:07:10,966
And this is a cluster.

140
00:07:10,966 --> 00:07:13,600
So now just measure the distances
between clusters.

141
00:07:13,600 --> 00:07:16,200
And let's say in our example
we're going to be talking about

142
00:07:16,200 --> 00:07:18,666
the distance between clusters
as the minimum distance.

143
00:07:18,666 --> 00:07:21,600
So that would be the distance
between those two clusters.

144
00:07:21,600 --> 00:07:24,300
That would be the distance that would be
that that would be and so on.

145
00:07:24,300 --> 00:07:26,466
So we measure all the distances
between clusters.

146
00:07:26,466 --> 00:07:29,300
And you find out that these two
are actually the closest clusters.

147
00:07:29,300 --> 00:07:31,266
And you combine them into one cluster.

148
00:07:31,266 --> 00:07:32,766
Next you repeat step three.

149
00:07:32,766 --> 00:07:36,533
So next out of these two clusters
out of one, two, three, four clusters.

150
00:07:36,533 --> 00:07:38,400
So you can see we had five.
Now we have four.

151
00:07:38,400 --> 00:07:40,000
Every time
you reduce the number of clusters

152
00:07:40,000 --> 00:07:43,100
by one out of these four clusters
which are the cluster as well the

153
00:07:43,166 --> 00:07:45,400
this seems as the closest cluster.

154
00:07:45,400 --> 00:07:46,700
So we're going to come by that.

155
00:07:46,700 --> 00:07:48,733
Now out of these three clusters
which are the cluster.

156
00:07:48,733 --> 00:07:51,366
So it looks like these
two are so combining them.

157
00:07:51,366 --> 00:07:53,333
And now we've only got two clusters left.

158
00:07:53,333 --> 00:07:54,900
So the last step would be to combine them

159
00:07:54,900 --> 00:07:57,900
because they are by default
they're going to be the closest.

160
00:07:58,366 --> 00:07:58,900
So there we go.

161
00:07:58,900 --> 00:08:01,500
And so that is the end of our algorithm.

162
00:08:01,500 --> 00:08:02,866
That's how it converges.

163
00:08:02,866 --> 00:08:07,033
You've we've gone through the process
of create

164
00:08:07,033 --> 00:08:11,366
of going from all points
seen as an individual cluster.

165
00:08:11,366 --> 00:08:15,366
So each point is an individual cluster
to now we only have one cluster

166
00:08:15,533 --> 00:08:17,200
that combines all of the points.

167
00:08:17,200 --> 00:08:18,900
And so what's the purpose of all of this?

168
00:08:18,900 --> 00:08:20,466
what's the purpose of this exercise?

169
00:08:20,466 --> 00:08:24,033
The way the hierarchical clustering
algorithm works is that it

170
00:08:24,166 --> 00:08:28,300
maintains a memory
of how we went through this process,

171
00:08:28,566 --> 00:08:31,200
and that memory is stored in a dendrogram.

172
00:08:31,200 --> 00:08:33,500
And that's exactly what we're going
to talk in the next tutorial.

173
00:08:33,500 --> 00:08:37,566
And once we've covered the dendrogram
it will make total sense

174
00:08:37,566 --> 00:08:42,033
why the hierarchical clustering algorithm
does what it does and how it works.

175
00:08:42,333 --> 00:08:43,733
So hopefully you enjoyed today's tutorial.

176
00:08:43,733 --> 00:08:45,233
Can't wait to see you on the next one!

177
00:08:45,233 --> 00:08:47,100
Until then, enjoy machine learning!