1 00:00:00,600 --> 00:00:03,200 Hello and welcome back to the course on Machine Learning. 2 00:00:03,200 --> 00:00:07,133 And in this section we're kicking off the tutorials for hierarchical clustering. 3 00:00:07,500 --> 00:00:10,566 All right so another very interesting topic ahead of us. 4 00:00:10,566 --> 00:00:15,733 And as always let's make the complex as simple and demystify what it's all about. 5 00:00:16,200 --> 00:00:18,600 All right so what is hierarchical clustering. 6 00:00:18,600 --> 00:00:19,933 Well believe it or not. 7 00:00:19,933 --> 00:00:24,000 But if you have points on your scatterplot or 8 00:00:24,000 --> 00:00:27,900 data points like we looked at previously, this is a two dimensional space. 9 00:00:28,200 --> 00:00:31,566 If you apply hierarchical clustering or let's just say H.C. 10 00:00:31,566 --> 00:00:35,233 for short, what will happen is you will get clusters. 11 00:00:35,233 --> 00:00:38,833 Again very, very similar to K-means. 12 00:00:38,833 --> 00:00:42,300 In fact, sometimes as a result, often the results can be exactly 13 00:00:42,300 --> 00:00:47,966 the same as k means clustering, but the whole process is a bit different. 14 00:00:48,033 --> 00:00:50,566 So let's talk about it in a bit more detail. 15 00:00:50,566 --> 00:00:53,200 So the first thing that we need to note is that there's two types 16 00:00:53,200 --> 00:00:56,900 of hierarchical clustering agglomerative and divisive. 17 00:00:56,933 --> 00:00:59,733 So agglomerative is the bottom up approach. 18 00:00:59,733 --> 00:01:02,700 And you'll see in more detail just now what that means. 19 00:01:02,700 --> 00:01:05,966 So we'll be starting at the very bottom and then building our way up. 20 00:01:06,000 --> 00:01:07,200 Divisive is the opposite. 21 00:01:07,200 --> 00:01:10,200 Starting at the top and devising a clusters into multiple. 22 00:01:10,600 --> 00:01:15,600 In this course we'll be focusing on the agglomerative approach, the divisive. 23 00:01:15,700 --> 00:01:19,133 We'll mention it a bit just now when we're talking about agglomerative. 24 00:01:19,133 --> 00:01:21,500 But it's basically the same thing but the other way around. 25 00:01:21,500 --> 00:01:23,366 It's the reverse of the agglomerative. 26 00:01:23,366 --> 00:01:27,100 And if you like, you can definitely research more about the divisive approach. 27 00:01:27,166 --> 00:01:30,166 But for now, we're going to focus on the matter of one. 28 00:01:30,733 --> 00:01:31,066 All right. 29 00:01:31,066 --> 00:01:34,200 So the agglomerative hierarchical clustering how does it work. 30 00:01:34,600 --> 00:01:36,966 Well to start off we're going to break it down into step by step approach. 31 00:01:36,966 --> 00:01:40,366 And then we'll look at an example and manually perform the clustering. 32 00:01:40,733 --> 00:01:46,000 All right so step one in HTK is to make each data point a single point cluster. 33 00:01:46,000 --> 00:01:47,133 So that forms and clusters. 34 00:01:47,133 --> 00:01:50,400 So if you have n data points your first step is to 35 00:01:50,400 --> 00:01:53,400 look at each one of them as an individual cluster. 36 00:01:53,466 --> 00:01:58,500 Then step two is to take the two closest data points and make them one cluster. 37 00:01:58,500 --> 00:02:02,400 So to combine them into one cluster that forms n minus one clusters. 38 00:02:02,933 --> 00:02:06,100 Then step three is to take the two closest clusters out of the ones. 39 00:02:06,100 --> 00:02:10,800 Now that you now have and make them one cluster that forms n minus two clusters. 40 00:02:11,400 --> 00:02:15,133 Then step four is to repeat step three until there is only one cluster. 41 00:02:15,133 --> 00:02:19,800 Basically, just keep repeating step three and combining points into bigger 42 00:02:19,800 --> 00:02:23,433 and bigger and bigger clusters until there's only one huge cluster left. 43 00:02:24,000 --> 00:02:27,600 So you just keep repeating step three and at the end you're done. 44 00:02:27,600 --> 00:02:30,600 So at the end you'll have one huge cluster left. 45 00:02:31,133 --> 00:02:34,166 And how to get from one to 2 or 3 clusters. 46 00:02:34,166 --> 00:02:36,300 How do you get the final result that you actually want? 47 00:02:36,300 --> 00:02:38,800 We'll talk about that in this section as well. 48 00:02:38,800 --> 00:02:40,700 So that's that's the goal of course. 49 00:02:40,700 --> 00:02:43,566 But one thing that stands out here 50 00:02:43,566 --> 00:02:46,800 is the words closest clusters. 51 00:02:47,533 --> 00:02:49,466 Now we've already talked about distances. 52 00:02:49,466 --> 00:02:53,266 And we mentioned Euclidean as you can use Euclidean distance or other distances. 53 00:02:53,266 --> 00:02:56,033 And that's totally fine when you're working with individual points. 54 00:02:56,033 --> 00:02:59,100 But here we're actually going a step even further. 55 00:02:59,100 --> 00:02:59,666 We're talking about 56 00:02:59,666 --> 00:03:03,466 not just the proximity of points, but actually proximity of clusters. 57 00:03:03,600 --> 00:03:04,900 And this is something worth noting. 58 00:03:04,900 --> 00:03:07,000 So I'd like to pause here for a bit. 59 00:03:07,000 --> 00:03:12,266 And or I kind of step to the side and talk about the closeness of clusters 60 00:03:12,266 --> 00:03:13,533 and how you measure 61 00:03:13,533 --> 00:03:15,066 distance between classes, because that 62 00:03:15,066 --> 00:03:18,900 can really affect your algorithm if you're using hierarchical clustering. 63 00:03:18,900 --> 00:03:21,400 So let's talk about that for for a few minutes. 64 00:03:21,400 --> 00:03:25,300 So first of all Euclidean distance just to get this out of the way 65 00:03:25,333 --> 00:03:28,933 once and for all Euclidean distance is in a two dimensional space. 66 00:03:28,933 --> 00:03:31,333 It's calculated like that. So the distance. 67 00:03:31,333 --> 00:03:35,200 So if you got two points P1 with the coordinates x1 68 00:03:35,200 --> 00:03:38,900 and y1 p2 of coordinates x2 and y2, then Euclidean distance 69 00:03:38,933 --> 00:03:44,300 or the length of this line is calculated as x2 minus x1, so that the distance 70 00:03:44,300 --> 00:03:49,800 between the x's squared plus the distance between the y's squared. 71 00:03:49,800 --> 00:03:53,066 So basically and then the you add them up and then you take the square root. 72 00:03:53,066 --> 00:03:57,233 So basically it's a you've got a right angled triangle over here. 73 00:03:57,466 --> 00:03:59,700 And you've got the cat at here. 74 00:03:59,700 --> 00:04:03,133 And you got another one cat at here I hope I'm pronouncing is right. 75 00:04:03,433 --> 00:04:06,366 And then this is the hypotenuse. Right. 76 00:04:06,366 --> 00:04:10,733 So that is how you calculate the distance between the two points. 77 00:04:10,733 --> 00:04:14,333 So it's basic geometry back from high school. 78 00:04:14,666 --> 00:04:15,666 And there you go. 79 00:04:15,666 --> 00:04:18,233 That's, that's just how Euclidean distance is calculated. 80 00:04:18,233 --> 00:04:20,200 And that's what we're going to be working with. 81 00:04:20,200 --> 00:04:20,733 But again 82 00:04:20,733 --> 00:04:25,533 there are other types of distances that you could be invoking in your algorithms. 83 00:04:25,533 --> 00:04:27,866 And it really depends on the scenario 84 00:04:27,866 --> 00:04:32,166 and what exactly how your algorithm is going to be structured. 85 00:04:32,166 --> 00:04:35,466 But in our examples we're going to be working with Euclidean distances 86 00:04:35,466 --> 00:04:39,000 because they're the more natural type of distance. 87 00:04:39,466 --> 00:04:43,000 And now let's talk about the distance between two clusters. 88 00:04:43,000 --> 00:04:45,233 So let's say you have two clusters the red and the blue. 89 00:04:45,233 --> 00:04:47,600 How do you measure the distance between them. 90 00:04:47,600 --> 00:04:50,800 What what is the definition of the distance between two clusters. 91 00:04:50,800 --> 00:04:52,700 It's not a not as obvious 92 00:04:52,700 --> 00:04:55,900 as it might sound at the start, because there can be a couple options. 93 00:04:55,900 --> 00:04:59,300 For instance, you can option one is just take the two closest points 94 00:04:59,600 --> 00:05:02,733 and measure that and call that the distance between the two clusters. 95 00:05:03,300 --> 00:05:05,800 Option two is actually take the two furthest points 96 00:05:05,800 --> 00:05:07,666 and call that the distance between two clusters. 97 00:05:07,666 --> 00:05:09,200 That's also valid approach. 98 00:05:09,200 --> 00:05:12,400 Option three take the average to take the average of all the distances 99 00:05:12,766 --> 00:05:15,566 between the different points in the cluster, so all the combinations 100 00:05:15,566 --> 00:05:18,566 of different points, and take the average of that distance. 101 00:05:18,600 --> 00:05:21,133 Option four is take the distance between the centroids. 102 00:05:21,133 --> 00:05:22,933 So find the centroids and find the distance 103 00:05:22,933 --> 00:05:26,900 between the centroids and call that the distance between the two clusters. 104 00:05:27,100 --> 00:05:30,300 So it's a very important part of 105 00:05:30,300 --> 00:05:34,033 hierarchical clustering what you define as the distance between two clusters. 106 00:05:34,033 --> 00:05:38,700 Because that can significantly impact the output of your algorithm. 107 00:05:38,866 --> 00:05:41,866 Now we're not going to delve much deeper into this. 108 00:05:41,900 --> 00:05:44,366 It's just something to remember to note. 109 00:05:44,366 --> 00:05:49,266 And based on your particular situation, whether it's a business problem 110 00:05:49,266 --> 00:05:54,700 or other type of, data science problem, based on what exactly you think 111 00:05:54,700 --> 00:05:57,900 is the best approach, you're going to need to define that in your algorithm. 112 00:05:57,900 --> 00:06:01,266 So just keep that in mind that for the hierarchical clustering 113 00:06:01,266 --> 00:06:04,866 algorithm, the distance between clusters is a crucial element. 114 00:06:04,866 --> 00:06:08,666 And you need to remember what exactly you're setting at is 115 00:06:08,666 --> 00:06:12,033 you're setting it to be how you defining it in your approach. 116 00:06:12,500 --> 00:06:12,900 All right. 117 00:06:12,900 --> 00:06:14,533 So we've talked about that. 118 00:06:14,533 --> 00:06:16,633 Now let's move back to our example. 119 00:06:16,633 --> 00:06:19,600 So we've already looked at the step by step rules. 120 00:06:19,600 --> 00:06:22,966 And I I'm a big fan of a step by step approach. 121 00:06:23,100 --> 00:06:24,866 Now we have this step by step approach. 122 00:06:24,866 --> 00:06:26,100 And it might have seemed 123 00:06:26,100 --> 00:06:29,166 a bit overwhelming as always, because we didn't have an example. 124 00:06:29,166 --> 00:06:31,200 But now we're going to have that example 125 00:06:31,200 --> 00:06:34,666 and we're going to look at how to build those hierarchical clusters. 126 00:06:34,866 --> 00:06:40,200 So step one make each data point a single point cluster that forms a six clusters. 127 00:06:40,200 --> 00:06:40,900 Let's have a look at that. 128 00:06:40,900 --> 00:06:43,866 You can see every single point is a separate cluster. 129 00:06:43,866 --> 00:06:48,566 Next take the two closest data points and make them one cluster 130 00:06:49,200 --> 00:06:51,966 where we can see that these two points are the closest. 131 00:06:51,966 --> 00:06:53,466 So we're putting them together in one cluster. 132 00:06:53,466 --> 00:06:55,800 Now we have five clusters 123, four. 133 00:06:55,800 --> 00:06:57,600 And these two are one cluster. 134 00:06:57,600 --> 00:06:58,933 All right. Step three. 135 00:06:58,933 --> 00:07:00,566 Take the two closest clusters 136 00:07:00,566 --> 00:07:03,566 out of the ones we had and turn them into one cluster. 137 00:07:03,633 --> 00:07:07,333 So out of the clusters we had because remember each point out of these four. 138 00:07:07,500 --> 00:07:09,933 So if I go back here each point here is a separate cluster. 139 00:07:09,933 --> 00:07:10,966 And this is a cluster. 140 00:07:10,966 --> 00:07:13,600 So now just measure the distances between clusters. 141 00:07:13,600 --> 00:07:16,200 And let's say in our example we're going to be talking about 142 00:07:16,200 --> 00:07:18,666 the distance between clusters as the minimum distance. 143 00:07:18,666 --> 00:07:21,600 So that would be the distance between those two clusters. 144 00:07:21,600 --> 00:07:24,300 That would be the distance that would be that that would be and so on. 145 00:07:24,300 --> 00:07:26,466 So we measure all the distances between clusters. 146 00:07:26,466 --> 00:07:29,300 And you find out that these two are actually the closest clusters. 147 00:07:29,300 --> 00:07:31,266 And you combine them into one cluster. 148 00:07:31,266 --> 00:07:32,766 Next you repeat step three. 149 00:07:32,766 --> 00:07:36,533 So next out of these two clusters out of one, two, three, four clusters. 150 00:07:36,533 --> 00:07:38,400 So you can see we had five. Now we have four. 151 00:07:38,400 --> 00:07:40,000 Every time you reduce the number of clusters 152 00:07:40,000 --> 00:07:43,100 by one out of these four clusters which are the cluster as well the 153 00:07:43,166 --> 00:07:45,400 this seems as the closest cluster. 154 00:07:45,400 --> 00:07:46,700 So we're going to come by that. 155 00:07:46,700 --> 00:07:48,733 Now out of these three clusters which are the cluster. 156 00:07:48,733 --> 00:07:51,366 So it looks like these two are so combining them. 157 00:07:51,366 --> 00:07:53,333 And now we've only got two clusters left. 158 00:07:53,333 --> 00:07:54,900 So the last step would be to combine them 159 00:07:54,900 --> 00:07:57,900 because they are by default they're going to be the closest. 160 00:07:58,366 --> 00:07:58,900 So there we go. 161 00:07:58,900 --> 00:08:01,500 And so that is the end of our algorithm. 162 00:08:01,500 --> 00:08:02,866 That's how it converges. 163 00:08:02,866 --> 00:08:07,033 You've we've gone through the process of create 164 00:08:07,033 --> 00:08:11,366 of going from all points seen as an individual cluster. 165 00:08:11,366 --> 00:08:15,366 So each point is an individual cluster to now we only have one cluster 166 00:08:15,533 --> 00:08:17,200 that combines all of the points. 167 00:08:17,200 --> 00:08:18,900 And so what's the purpose of all of this? 168 00:08:18,900 --> 00:08:20,466 what's the purpose of this exercise? 169 00:08:20,466 --> 00:08:24,033 The way the hierarchical clustering algorithm works is that it 170 00:08:24,166 --> 00:08:28,300 maintains a memory of how we went through this process, 171 00:08:28,566 --> 00:08:31,200 and that memory is stored in a dendrogram. 172 00:08:31,200 --> 00:08:33,500 And that's exactly what we're going to talk in the next tutorial. 173 00:08:33,500 --> 00:08:37,566 And once we've covered the dendrogram it will make total sense 174 00:08:37,566 --> 00:08:42,033 why the hierarchical clustering algorithm does what it does and how it works. 175 00:08:42,333 --> 00:08:43,733 So hopefully you enjoyed today's tutorial. 176 00:08:43,733 --> 00:08:45,233 Can't wait to see you on the next one! 177 00:08:45,233 --> 00:08:47,100 Until then, enjoy machine learning!