1 00:00:00,233 --> 00:00:02,633 Hello and welcome to this art tutorial. 2 00:00:02,633 --> 00:00:05,533 So in the previous tutorial, we plotted our dendrogram 3 00:00:05,533 --> 00:00:10,366 to find the optimal number of clusters and the result was five clusters. 4 00:00:11,100 --> 00:00:13,500 So now in this new step, since we have this 5 00:00:13,500 --> 00:00:16,600 optimal number of clusters, we are going to fit the hierarchical 6 00:00:16,600 --> 00:00:20,900 clustering algorithm to our data X with obviously five clusters. 7 00:00:21,400 --> 00:00:21,700 Okay. 8 00:00:21,700 --> 00:00:26,633 So the first thing that we need to do is to create an object of the H class class. 9 00:00:26,933 --> 00:00:29,933 And actually this is something that we already did. 10 00:00:30,100 --> 00:00:34,033 Because if we go back to step two here when we wrote dendrogram 11 00:00:34,033 --> 00:00:37,800 equals h class, then the distance matrix and the word method 12 00:00:38,266 --> 00:00:41,666 we actually created an object of the H class class. 13 00:00:42,200 --> 00:00:44,733 And we're going to do the exact same thing right now 14 00:00:44,733 --> 00:00:48,333 we're going to create the exact same object from the H class class. 15 00:00:48,666 --> 00:00:50,433 So I'm going to copy this line of code, 16 00:00:51,766 --> 00:00:53,666 paste it here in step three. 17 00:00:53,666 --> 00:00:56,900 And I'm just going to change the name of the object 18 00:00:57,266 --> 00:00:59,066 just to make things nice and clear. 19 00:00:59,066 --> 00:01:02,066 So let's replace here dendrogram by HC. 20 00:01:02,433 --> 00:01:05,600 And now we have the same object as before only with a different name. 21 00:01:06,033 --> 00:01:09,266 And this actually makes sense because by building a dendrogram 22 00:01:09,266 --> 00:01:12,466 we actually do all the steps of hierarchical clustering. 23 00:01:12,866 --> 00:01:16,133 And among all these steps there is the step in which 24 00:01:16,366 --> 00:01:19,366 the algorithm found five clusters. 25 00:01:20,100 --> 00:01:23,133 So that means that this object contains the information 26 00:01:23,133 --> 00:01:25,800 when we have five clusters. 27 00:01:25,800 --> 00:01:29,600 And so now let's use this object to build our vector of clusters. 28 00:01:29,600 --> 00:01:33,800 That is the vector that will tell us which cluster each customer belongs to. 29 00:01:34,366 --> 00:01:36,866 And to build this vector we're going to use one of the method 30 00:01:36,866 --> 00:01:40,500 of the H class class which is the cut tree method. 31 00:01:40,900 --> 00:01:46,300 So we're going to call our vector of clusters y underscore hc equals. 32 00:01:46,766 --> 00:01:49,766 Then we use the country method of the h class class. 33 00:01:49,800 --> 00:01:53,466 So let's type cut tree and then press f1. 34 00:01:54,566 --> 00:01:55,000 And then here 35 00:01:55,000 --> 00:01:58,000 we see that this method requires three arguments. 36 00:01:58,100 --> 00:02:00,000 The first argument is tree. 37 00:02:00,000 --> 00:02:03,133 And of course it's our dendrogram that we just renamed HC. 38 00:02:03,633 --> 00:02:06,633 Then the second argument is k the number of clusters. 39 00:02:07,533 --> 00:02:10,166 So here we input five 40 00:02:10,166 --> 00:02:13,166 and we will leave the default value for the third parameter. 41 00:02:13,600 --> 00:02:13,933 Okay. 42 00:02:13,933 --> 00:02:17,966 Now it's interesting to take a step back because actually this method is called 43 00:02:18,133 --> 00:02:22,500 cut tree which is a very well chosen name because indeed we are cutting the tree 44 00:02:22,500 --> 00:02:25,633 to take the part of the tree that contains our five clusters. 45 00:02:25,866 --> 00:02:27,600 So that makes pretty much sense. 46 00:02:27,600 --> 00:02:31,300 And now, since we are actually done with fitting hierarchical clustering 47 00:02:31,333 --> 00:02:36,100 to our data set, let's select this code section here and execute 48 00:02:36,100 --> 00:02:41,433 to find our y hc vector of clusters that appears right here. 49 00:02:42,900 --> 00:02:43,600 And now if we 50 00:02:43,600 --> 00:02:49,100 type in the console y hc here and press enter, we have all the clusters 51 00:02:49,100 --> 00:02:52,633 that the hierarchical clustering algorithm assigned to each customer. 52 00:02:52,966 --> 00:02:56,266 So for example customer number one belongs to cluster one. 53 00:02:56,266 --> 00:02:58,766 And customer number two belongs to cluster two. 54 00:02:58,766 --> 00:03:01,600 Customer number 106 belongs to cluster three. 55 00:03:01,600 --> 00:03:04,600 And our last customer belongs to cluster four. 56 00:03:05,733 --> 00:03:06,133 Okay. 57 00:03:06,133 --> 00:03:08,966 So congratulations we found the right number of clusters. 58 00:03:08,966 --> 00:03:12,500 And we fit it correctly to hierarchical clustering to our model data set. 59 00:03:12,833 --> 00:03:15,600 And now time for fun. In the next tutorial 60 00:03:15,600 --> 00:03:18,900 we will be visualizing a hierarchical clustering results.