1 00:00:00,300 --> 00:00:00,566 All right. 2 00:00:00,566 --> 00:00:03,000 So let's start from the top. We have to start from the top. 3 00:00:03,000 --> 00:00:04,500 And so from the top it goes this way. 4 00:00:04,500 --> 00:00:05,666 We start from here. 5 00:00:05,666 --> 00:00:07,533 That's the first horizontal bar. 6 00:00:07,533 --> 00:00:10,466 And when I move down you know move down like that. 7 00:00:10,466 --> 00:00:11,133 There you go. 8 00:00:11,133 --> 00:00:13,800 That's when I meet the next horizontal bar. 9 00:00:13,800 --> 00:00:15,033 You know this one. 10 00:00:15,033 --> 00:00:19,166 So actually we made very small vertical distance here, 11 00:00:19,166 --> 00:00:23,700 which means that indeed the optimal number of clusters is definitely not two. 12 00:00:23,733 --> 00:00:26,700 Right. Because here we have two vertical bars. 13 00:00:26,700 --> 00:00:30,933 Then the next step is to start from this second horizontal bar. 14 00:00:31,200 --> 00:00:34,200 And so let's do this with our mouse. 15 00:00:34,266 --> 00:00:35,100 All right. 16 00:00:35,100 --> 00:00:37,700 We start from here the second horizontal bar. 17 00:00:37,700 --> 00:00:41,766 And we're going to go down until we meet the next horizontal bar 18 00:00:41,900 --> 00:00:44,966 which is right here. This one right. 19 00:00:45,000 --> 00:00:47,433 That's the second horizontal bar. This was the first. 20 00:00:47,433 --> 00:00:49,433 And from this first to the second. 21 00:00:49,433 --> 00:00:52,566 Well we made that vertical distance. Okay. 22 00:00:52,833 --> 00:00:54,033 So that's pretty big. 23 00:00:54,033 --> 00:00:58,233 Which means that maybe, you know, if it is the largest vertical move we can make. 24 00:00:58,433 --> 00:01:03,400 Well, in that case, the optimal number of clusters would be one, two, three. 25 00:01:03,466 --> 00:01:06,566 You know the number of vertical bars we have 26 00:01:06,566 --> 00:01:10,266 within this vertical move, this one, this one and this one. 27 00:01:10,266 --> 00:01:12,766 So that would go for three clusters. 28 00:01:12,766 --> 00:01:15,033 All right. So maybe that's the optimal number of clusters. 29 00:01:15,033 --> 00:01:16,466 But then let's continue. 30 00:01:16,466 --> 00:01:18,800 Let's continue from here. Right. 31 00:01:18,800 --> 00:01:20,400 So we're going to continue from here. 32 00:01:20,400 --> 00:01:23,400 That last horizontal bar we met. 33 00:01:23,600 --> 00:01:26,533 And now I'm going to expand over here. 34 00:01:26,533 --> 00:01:29,966 And same we're going to go down until we meet the next horizontal bar. 35 00:01:30,000 --> 00:01:32,866 And there we go. That's the next horizontal bar. 36 00:01:32,866 --> 00:01:34,766 And here we made a small vertical distance. 37 00:01:34,766 --> 00:01:35,900 So definitely 38 00:01:35,900 --> 00:01:39,466 it is shorter than the previous vertical distance we made just before. 39 00:01:39,666 --> 00:01:44,533 And therefore the optimal number of clusters is definitely not one, two 40 00:01:44,766 --> 00:01:47,433 three and four. It's definitely not four. 41 00:01:47,433 --> 00:01:49,266 Okay. Okay. But let's continue. 42 00:01:49,266 --> 00:01:53,900 Maybe there is a higher vertical move we can make in the next step. 43 00:01:53,933 --> 00:01:55,300 And you know exactly that. 44 00:01:55,300 --> 00:01:58,300 This is going to be the case because we already did K-means. 45 00:01:58,466 --> 00:01:59,333 But there we go. 46 00:01:59,333 --> 00:02:00,300 That's next move. 47 00:02:00,300 --> 00:02:01,100 We start from here. 48 00:02:01,100 --> 00:02:03,766 That last horizontal bar we met. 49 00:02:03,766 --> 00:02:08,666 And I'm going to expand this in order to, you know, not miss any horizontal bar. 50 00:02:08,666 --> 00:02:10,066 And actually I should even take 51 00:02:10,066 --> 00:02:13,700 the whole width, you know, I should take exactly this. 52 00:02:14,000 --> 00:02:14,333 All right. 53 00:02:14,333 --> 00:02:17,333 So that we can make sure we don't miss any horizontal bar. 54 00:02:17,466 --> 00:02:19,100 So we start from here, and here we go. 55 00:02:19,100 --> 00:02:21,700 Let's move down, let's move down, move down. Move. 56 00:02:21,700 --> 00:02:25,166 Don't move down until we meet the next horizontal bar. 57 00:02:25,500 --> 00:02:27,866 And here we go. 58 00:02:27,866 --> 00:02:28,166 Right. 59 00:02:28,166 --> 00:02:32,933 You see, that's the next horizontal bar we met the one from that last cluster 60 00:02:32,933 --> 00:02:33,833 here. 61 00:02:33,833 --> 00:02:38,633 And well, now the question is, is that vertical move. 62 00:02:38,633 --> 00:02:39,600 You know, vertical distance 63 00:02:39,600 --> 00:02:44,066 we just made here bigger than the one we made in the second step. 64 00:02:44,066 --> 00:02:45,500 Meaning this one. 65 00:02:45,500 --> 00:02:48,366 Well it seems to be the case, right. 66 00:02:48,366 --> 00:02:53,333 It seems to be that this vertical distance is actually larger 67 00:02:53,333 --> 00:02:57,033 than, you know, this vertical distance. 68 00:02:57,400 --> 00:02:59,866 How could we actually measure this, you know, to be exactly. Sure. 69 00:02:59,866 --> 00:03:04,700 Well, have a look at, you know, the pixel here 264 times 66. 70 00:03:04,700 --> 00:03:05,633 I don't know if you can see it. 71 00:03:05,633 --> 00:03:08,633 Well, but if I move up. 72 00:03:08,633 --> 00:03:10,866 Remember there was 66. 73 00:03:10,866 --> 00:03:12,833 Well actually zero okay. So that's very easy. 74 00:03:12,833 --> 00:03:15,566 So there are actually 66 pixels here. 75 00:03:15,566 --> 00:03:16,633 All right. Cool. 76 00:03:16,633 --> 00:03:20,100 And now let's do this again here from that horizontal bar 77 00:03:20,566 --> 00:03:23,433 and up to here. 78 00:03:23,433 --> 00:03:24,566 And wow okay. 79 00:03:24,566 --> 00:03:26,400 So that was very short actually 80 00:03:26,400 --> 00:03:29,900 because here the vertical distance is actually 67 pixel. 81 00:03:29,900 --> 00:03:35,133 You see it, you know 272 times 67 272 is actually the width of that rectangle. 82 00:03:35,133 --> 00:03:36,633 I just made with my mouse. 83 00:03:36,633 --> 00:03:39,433 And 67 is the height. And that's the height. 84 00:03:39,433 --> 00:03:43,033 We're interesting because that corresponds to the distance of the vertical move. 85 00:03:43,333 --> 00:03:46,300 So actually 67 versus 66. 86 00:03:46,300 --> 00:03:51,100 So well you know three clusters is actually a good number of clusters. 87 00:03:51,266 --> 00:03:55,433 But you know, since we already did it with K-means and on the elbow method, 88 00:03:55,433 --> 00:03:59,300 we clearly identified that the optimal number of clusters was five. 89 00:03:59,400 --> 00:04:02,500 Well, we're going to take that one pixel difference here 90 00:04:02,833 --> 00:04:07,200 in order to still keep well five as the optimal number of clusters. 91 00:04:07,200 --> 00:04:10,200 But it's really interesting that we did this dendrogram. 92 00:04:10,200 --> 00:04:15,100 I actually didn't notice before that we were so close from my vision. 93 00:04:15,100 --> 00:04:18,966 I had the impression that this distance was larger, but there you go. 94 00:04:19,100 --> 00:04:21,000 This one was also very good. 95 00:04:21,000 --> 00:04:24,000 Let's check it again, because I just want to make sure. 96 00:04:24,233 --> 00:04:26,700 Yeah 66 but even, you know, 67 97 00:04:26,700 --> 00:04:29,900 depends down to where you go exactly in that horizontal bar. 98 00:04:29,900 --> 00:04:32,066 Actually, you know, to make it super thorough 99 00:04:32,066 --> 00:04:35,800 we would need to start from here, you know, at the bottom 100 00:04:35,800 --> 00:04:39,700 of the horizontal bar up to the top of the next vertical bar. 101 00:04:39,700 --> 00:04:41,600 So here we have 65. 102 00:04:41,600 --> 00:04:44,600 And for that next distance. 103 00:04:44,700 --> 00:04:45,300 Well, you know 104 00:04:45,300 --> 00:04:50,133 I'm starting from the bottom of the horizontal bar and going here 65. 105 00:04:50,166 --> 00:04:51,300 The same is the same. 106 00:04:51,300 --> 00:04:54,466 So actually three clusters of five clusters is very good. 107 00:04:54,700 --> 00:04:59,200 And so feel free to, you know, continue this implementation with either 3 or 5. 108 00:04:59,433 --> 00:05:03,300 But since with K-means we identified five clusters as the optimal number 109 00:05:03,300 --> 00:05:07,033 of clusters, well, we're still going to go with five here, but there you go. 110 00:05:07,033 --> 00:05:10,200 That's a very good example because indeed in some situations 111 00:05:10,200 --> 00:05:13,200 we can have two optimal number of clusters. 112 00:05:13,733 --> 00:05:14,100 All right. 113 00:05:14,100 --> 00:05:15,466 So I'm really glad I showed you this. 114 00:05:15,466 --> 00:05:19,800 And mostly this shows you the importance of trying several models 115 00:05:20,000 --> 00:05:23,366 because indeed having another model can give you extra insight 116 00:05:23,733 --> 00:05:25,700 when doing your machine learning task. 117 00:05:25,700 --> 00:05:28,800 And with hierarchical clustering, that extra insight 118 00:05:29,066 --> 00:05:32,666 is that three clusters actually makes a lot of sense as well. 119 00:05:32,933 --> 00:05:33,700 So there you go. 120 00:05:33,700 --> 00:05:37,800 Make sure to keep hierarchical clustering in your toolkit in order to get 121 00:05:37,800 --> 00:05:42,800 these extra insights in your future data sets for your future clustering problems. 122 00:05:43,233 --> 00:05:44,033 All right. Perfect. 123 00:05:44,033 --> 00:05:47,366 So let's close this and let's go back to our implementation. 124 00:05:47,600 --> 00:05:51,300 And now for the next step we're going to train that hierarchical 125 00:05:51,300 --> 00:05:55,833 clustering model with as we said five clusters in the next tutorial. 126 00:05:56,100 --> 00:05:58,966 And until then enjoy clustering and machine learning.