1 00:00:00,700 --> 00:00:03,366 Hello and welcome back to the course on Machine Learning. 2 00:00:03,366 --> 00:00:07,333 Today we're talking about Support Vector Machines or SVMs for short. 3 00:00:07,500 --> 00:00:12,800 So SVMs were initially developed in the 1960s. 4 00:00:12,900 --> 00:00:15,900 Then they were refined again in the 1990s. 5 00:00:15,966 --> 00:00:19,233 And only now they're becoming a very popular in machine learning 6 00:00:19,233 --> 00:00:22,633 because they are demonstrating that they can be very, very powerful 7 00:00:22,666 --> 00:00:26,633 because they are somewhat different to other machine learning algorithms. 8 00:00:26,766 --> 00:00:29,833 And we'll find out how they're special towards the end of this tutorial. 9 00:00:29,966 --> 00:00:33,900 But for now, let's understand how support vector machines actually work. 10 00:00:34,466 --> 00:00:34,700 All right. 11 00:00:34,700 --> 00:00:38,266 So here we've got as usual points on a two dimensional space. 12 00:00:38,300 --> 00:00:41,600 For simplicity's sake we've got just two columns x1 and x2. 13 00:00:41,933 --> 00:00:44,133 And we've got some observations. 14 00:00:44,133 --> 00:00:47,100 Some are red, some are green. So we've already classified them. 15 00:00:47,100 --> 00:00:51,433 But now how do we derive a line that's going to separate them. 16 00:00:51,433 --> 00:00:53,733 So how do we actually separate these points. 17 00:00:53,733 --> 00:00:55,633 Because that's a separation. 18 00:00:55,633 --> 00:00:59,400 Or in other words that decision boundary is going to be very important 19 00:00:59,400 --> 00:01:02,033 for us going forward when we start adding new points. 20 00:01:02,033 --> 00:01:04,300 So that's that's a point about classification. 21 00:01:04,300 --> 00:01:05,700 That's the purpose of our classification. 22 00:01:05,700 --> 00:01:08,166 We want to create a boundary between these two 23 00:01:08,166 --> 00:01:09,400 so that when we in the future 24 00:01:09,400 --> 00:01:12,633 add new points that we want to classify that haven't been classified yet, 25 00:01:12,900 --> 00:01:14,733 we will know where they will fall 26 00:01:14,733 --> 00:01:17,200 either in the green green area or in the red area. 27 00:01:17,200 --> 00:01:19,533 So how can we separate these, points? 28 00:01:19,533 --> 00:01:20,466 We see here? 29 00:01:20,466 --> 00:01:24,700 Well, one way is to draw a line like that in our two dimensional space. 30 00:01:24,700 --> 00:01:25,300 And then, 31 00:01:25,300 --> 00:01:29,200 so anything to the right will be green, anything to the left will be red. 32 00:01:29,200 --> 00:01:33,166 And if a new point falls somewhere on this space, we will know right away 33 00:01:33,166 --> 00:01:35,400 if it's red or green, because we'll know where it falls. 34 00:01:35,400 --> 00:01:38,400 However, there's another way we can draw a horizontal line like that, 35 00:01:38,566 --> 00:01:41,500 or we can draw a diagonal line like that. 36 00:01:41,500 --> 00:01:44,800 We can actually draw another diagonal line, or we can draw another diagonal. 37 00:01:44,900 --> 00:01:48,566 So there's lots of different lines that we can create that will achieve 38 00:01:48,566 --> 00:01:51,566 the same result will separate our points to two classes. 39 00:01:51,666 --> 00:01:55,833 But at the same time they all in the future will have different 40 00:01:56,133 --> 00:01:56,933 consequences. 41 00:01:56,933 --> 00:01:58,666 So when we add new points, 42 00:01:58,666 --> 00:02:01,466 depending on where that point will fall, it'll either be classed 43 00:02:01,466 --> 00:02:03,366 as part of the green zone or part of the red zone. 44 00:02:03,366 --> 00:02:05,100 So we want to find the optimal line. 45 00:02:05,100 --> 00:02:07,166 And that's what, SVMs are all about. 46 00:02:07,166 --> 00:02:10,800 They're about finding the best line or the best, 47 00:02:11,400 --> 00:02:15,066 decision boundary, which will help us separate our space into classes. 48 00:02:15,566 --> 00:02:20,666 So let's find out how the SVM actually searches for this line. 49 00:02:20,966 --> 00:02:25,466 Well, the line is searched through the maximum margin. 50 00:02:25,466 --> 00:02:27,033 So here you can see a line. 51 00:02:27,033 --> 00:02:30,033 And this is the line an SVM will draw. 52 00:02:30,200 --> 00:02:35,366 And so basically it's the line that separates these two classes of points. 53 00:02:35,566 --> 00:02:39,900 And at the same time it has the maximum margin which means this distance. 54 00:02:39,900 --> 00:02:43,966 So this line is drawn equidistant from this point and this point. 55 00:02:44,333 --> 00:02:47,300 And we'll find out exactly why these points in a second. 56 00:02:47,300 --> 00:02:51,100 And then the distance between the line and each one of these points. 57 00:02:51,100 --> 00:02:54,266 So that's equidistant and that's margin. 58 00:02:54,266 --> 00:02:57,933 So the sum of these two distances has to be maximized 59 00:02:58,233 --> 00:03:01,233 in order for this line to be the result of the SVM. 60 00:03:01,566 --> 00:03:04,800 And these two points are actually called the support vectors. 61 00:03:05,200 --> 00:03:07,366 Why they're called vectors. We'll also find out in a second. 62 00:03:07,366 --> 00:03:13,066 But so basically they these two points are supporting this whole algorithm. 63 00:03:13,066 --> 00:03:16,066 So even if you get rid of all the rest of the points, 64 00:03:16,100 --> 00:03:18,700 nothing will change the algorithm will be exactly the same. 65 00:03:18,700 --> 00:03:23,333 So these other points, they don't, contribute to the result of the algorithm. 66 00:03:23,600 --> 00:03:26,400 Only these two points are contributing. 67 00:03:26,400 --> 00:03:29,400 And therefore they called the supporting vectors. 68 00:03:29,400 --> 00:03:32,333 You can you can call them supporting points, but in reality they're vectors. 69 00:03:32,333 --> 00:03:36,100 And this is why, because in a multidimensional space, 70 00:03:36,100 --> 00:03:39,700 when you have more than just two variables, you can have three, five, ten, 71 00:03:39,900 --> 00:03:41,900 100 variables each. 72 00:03:41,900 --> 00:03:45,000 point is actually no longer point because you can't visualize it on a 73 00:03:45,000 --> 00:03:48,133 two dimensional plane or even a three dimensional space. 74 00:03:48,533 --> 00:03:52,266 And therefore, each of those points that we see here is 75 00:03:52,500 --> 00:03:55,566 considered is actually a vector in a multi-dimensional space. 76 00:03:55,566 --> 00:04:00,600 So the more general term for points that we see here are vectors. 77 00:04:00,600 --> 00:04:04,500 And this is something that is studied in, mathematics in a university 78 00:04:04,733 --> 00:04:06,033 or high school mathematics. 79 00:04:06,033 --> 00:04:09,366 And basically so generally speaking they're all vectors. 80 00:04:09,366 --> 00:04:10,800 Just in this particular example, 81 00:04:10,800 --> 00:04:13,833 when we have two dimensions, then we can call them points. 82 00:04:13,833 --> 00:04:14,900 But in reality their vectors. 83 00:04:14,900 --> 00:04:16,866 And that's why they're called support vectors. 84 00:04:16,866 --> 00:04:19,833 So hence these two specific vectors are the ones 85 00:04:19,833 --> 00:04:22,933 supporting kind of supporting this decision boundary. 86 00:04:22,933 --> 00:04:25,633 Or this way we're building this algorithm. That's why they're important. 87 00:04:25,633 --> 00:04:29,100 And that's why this whole algorithm is called the support vector machine. 88 00:04:29,400 --> 00:04:30,933 So now what else do we have here. 89 00:04:30,933 --> 00:04:34,800 Well we've got the line in the middle which is called the maximum margin 90 00:04:34,800 --> 00:04:37,833 hyperplane or the maximum margin classifier. 91 00:04:37,833 --> 00:04:43,000 So in a two dimensional space, it's it's just like a classifier is just a line, 92 00:04:43,100 --> 00:04:46,666 but actually in a multi-dimensional space, it's a hyperplane. 93 00:04:47,100 --> 00:04:50,100 And I know it's a very a bit of a confusing term, 94 00:04:50,266 --> 00:04:52,733 but that's what it's called a maximum margin hyperplane. 95 00:04:52,733 --> 00:04:55,400 So those all other ones that we saw were also hyperplanes. 96 00:04:55,400 --> 00:04:57,833 But there weren't the maximum margin hyperplanes. 97 00:04:57,833 --> 00:04:59,100 And you can check that just also 98 00:04:59,100 --> 00:05:03,000 you can draw a different hyperplane here and just check what the margin will be. 99 00:05:03,000 --> 00:05:06,266 It'll always be less because this is the one with the maximum margin. 100 00:05:06,533 --> 00:05:09,266 And then you've got the green and the red dotted lines. 101 00:05:09,266 --> 00:05:11,533 So the green one is called the positive hyperplane. 102 00:05:11,533 --> 00:05:14,000 And the red one is called the negative hyperplane. 103 00:05:14,000 --> 00:05:16,933 It doesn't really matter in which order you name them. 104 00:05:16,933 --> 00:05:19,966 Just the point is that one of them is positive, the other one is negative. 105 00:05:20,200 --> 00:05:23,933 I basically anything to the right of the positive is classified 106 00:05:24,166 --> 00:05:27,833 as, the green category or the positive category. 107 00:05:27,833 --> 00:05:28,900 Anything to the left 108 00:05:28,900 --> 00:05:32,500 is classified as a negative category or the red category in our case. 109 00:05:32,500 --> 00:05:36,900 So that's, how the, support vector machine algorithm works. 110 00:05:36,900 --> 00:05:37,400 Of course, there's 111 00:05:37,400 --> 00:05:42,000 some complicated mathematics behind it, but the essence, the intuitive part of it 112 00:05:42,133 --> 00:05:47,100 is exactly this that we're working with a linearly separable, 113 00:05:47,633 --> 00:05:51,033 data set where we can actually it's given to us by default 114 00:05:51,033 --> 00:05:55,000 that we can put a line through our chart which will separate the two categories. 115 00:05:55,400 --> 00:05:59,100 And then we're just searching for the one with the maximum margin. 116 00:05:59,666 --> 00:06:02,766 So conceptually, when you think about it, it's actually a pretty, 117 00:06:03,133 --> 00:06:05,700 simple algorithm when you think about it this way. 118 00:06:05,700 --> 00:06:09,833 If I going into the mathematics and the question is what's so special 119 00:06:09,833 --> 00:06:10,800 about SVMs? 120 00:06:10,800 --> 00:06:14,000 Why are they so popular and why are they different? 121 00:06:14,233 --> 00:06:16,666 to other machine learning algorithms? 122 00:06:16,666 --> 00:06:18,766 And that's exactly what we're going to talk about right now. 123 00:06:18,766 --> 00:06:22,466 So imagine you're trying to teach a machine 124 00:06:22,466 --> 00:06:25,633 how to distinguish between apples and oranges, 125 00:06:25,633 --> 00:06:29,333 how to classify, a fruit into either an apple, an orange. 126 00:06:29,333 --> 00:06:31,300 So you're telling the machine that. 127 00:06:31,300 --> 00:06:34,033 All right, I'm going to give you some test data. 128 00:06:34,033 --> 00:06:36,566 So here's have a look at all of these apples. 129 00:06:36,566 --> 00:06:38,366 These are apples to oranges. 130 00:06:38,366 --> 00:06:40,333 Analyze them. Look at them. 131 00:06:40,333 --> 00:06:42,166 see what, parameters they have. 132 00:06:42,166 --> 00:06:43,833 And then next I'm either going to give you I'm 133 00:06:43,833 --> 00:06:46,433 going to give you a fruit which will be either an apple, an orange. 134 00:06:46,433 --> 00:06:49,066 And you're going to need to classify it and tell me 135 00:06:49,066 --> 00:06:50,700 whether it's an apple, an orange. Right. 136 00:06:50,700 --> 00:06:54,766 So that's kind of a standard machine learning problem. 137 00:06:55,300 --> 00:06:59,900 Now in our case here you can see let's say on the right we have oranges. 138 00:06:59,900 --> 00:07:01,466 On the left we have apples. 139 00:07:01,466 --> 00:07:04,766 So what predominantly machine algorithms would do is they would 140 00:07:04,766 --> 00:07:09,366 look at the most apple the apples and the most orange oranges. 141 00:07:09,366 --> 00:07:12,666 So they would look at the most stock standard common type of apples 142 00:07:12,866 --> 00:07:16,133 and the most stock standard common type of oranges. 143 00:07:16,366 --> 00:07:19,166 And now case it would be some apple, some over there in the 144 00:07:19,166 --> 00:07:24,400 in the very heart of the apple, class, far away from the oranges. 145 00:07:24,633 --> 00:07:26,566 And for the oranges would be somewhere over there. 146 00:07:26,566 --> 00:07:29,900 So also in the very heart of the orange class, far away from the apple. 147 00:07:29,900 --> 00:07:34,533 So they would try a machine, would try to learn from the apples that are very, 148 00:07:35,066 --> 00:07:37,400 like apples. So it would know what an apple is. 149 00:07:37,400 --> 00:07:42,000 And it also try to learn from oranges, so it would know what an orange actually is. 150 00:07:42,200 --> 00:07:44,900 And that's how most of the machine learning algorithms work. 151 00:07:44,900 --> 00:07:48,500 And then based on that, it would be able to come up with some predictions 152 00:07:48,500 --> 00:07:53,866 and classifying for new data elements and new variables that you would give it. 153 00:07:54,166 --> 00:07:57,566 In the case of Support Vector Machine, it's a bit different. 154 00:07:57,700 --> 00:08:01,033 Instead of looking at the most stock standard apples and stock 155 00:08:01,033 --> 00:08:04,700 standard oranges, what does Support Vector Machines do 156 00:08:04,700 --> 00:08:09,400 is they actually look at the apples that are very much like an orange. 157 00:08:09,400 --> 00:08:13,266 So here you can see an apple which is not your standard apples, orange in color. 158 00:08:13,266 --> 00:08:16,166 Right. So it's very easy to confuse this apple with an orange. 159 00:08:16,166 --> 00:08:19,800 And they would look at oranges which are not stock, standard oranges 160 00:08:19,800 --> 00:08:21,600 which are more like apples than anything else. 161 00:08:21,600 --> 00:08:23,233 So ignore the lemon here. 162 00:08:23,233 --> 00:08:26,233 So those of us in the image, just out of the oranges, 163 00:08:26,400 --> 00:08:30,733 the SVM would pick the one that is that looks the most like an apple. 164 00:08:30,733 --> 00:08:32,433 In this case, we have a green orange. 165 00:08:32,433 --> 00:08:35,100 It's, not normal to have a green orange. 166 00:08:35,100 --> 00:08:37,433 When you think of orange, you think of an orange orange. 167 00:08:37,433 --> 00:08:41,700 And so what that is, is those are the support support vectors. 168 00:08:41,700 --> 00:08:43,233 So the support vectors, you can see that 169 00:08:43,233 --> 00:08:45,033 they're actually very close to the boundary. 170 00:08:45,033 --> 00:08:48,766 So they're very close to to the apple or the red one would be very close 171 00:08:48,766 --> 00:08:49,400 to the green ones. 172 00:08:49,400 --> 00:08:51,466 And the orange of the green mark 173 00:08:51,466 --> 00:08:54,966 here would be very close to the red ones and therefore the support vector machine. 174 00:08:55,200 --> 00:08:58,766 In that sense, you can think of it as like a more extreme type of algorithm, 175 00:08:58,766 --> 00:09:02,666 a very rebellious type of algorithm, or very risky type of algorithm, 176 00:09:02,666 --> 00:09:08,366 because it looks at the very extreme case, which is very close to the boundary, 177 00:09:08,400 --> 00:09:12,866 and it uses that to construct its analysis. 178 00:09:12,966 --> 00:09:18,033 And that in itself makes the support vector machine algorithms very special, 179 00:09:18,033 --> 00:09:22,766 very different to, most of the other machine learning algorithms. 180 00:09:22,766 --> 00:09:26,400 And that's why at times they perform much better 181 00:09:26,633 --> 00:09:29,633 than, non support vector machine algorithms. 182 00:09:29,933 --> 00:09:30,700 So there you go. 183 00:09:30,700 --> 00:09:35,000 I hope is explanation and intuition of support vector machines was useful. 184 00:09:35,000 --> 00:09:39,233 And now not only you know how they work, but also why they are different 185 00:09:39,466 --> 00:09:42,500 to other algorithms out there that are used in machine learning. 186 00:09:42,900 --> 00:09:45,066 And on that note, we're going to end today's tutorial. 187 00:09:45,066 --> 00:09:46,733 I look forward to seeing you next time. 188 00:09:46,733 --> 00:09:48,566 And till then, enjoy machine learning.