1 00:00:00,333 --> 00:00:02,566 Hello and welcome back to the course on Machine Learning. 2 00:00:02,566 --> 00:00:06,066 Today we will finally find out about the kernel trick. 3 00:00:06,300 --> 00:00:08,200 So let's get started. 4 00:00:08,200 --> 00:00:11,533 So here we've got the Gaussian or the radial basis function kernel. 5 00:00:11,533 --> 00:00:14,300 And those are two interchangeable terms. 6 00:00:14,300 --> 00:00:17,366 And now let's have a look at this function. 7 00:00:17,366 --> 00:00:19,666 So K stands for kernel. 8 00:00:19,666 --> 00:00:25,600 And it's the function f function applied to two vectors the x vector. 9 00:00:25,600 --> 00:00:29,400 So this is a just a some sort of point in our data set. 10 00:00:29,400 --> 00:00:34,433 And L stands for landmark I means there might be several landmarks, 11 00:00:34,433 --> 00:00:36,533 but we're going to we're not going to worry about 12 00:00:36,533 --> 00:00:39,533 I for now we're just going to look at this as a landmark. 13 00:00:39,666 --> 00:00:45,100 And then that equals to an exponent in the power of minus the 14 00:00:45,333 --> 00:00:48,766 the double vertical lines mean the distance between x 15 00:00:48,766 --> 00:00:53,000 and the landmark squared divided by two sigma squared. 16 00:00:53,300 --> 00:00:56,400 So I know this might all seem very confusing right now. 17 00:00:56,400 --> 00:01:00,833 And, you're probably wondering, Carol, this makes no sense whatsoever. 18 00:01:01,233 --> 00:01:03,000 What is this, even mean? 19 00:01:03,000 --> 00:01:07,500 Well, let's explore this through a visual example. 20 00:01:07,766 --> 00:01:13,466 So here I've got, an image which represents this particular function 21 00:01:13,633 --> 00:01:17,200 for a specific sigma, for a specific landmark. 22 00:01:17,800 --> 00:01:20,400 but this is what it looks like, 23 00:01:20,400 --> 00:01:23,566 in a, when you visualize it. 24 00:01:23,566 --> 00:01:26,566 And so what's happening here is, we've got l 25 00:01:26,566 --> 00:01:31,300 the landmark is actually in the middle of this, plane. 26 00:01:31,300 --> 00:01:35,366 So in the middle of this two dimensional space, 27 00:01:35,366 --> 00:01:36,966 let's imagine this is the X coordinate. 28 00:01:36,966 --> 00:01:39,533 This is the y coordinate in the middle. We've got zero zero. 29 00:01:39,533 --> 00:01:42,533 And that's where the landmark is actually located. 30 00:01:42,600 --> 00:01:46,433 And then, the vertical here, the vertical axis 31 00:01:46,433 --> 00:01:50,400 represents the result that we get when we calculate this. 32 00:01:51,266 --> 00:01:53,900 for every other point on this, 33 00:01:54,866 --> 00:01:58,533 z axis, on this XY field or plane, 34 00:01:59,266 --> 00:02:03,633 if we take any other point, then, this the results of this calculation. 35 00:02:03,633 --> 00:02:04,800 So let's say, 36 00:02:04,800 --> 00:02:07,866 we put that point in here and then we calculate the distance to the landmark. 37 00:02:08,300 --> 00:02:11,300 then we square divided by two sigma squared 38 00:02:11,300 --> 00:02:14,666 where sigma is some fixed parameter that we decided upon earlier. 39 00:02:14,900 --> 00:02:19,033 And then we take the negative of that, and then we put the exponent 40 00:02:19,033 --> 00:02:20,100 into that power. 41 00:02:20,100 --> 00:02:21,100 Then we get this result. 42 00:02:21,100 --> 00:02:22,900 That's what it will look like. 43 00:02:22,900 --> 00:02:25,033 So let's, go through this step by step. 44 00:02:25,033 --> 00:02:28,033 Let's, look at the tip of this, 45 00:02:28,866 --> 00:02:31,900 is right in the middle of this whole, 46 00:02:32,166 --> 00:02:34,933 XY, plane. 47 00:02:34,933 --> 00:02:38,166 And so if you project that into back onto the plane 48 00:02:38,500 --> 00:02:42,566 and the bottom, you'll have the kernel that's, that or not the kernel. 49 00:02:42,566 --> 00:02:45,300 That's the, landmark. We're gonna call it the landmark. 50 00:02:45,300 --> 00:02:49,400 That's where, the middle of this bottom square is. 51 00:02:49,400 --> 00:02:52,833 And that's from where we're measuring, the distance. 52 00:02:52,833 --> 00:02:57,566 So, k, this distance x minus l I this distance 53 00:02:57,566 --> 00:03:00,566 that we're you squaring here, it will be measured from that landmark. 54 00:03:00,566 --> 00:03:02,766 So let's, take a point and look at it. 55 00:03:02,766 --> 00:03:06,500 So there is a point, it's somewhere on a plane. 56 00:03:06,500 --> 00:03:08,833 It's quite far away from the landmark. 57 00:03:08,833 --> 00:03:09,866 There's a distance. 58 00:03:09,866 --> 00:03:11,900 So we're taking that distance. 59 00:03:11,900 --> 00:03:16,066 We're squaring that distance, dividing it by two sigma squared. 60 00:03:16,066 --> 00:03:17,233 Take it negative. 61 00:03:17,233 --> 00:03:20,100 And then we want to see what the result will be. 62 00:03:20,100 --> 00:03:23,066 And so, how can we confirm that this visualization 63 00:03:23,066 --> 00:03:26,666 is actually indeed, aligned with this formula. 64 00:03:26,666 --> 00:03:28,500 So it's pretty simple. 65 00:03:28,500 --> 00:03:31,566 the distance here, let's assume it's quite large. 66 00:03:31,566 --> 00:03:31,800 Right. 67 00:03:31,800 --> 00:03:33,133 So it's quite a large distance 68 00:03:33,133 --> 00:03:35,500 compared to some other points that are closer to the landmarks. 69 00:03:35,500 --> 00:03:39,666 So basically the distance here is a large number. 70 00:03:39,933 --> 00:03:43,900 And if we take a large number and we square it right, 71 00:03:43,900 --> 00:03:45,900 we get an even larger number. 72 00:03:45,900 --> 00:03:49,100 And then we divide by two sigma squared. 73 00:03:49,100 --> 00:03:51,366 It's still assuming it's still a large number. 74 00:03:51,366 --> 00:03:52,566 Again depends on the sigma. 75 00:03:52,566 --> 00:03:56,800 And we'll find out the role of sigma further done in this tutorial. 76 00:03:56,800 --> 00:04:00,166 But assuming this is still a very large number. 77 00:04:00,466 --> 00:04:04,166 So you've got a very large number here, and then you say my negative, 78 00:04:04,200 --> 00:04:07,500 you're making a negative a very large but negative number. 79 00:04:07,500 --> 00:04:09,666 So if you take in an exponent 80 00:04:09,666 --> 00:04:14,100 and you put it into a power of very negative, a very large 81 00:04:14,100 --> 00:04:18,633 negative number, so e to the power of one, let's say -1,000,000. 82 00:04:18,633 --> 00:04:18,966 Right. 83 00:04:18,966 --> 00:04:23,333 Some, you know, just for argument's sake or -1000, what does that give us. 84 00:04:23,466 --> 00:04:26,666 That gives us a value very close to zero. 85 00:04:26,666 --> 00:04:30,533 So it's it's basically equivalent to saying one divided 86 00:04:30,533 --> 00:04:32,666 by e to the power of a thousand. 87 00:04:32,666 --> 00:04:35,400 And that is a very, very small number. 88 00:04:35,400 --> 00:04:38,733 So that basically means when you're far away from the, 89 00:04:39,700 --> 00:04:42,700 landmark or from the center, you get 90 00:04:43,066 --> 00:04:46,466 pretty much zero on the vertical axis and which aligns with our image here. 91 00:04:46,866 --> 00:04:49,066 Now let's have a look at another, example. 92 00:04:49,066 --> 00:04:52,800 So this point is actually closer to the landmark. 93 00:04:52,800 --> 00:04:55,400 And here if we measure the distance is quite small. 94 00:04:55,400 --> 00:04:58,833 So now if you take a small number, you square it and you still have a small 95 00:04:58,833 --> 00:05:01,333 number, you divide by two sigma square, you still have a small number. 96 00:05:01,333 --> 00:05:06,633 So you look at e to the power of minus a small number. 97 00:05:06,633 --> 00:05:10,533 So let's say e to the power of minus, I don't know, like, 98 00:05:11,233 --> 00:05:14,800 point or E to the power of minus. 99 00:05:16,566 --> 00:05:17,900 one, for example, 100 00:05:17,900 --> 00:05:21,433 or E to the power power of -0, 0.1. 101 00:05:22,100 --> 00:05:23,633 so that basically means. 102 00:05:23,633 --> 00:05:26,633 So this number is close to zero is it's, 103 00:05:26,666 --> 00:05:31,666 as you get closer to the landmark, the this number gets closer to zero. 104 00:05:31,666 --> 00:05:34,666 So and we know that e to the power of minus 105 00:05:35,066 --> 00:05:39,600 zero, -0.01, 0.0001 and so on. 106 00:05:39,866 --> 00:05:42,900 Basically as you get close to zero e to a power, 107 00:05:42,900 --> 00:05:45,900 we get closer to E, to the power of zero. 108 00:05:46,266 --> 00:05:47,933 And each of the power of zero is one. 109 00:05:47,933 --> 00:05:51,133 So basically, as you get closer to your landmark, 110 00:05:51,133 --> 00:05:55,100 this number over here, this number here gets smaller and smaller 111 00:05:55,100 --> 00:05:59,166 and smaller, and this, the whole right part over here converges to one. 112 00:05:59,166 --> 00:06:01,266 So it becomes bigger, bigger. We get bigger, bigger. 113 00:06:01,266 --> 00:06:04,600 And you climb this hill up to the top where you get to one 114 00:06:05,066 --> 00:06:06,900 in, the very landmark itself. 115 00:06:06,900 --> 00:06:11,300 So when you exactly hit on the latter, you hit the landmark, you get to the top. 116 00:06:11,766 --> 00:06:15,333 And so that is just a quick way of checking that 117 00:06:15,333 --> 00:06:19,500 this image is indeed, the kernel function that we're looking at. 118 00:06:20,100 --> 00:06:22,800 And, what why is this all useful? 119 00:06:22,800 --> 00:06:23,800 Why do we need this? 120 00:06:23,800 --> 00:06:27,800 Well, because we're going to use this kernel function to, 121 00:06:28,300 --> 00:06:32,666 separate our data set to build that decision boundary. 122 00:06:32,666 --> 00:06:33,300 So let's have a look. 123 00:06:34,900 --> 00:06:37,800 there is our two dimensional space, right? 124 00:06:37,800 --> 00:06:41,600 And there is our X1, x2, just like we had here x1, x2. 125 00:06:42,033 --> 00:06:45,666 And now what are we going to do is we're going to take the landmark 126 00:06:45,666 --> 00:06:48,666 and put it somewhere in our, 127 00:06:48,900 --> 00:06:51,333 in our, among them, our, data set. 128 00:06:51,333 --> 00:06:55,100 And there is a whole methodology on how the machine learning algorithm, 129 00:06:55,100 --> 00:06:58,800 when you implement it in R or Python or any other language, how it does it. 130 00:06:59,100 --> 00:07:01,633 And we're not going to go into detail on that 131 00:07:01,633 --> 00:07:03,133 because we just focus on the intuition. 132 00:07:03,133 --> 00:07:07,466 But basically there's a way to find an optimal placement for these landmarks. 133 00:07:07,900 --> 00:07:10,133 And so landmark is placed. 134 00:07:10,133 --> 00:07:14,600 And next what happens is the distance, 135 00:07:14,833 --> 00:07:18,033 as you can see here, the circle, the circumference around 136 00:07:18,033 --> 00:07:22,200 this, kernel function 137 00:07:22,200 --> 00:07:26,933 is actually projected here onto our visualization. 138 00:07:26,933 --> 00:07:31,866 So, what this circumference allows us to do is it allows us to, 139 00:07:32,200 --> 00:07:35,866 take all of the points that are within that circumference 140 00:07:36,066 --> 00:07:39,066 and have them, like, 141 00:07:39,400 --> 00:07:42,033 assign them a value of above zero. 142 00:07:42,033 --> 00:07:45,266 So anything outside the circumference, all of this blue stuff. 143 00:07:45,566 --> 00:07:49,166 So basically all of these red points, they'll get a value of zero right. 144 00:07:49,433 --> 00:07:53,200 If you apply this function or a valid value very very close to zero. 145 00:07:53,466 --> 00:07:56,900 If on the other hand, any point falls within the circumference 146 00:07:57,066 --> 00:08:00,500 based on this function, it will get a value of above zero. 147 00:08:00,666 --> 00:08:04,733 And that is how we can separate the two classes the green from the red. 148 00:08:05,100 --> 00:08:08,100 Just if we pick the right sigma. 149 00:08:08,100 --> 00:08:13,566 So here we know that, sigma actually, well, we don't know that yet, 150 00:08:13,566 --> 00:08:16,566 but what's a Sigma role is that it 151 00:08:16,700 --> 00:08:19,400 defines how wide this circumference is. 152 00:08:19,400 --> 00:08:21,900 So if you increase sigma, 153 00:08:21,900 --> 00:08:24,066 the circumference will increase like this picture and change. 154 00:08:24,066 --> 00:08:26,300 But it just it should change it. 155 00:08:26,300 --> 00:08:29,966 the circumference here would increase and it would take more, the space up. 156 00:08:30,300 --> 00:08:34,033 or if you decrease sigma, the circumference will decrease 157 00:08:34,300 --> 00:08:37,766 and therefore, you'll take less points. 158 00:08:37,766 --> 00:08:43,200 And so basically, by finding the right sigma, you can set up 159 00:08:43,200 --> 00:08:46,933 the correct, kernel function to 160 00:08:47,900 --> 00:08:49,833 assign 161 00:08:49,833 --> 00:08:54,500 zero values to all of the points that you don't want in your classification 162 00:08:54,500 --> 00:08:59,266 and values above zero to the points that you do want in your classifier. 163 00:08:59,833 --> 00:09:03,066 And that will allow you to, separate 164 00:09:03,066 --> 00:09:06,433 the two that will allow you to classify, each one. 165 00:09:06,433 --> 00:09:09,566 And that, in essence, is the kernel trick we have created 166 00:09:09,600 --> 00:09:12,600 a decision boundary 167 00:09:12,633 --> 00:09:16,600 without actually going into a higher dimensional space, 168 00:09:16,600 --> 00:09:22,200 without having to, project all of our or create a mapping function. 169 00:09:22,700 --> 00:09:26,000 that's going to, you know, take us two dimensional space 170 00:09:26,000 --> 00:09:28,466 and do all the computation. The point is, we're not doing 171 00:09:28,466 --> 00:09:30,433 the computations in the higher dimensional space. 172 00:09:30,433 --> 00:09:34,766 We're still doing the computations in the low dimensional space. Yes. 173 00:09:34,766 --> 00:09:39,666 We have this, visual representation that involves a higher dimensional space. 174 00:09:39,933 --> 00:09:43,233 But at the same time, if you look at the, computational part, 175 00:09:43,966 --> 00:09:47,400 we were just calculating this formula and then we're saying, if this is greater, 176 00:09:47,666 --> 00:09:52,900 if this is equal to zero, then assign, you know, class red, if this is greater 177 00:09:53,100 --> 00:09:57,566 or equal to greater than zero, then assign clause green. 178 00:09:57,833 --> 00:10:00,766 If we if you look at the computation, it's actually happening 179 00:10:00,766 --> 00:10:02,366 in still in the two dimensional space. 180 00:10:02,366 --> 00:10:04,266 And that's called the kernel trick. 181 00:10:04,266 --> 00:10:07,766 So all of a sudden you can adjust your, 182 00:10:08,066 --> 00:10:11,400 decision boundary and, and it's non-linear. 183 00:10:11,666 --> 00:10:16,666 And moreover you find yourself being able to solve much harder, 184 00:10:16,833 --> 00:10:19,933 much more complex problems like this, for example. 185 00:10:19,933 --> 00:10:23,200 So here this is a much it's a very simplified formula. 186 00:10:23,200 --> 00:10:25,100 But if you take two kernel function 187 00:10:25,100 --> 00:10:28,166 and you just add them up, in reality they have to be coefficients to, 188 00:10:28,200 --> 00:10:31,200 to here, to, to here and then another to to before. 189 00:10:31,366 --> 00:10:36,000 so they have to be coefficients with these with the kernel formula. 190 00:10:36,000 --> 00:10:37,133 It's a bit more complex than that. 191 00:10:37,133 --> 00:10:41,166 But in simple terms, if you take two kernel functions 192 00:10:41,166 --> 00:10:45,666 and you just add them up, then because the, 193 00:10:46,166 --> 00:10:50,666 the value of this function, if, let's say this is kernel, 194 00:10:50,666 --> 00:10:53,833 or this is your a landmark one, because the value of the function 195 00:10:53,833 --> 00:10:56,400 when you get further away it becomes zero. Right. 196 00:10:56,400 --> 00:10:57,600 They don't really interfere. 197 00:10:57,600 --> 00:11:01,500 So as you move away from this landmark, so this landmark 198 00:11:01,500 --> 00:11:05,400 will only, encapsulate all of these points around here. 199 00:11:05,400 --> 00:11:08,400 And then as you move away, this will be zero everywhere, right? 200 00:11:08,400 --> 00:11:10,666 Everywhere else, including in these points. 201 00:11:10,666 --> 00:11:13,333 But then this one will be non-zero. 202 00:11:13,333 --> 00:11:14,700 Close to this landmark. 203 00:11:14,700 --> 00:11:15,966 And so if you just add them up, 204 00:11:15,966 --> 00:11:19,500 you will get non-zero values for exactly all of those points. 205 00:11:19,833 --> 00:11:21,666 And so therefore you can draw 206 00:11:21,666 --> 00:11:24,666 a non-linear decision boundary which even looks like this. 207 00:11:24,900 --> 00:11:30,066 And the formula here would be the point is assigned to the green class 208 00:11:30,066 --> 00:11:32,133 when this equation is greater than zero 209 00:11:32,133 --> 00:11:35,633 and the point is assigned to red class when this, equation is equal to zero. 210 00:11:35,933 --> 00:11:37,966 Now again, this is a very simplified example. 211 00:11:37,966 --> 00:11:40,866 In reality, this is a bit different. 212 00:11:40,866 --> 00:11:43,366 It's greater or equal to zero. This is less than zero. 213 00:11:43,366 --> 00:11:44,933 And that's because we have the coefficients. 214 00:11:44,933 --> 00:11:50,533 And it is a bit more of a, complex, more of complex mathematics behind it. 215 00:11:50,533 --> 00:11:53,400 But we don't really need to go into those steps. 216 00:11:53,400 --> 00:11:59,200 The point is that we understand here that we can create this nonlinear, 217 00:11:59,800 --> 00:12:02,800 very complex decision boundary 218 00:12:02,966 --> 00:12:06,000 without having to go into a higher dimensional space. 219 00:12:06,000 --> 00:12:07,533 Everything is still happening 220 00:12:07,533 --> 00:12:11,333 in those same dimensions simply because we're applying the kernel functions. 221 00:12:11,566 --> 00:12:14,333 And that is why this method is called the kernel trick. 222 00:12:14,333 --> 00:12:17,966 I hope you enjoyed this explanation, and I look forward to seeing you next time. 223 00:12:17,966 --> 00:12:19,666 Until then, happy analyzing.