1 00:00:00,333 --> 00:00:02,666 Hello and welcome back to the course on Machine Learning. 2 00:00:02,666 --> 00:00:07,233 In today's tutorial, we will find out how we can take our Non-Linearly separable 3 00:00:07,233 --> 00:00:12,033 data set, map it to a higher dimension and get a linearly separable data set. 4 00:00:12,200 --> 00:00:17,000 Invoke the support vector machine algorithm, build a decision boundary 5 00:00:17,000 --> 00:00:21,000 for a data set, and then project all of that back into our original dimensions. 6 00:00:21,266 --> 00:00:24,266 So quite a lot to cover. Let's get started. 7 00:00:24,600 --> 00:00:26,933 First off we're going to look at a simplified example. 8 00:00:26,933 --> 00:00:29,600 We're going to look at a one dimensional data set. 9 00:00:29,600 --> 00:00:33,833 So normally we visualize everything in the PowerPoint with two dimensions. 10 00:00:34,066 --> 00:00:36,633 to make it, you know, look pretty. 11 00:00:36,633 --> 00:00:40,466 And so that we can, kind of understand how it would work in multiple dimensions. 12 00:00:40,633 --> 00:00:44,333 But right now, that will be a bit too complex for us to start with. 13 00:00:44,366 --> 00:00:47,300 So we're going to start with a single dimension. 14 00:00:47,300 --> 00:00:49,333 So here we've got the X1 dimension. 15 00:00:49,333 --> 00:00:52,533 We've got some points here. 16 00:00:52,533 --> 00:00:54,100 So we've got nine data points. 17 00:00:54,100 --> 00:00:57,500 And as we can see they are non linearly separable. 18 00:00:57,500 --> 00:01:01,433 So in a in this dimensional in a single dimension 19 00:01:02,066 --> 00:01:06,333 dimensional space, a linear separator would not be a line would be a dot. 20 00:01:06,333 --> 00:01:06,633 Right. 21 00:01:06,633 --> 00:01:10,233 So in a two dimensional space, a linear, a linear 22 00:01:10,700 --> 00:01:15,700 separator is a line, in a three dimensional space it's a hyperplane. 23 00:01:15,700 --> 00:01:18,133 But in a one dimensional space it's a single dot. 24 00:01:18,133 --> 00:01:21,266 So can we separate a separate the green from the red with a single dot? 25 00:01:21,266 --> 00:01:21,700 Here. 26 00:01:21,700 --> 00:01:25,833 No we cannot we if we put it here then these will be separated from that. 27 00:01:25,833 --> 00:01:27,633 If we put it here will be separate from that. 28 00:01:27,633 --> 00:01:30,633 So this is a non-linearly separable data set. 29 00:01:31,300 --> 00:01:34,033 Now how can we apply 30 00:01:34,033 --> 00:01:38,400 the method of increasing the dimensionality of this space 31 00:01:38,400 --> 00:01:42,066 to make it a linearly separable data set in a higher dimension? 32 00:01:42,066 --> 00:01:43,566 That's what we're going to look at. 33 00:01:43,566 --> 00:01:46,033 And it might seem impossible. Right. 34 00:01:46,033 --> 00:01:49,766 So the first time I learned about this, for me it was like, wow, how can you take, 35 00:01:50,300 --> 00:01:54,600 a non-linearly separable data set and somehow magically increase 36 00:01:54,600 --> 00:01:58,566 the dimensionality and you get a, linearly separable data 37 00:01:58,566 --> 00:02:02,833 set that, you know, sounded like absurd, but it is actually possible. 38 00:02:02,833 --> 00:02:05,300 And that's what we're going to see right now. 39 00:02:05,300 --> 00:02:07,900 So we're going to create this mapping function on the fly. 40 00:02:07,900 --> 00:02:13,033 So, let's say that this point over here is around five. 41 00:02:13,033 --> 00:02:14,833 So, this is zero over here. 42 00:02:14,833 --> 00:02:17,600 And then somewhere here we've got five. It doesn't really matter. 43 00:02:17,600 --> 00:02:18,333 It can be any number. 44 00:02:18,333 --> 00:02:22,433 But just for argument's sake, let's say that this point over here is five, right? 45 00:02:22,433 --> 00:02:23,233 And then it keeps going. 46 00:02:23,233 --> 00:02:26,066 So our first step to build the mapping function, 47 00:02:26,066 --> 00:02:28,533 there can be multiple mapping functions that you can build. 48 00:02:28,533 --> 00:02:31,533 I'm just going to show you one that came to my mind. 49 00:02:31,600 --> 00:02:35,400 so the first step will be to go f equals x minus five. 50 00:02:35,500 --> 00:02:38,966 So to subtract five from our data set. 51 00:02:38,966 --> 00:02:43,066 And that is going to what is that going to do. 52 00:02:43,066 --> 00:02:46,966 That is going to move everything to the left. 53 00:02:46,966 --> 00:02:52,500 So basically, now, this is what, the result looks like. 54 00:02:52,500 --> 00:02:56,533 So if you take five, you subtract five from x, you get, you know, 55 00:02:56,533 --> 00:03:00,000 like these ones will go into negative, these ones will stay and positive. 56 00:03:00,900 --> 00:03:04,266 And then the next step would be to, square all of that. 57 00:03:04,266 --> 00:03:07,766 So f is now equals to x minus five square. 58 00:03:07,766 --> 00:03:09,066 So how will that all look like. 59 00:03:09,066 --> 00:03:13,066 Well basically you'll have this squared function 60 00:03:13,066 --> 00:03:17,033 going through your, chart. 61 00:03:17,033 --> 00:03:20,033 And then all of these will be projected onto the function. 62 00:03:20,266 --> 00:03:21,166 there we go. 63 00:03:21,166 --> 00:03:24,266 So that's what it looks like f equals x minus five squared. 64 00:03:24,633 --> 00:03:28,133 And now what we want to do is we just want to see 65 00:03:28,133 --> 00:03:29,966 that it is indeed linearly separable. 66 00:03:29,966 --> 00:03:30,800 So there we go. 67 00:03:30,800 --> 00:03:32,933 There's our linear separator. 68 00:03:32,933 --> 00:03:36,200 So in a two dimensional space as we remember a linear separator 69 00:03:36,200 --> 00:03:37,766 is a straight line. 70 00:03:37,766 --> 00:03:40,766 And as you can see this data set 71 00:03:40,966 --> 00:03:43,733 became linearly separable in this dimension. 72 00:03:43,733 --> 00:03:46,566 I know it's it's surprising and even a bit shocking. 73 00:03:46,566 --> 00:03:48,200 But indeed it is is the case. 74 00:03:48,200 --> 00:03:52,400 So you can see that we were able to take this line and separate all of them, 75 00:03:52,700 --> 00:03:57,900 red elements or other data set from the green elements, and that's it. 76 00:03:58,200 --> 00:04:01,500 And then what we would do next from here is we would project everything back 77 00:04:01,500 --> 00:04:05,433 onto our original space, and we would know how 78 00:04:05,433 --> 00:04:09,666 to functionally separate, the green from the red. 79 00:04:10,400 --> 00:04:15,000 And that is, what happens when you map something to a higher dimension. 80 00:04:15,300 --> 00:04:20,500 So now knowing this example and seeing that it works in, reality, 81 00:04:20,966 --> 00:04:24,566 we can proceed to a higher dimension, you know, to start 82 00:04:24,566 --> 00:04:26,666 with a two dimensional space. So let's have a look. 83 00:04:26,666 --> 00:04:28,233 So there's our two dimensional space. 84 00:04:28,233 --> 00:04:30,933 And basically you would apply the same principle. 85 00:04:30,933 --> 00:04:35,666 So here you can apply you kind of invoke the support vector machine algorithm 86 00:04:35,666 --> 00:04:38,933 because it is not a non, linearly 87 00:04:39,333 --> 00:04:42,766 separable data set in this space. 88 00:04:42,766 --> 00:04:45,833 But then you would apply some sort of mapping function 89 00:04:45,833 --> 00:04:51,133 like right now we won't go into detail what exactly mapping function it would be. 90 00:04:51,300 --> 00:04:54,133 And again there could be multiple different options and so on. 91 00:04:54,133 --> 00:04:56,933 But basically based on the previous example 92 00:04:56,933 --> 00:05:00,700 we now know that it's possible, like we've seen, empirical evidence 93 00:05:00,700 --> 00:05:01,800 that it is possible 94 00:05:01,800 --> 00:05:03,433 to do the same thing applies 95 00:05:03,433 --> 00:05:05,866 to two dimensional space moving into a three dimensional space, 96 00:05:05,866 --> 00:05:08,733 you would map it into a three dimensional space, and then somehow 97 00:05:08,733 --> 00:05:13,666 it would become a linearly separable data set in this space. 98 00:05:13,666 --> 00:05:16,200 And here we've got the new dimension, which is Z. 99 00:05:16,200 --> 00:05:18,433 And in a three dimensional space, 100 00:05:18,433 --> 00:05:22,166 the linear separator is no longer a line, it's a hyperplane. 101 00:05:22,400 --> 00:05:25,433 And so this hyperplane separates the two, 102 00:05:26,000 --> 00:05:29,000 parts of our data set in the way we want. 103 00:05:29,200 --> 00:05:33,400 So the support vector machine algorithm has helped us build this hyperplane. 104 00:05:33,733 --> 00:05:36,400 And then basically so we've got this result that 105 00:05:36,400 --> 00:05:39,700 we just projected back into our two dimensional space. 106 00:05:39,700 --> 00:05:43,800 And we've got this, circle that encompasses our, 107 00:05:44,666 --> 00:05:47,966 classes or separates our classes. 108 00:05:48,266 --> 00:05:49,433 And there we go. 109 00:05:49,433 --> 00:05:51,300 We've got the non linear separator. 110 00:05:51,300 --> 00:05:57,100 So as you can see we can still even though we I've got a 111 00:05:57,433 --> 00:06:00,433 a bit of a more complex problem where we cannot, 112 00:06:00,466 --> 00:06:04,200 directly apply the support vector machine algorithm as we used to. 113 00:06:04,800 --> 00:06:09,400 we can still go into a, higher, dimension 114 00:06:09,400 --> 00:06:13,633 and then apply the support vector machine algorithm and, like, 115 00:06:13,633 --> 00:06:16,866 we won't go into detail if it's possible all the time. 116 00:06:16,866 --> 00:06:18,533 And if this case is one, it's not possible. 117 00:06:18,533 --> 00:06:19,633 And so what do you do there? 118 00:06:19,633 --> 00:06:23,666 But the point is that, there is a solution that you can 119 00:06:23,866 --> 00:06:27,800 explore the, higher dimensions so that this is not a dead end. 120 00:06:27,800 --> 00:06:30,266 You can just do that. But there's a problem. 121 00:06:30,266 --> 00:06:33,500 The problem with this algorithm, and the problem is that mapping 122 00:06:33,500 --> 00:06:36,600 to a high dimensional space can be highly compute intensive. 123 00:06:36,600 --> 00:06:41,233 So it might require a lot of computation, a lot of, processing power. 124 00:06:41,233 --> 00:06:46,400 And, you know, the larger your data set, the more, of a problem this can cause. 125 00:06:47,366 --> 00:06:49,200 And therefore, 126 00:06:49,200 --> 00:06:52,333 this approach isn't the best because you can imagine, like, 127 00:06:52,566 --> 00:06:56,966 you have a data set and then, mapping it to a higher dimension, 128 00:06:57,000 --> 00:07:00,166 performing all the calculations there, and then coming back to your lower 129 00:07:00,166 --> 00:07:03,200 dimension that can, 130 00:07:03,200 --> 00:07:06,600 even for a computer, not just like in our minds as humans, 131 00:07:06,800 --> 00:07:09,733 but just like for a computer that can, 132 00:07:09,733 --> 00:07:13,266 cause a lot of, delays. 133 00:07:13,266 --> 00:07:16,566 It can cause a lot of, like, processing 134 00:07:16,566 --> 00:07:19,666 backlog and issues in that sense. 135 00:07:19,666 --> 00:07:21,533 And, we don't want that to happen. 136 00:07:21,533 --> 00:07:22,966 And therefore, we're going to explore 137 00:07:22,966 --> 00:07:26,533 something else we can explore, a a different approach, 138 00:07:26,766 --> 00:07:30,033 which is called in mathematics, the kernel trick. 139 00:07:30,033 --> 00:07:34,500 And that's, approach is going to help us perform 140 00:07:34,500 --> 00:07:38,700 very similar, gets very similar results, 141 00:07:38,700 --> 00:07:42,600 but without having to go to a higher dimensional space. 142 00:07:42,600 --> 00:07:45,200 And we're going to talk about that in the next 143 00:07:45,200 --> 00:07:47,933 tutorial, is going to be exciting, and I can't wait to see you there. 144 00:07:47,933 --> 00:07:49,800 Until then, happy analyzing.