1 00:00:00,433 --> 00:00:02,100 Welcome back to your machine. 2 00:00:02,100 --> 00:00:06,600 Learning A to Z course is super excited to have you back on board. 3 00:00:06,600 --> 00:00:10,866 And today we're kicking off the support vector regression intuition. 4 00:00:11,066 --> 00:00:12,766 Super pumped about these tutorials. 5 00:00:12,766 --> 00:00:15,766 Got some very exciting slides coming up for you. 6 00:00:15,833 --> 00:00:17,733 So what is support vector regression? 7 00:00:17,733 --> 00:00:23,333 Well, support vector regression was invented back in the 90s, by Vladimir 8 00:00:23,333 --> 00:00:28,066 Vapnik and his colleagues who were working at the Bell Labs at the time. 9 00:00:28,066 --> 00:00:32,333 That was that AT&T Bell Labs down there and knocked Nokia Bell Labs 10 00:00:33,000 --> 00:00:36,600 and, a lot of, support vector 11 00:00:36,600 --> 00:00:40,300 machine and support vector regression are discussed in, Vladimir. 12 00:00:40,366 --> 00:00:44,333 Up next book, the nature of Statistical Learning, 1992. 13 00:00:44,766 --> 00:00:48,000 In this course, we will be covering both a support vector 14 00:00:48,366 --> 00:00:51,800 machine, in the part of the course to do of classification 15 00:00:51,800 --> 00:00:53,133 and support vector regression. 16 00:00:53,133 --> 00:00:54,600 We'll be talking about it here. 17 00:00:54,600 --> 00:00:58,233 And in addition to that, we'll also be talking about, 18 00:00:58,233 --> 00:01:02,100 kernel support vector machine and kernel, a support 19 00:01:02,100 --> 00:01:06,400 vector regression and the kernel trick and many other things like that. 20 00:01:06,600 --> 00:01:09,066 So there's a lot of exciting tutorials on these topics, 21 00:01:09,066 --> 00:01:11,833 but for now we're just going to limit ourselves to support vector 22 00:01:11,833 --> 00:01:15,633 regression, specifically linear support vector regression. 23 00:01:16,000 --> 00:01:19,300 So here we go. here we've got two plots. 24 00:01:19,300 --> 00:01:22,333 And we need that in order to compare SVR to, 25 00:01:22,700 --> 00:01:26,066 the simple linear regression that'll really help us understand things. 26 00:01:26,266 --> 00:01:29,633 So here on the left we've got some random plots, random dots. 27 00:01:29,933 --> 00:01:31,500 I'm going to copy them over to the right. 28 00:01:31,500 --> 00:01:33,200 So we know that these are identical. 29 00:01:33,200 --> 00:01:34,500 There's no tricks involved. 30 00:01:34,500 --> 00:01:36,300 There's an absolutely identical. 31 00:01:36,300 --> 00:01:39,300 this is an absolutely identical set of data. 32 00:01:39,300 --> 00:01:41,166 And, let's start with the one on the left. 33 00:01:41,166 --> 00:01:44,133 We're going to apply a simple linear regression. 34 00:01:44,133 --> 00:01:47,200 We've already discussed what it's like, but let's quickly refresh. 35 00:01:47,200 --> 00:01:51,000 So basically we're going to have this line go through the data. 36 00:01:51,000 --> 00:01:52,766 And how is this line derived. 37 00:01:52,766 --> 00:01:55,800 Well a method called the ordinary least 38 00:01:55,833 --> 00:01:59,300 squares method will be applied to find this line. 39 00:01:59,300 --> 00:02:03,933 Basically we want to minimize, the distance between the 40 00:02:03,933 --> 00:02:07,800 this value y, the actual value in the data 41 00:02:07,800 --> 00:02:10,333 and y hat basically what it would have been on the trend line. 42 00:02:10,333 --> 00:02:12,466 We take the difference here or difference here. 43 00:02:12,466 --> 00:02:14,233 We square it and we want to minimize that. 44 00:02:14,233 --> 00:02:16,166 That's ordinary least squares method. 45 00:02:16,166 --> 00:02:18,800 If essentially what we're doing is minimize 46 00:02:18,800 --> 00:02:23,533 the error, we want to have a line with the minimum error, possible. 47 00:02:23,533 --> 00:02:26,800 So that's that's the, intuition behind 48 00:02:27,200 --> 00:02:30,000 a simple linear regression, something we already talked about. 49 00:02:30,000 --> 00:02:34,366 Now, how does this, support vector regression work? 50 00:02:34,466 --> 00:02:35,866 Well, let's have a look on the right 51 00:02:35,866 --> 00:02:39,766 with SVR instead of a simple line, you'll see a tube. 52 00:02:40,333 --> 00:02:43,566 And here you have the regression line in the middle. 53 00:02:43,566 --> 00:02:46,266 But then there's this tube around. And what does this tube do. 54 00:02:46,266 --> 00:02:50,800 Well this tube has a width of epsilon and the width is measured vertically. 55 00:02:50,833 --> 00:02:54,766 This important along this axis not perpendicular to the tube but vertically. 56 00:02:55,300 --> 00:03:00,166 And this tube itself is called the epsilon insensitive tube. 57 00:03:00,300 --> 00:03:01,366 And what does that mean. 58 00:03:01,366 --> 00:03:05,800 Well, that means that any, points in our data 59 00:03:05,800 --> 00:03:09,333 set that fall inside the tube, they won't be. 60 00:03:09,333 --> 00:03:10,966 We'll be disregarding the error. 61 00:03:10,966 --> 00:03:14,933 So basically this tube, think of it as a a margin of error 62 00:03:14,933 --> 00:03:18,366 that we are allowing our model to have 63 00:03:18,733 --> 00:03:21,966 and not care about any error inside here. 64 00:03:21,966 --> 00:03:26,666 So any discrepancy between or any like distance between this, 65 00:03:26,966 --> 00:03:31,333 point over here and the line as in like for instance here we could see 66 00:03:31,733 --> 00:03:34,733 let's look at which point is that that's, 67 00:03:35,300 --> 00:03:38,533 one, two, three, this third point, you can see the line is even different, right? 68 00:03:38,533 --> 00:03:40,666 The results can be different and probably will be different. 69 00:03:40,666 --> 00:03:43,633 So this third point here, there's a distance between the line here. 70 00:03:43,633 --> 00:03:45,300 And we care about this error here. 71 00:03:45,300 --> 00:03:48,800 We don't care about this error because it falls within this epsilon 72 00:03:49,000 --> 00:03:50,100 insensitive tube. 73 00:03:50,100 --> 00:03:52,900 So we're disregarding any kind of errors in here. 74 00:03:52,900 --> 00:03:55,200 And that's kind of the key behind support vector regression. 75 00:03:55,200 --> 00:03:58,966 And it gives a little bit of movement 76 00:03:58,966 --> 00:04:03,100 or a little bit of buffer to our model. 77 00:04:03,633 --> 00:04:06,900 And at the same time we have points that are outside 78 00:04:06,900 --> 00:04:09,133 the epsilon insensitive tube. 79 00:04:09,133 --> 00:04:10,500 there they are. 80 00:04:10,500 --> 00:04:13,166 And for them we do care about the error. 81 00:04:13,166 --> 00:04:17,033 And here will be measured as the distance between the, the point 82 00:04:17,033 --> 00:04:18,066 and the tube itself. 83 00:04:18,066 --> 00:04:20,400 So not the trend line, but the tube itself. 84 00:04:20,400 --> 00:04:22,500 these distances have names. 85 00:04:22,500 --> 00:04:24,533 They're either C star. 86 00:04:24,533 --> 00:04:28,166 If, the point is below the tube, or see 87 00:04:28,333 --> 00:04:31,733 if the point is above the tube and, they're called. 88 00:04:31,900 --> 00:04:34,533 So these values are called slack variables. 89 00:04:34,533 --> 00:04:38,666 So it's either C star if it's below C farthest star or if it's above. 90 00:04:39,000 --> 00:04:40,733 And we do care about the error. 91 00:04:40,733 --> 00:04:43,800 So we care about these distances and the way we care about it. 92 00:04:43,800 --> 00:04:47,433 So we're going to try to avoid formulas that will give additional 93 00:04:47,433 --> 00:04:51,233 reading something you can look into further down at the end of this tutorial. 94 00:04:51,233 --> 00:04:55,866 But just for for completeness sake, here is the formula. 95 00:04:55,966 --> 00:04:59,233 So in the case of OLS, it was a simple, 96 00:04:59,566 --> 00:05:02,933 ordinary least squared like that here it's a bit more complex. 97 00:05:03,433 --> 00:05:05,700 we're not going to talk about this part over here. 98 00:05:05,700 --> 00:05:08,000 but what we're focusing on is this. 99 00:05:08,000 --> 00:05:10,100 And we can see that we're minimizing. 100 00:05:10,100 --> 00:05:14,233 We want these, distances, the sum of the sum of these distances 101 00:05:14,233 --> 00:05:16,933 to be minimal, once again, will be additional 102 00:05:16,933 --> 00:05:18,733 reading at the end if you'd like to go into it. 103 00:05:18,733 --> 00:05:21,700 But effectively, these are points. 104 00:05:21,700 --> 00:05:24,466 The ones that are outside our tube are dictated 105 00:05:24,466 --> 00:05:28,133 what the tube will look like, how the tube will be positioned 106 00:05:28,133 --> 00:05:31,233 so the error within the tube is completely disregarded. 107 00:05:31,233 --> 00:05:34,200 We don't care about the error unlike in the ordinary squares, 108 00:05:34,200 --> 00:05:37,733 so we're giving some kind of buffer or flexibility to our tube. 109 00:05:38,066 --> 00:05:42,600 or like allowing it to, accounting 110 00:05:42,600 --> 00:05:47,100 for some kind of error that, we might expect in the data. 111 00:05:47,100 --> 00:05:47,800 It's normal 112 00:05:47,800 --> 00:05:51,900 sometimes for there to be error, but these ones, they are important to us. 113 00:05:52,333 --> 00:05:55,300 And, also one final thing. 114 00:05:55,300 --> 00:05:58,200 Why is this a method called support vector regression? 115 00:05:58,200 --> 00:06:02,800 Well, because effectively these points, all of these points outside. 116 00:06:03,000 --> 00:06:07,200 But at any point, actually any point on this plot is a vector, right. 117 00:06:07,200 --> 00:06:07,800 Can be represented 118 00:06:07,800 --> 00:06:10,933 as a vector in this two dimensional space or a multi dimensional space. 119 00:06:10,933 --> 00:06:13,000 If you have more features. 120 00:06:13,000 --> 00:06:15,766 So in this case it's kind of presented by a two dimensional vector. 121 00:06:15,766 --> 00:06:18,400 So they are all these points are vectors. 122 00:06:18,400 --> 00:06:19,800 But the ones that we've highlighted 123 00:06:19,800 --> 00:06:23,633 in red, the ones outside the tube, they're the support vectors 124 00:06:23,633 --> 00:06:27,633 because they are dictating how this tube is created. 125 00:06:27,633 --> 00:06:32,700 So basically they're supporting the structure or formation of this tube. 126 00:06:32,700 --> 00:06:35,133 And that's why they're called support vectors. 127 00:06:35,133 --> 00:06:37,200 And that's why this is a support vector regression. 128 00:06:38,166 --> 00:06:38,933 And so there we go. 129 00:06:38,933 --> 00:06:40,200 That's what it's all about. 130 00:06:40,200 --> 00:06:43,800 that just important to remember the epsilon insensitive tube and that, 131 00:06:43,800 --> 00:06:45,300 support vector regression 132 00:06:45,300 --> 00:06:49,000 just cares about the errors of anything that's lying outside this tube. 133 00:06:49,666 --> 00:06:52,666 And, to finish off, as promised, here's the additional reading. 134 00:06:52,900 --> 00:06:56,666 so if you'd like to learn a bit more, have a look at chapter four Support Vector 135 00:06:56,666 --> 00:07:00,500 Regression in a book called Efficient Learning Machines theories, concepts, 136 00:07:00,500 --> 00:07:05,400 and Applications for Engineers and System Design by Marietta Ward and Rahul Khanna. 137 00:07:05,800 --> 00:07:09,766 and here's a link here, where it's aggregated on this, 138 00:07:10,066 --> 00:07:14,066 portal for, published work. 139 00:07:14,533 --> 00:07:19,166 something you'll note that it might be a little bit confusing here. 140 00:07:19,166 --> 00:07:21,000 They say these are potential support vectors. 141 00:07:21,000 --> 00:07:23,133 They're referring only to the ones close. 142 00:07:23,133 --> 00:07:27,900 I've had a look at different literature, and, the literature I prefer is 143 00:07:27,900 --> 00:07:31,866 the one that says in this respect is the one that says that the support 144 00:07:31,866 --> 00:07:36,500 vectors are the ones that are outside and they are, outside. 145 00:07:36,500 --> 00:07:39,633 Any, any, basically any point outside the tube is a support vector. 146 00:07:39,633 --> 00:07:44,100 So that's how we talked discussed it inside this tutorial. 147 00:07:44,100 --> 00:07:46,733 But have a look at this different nomenclature. 148 00:07:46,733 --> 00:07:50,666 Maybe that'll be a good perspective to have a different view. 149 00:07:50,666 --> 00:07:55,866 But overall the first couple of paragraphs describing the whole problem, they're 150 00:07:56,066 --> 00:07:58,033 very well written. I liked how they were written. 151 00:07:58,033 --> 00:08:01,600 And I think, can be a valuable addition if you're looking for additional reading. 152 00:08:02,133 --> 00:08:04,500 So there we go. That's, support vector regression. 153 00:08:04,500 --> 00:08:07,200 Hope you enjoyed this tutorial and look forward to seeing you next time. 154 00:08:07,200 --> 00:08:09,066 Until then, happy analyzing.