1 00:00:00,700 --> 00:00:07,330 As discussed in last video later, we are going to start our discussion now on support vector classifier 2 00:00:07,840 --> 00:00:11,440 so that we can handle non perfectly separable scenarios. 3 00:00:12,190 --> 00:00:17,710 And secondly, so that our model is less sensitive to individual observations. 4 00:00:19,960 --> 00:00:23,830 So let me first tell you what support vector classifier is. 5 00:00:24,980 --> 00:00:32,840 A support vector, it is a soft margin classifier, the meaning of this is that we will have a hyper 6 00:00:32,850 --> 00:00:40,290 plane and the margins are only hyper plane, but all the observations need not be on the correct side 7 00:00:40,290 --> 00:00:41,390 of this margin. 8 00:00:42,780 --> 00:00:50,250 We will allow some observations to be present within the margin area and some may even go on the wrong 9 00:00:50,250 --> 00:00:51,390 side of the hyper plain. 10 00:00:53,350 --> 00:00:54,660 So if you look at this graph. 11 00:00:58,380 --> 00:01:03,720 The purple points are supposed to be on the left side of this margin. 12 00:01:06,160 --> 00:01:12,330 Whereas on the other margin, the blue points are supposed to be on the right side of this margin. 13 00:01:13,910 --> 00:01:16,210 But if you look at point number eight. 14 00:01:18,750 --> 00:01:23,010 This point is supposed to be on the right hand side of the upper margin. 15 00:01:24,360 --> 00:01:25,770 But this is not so. 16 00:01:25,900 --> 00:01:33,090 This point is presenting the wrong side of the margin of, however sensitive still to the date of the 17 00:01:33,090 --> 00:01:35,560 hyperbole, it is correctly classified. 18 00:01:36,580 --> 00:01:42,910 So it is on the wrong side of modern but correctly classified various point number well. 19 00:01:44,440 --> 00:01:51,100 This point is a blue point, which is supposed to be on the right side of the upper margin, whereas 20 00:01:51,550 --> 00:01:54,160 it is on the left side of the hyper plane itself. 21 00:01:54,760 --> 00:02:00,070 So this point will be incorrectly classified by our support vector classifier. 22 00:02:02,280 --> 00:02:05,110 So there are two types of errors that we are alone. 23 00:02:05,370 --> 00:02:13,560 One is classification is correct, but it is on the wrong side of the margin, which is represented 24 00:02:13,560 --> 00:02:15,720 by these blue circle point. 25 00:02:16,820 --> 00:02:18,030 That is point number one I need. 26 00:02:19,370 --> 00:02:22,590 And the second type of error that we allow is. 27 00:02:23,790 --> 00:02:30,000 Allowing the point to be misclassified, that is being on the wrong side of the hyperbole. 28 00:02:31,180 --> 00:02:35,810 Two point number eleven and point number three are misclassified. 29 00:02:40,500 --> 00:02:45,330 Now let's see how a support vector classify it is created. 30 00:02:46,930 --> 00:02:53,290 The underlying concept stays the same, that we are trying to get a hyper plane, that maximum margin, 31 00:02:53,890 --> 00:02:56,680 but with one additional constraint. 32 00:02:58,250 --> 00:03:01,810 The constraint are going to add is a budget of mistakes. 33 00:03:03,420 --> 00:03:10,020 We know some points are going to be misclassified, so we will create a budget B, which will be the 34 00:03:10,140 --> 00:03:14,070 amount of misclassification that is acceptable to us. 35 00:03:15,950 --> 00:03:23,320 So the software will be trying to find a hyper plane with maximum margin while staying within this budget. 36 00:03:27,330 --> 00:03:28,790 So here in that example. 37 00:03:30,120 --> 00:03:34,740 If we find out the distance off point one from its margin. 38 00:03:36,610 --> 00:03:39,880 So if we find a distance of this one with this mob, then. 39 00:03:41,400 --> 00:03:46,100 This is the amount of edit done with this point. 40 00:03:47,730 --> 00:03:49,830 Let us say this distances X1. 41 00:03:51,920 --> 00:03:55,220 Now, if I look at the other Purple Point, which is live in. 42 00:03:56,780 --> 00:04:04,430 The distance of this eleven point will be from this margin to eleven point, which is say to. 43 00:04:05,770 --> 00:04:09,580 To put people point, we have excellent and x2. 44 00:04:11,030 --> 00:04:12,700 As the misclassification error. 45 00:04:14,490 --> 00:04:15,580 What the blue point. 46 00:04:16,650 --> 00:04:17,720 Say, point eight. 47 00:04:18,690 --> 00:04:21,330 This was the the upper lane was the margin. 48 00:04:21,660 --> 00:04:26,460 So the distance from that upper lane would be the error of classification. 49 00:04:26,550 --> 00:04:28,830 So that would be X3. 50 00:04:30,970 --> 00:04:36,760 Similarly for point, well, we can find out export, which is the distance off point twelve from the 51 00:04:36,760 --> 00:04:37,360 up and margin. 52 00:04:38,750 --> 00:04:46,250 We will add all these misclassification errors and we will say that all these errors should be less 53 00:04:46,250 --> 00:04:48,600 than predefined budget. 54 00:04:48,800 --> 00:04:49,220 The. 55 00:04:53,470 --> 00:04:56,430 In this way, we will allow some misclassification. 56 00:04:57,780 --> 00:05:00,030 While trying to maximize the margins. 57 00:05:02,410 --> 00:05:09,850 Another way of implementing this same concept, which is often found in software packages, is user 58 00:05:09,910 --> 00:05:10,930 cost parameter. 59 00:05:12,560 --> 00:05:14,090 This is effectively the same thing. 60 00:05:14,630 --> 00:05:16,730 Only difference is that instead of having. 61 00:05:17,840 --> 00:05:19,460 Unit costs and a, budget. 62 00:05:19,540 --> 00:05:24,100 B, we will have C, times the cost and the unit budget. 63 00:05:25,400 --> 00:05:27,770 So see as kind of an inverse of B. 64 00:05:28,980 --> 00:05:37,080 This C will be a hyper parameter that we will choose the value of C business cross-validation so that 65 00:05:37,080 --> 00:05:39,630 we get minimum test set error. 66 00:05:42,790 --> 00:05:47,940 One more thing to note here is these point point number one eight. 67 00:05:48,050 --> 00:05:48,910 Levin will. 68 00:05:49,950 --> 00:05:52,950 Which lay within the margin or on the margin. 69 00:05:54,060 --> 00:05:58,320 These all these points will be called as support vectors. 70 00:05:59,910 --> 00:06:07,830 Sense now, instead of just the point on the boundary, the margin depend on the presence of these points 71 00:06:07,830 --> 00:06:08,220 until. 72 00:06:09,930 --> 00:06:17,040 So all the points within these margins and points length exactly on the margin, all these points will 73 00:06:17,040 --> 00:06:19,080 be considered as the support vectors. 74 00:06:20,520 --> 00:06:24,760 Any point outside the margin is not that relevant. 75 00:06:26,570 --> 00:06:28,940 I support to classify it will change. 76 00:06:29,110 --> 00:06:32,300 Ban any of these support vectors are teenaged. 77 00:06:38,110 --> 00:06:43,200 Now, let us see what is the impact of increasing or decreasing the value of this parameter C. 78 00:06:45,280 --> 00:06:48,570 If we decrease the value of C and make it model. 79 00:06:49,800 --> 00:06:53,580 This would mean that there is less cost of misclassification. 80 00:06:55,230 --> 00:07:02,250 In such a scenario, more cases will be allowed within the margins and also mostly cases will be allowed 81 00:07:02,250 --> 00:07:03,390 to be misclassified. 82 00:07:05,090 --> 00:07:11,040 On the other hand, if you increase, well, you'll see the cost of making a mistake is high, so there 83 00:07:11,040 --> 00:07:15,830 will be fewer of the between the margins and fewer misclassified points. 84 00:07:18,910 --> 00:07:25,210 But doing this will increase the sensitivity of our model to individual observations, which may result 85 00:07:25,210 --> 00:07:26,110 in overfitting. 86 00:07:27,390 --> 00:07:35,190 This is why we need to carefully to the value of see, a small value could lead to a lot of misclassification 87 00:07:35,670 --> 00:07:37,980 and a large value could lead to overbidding. 88 00:07:39,080 --> 00:07:44,970 So we will try to find the optimum value of C at which we get the best tested performance. 89 00:07:46,880 --> 00:07:51,380 Here I am also sharing a link to a Web page on Stanford Web site. 90 00:07:52,340 --> 00:07:58,840 It's a great tool to visualize the effect of adding and removing points in a to day predictor space. 91 00:07:59,330 --> 00:08:01,370 And the effect of changing value of C. 92 00:08:02,960 --> 00:08:03,760 Let me show you this, too. 93 00:08:06,390 --> 00:08:12,630 So does it in a page, since we have discussed only support vector classifier, which is a linear classifier. 94 00:08:13,290 --> 00:08:16,890 We will toggle this to linear classifier. 95 00:08:17,280 --> 00:08:21,630 We'll be discussing the kernel based, nonlinear classifiers in the coming lectures. 96 00:08:23,050 --> 00:08:24,970 So for now, this is a linear classifier. 97 00:08:25,810 --> 00:08:28,570 There are two types of point, great point. 98 00:08:28,570 --> 00:08:29,280 And Greenpoint. 99 00:08:30,190 --> 00:08:36,280 Some of the points that we added, we can click within this two dimensional space to create a great 100 00:08:36,280 --> 00:08:36,700 point. 101 00:08:36,910 --> 00:08:43,600 And if you want to create a green point, you press shift and then do almost click to add a green point. 102 00:08:46,160 --> 00:08:52,850 So if you add Greenpoint near the boundary, you'll see that the classifier keeps on changing. 103 00:08:59,840 --> 00:09:03,020 And if you scroll down. 104 00:09:04,630 --> 00:09:07,640 There is this parameter C, which is the cost parameter. 105 00:09:08,710 --> 00:09:11,860 As I told you, if you have less value of cost. 106 00:09:13,050 --> 00:09:19,260 There'll be more mass classifications and more number of points will be present within the margins. 107 00:09:19,500 --> 00:09:23,880 So you will have large margins if the value of seats decreased. 108 00:09:25,440 --> 00:09:27,240 You can see this is what is happening. 109 00:09:28,260 --> 00:09:32,980 If you increase the well, you'll see fewer points will be misclassified. 110 00:09:33,960 --> 00:09:39,420 But our system will become very sensitive to individual observations. 111 00:09:44,780 --> 00:09:54,140 So using this Tudi tool, you can easily visualize the effect of adding different types of points anywhere 112 00:09:54,140 --> 00:09:54,830 in this base. 113 00:09:55,130 --> 00:09:56,900 Plus, see the effect of changing? 114 00:09:56,900 --> 00:09:58,790 Well, you'll see on the margin. 115 00:09:58,820 --> 00:09:59,120 Good.