1 00:00:00,870 --> 00:00:05,130 In this video, we are going to talk about support vector machines. 2 00:00:07,460 --> 00:00:10,110 We are an extension of support to classify it. 3 00:00:11,350 --> 00:00:18,790 As I told you in last video, support vector machines used Col's to be able to create non-linear boundaries. 4 00:00:19,660 --> 00:00:21,130 And what is a common. 5 00:00:22,380 --> 00:00:27,350 Gunnell is simply some functional relationship between two observations. 6 00:00:28,930 --> 00:00:34,570 And as we change this function, we will get different shapes of decision boundaries. 7 00:00:36,590 --> 00:00:44,390 Basically, we try to take the impact of each observation at a particular point, and this impact is 8 00:00:44,390 --> 00:00:45,560 given by a function. 9 00:00:51,390 --> 00:00:53,400 If you take the function to be this. 10 00:00:55,780 --> 00:00:58,330 That is in a product of two points. 11 00:00:58,900 --> 00:01:03,910 You will end up with a straight line by doing some mathematics. 12 00:01:03,970 --> 00:01:11,200 It can be shown that this linear kernel is effectively support better classifier only bartends. 13 00:01:11,260 --> 00:01:12,940 Mathematics will get complicated. 14 00:01:13,180 --> 00:01:16,210 Plus, you do not need it to solve any business problem. 15 00:01:17,350 --> 00:01:20,690 Therefore, we will not be covering the mathematics bar behind it. 16 00:01:29,590 --> 00:01:36,040 Next, if you want a polynomial type of boundry, the cardinal function looks like this. 17 00:01:37,480 --> 00:01:39,520 It is one plus in a product. 18 00:01:39,750 --> 00:01:46,390 Could the power be value of B, will the domain, the flexibility of this polynomial boundary? 19 00:01:47,620 --> 00:01:53,230 B is also known as degree of this polynomial, for our example? 20 00:01:53,620 --> 00:01:57,220 Well, we were not able to create a good linear separator. 21 00:01:57,970 --> 00:02:02,680 You can see that by using a polynomial boundary of sufficiently high degree. 22 00:02:03,310 --> 00:02:05,840 We can classify the observations much better. 23 00:02:11,360 --> 00:02:17,120 Lastly, this radial cardinal is one of the most frequently used cardinal functions. 24 00:02:18,050 --> 00:02:23,120 It is given by this complicated looking formula and this formula. 25 00:02:23,150 --> 00:02:26,720 The first part is a great deal, distance between two points. 26 00:02:31,170 --> 00:02:36,110 And this radial distance is multiplied with a positive, constant gamma. 27 00:02:37,820 --> 00:02:42,290 Now, the distances large or the value of dormice is large. 28 00:02:42,650 --> 00:02:43,910 This product will be large. 29 00:02:44,910 --> 00:02:49,770 And due to this negative sign, the exponential value will be very small. 30 00:02:52,280 --> 00:02:57,830 So even if the distance off point is not much and got my large. 31 00:02:59,040 --> 00:03:01,950 Nearby point will not have much effect. 32 00:03:02,400 --> 00:03:07,140 Only very close point will impact at any given point. 33 00:03:09,550 --> 00:03:14,980 Therefore, large value of gamma will give us very tight margins. 34 00:03:18,370 --> 00:03:25,860 In short, Gama'a defines how much influence a single training example is going to have at any particular 35 00:03:25,870 --> 00:03:26,290 point. 36 00:03:30,290 --> 00:03:35,420 So larger rally of Gamel mean only closer point have an impact. 37 00:03:36,020 --> 00:03:41,230 And farther away point do not have any impact on that at that particular point. 38 00:03:44,480 --> 00:03:49,960 Does garment will also be a hyper barometer, which we need to tune using cross-validation? 39 00:03:50,870 --> 00:03:58,520 And we will try to find out a value of glamour at which edit it will be minimum this link that I showed 40 00:03:58,520 --> 00:03:59,010 you earlier. 41 00:03:59,090 --> 00:03:59,510 Also. 42 00:04:01,910 --> 00:04:05,270 This we can use to visualize the did a deal compelled also. 43 00:04:05,630 --> 00:04:06,530 So let us take it out. 44 00:04:09,990 --> 00:04:13,890 So by default, this has Radil Cardinals elected. 45 00:04:15,420 --> 00:04:16,440 If you scroll down. 46 00:04:18,760 --> 00:04:21,940 You can see the two hyper barometers that you can tune. 47 00:04:22,120 --> 00:04:28,890 One is sea and one is the cardinal think much of it is the same as the Gama that we discussed into religion. 48 00:04:30,940 --> 00:04:35,040 If you jadi well, you'll see that as if you decrease it. 49 00:04:37,950 --> 00:04:47,000 The cost of misclassification decreases, so our classifier will lose more mis classifications. 50 00:04:47,430 --> 00:04:51,570 So basically it is changing the whole area into green. 51 00:04:51,600 --> 00:04:54,930 That is, it is classifying all the points as green. 52 00:04:56,430 --> 00:05:04,790 But if I increase the cost of misclassification, our model will start to classify more accurately and 53 00:05:05,160 --> 00:05:11,330 it will give us some maundering within which all the red points situated. 54 00:05:11,760 --> 00:05:14,460 And outside which we have all the green point. 55 00:05:17,690 --> 00:05:24,350 This connotes sigma, which is the gamma, as I told you, larger rally of Gamma means that there is 56 00:05:24,470 --> 00:05:26,960 more impact of nearby points. 57 00:05:27,590 --> 00:05:30,980 So let us let us reduce the value of gamma first. 58 00:05:33,360 --> 00:05:38,590 As you can see initially, when I have very low value of gamma. 59 00:05:40,260 --> 00:05:44,050 Everything is green since we have more green points alone. 60 00:05:44,280 --> 00:05:45,300 So everything is green. 61 00:05:45,810 --> 00:05:47,950 When I start to increase the value of Gamma. 62 00:05:49,590 --> 00:05:55,170 You can see the small red areas coming up near red points only. 63 00:05:55,860 --> 00:06:03,210 So the points which are near the red observations, these will be classified as red. 64 00:06:03,750 --> 00:06:10,830 Since we are increasing the importance of nearby point only if I continue to increase the importance 65 00:06:10,830 --> 00:06:15,280 of nearby points, I start to get a bigger boundary. 66 00:06:17,570 --> 00:06:28,130 But after a certain level, the impact of all individual points will be sorted, used, that a boundary 67 00:06:28,130 --> 00:06:29,780 will start behaving erratically. 68 00:06:31,010 --> 00:06:40,760 So it is important that we find our optimum values of both these hyper barometers, C and Gamma, so 69 00:06:40,760 --> 00:06:44,540 that we get the correct boundary to do classification.