1
00:00:00,700 --> 00:00:07,330
As discussed in last video later, we are going to start our discussion now on support vector classifier

2
00:00:07,840 --> 00:00:11,440
so that we can handle non perfectly separable scenarios.

3
00:00:12,190 --> 00:00:17,710
And secondly, so that our model is less sensitive to individual observations.

4
00:00:19,960 --> 00:00:23,830
So let me first tell you what support vector classifier is.

5
00:00:24,980 --> 00:00:32,840
A support vector, it is a soft margin classifier, the meaning of this is that we will have a hyper

6
00:00:32,850 --> 00:00:40,290
plane and the margins are only hyper plane, but all the observations need not be on the correct side

7
00:00:40,290 --> 00:00:41,390
of this margin.

8
00:00:42,780 --> 00:00:50,250
We will allow some observations to be present within the margin area and some may even go on the wrong

9
00:00:50,250 --> 00:00:51,390
side of the hyper plain.

10
00:00:53,350 --> 00:00:54,660
So if you look at this graph.

11
00:00:58,380 --> 00:01:03,720
The purple points are supposed to be on the left side of this margin.

12
00:01:06,160 --> 00:01:12,330
Whereas on the other margin, the blue points are supposed to be on the right side of this margin.

13
00:01:13,910 --> 00:01:16,210
But if you look at point number eight.

14
00:01:18,750 --> 00:01:23,010
This point is supposed to be on the right hand side of the upper margin.

15
00:01:24,360 --> 00:01:25,770
But this is not so.

16
00:01:25,900 --> 00:01:33,090
This point is presenting the wrong side of the margin of, however sensitive still to the date of the

17
00:01:33,090 --> 00:01:35,560
hyperbole, it is correctly classified.

18
00:01:36,580 --> 00:01:42,910
So it is on the wrong side of modern but correctly classified various point number well.

19
00:01:44,440 --> 00:01:51,100
This point is a blue point, which is supposed to be on the right side of the upper margin, whereas

20
00:01:51,550 --> 00:01:54,160
it is on the left side of the hyper plane itself.

21
00:01:54,760 --> 00:02:00,070
So this point will be incorrectly classified by our support vector classifier.

22
00:02:02,280 --> 00:02:05,110
So there are two types of errors that we are alone.

23
00:02:05,370 --> 00:02:13,560
One is classification is correct, but it is on the wrong side of the margin, which is represented

24
00:02:13,560 --> 00:02:15,720
by these blue circle point.

25
00:02:16,820 --> 00:02:18,030
That is point number one I need.

26
00:02:19,370 --> 00:02:22,590
And the second type of error that we allow is.

27
00:02:23,790 --> 00:02:30,000
Allowing the point to be misclassified, that is being on the wrong side of the hyperbole.

28
00:02:31,180 --> 00:02:35,810
Two point number eleven and point number three are misclassified.

29
00:02:40,500 --> 00:02:45,330
Now let's see how a support vector classify it is created.

30
00:02:46,930 --> 00:02:53,290
The underlying concept stays the same, that we are trying to get a hyper plane, that maximum margin,

31
00:02:53,890 --> 00:02:56,680
but with one additional constraint.

32
00:02:58,250 --> 00:03:01,810
The constraint are going to add is a budget of mistakes.

33
00:03:03,420 --> 00:03:10,020
We know some points are going to be misclassified, so we will create a budget B, which will be the

34
00:03:10,140 --> 00:03:14,070
amount of misclassification that is acceptable to us.

35
00:03:15,950 --> 00:03:23,320
So the software will be trying to find a hyper plane with maximum margin while staying within this budget.

36
00:03:27,330 --> 00:03:28,790
So here in that example.

37
00:03:30,120 --> 00:03:34,740
If we find out the distance off point one from its margin.

38
00:03:36,610 --> 00:03:39,880
So if we find a distance of this one with this mob, then.

39
00:03:41,400 --> 00:03:46,100
This is the amount of edit done with this point.

40
00:03:47,730 --> 00:03:49,830
Let us say this distances X1.

41
00:03:51,920 --> 00:03:55,220
Now, if I look at the other Purple Point, which is live in.

42
00:03:56,780 --> 00:04:04,430
The distance of this eleven point will be from this margin to eleven point, which is say to.

43
00:04:05,770 --> 00:04:09,580
To put people point, we have excellent and x2.

44
00:04:11,030 --> 00:04:12,700
As the misclassification error.

45
00:04:14,490 --> 00:04:15,580
What the blue point.

46
00:04:16,650 --> 00:04:17,720
Say, point eight.

47
00:04:18,690 --> 00:04:21,330
This was the the upper lane was the margin.

48
00:04:21,660 --> 00:04:26,460
So the distance from that upper lane would be the error of classification.

49
00:04:26,550 --> 00:04:28,830
So that would be X3.

50
00:04:30,970 --> 00:04:36,760
Similarly for point, well, we can find out export, which is the distance off point twelve from the

51
00:04:36,760 --> 00:04:37,360
up and margin.

52
00:04:38,750 --> 00:04:46,250
We will add all these misclassification errors and we will say that all these errors should be less

53
00:04:46,250 --> 00:04:48,600
than predefined budget.

54
00:04:48,800 --> 00:04:49,220
The.

55
00:04:53,470 --> 00:04:56,430
In this way, we will allow some misclassification.

56
00:04:57,780 --> 00:05:00,030
While trying to maximize the margins.

57
00:05:02,410 --> 00:05:09,850
Another way of implementing this same concept, which is often found in software packages, is user

58
00:05:09,910 --> 00:05:10,930
cost parameter.

59
00:05:12,560 --> 00:05:14,090
This is effectively the same thing.

60
00:05:14,630 --> 00:05:16,730
Only difference is that instead of having.

61
00:05:17,840 --> 00:05:19,460
Unit costs and a, budget.

62
00:05:19,540 --> 00:05:24,100
B, we will have C, times the cost and the unit budget.

63
00:05:25,400 --> 00:05:27,770
So see as kind of an inverse of B.

64
00:05:28,980 --> 00:05:37,080
This C will be a hyper parameter that we will choose the value of C business cross-validation so that

65
00:05:37,080 --> 00:05:39,630
we get minimum test set error.

66
00:05:42,790 --> 00:05:47,940
One more thing to note here is these point point number one eight.

67
00:05:48,050 --> 00:05:48,910
Levin will.

68
00:05:49,950 --> 00:05:52,950
Which lay within the margin or on the margin.

69
00:05:54,060 --> 00:05:58,320
These all these points will be called as support vectors.

70
00:05:59,910 --> 00:06:07,830
Sense now, instead of just the point on the boundary, the margin depend on the presence of these points

71
00:06:07,830 --> 00:06:08,220
until.

72
00:06:09,930 --> 00:06:17,040
So all the points within these margins and points length exactly on the margin, all these points will

73
00:06:17,040 --> 00:06:19,080
be considered as the support vectors.

74
00:06:20,520 --> 00:06:24,760
Any point outside the margin is not that relevant.

75
00:06:26,570 --> 00:06:28,940
I support to classify it will change.

76
00:06:29,110 --> 00:06:32,300
Ban any of these support vectors are teenaged.

77
00:06:38,110 --> 00:06:43,200
Now, let us see what is the impact of increasing or decreasing the value of this parameter C.

78
00:06:45,280 --> 00:06:48,570
If we decrease the value of C and make it model.

79
00:06:49,800 --> 00:06:53,580
This would mean that there is less cost of misclassification.

80
00:06:55,230 --> 00:07:02,250
In such a scenario, more cases will be allowed within the margins and also mostly cases will be allowed

81
00:07:02,250 --> 00:07:03,390
to be misclassified.

82
00:07:05,090 --> 00:07:11,040
On the other hand, if you increase, well, you'll see the cost of making a mistake is high, so there

83
00:07:11,040 --> 00:07:15,830
will be fewer of the between the margins and fewer misclassified points.

84
00:07:18,910 --> 00:07:25,210
But doing this will increase the sensitivity of our model to individual observations, which may result

85
00:07:25,210 --> 00:07:26,110
in overfitting.

86
00:07:27,390 --> 00:07:35,190
This is why we need to carefully to the value of see, a small value could lead to a lot of misclassification

87
00:07:35,670 --> 00:07:37,980
and a large value could lead to overbidding.

88
00:07:39,080 --> 00:07:44,970
So we will try to find the optimum value of C at which we get the best tested performance.

89
00:07:46,880 --> 00:07:51,380
Here I am also sharing a link to a Web page on Stanford Web site.

90
00:07:52,340 --> 00:07:58,840
It's a great tool to visualize the effect of adding and removing points in a to day predictor space.

91
00:07:59,330 --> 00:08:01,370
And the effect of changing value of C.

92
00:08:02,960 --> 00:08:03,760
Let me show you this, too.

93
00:08:06,390 --> 00:08:12,630
So does it in a page, since we have discussed only support vector classifier, which is a linear classifier.

94
00:08:13,290 --> 00:08:16,890
We will toggle this to linear classifier.

95
00:08:17,280 --> 00:08:21,630
We'll be discussing the kernel based, nonlinear classifiers in the coming lectures.

96
00:08:23,050 --> 00:08:24,970
So for now, this is a linear classifier.

97
00:08:25,810 --> 00:08:28,570
There are two types of point, great point.

98
00:08:28,570 --> 00:08:29,280
And Greenpoint.

99
00:08:30,190 --> 00:08:36,280
Some of the points that we added, we can click within this two dimensional space to create a great

100
00:08:36,280 --> 00:08:36,700
point.

101
00:08:36,910 --> 00:08:43,600
And if you want to create a green point, you press shift and then do almost click to add a green point.

102
00:08:46,160 --> 00:08:52,850
So if you add Greenpoint near the boundary, you'll see that the classifier keeps on changing.

103
00:08:59,840 --> 00:09:03,020
And if you scroll down.

104
00:09:04,630 --> 00:09:07,640
There is this parameter C, which is the cost parameter.

105
00:09:08,710 --> 00:09:11,860
As I told you, if you have less value of cost.

106
00:09:13,050 --> 00:09:19,260
There'll be more mass classifications and more number of points will be present within the margins.

107
00:09:19,500 --> 00:09:23,880
So you will have large margins if the value of seats decreased.

108
00:09:25,440 --> 00:09:27,240
You can see this is what is happening.

109
00:09:28,260 --> 00:09:32,980
If you increase the well, you'll see fewer points will be misclassified.

110
00:09:33,960 --> 00:09:39,420
But our system will become very sensitive to individual observations.

111
00:09:44,780 --> 00:09:54,140
So using this Tudi tool, you can easily visualize the effect of adding different types of points anywhere

112
00:09:54,140 --> 00:09:54,830
in this base.

113
00:09:55,130 --> 00:09:56,900
Plus, see the effect of changing?

114
00:09:56,900 --> 00:09:58,790
Well, you'll see on the margin.

115
00:09:58,820 --> 00:09:59,120
Good.