1
00:00:00,150 --> 00:00:07,050
Hello, all so in our previous session, we have gained some of the basic knowledge of what is a logistic

2
00:00:07,050 --> 00:00:14,790
regression basics, why we don't have to use linear regression for the classification problem just because

3
00:00:14,790 --> 00:00:16,140
of the drawbacks.

4
00:00:16,140 --> 00:00:20,940
And these are exactly my both the drawbacks navigated by outliers.

5
00:00:20,940 --> 00:00:25,450
And what if my probability will be greater than one and less than zero?

6
00:00:25,470 --> 00:00:28,910
So in such case, I have to use this logistic regression.

7
00:00:29,190 --> 00:00:35,730
So in the session we are going to discuss our four major cases and then I'm going to show you how you

8
00:00:35,730 --> 00:00:42,310
can apply sigmoid on your linear regression best fit line to achieve your goal.

9
00:00:42,840 --> 00:00:47,940
So let's say this is exactly my very first case.

10
00:00:48,060 --> 00:00:52,140
And here these acts are my positive data points to hear.

11
00:00:52,140 --> 00:00:59,280
My wife will be plus one for positive data points and the circle will be my negative returns.

12
00:00:59,280 --> 00:01:02,190
And for this, my wife will be negative.

13
00:01:02,700 --> 00:01:09,600
So in such case, if I were to consider all the data points out of this plane, I have a positive distance

14
00:01:09,720 --> 00:01:13,560
and below this plane I have a negative distance.

15
00:01:13,770 --> 00:01:16,730
That's what we have asserted in our previous session.

16
00:01:17,250 --> 00:01:23,420
So we know it is the plane, if I have to consider next to this data point.

17
00:01:24,120 --> 00:01:26,840
So we know this data point is above the plane.

18
00:01:27,270 --> 00:01:36,480
So in such case, I have a positive way and in such case my distance, which is nothing but the blue

19
00:01:36,480 --> 00:01:37,260
transpose.

20
00:01:38,260 --> 00:01:46,810
Two x one, and it is also positive because it is above the plane, so if I'm going to do product of

21
00:01:46,810 --> 00:01:51,620
both, which is nothing but why do W transport excellent.

22
00:01:51,970 --> 00:01:54,180
So this will also be positive.

23
00:01:54,490 --> 00:02:04,810
So whenever I have a case as where my Y is also positive, where my distance is also positive and the

24
00:02:04,810 --> 00:02:08,060
product of both was also positive.

25
00:02:08,380 --> 00:02:13,240
So in such case, my classification has happened correctly.

26
00:02:13,540 --> 00:02:20,690
So it means this X data point has classified correctly by this a straight line.

27
00:02:21,010 --> 00:02:23,950
So this is my correctly classified data point.

28
00:02:24,340 --> 00:02:27,470
So I'm going to assume another case over here.

29
00:02:27,490 --> 00:02:37,210
So this is my very first case and assuming I have a case to and in case to assume I am going to consider

30
00:02:37,390 --> 00:02:44,800
this data point, which is exactly my negative point so far, this I have Y equals to minus one and

31
00:02:44,800 --> 00:02:49,480
my distance is also negative because it is below the plane.

32
00:02:49,690 --> 00:02:57,270
And if I am going to do multiplication or both, which is nothing but minus one into some negative number,

33
00:02:57,550 --> 00:03:00,950
so it will also give me some positive value.

34
00:03:01,480 --> 00:03:09,020
So since Y and distance, both are negative, but when I'm going to do multiplication of both.

35
00:03:09,370 --> 00:03:14,070
So in such case my data point gets classified correctly.

36
00:03:14,230 --> 00:03:22,260
So it means the circle data point is also gets correctly classified by this straight line.

37
00:03:22,480 --> 00:03:29,500
So let me discuss my case number three in which I'm going to consider a negative data point.

38
00:03:30,100 --> 00:03:38,650
Assume this one so far this my wife will be negative or I can say minus one, but my distance, which

39
00:03:38,650 --> 00:03:43,870
is W transpose X will be positive because it is the plane.

40
00:03:44,410 --> 00:03:51,910
So if I'm going to do multiplication of board, it will give me a negative value, which will directly

41
00:03:51,910 --> 00:03:55,120
refers to it is incorrectly.

42
00:03:56,560 --> 00:04:07,570
Classified data point, so that's what we have to deal with that in a similar case as you made, I have

43
00:04:07,570 --> 00:04:09,460
a case number for over here.

44
00:04:09,880 --> 00:04:13,010
So what exactly can be my case number four?

45
00:04:13,690 --> 00:04:16,080
Suppose this is my data point.

46
00:04:16,180 --> 00:04:18,100
Suppose this is actually my data point.

47
00:04:18,340 --> 00:04:25,860
So for this, my life will be pleasant, but my distance will be negative because it is below the plane.

48
00:04:26,290 --> 00:04:33,730
And if I want to do multiplication of board, it will give me a negative value, which will directly

49
00:04:33,730 --> 00:04:38,500
refers to incorrectly classified data.

50
00:04:39,070 --> 00:04:43,910
So anyhow, we have to deal with both of these cases.

51
00:04:43,930 --> 00:04:45,860
The third one and the fourth one.

52
00:04:46,030 --> 00:04:58,030
So if I have to define my cost function, I can see my cost function is nothing bad, but just a summation

53
00:04:58,030 --> 00:05:05,140
of I close to one to end because I have any number of data points and that is nothing but a product

54
00:05:05,140 --> 00:05:11,770
of this why which is assigned to each and every data point and nothing, just a distance of each and

55
00:05:11,770 --> 00:05:14,700
every data point to its plane.

56
00:05:15,130 --> 00:05:19,630
So that's nothing but DeBlois I could transport into exile.

57
00:05:19,660 --> 00:05:22,250
Similarly, I have no idea why.

58
00:05:22,510 --> 00:05:29,620
So this is nothing but my entire existence and this is a Y value which is assigned to each of the data

59
00:05:29,620 --> 00:05:29,960
point.

60
00:05:30,370 --> 00:05:38,020
So this is exactly my cost function, which I can say this is my optimizer.

61
00:05:39,060 --> 00:05:46,480
So it should be as maximum as possible, so we have to make it as maximum.

62
00:05:46,740 --> 00:05:54,240
So if I have to create a straight line or a best fit line, which linearly separate to this data point,

63
00:05:54,750 --> 00:06:03,300
I have to make sure that the summation of all the points along with the distance should be maximum.

64
00:06:03,480 --> 00:06:05,340
That's what we have to make sure.

65
00:06:05,700 --> 00:06:09,840
So from this cost function, you will analyze this.

66
00:06:09,870 --> 00:06:14,370
Why I will be constant and this axi which is nothing.

67
00:06:14,370 --> 00:06:16,710
My data point is also constant.

68
00:06:17,040 --> 00:06:19,130
So what exactly is waiting over here.

69
00:06:19,560 --> 00:06:20,010
This.

70
00:06:21,520 --> 00:06:29,440
Transpose this w I transport, which is officially assigned to my line, which is coefficient assigned

71
00:06:29,440 --> 00:06:34,990
to my best foot line, so that is varying, which is actually a coefficient.

72
00:06:35,230 --> 00:06:43,510
So it means I have to update this coefficient or I can say I have to update this rate in such a way

73
00:06:43,780 --> 00:06:47,110
that it will maximize this summation.

74
00:06:47,410 --> 00:06:54,940
And when I'm getting my maximum summation, then that line that will be termed as best fit line that

75
00:06:54,940 --> 00:06:57,690
linearly classifies our classes.

76
00:06:57,850 --> 00:07:02,570
So this is how my entire logic regression actually works.

77
00:07:02,620 --> 00:07:11,560
So assuming I have this data points, so I have to draw a best fit line for my best line and let's say

78
00:07:11,580 --> 00:07:14,880
this one or let's say this one as well.

79
00:07:15,890 --> 00:07:25,580
And let's take this one as well so I can have here multiple best foodline, so what can be my best Quitline

80
00:07:25,580 --> 00:07:26,200
actually?

81
00:07:26,300 --> 00:07:31,880
So my best foot line will be that line, which is a maximum value of my optimizer.

82
00:07:31,890 --> 00:07:33,730
And what exactly was optimizer?

83
00:07:34,100 --> 00:07:37,010
That is nothing but I equals to one, two.

84
00:07:37,010 --> 00:07:48,590
And this why I into w I transpose into exile so which our best client has this maximum value, that

85
00:07:48,770 --> 00:07:53,680
best fit line will get selected for my prediction purpose.

86
00:07:53,930 --> 00:07:59,340
So the optimizer that we have written over here is still there.

87
00:07:59,390 --> 00:08:06,440
I have to update it assumed I have it in this much number of data points and if I have to draw a best

88
00:08:06,440 --> 00:08:09,330
fit line, that is nothing but this one.

89
00:08:09,680 --> 00:08:17,240
So this is that best fit line that can linearly separable both of these data points, both of the different

90
00:08:17,240 --> 00:08:18,280
different classes.

91
00:08:18,380 --> 00:08:21,580
But what if I have an outlier?

92
00:08:21,590 --> 00:08:25,140
What do you suppose my outlier like to add here?

93
00:08:25,400 --> 00:08:30,700
So what can be my best Quitline if I have outlined over here?

94
00:08:31,160 --> 00:08:39,170
So let's say if I am going to consider the previous best fit line, which is this one, and if I am

95
00:08:39,170 --> 00:08:44,680
going to consider, let's say, the distance between this one and distance between this one.

96
00:08:44,990 --> 00:08:52,430
So this will be exactly where negative distance y because this data point will that close to my negative

97
00:08:52,430 --> 00:08:56,990
data point and this cross data point will therefore do my positive data point.

98
00:08:57,140 --> 00:09:02,570
And here my way is going to be negative one and here my view will be positive one.

99
00:09:02,990 --> 00:09:09,740
So here in all the cases I have minus two and here in all the cases I have plus two.

100
00:09:09,950 --> 00:09:15,860
So assume I'm going to assume as the distance between the straight line and data point is.

101
00:09:16,460 --> 00:09:18,560
So here my tool will reward.

102
00:09:18,890 --> 00:09:22,760
That is nothing but W transpose D into X.

103
00:09:23,360 --> 00:09:30,680
So here if I'm going to consider this one distance, assuming if I'm going to consider this one, this

104
00:09:30,680 --> 00:09:33,950
test, it will somewhere give me one hundred.

105
00:09:33,980 --> 00:09:35,120
I'm just assuming.

106
00:09:35,870 --> 00:09:43,340
So if you are going to do a summation of all these stuffs, you will see I have had minus eight so I'm

107
00:09:43,340 --> 00:09:48,610
going to say minus it and here I have plus eight and here I have this one hundred.

108
00:09:48,980 --> 00:09:53,380
So this will get cancer and this will give me one hundred.

109
00:09:53,540 --> 00:09:55,280
So that's hundred.

110
00:09:55,730 --> 00:10:01,730
Will, just on account of this outlier, this data point, which is exactly my outlier.

111
00:10:01,970 --> 00:10:08,720
And if I am going to consider previous case in this case, this will be one minus eight and this will

112
00:10:08,720 --> 00:10:13,240
be exactly where plus eight to in this case I have this one at zero.

113
00:10:13,550 --> 00:10:19,850
So if I'm going to consider this one here, you will see I have highest fluctuation in data.

114
00:10:19,850 --> 00:10:27,710
You will see this is just an account of that outlier that has presented over here, assuming I'm going

115
00:10:27,710 --> 00:10:29,780
to consider the best flat line.

116
00:10:30,200 --> 00:10:34,260
Let's this time I have this one best fit line.

117
00:10:34,730 --> 00:10:38,990
So in such case, you can observe these all data points.

118
00:10:38,990 --> 00:10:46,290
These circle data points will give me my negative distance and all these will give me my boster distance.

119
00:10:46,490 --> 00:10:48,240
So these all will get canceled.

120
00:10:48,560 --> 00:10:54,710
So this will be my exact distance, which is nothing but a negative distance.

121
00:10:54,950 --> 00:11:01,010
So you will see whenever I'm going to change my best route line, how much fluctuation we are going

122
00:11:01,010 --> 00:11:04,290
to observe using this best foot line.

123
00:11:04,460 --> 00:11:09,540
So anyhow, we have to overcome this fluctuation, which is going to happen over here.

124
00:11:09,710 --> 00:11:18,160
The simple answer is by just applying a function on this optimizer to what optimize that I was previously,

125
00:11:18,170 --> 00:11:27,440
that is nothing but why I into this w transpose X and if I have multiple data points, I can do a summation

126
00:11:27,440 --> 00:11:27,930
as well.

127
00:11:28,250 --> 00:11:30,770
So this is exactly what means optimize it.

128
00:11:31,070 --> 00:11:33,950
So I have to apply a function over here.

129
00:11:34,130 --> 00:11:35,930
So what exactly was a function?

130
00:11:36,320 --> 00:11:42,370
This function is nothing, but this is exactly my sigmoid function.

131
00:11:42,800 --> 00:11:48,330
So we have to apply this sigmoid function on each of these value.

132
00:11:48,770 --> 00:11:54,600
So after applying this function, it will convert this entire value.

133
00:11:54,620 --> 00:12:01,580
You will see, let's say I'm going to assume it is Z, so whatever data point I have over here, so

134
00:12:01,730 --> 00:12:05,350
any data point will give me my value in this form.

135
00:12:05,870 --> 00:12:13,180
So whatever value it will return, I'm just going to send this value to my sigmoid function and what

136
00:12:13,180 --> 00:12:14,450
this sigmoid will do.

137
00:12:14,780 --> 00:12:15,380
It will.

138
00:12:15,660 --> 00:12:22,770
What this exit is higher and let's say the value that it Willetton, let's say one hundred sixty Segway

139
00:12:22,950 --> 00:12:27,490
will convert this Hyundai into a range of 021.

140
00:12:27,690 --> 00:12:30,100
So that's what my segment will do.

141
00:12:30,480 --> 00:12:38,280
And as I have very much fluctuation in the cost function value so we can prevent this using this signal.

142
00:12:38,640 --> 00:12:46,980
So how exactly this sigma will prevent the sigmoid function is nothing but one up on one plus E to the

143
00:12:46,990 --> 00:12:47,900
power mindset.

144
00:12:48,150 --> 00:12:50,550
And this is exactly why that.

145
00:12:50,700 --> 00:12:58,160
So we have to parse this, that here and this will convert this entire value between zero to one.

146
00:12:58,530 --> 00:13:00,420
That's what my sigmoid will do.

147
00:13:00,570 --> 00:13:01,870
So what segment will do?

148
00:13:01,890 --> 00:13:08,650
Basically it will remove the effect of this outlier by just converting Heidelberg.

149
00:13:08,820 --> 00:13:13,980
Let's say this high number one hundred into some range between zero to one.

150
00:13:14,160 --> 00:13:18,640
So that's basically a task of this sigmoid function.

151
00:13:18,930 --> 00:13:23,160
What we are going to use in my life is regression.

152
00:13:23,610 --> 00:13:26,670
So our previous use case was.

153
00:13:27,090 --> 00:13:28,610
Yeah, this one.

154
00:13:28,650 --> 00:13:29,010
Yeah.

155
00:13:29,400 --> 00:13:33,850
So in this case, you can observe this was exactly hundred.

156
00:13:34,080 --> 00:13:41,520
So what we have to do very first, we have to parse this one hundred to my sigmoid function so I can

157
00:13:41,520 --> 00:13:43,410
see over here.

158
00:13:43,680 --> 00:13:49,020
So this will exactly give me some value between zero to one and my problem is solved.

159
00:13:49,410 --> 00:13:53,050
That's what we are trying to achieve, using logistic regression.

160
00:13:53,070 --> 00:13:54,740
That's what my sigmoid will do.

161
00:13:55,020 --> 00:13:59,330
It will remove the impact of this outlier.

162
00:13:59,400 --> 00:14:01,500
That's what I want to achieve.

163
00:14:01,650 --> 00:14:09,300
So how exactly we achieve this as goal, which is mine, does take regression or whatever input or I

164
00:14:09,300 --> 00:14:13,150
can say whatever output this linear regression will give.

165
00:14:13,290 --> 00:14:15,000
Suppose it gives me one hundred.

166
00:14:15,280 --> 00:14:18,930
So I'm just going to apply a sigmoid over this one hundred.

167
00:14:19,230 --> 00:14:27,720
So once I will apply this sigmoid over this hundred using this formula, it will give me this kind of

168
00:14:27,720 --> 00:14:28,020
code.

169
00:14:28,290 --> 00:14:32,650
That's what logistic regression works internally.

170
00:14:32,940 --> 00:14:36,630
So hope you love this in that induction of logistic regression.

171
00:14:37,290 --> 00:14:37,900
Thank you.

172
00:14:37,950 --> 00:14:38,850
Have a nice day.

173
00:14:39,210 --> 00:14:40,010
Keep learning.

174
00:14:40,020 --> 00:14:40,830
Keep going.

175
00:14:41,220 --> 00:14:42,120
Keep practicing.