1 00:00:00,150 --> 00:00:07,050 Hello, all so in our previous session, we have gained some of the basic knowledge of what is a logistic 2 00:00:07,050 --> 00:00:14,790 regression basics, why we don't have to use linear regression for the classification problem just because 3 00:00:14,790 --> 00:00:16,140 of the drawbacks. 4 00:00:16,140 --> 00:00:20,940 And these are exactly my both the drawbacks navigated by outliers. 5 00:00:20,940 --> 00:00:25,450 And what if my probability will be greater than one and less than zero? 6 00:00:25,470 --> 00:00:28,910 So in such case, I have to use this logistic regression. 7 00:00:29,190 --> 00:00:35,730 So in the session we are going to discuss our four major cases and then I'm going to show you how you 8 00:00:35,730 --> 00:00:42,310 can apply sigmoid on your linear regression best fit line to achieve your goal. 9 00:00:42,840 --> 00:00:47,940 So let's say this is exactly my very first case. 10 00:00:48,060 --> 00:00:52,140 And here these acts are my positive data points to hear. 11 00:00:52,140 --> 00:00:59,280 My wife will be plus one for positive data points and the circle will be my negative returns. 12 00:00:59,280 --> 00:01:02,190 And for this, my wife will be negative. 13 00:01:02,700 --> 00:01:09,600 So in such case, if I were to consider all the data points out of this plane, I have a positive distance 14 00:01:09,720 --> 00:01:13,560 and below this plane I have a negative distance. 15 00:01:13,770 --> 00:01:16,730 That's what we have asserted in our previous session. 16 00:01:17,250 --> 00:01:23,420 So we know it is the plane, if I have to consider next to this data point. 17 00:01:24,120 --> 00:01:26,840 So we know this data point is above the plane. 18 00:01:27,270 --> 00:01:36,480 So in such case, I have a positive way and in such case my distance, which is nothing but the blue 19 00:01:36,480 --> 00:01:37,260 transpose. 20 00:01:38,260 --> 00:01:46,810 Two x one, and it is also positive because it is above the plane, so if I'm going to do product of 21 00:01:46,810 --> 00:01:51,620 both, which is nothing but why do W transport excellent. 22 00:01:51,970 --> 00:01:54,180 So this will also be positive. 23 00:01:54,490 --> 00:02:04,810 So whenever I have a case as where my Y is also positive, where my distance is also positive and the 24 00:02:04,810 --> 00:02:08,060 product of both was also positive. 25 00:02:08,380 --> 00:02:13,240 So in such case, my classification has happened correctly. 26 00:02:13,540 --> 00:02:20,690 So it means this X data point has classified correctly by this a straight line. 27 00:02:21,010 --> 00:02:23,950 So this is my correctly classified data point. 28 00:02:24,340 --> 00:02:27,470 So I'm going to assume another case over here. 29 00:02:27,490 --> 00:02:37,210 So this is my very first case and assuming I have a case to and in case to assume I am going to consider 30 00:02:37,390 --> 00:02:44,800 this data point, which is exactly my negative point so far, this I have Y equals to minus one and 31 00:02:44,800 --> 00:02:49,480 my distance is also negative because it is below the plane. 32 00:02:49,690 --> 00:02:57,270 And if I am going to do multiplication or both, which is nothing but minus one into some negative number, 33 00:02:57,550 --> 00:03:00,950 so it will also give me some positive value. 34 00:03:01,480 --> 00:03:09,020 So since Y and distance, both are negative, but when I'm going to do multiplication of both. 35 00:03:09,370 --> 00:03:14,070 So in such case my data point gets classified correctly. 36 00:03:14,230 --> 00:03:22,260 So it means the circle data point is also gets correctly classified by this straight line. 37 00:03:22,480 --> 00:03:29,500 So let me discuss my case number three in which I'm going to consider a negative data point. 38 00:03:30,100 --> 00:03:38,650 Assume this one so far this my wife will be negative or I can say minus one, but my distance, which 39 00:03:38,650 --> 00:03:43,870 is W transpose X will be positive because it is the plane. 40 00:03:44,410 --> 00:03:51,910 So if I'm going to do multiplication of board, it will give me a negative value, which will directly 41 00:03:51,910 --> 00:03:55,120 refers to it is incorrectly. 42 00:03:56,560 --> 00:04:07,570 Classified data point, so that's what we have to deal with that in a similar case as you made, I have 43 00:04:07,570 --> 00:04:09,460 a case number for over here. 44 00:04:09,880 --> 00:04:13,010 So what exactly can be my case number four? 45 00:04:13,690 --> 00:04:16,080 Suppose this is my data point. 46 00:04:16,180 --> 00:04:18,100 Suppose this is actually my data point. 47 00:04:18,340 --> 00:04:25,860 So for this, my life will be pleasant, but my distance will be negative because it is below the plane. 48 00:04:26,290 --> 00:04:33,730 And if I want to do multiplication of board, it will give me a negative value, which will directly 49 00:04:33,730 --> 00:04:38,500 refers to incorrectly classified data. 50 00:04:39,070 --> 00:04:43,910 So anyhow, we have to deal with both of these cases. 51 00:04:43,930 --> 00:04:45,860 The third one and the fourth one. 52 00:04:46,030 --> 00:04:58,030 So if I have to define my cost function, I can see my cost function is nothing bad, but just a summation 53 00:04:58,030 --> 00:05:05,140 of I close to one to end because I have any number of data points and that is nothing but a product 54 00:05:05,140 --> 00:05:11,770 of this why which is assigned to each and every data point and nothing, just a distance of each and 55 00:05:11,770 --> 00:05:14,700 every data point to its plane. 56 00:05:15,130 --> 00:05:19,630 So that's nothing but DeBlois I could transport into exile. 57 00:05:19,660 --> 00:05:22,250 Similarly, I have no idea why. 58 00:05:22,510 --> 00:05:29,620 So this is nothing but my entire existence and this is a Y value which is assigned to each of the data 59 00:05:29,620 --> 00:05:29,960 point. 60 00:05:30,370 --> 00:05:38,020 So this is exactly my cost function, which I can say this is my optimizer. 61 00:05:39,060 --> 00:05:46,480 So it should be as maximum as possible, so we have to make it as maximum. 62 00:05:46,740 --> 00:05:54,240 So if I have to create a straight line or a best fit line, which linearly separate to this data point, 63 00:05:54,750 --> 00:06:03,300 I have to make sure that the summation of all the points along with the distance should be maximum. 64 00:06:03,480 --> 00:06:05,340 That's what we have to make sure. 65 00:06:05,700 --> 00:06:09,840 So from this cost function, you will analyze this. 66 00:06:09,870 --> 00:06:14,370 Why I will be constant and this axi which is nothing. 67 00:06:14,370 --> 00:06:16,710 My data point is also constant. 68 00:06:17,040 --> 00:06:19,130 So what exactly is waiting over here. 69 00:06:19,560 --> 00:06:20,010 This. 70 00:06:21,520 --> 00:06:29,440 Transpose this w I transport, which is officially assigned to my line, which is coefficient assigned 71 00:06:29,440 --> 00:06:34,990 to my best foot line, so that is varying, which is actually a coefficient. 72 00:06:35,230 --> 00:06:43,510 So it means I have to update this coefficient or I can say I have to update this rate in such a way 73 00:06:43,780 --> 00:06:47,110 that it will maximize this summation. 74 00:06:47,410 --> 00:06:54,940 And when I'm getting my maximum summation, then that line that will be termed as best fit line that 75 00:06:54,940 --> 00:06:57,690 linearly classifies our classes. 76 00:06:57,850 --> 00:07:02,570 So this is how my entire logic regression actually works. 77 00:07:02,620 --> 00:07:11,560 So assuming I have this data points, so I have to draw a best fit line for my best line and let's say 78 00:07:11,580 --> 00:07:14,880 this one or let's say this one as well. 79 00:07:15,890 --> 00:07:25,580 And let's take this one as well so I can have here multiple best foodline, so what can be my best Quitline 80 00:07:25,580 --> 00:07:26,200 actually? 81 00:07:26,300 --> 00:07:31,880 So my best foot line will be that line, which is a maximum value of my optimizer. 82 00:07:31,890 --> 00:07:33,730 And what exactly was optimizer? 83 00:07:34,100 --> 00:07:37,010 That is nothing but I equals to one, two. 84 00:07:37,010 --> 00:07:48,590 And this why I into w I transpose into exile so which our best client has this maximum value, that 85 00:07:48,770 --> 00:07:53,680 best fit line will get selected for my prediction purpose. 86 00:07:53,930 --> 00:07:59,340 So the optimizer that we have written over here is still there. 87 00:07:59,390 --> 00:08:06,440 I have to update it assumed I have it in this much number of data points and if I have to draw a best 88 00:08:06,440 --> 00:08:09,330 fit line, that is nothing but this one. 89 00:08:09,680 --> 00:08:17,240 So this is that best fit line that can linearly separable both of these data points, both of the different 90 00:08:17,240 --> 00:08:18,280 different classes. 91 00:08:18,380 --> 00:08:21,580 But what if I have an outlier? 92 00:08:21,590 --> 00:08:25,140 What do you suppose my outlier like to add here? 93 00:08:25,400 --> 00:08:30,700 So what can be my best Quitline if I have outlined over here? 94 00:08:31,160 --> 00:08:39,170 So let's say if I am going to consider the previous best fit line, which is this one, and if I am 95 00:08:39,170 --> 00:08:44,680 going to consider, let's say, the distance between this one and distance between this one. 96 00:08:44,990 --> 00:08:52,430 So this will be exactly where negative distance y because this data point will that close to my negative 97 00:08:52,430 --> 00:08:56,990 data point and this cross data point will therefore do my positive data point. 98 00:08:57,140 --> 00:09:02,570 And here my way is going to be negative one and here my view will be positive one. 99 00:09:02,990 --> 00:09:09,740 So here in all the cases I have minus two and here in all the cases I have plus two. 100 00:09:09,950 --> 00:09:15,860 So assume I'm going to assume as the distance between the straight line and data point is. 101 00:09:16,460 --> 00:09:18,560 So here my tool will reward. 102 00:09:18,890 --> 00:09:22,760 That is nothing but W transpose D into X. 103 00:09:23,360 --> 00:09:30,680 So here if I'm going to consider this one distance, assuming if I'm going to consider this one, this 104 00:09:30,680 --> 00:09:33,950 test, it will somewhere give me one hundred. 105 00:09:33,980 --> 00:09:35,120 I'm just assuming. 106 00:09:35,870 --> 00:09:43,340 So if you are going to do a summation of all these stuffs, you will see I have had minus eight so I'm 107 00:09:43,340 --> 00:09:48,610 going to say minus it and here I have plus eight and here I have this one hundred. 108 00:09:48,980 --> 00:09:53,380 So this will get cancer and this will give me one hundred. 109 00:09:53,540 --> 00:09:55,280 So that's hundred. 110 00:09:55,730 --> 00:10:01,730 Will, just on account of this outlier, this data point, which is exactly my outlier. 111 00:10:01,970 --> 00:10:08,720 And if I am going to consider previous case in this case, this will be one minus eight and this will 112 00:10:08,720 --> 00:10:13,240 be exactly where plus eight to in this case I have this one at zero. 113 00:10:13,550 --> 00:10:19,850 So if I'm going to consider this one here, you will see I have highest fluctuation in data. 114 00:10:19,850 --> 00:10:27,710 You will see this is just an account of that outlier that has presented over here, assuming I'm going 115 00:10:27,710 --> 00:10:29,780 to consider the best flat line. 116 00:10:30,200 --> 00:10:34,260 Let's this time I have this one best fit line. 117 00:10:34,730 --> 00:10:38,990 So in such case, you can observe these all data points. 118 00:10:38,990 --> 00:10:46,290 These circle data points will give me my negative distance and all these will give me my boster distance. 119 00:10:46,490 --> 00:10:48,240 So these all will get canceled. 120 00:10:48,560 --> 00:10:54,710 So this will be my exact distance, which is nothing but a negative distance. 121 00:10:54,950 --> 00:11:01,010 So you will see whenever I'm going to change my best route line, how much fluctuation we are going 122 00:11:01,010 --> 00:11:04,290 to observe using this best foot line. 123 00:11:04,460 --> 00:11:09,540 So anyhow, we have to overcome this fluctuation, which is going to happen over here. 124 00:11:09,710 --> 00:11:18,160 The simple answer is by just applying a function on this optimizer to what optimize that I was previously, 125 00:11:18,170 --> 00:11:27,440 that is nothing but why I into this w transpose X and if I have multiple data points, I can do a summation 126 00:11:27,440 --> 00:11:27,930 as well. 127 00:11:28,250 --> 00:11:30,770 So this is exactly what means optimize it. 128 00:11:31,070 --> 00:11:33,950 So I have to apply a function over here. 129 00:11:34,130 --> 00:11:35,930 So what exactly was a function? 130 00:11:36,320 --> 00:11:42,370 This function is nothing, but this is exactly my sigmoid function. 131 00:11:42,800 --> 00:11:48,330 So we have to apply this sigmoid function on each of these value. 132 00:11:48,770 --> 00:11:54,600 So after applying this function, it will convert this entire value. 133 00:11:54,620 --> 00:12:01,580 You will see, let's say I'm going to assume it is Z, so whatever data point I have over here, so 134 00:12:01,730 --> 00:12:05,350 any data point will give me my value in this form. 135 00:12:05,870 --> 00:12:13,180 So whatever value it will return, I'm just going to send this value to my sigmoid function and what 136 00:12:13,180 --> 00:12:14,450 this sigmoid will do. 137 00:12:14,780 --> 00:12:15,380 It will. 138 00:12:15,660 --> 00:12:22,770 What this exit is higher and let's say the value that it Willetton, let's say one hundred sixty Segway 139 00:12:22,950 --> 00:12:27,490 will convert this Hyundai into a range of 021. 140 00:12:27,690 --> 00:12:30,100 So that's what my segment will do. 141 00:12:30,480 --> 00:12:38,280 And as I have very much fluctuation in the cost function value so we can prevent this using this signal. 142 00:12:38,640 --> 00:12:46,980 So how exactly this sigma will prevent the sigmoid function is nothing but one up on one plus E to the 143 00:12:46,990 --> 00:12:47,900 power mindset. 144 00:12:48,150 --> 00:12:50,550 And this is exactly why that. 145 00:12:50,700 --> 00:12:58,160 So we have to parse this, that here and this will convert this entire value between zero to one. 146 00:12:58,530 --> 00:13:00,420 That's what my sigmoid will do. 147 00:13:00,570 --> 00:13:01,870 So what segment will do? 148 00:13:01,890 --> 00:13:08,650 Basically it will remove the effect of this outlier by just converting Heidelberg. 149 00:13:08,820 --> 00:13:13,980 Let's say this high number one hundred into some range between zero to one. 150 00:13:14,160 --> 00:13:18,640 So that's basically a task of this sigmoid function. 151 00:13:18,930 --> 00:13:23,160 What we are going to use in my life is regression. 152 00:13:23,610 --> 00:13:26,670 So our previous use case was. 153 00:13:27,090 --> 00:13:28,610 Yeah, this one. 154 00:13:28,650 --> 00:13:29,010 Yeah. 155 00:13:29,400 --> 00:13:33,850 So in this case, you can observe this was exactly hundred. 156 00:13:34,080 --> 00:13:41,520 So what we have to do very first, we have to parse this one hundred to my sigmoid function so I can 157 00:13:41,520 --> 00:13:43,410 see over here. 158 00:13:43,680 --> 00:13:49,020 So this will exactly give me some value between zero to one and my problem is solved. 159 00:13:49,410 --> 00:13:53,050 That's what we are trying to achieve, using logistic regression. 160 00:13:53,070 --> 00:13:54,740 That's what my sigmoid will do. 161 00:13:55,020 --> 00:13:59,330 It will remove the impact of this outlier. 162 00:13:59,400 --> 00:14:01,500 That's what I want to achieve. 163 00:14:01,650 --> 00:14:09,300 So how exactly we achieve this as goal, which is mine, does take regression or whatever input or I 164 00:14:09,300 --> 00:14:13,150 can say whatever output this linear regression will give. 165 00:14:13,290 --> 00:14:15,000 Suppose it gives me one hundred. 166 00:14:15,280 --> 00:14:18,930 So I'm just going to apply a sigmoid over this one hundred. 167 00:14:19,230 --> 00:14:27,720 So once I will apply this sigmoid over this hundred using this formula, it will give me this kind of 168 00:14:27,720 --> 00:14:28,020 code. 169 00:14:28,290 --> 00:14:32,650 That's what logistic regression works internally. 170 00:14:32,940 --> 00:14:36,630 So hope you love this in that induction of logistic regression. 171 00:14:37,290 --> 00:14:37,900 Thank you. 172 00:14:37,950 --> 00:14:38,850 Have a nice day. 173 00:14:39,210 --> 00:14:40,010 Keep learning. 174 00:14:40,020 --> 00:14:40,830 Keep going. 175 00:14:41,220 --> 00:14:42,120 Keep practicing.