1 00:00:02,450 --> 00:00:07,350 In this video, we are starting with the first classified technique, which is logistic regression. 2 00:00:10,040 --> 00:00:15,970 Let us consider again these similar credit default data set that we saw in the last lecture. 3 00:00:19,460 --> 00:00:25,640 So in this, we have three predictive variables studying balance and income. 4 00:00:27,440 --> 00:00:32,270 And as a responsible even we have weathered the person defaulted on the credit or not. 5 00:00:33,950 --> 00:00:40,550 If a good deed defaulted very well as Z2 and one that is, if the person defaulted, then one, and 6 00:00:40,550 --> 00:00:42,230 if not defaulted, then ditto. 7 00:00:42,850 --> 00:00:47,400 And then simple linear regression with just one variable, say balance. 8 00:00:48,650 --> 00:00:49,720 This is what I will get. 9 00:00:53,990 --> 00:00:59,810 So using the values of balance, we are predicting the probability of default basis this line. 10 00:01:02,180 --> 00:01:08,800 Now we can choose any boundary value above which we will classify it as one and below it. 11 00:01:09,050 --> 00:01:10,880 We will classify it as zero. 12 00:01:12,470 --> 00:01:15,170 So suppose I select point to as cutoff value. 13 00:01:15,950 --> 00:01:18,800 This point to will be koelie boundary value. 14 00:01:20,120 --> 00:01:25,640 If you want to be less conservative in predicting who is going to default, we can use larger values. 15 00:01:25,790 --> 00:01:33,860 Say point five as boundary value, then any point which lies beyond point five will be considered as 16 00:01:33,860 --> 00:01:35,470 risky for giving credit. 17 00:01:38,560 --> 00:01:43,630 You can see the problem with this regression model for these small values of balance. 18 00:01:44,770 --> 00:01:47,530 We are getting negative values of probability of default. 19 00:01:48,730 --> 00:01:50,740 How do we interpret these negative values? 20 00:01:51,610 --> 00:01:56,290 There is no explanation for these values toward this problem. 21 00:01:56,860 --> 00:02:03,580 We must model this probability with a function that gives output between zero and one for all values 22 00:02:03,580 --> 00:02:04,060 of X. 23 00:02:07,390 --> 00:02:12,640 And logistic regression, we use the logistic function, which is also called sigmoid function. 24 00:02:16,210 --> 00:02:19,780 This function is called sigmoid function or logistic function. 25 00:02:20,800 --> 00:02:24,070 This is where logistic regression is getting its name from. 26 00:02:26,560 --> 00:02:29,470 You can see that this is the linear part. 27 00:02:30,280 --> 00:02:33,820 Initially we had Y is equal to be does it plus B Dominik's. 28 00:02:35,380 --> 00:02:42,550 That linear relationship has been put into this function to get deployability to understand why we did 29 00:02:42,550 --> 00:02:42,880 this. 30 00:02:43,300 --> 00:02:44,380 Let us look at this graph. 31 00:02:46,630 --> 00:02:53,440 Logistic function has this s shape shaped go and it is bonded between VI is equal to zero and Y is equal 32 00:02:53,440 --> 00:02:53,830 to one. 33 00:02:55,990 --> 00:02:58,870 You can see if I put X is equal to infinity here. 34 00:02:59,950 --> 00:03:01,720 Why is tending to one. 35 00:03:02,590 --> 00:03:07,930 And if I put X is equal to minus infinity here why is tending to zero. 36 00:03:10,450 --> 00:03:16,630 These are sensible limits and the output of this function is better able to capture the range of probabilities 37 00:03:16,870 --> 00:03:18,140 than the linear regression model. 38 00:03:21,010 --> 00:03:23,170 So this is how the final function will look like. 39 00:03:25,030 --> 00:03:30,040 We can also see that we are finding out the probability of Y is equal to one. 40 00:03:30,310 --> 00:03:40,420 Using this logistic function, what I mean is if what X is this point fifteen hundred correspondingly 41 00:03:40,690 --> 00:03:48,250 Y values this same point, one will be saying that there is a 10 percent chance that the person will 42 00:03:48,250 --> 00:03:48,760 default. 43 00:03:51,640 --> 00:03:58,600 Now, keep in mind that since way is not linearly littered with X, that is this is the Y value and 44 00:03:58,600 --> 00:04:00,850 Y is not linearly littered with X. 45 00:04:01,090 --> 00:04:03,580 It is some other function of X. 46 00:04:04,510 --> 00:04:05,980 We cannot interpret B. 47 00:04:06,340 --> 00:04:10,240 And B does it all in the same way as we do in linear regression. 48 00:04:11,740 --> 00:04:18,490 That is, if we increase X by one unit this way will not increase by B one unit. 49 00:04:20,440 --> 00:04:23,020 The relationship is a bit more complex than that. 50 00:04:23,380 --> 00:04:28,240 And we will discuss later how to interpret the result of a logistic regression model. 51 00:04:30,130 --> 00:04:32,200 Now, if you remember from the last lecture. 52 00:04:32,590 --> 00:04:39,520 The third issue with the linear regression classification model is that if we have outlaying point, 53 00:04:40,330 --> 00:04:46,150 the line tends to change its slope and then misclassify most of the point. 54 00:04:47,290 --> 00:04:49,990 But that problem is also handled by the sigmoid function. 55 00:04:50,470 --> 00:04:57,730 If we have an outline point also, this code is automatically changing its slope and covering those 56 00:04:57,850 --> 00:04:58,740 outlying points. 57 00:05:00,430 --> 00:05:07,720 So any point here, if we extend this graph, any point here will also lay on this good. 58 00:05:08,080 --> 00:05:10,930 And this slope will not be changed much. 59 00:05:12,550 --> 00:05:14,110 So this is all this. 60 00:05:14,200 --> 00:05:20,800 Using the sigmoid function, we have solved the issue of having probability values between zero and 61 00:05:20,800 --> 00:05:21,130 one. 62 00:05:21,550 --> 00:05:30,130 And the problem of outlaying data point to know that we have the model that can give us the probability 63 00:05:30,130 --> 00:05:32,260 values using this formula. 64 00:05:33,460 --> 00:05:34,960 We need to get the values of this. 65 00:05:34,970 --> 00:05:35,420 B does it. 66 00:05:35,680 --> 00:05:42,580 And we do one such that we have minimum error or maximum correctness and prediction. 67 00:05:46,770 --> 00:05:54,290 In linear regression model, we use a method called or Neddie Lee Square Meter and logistic regression. 68 00:05:54,500 --> 00:05:57,830 The general method used is called maximum likelihood method. 69 00:06:00,110 --> 00:06:02,090 The intuition behind it is simple. 70 00:06:03,370 --> 00:06:13,100 Let us go back to the example to understand how maximum likelihood is applied when a person is defaulting. 71 00:06:13,970 --> 00:06:15,590 The default value is one. 72 00:06:17,390 --> 00:06:23,870 The predicted probability, which is B X should be as close to one in such a case. 73 00:06:25,300 --> 00:06:30,740 And when the person is not defaulting, the predicted probability should be as close to zero in such 74 00:06:30,740 --> 00:06:31,130 a case. 75 00:06:32,870 --> 00:06:44,210 So we want to maximize B X if the person is defaulting and we want to maximize one minus because if 76 00:06:44,210 --> 00:06:45,590 the person is not defaulting. 77 00:06:47,600 --> 00:06:56,690 So this likelihood function is to be maximized, given these two conditions, maximum likely, maximum 78 00:06:56,690 --> 00:07:03,740 likelihood metter is just maximizing this small condition beyond this formula. 79 00:07:03,920 --> 00:07:09,080 I will not show you any more mathematics because mathematics does get a bit complex. 80 00:07:09,140 --> 00:07:14,010 And frankly, as business managers, you will never need more mathematics than this. 81 00:07:15,640 --> 00:07:18,350 Our software packages can easily fit this model. 82 00:07:19,040 --> 00:07:23,890 And so we do not need to concern ourselves with details of this fitting procedure. 83 00:07:25,970 --> 00:07:34,430 However, remember this part, that lake and linear regression we used, or Mary Lee Square in logistic 84 00:07:34,430 --> 00:07:34,970 regression. 85 00:07:35,120 --> 00:07:41,810 We use maximum likelihood, better to estimate decode fishings and maximum likelihood. 86 00:07:41,810 --> 00:07:45,530 Materne is a very popular approach used in many classification techniques. 87 00:07:47,450 --> 00:07:53,140 And then next we do we will learn how to train a logistic model on a dataset using only one predictability.