1
00:00:02,450 --> 00:00:07,350
In this video, we are starting with the first classified technique, which is logistic regression.

2
00:00:10,040 --> 00:00:15,970
Let us consider again these similar credit default data set that we saw in the last lecture.

3
00:00:19,460 --> 00:00:25,640
So in this, we have three predictive variables studying balance and income.

4
00:00:27,440 --> 00:00:32,270
And as a responsible even we have weathered the person defaulted on the credit or not.

5
00:00:33,950 --> 00:00:40,550
If a good deed defaulted very well as Z2 and one that is, if the person defaulted, then one, and

6
00:00:40,550 --> 00:00:42,230
if not defaulted, then ditto.

7
00:00:42,850 --> 00:00:47,400
And then simple linear regression with just one variable, say balance.

8
00:00:48,650 --> 00:00:49,720
This is what I will get.

9
00:00:53,990 --> 00:00:59,810
So using the values of balance, we are predicting the probability of default basis this line.

10
00:01:02,180 --> 00:01:08,800
Now we can choose any boundary value above which we will classify it as one and below it.

11
00:01:09,050 --> 00:01:10,880
We will classify it as zero.

12
00:01:12,470 --> 00:01:15,170
So suppose I select point to as cutoff value.

13
00:01:15,950 --> 00:01:18,800
This point to will be koelie boundary value.

14
00:01:20,120 --> 00:01:25,640
If you want to be less conservative in predicting who is going to default, we can use larger values.

15
00:01:25,790 --> 00:01:33,860
Say point five as boundary value, then any point which lies beyond point five will be considered as

16
00:01:33,860 --> 00:01:35,470
risky for giving credit.

17
00:01:38,560 --> 00:01:43,630
You can see the problem with this regression model for these small values of balance.

18
00:01:44,770 --> 00:01:47,530
We are getting negative values of probability of default.

19
00:01:48,730 --> 00:01:50,740
How do we interpret these negative values?

20
00:01:51,610 --> 00:01:56,290
There is no explanation for these values toward this problem.

21
00:01:56,860 --> 00:02:03,580
We must model this probability with a function that gives output between zero and one for all values

22
00:02:03,580 --> 00:02:04,060
of X.

23
00:02:07,390 --> 00:02:12,640
And logistic regression, we use the logistic function, which is also called sigmoid function.

24
00:02:16,210 --> 00:02:19,780
This function is called sigmoid function or logistic function.

25
00:02:20,800 --> 00:02:24,070
This is where logistic regression is getting its name from.

26
00:02:26,560 --> 00:02:29,470
You can see that this is the linear part.

27
00:02:30,280 --> 00:02:33,820
Initially we had Y is equal to be does it plus B Dominik's.

28
00:02:35,380 --> 00:02:42,550
That linear relationship has been put into this function to get deployability to understand why we did

29
00:02:42,550 --> 00:02:42,880
this.

30
00:02:43,300 --> 00:02:44,380
Let us look at this graph.

31
00:02:46,630 --> 00:02:53,440
Logistic function has this s shape shaped go and it is bonded between VI is equal to zero and Y is equal

32
00:02:53,440 --> 00:02:53,830
to one.

33
00:02:55,990 --> 00:02:58,870
You can see if I put X is equal to infinity here.

34
00:02:59,950 --> 00:03:01,720
Why is tending to one.

35
00:03:02,590 --> 00:03:07,930
And if I put X is equal to minus infinity here why is tending to zero.

36
00:03:10,450 --> 00:03:16,630
These are sensible limits and the output of this function is better able to capture the range of probabilities

37
00:03:16,870 --> 00:03:18,140
than the linear regression model.

38
00:03:21,010 --> 00:03:23,170
So this is how the final function will look like.

39
00:03:25,030 --> 00:03:30,040
We can also see that we are finding out the probability of Y is equal to one.

40
00:03:30,310 --> 00:03:40,420
Using this logistic function, what I mean is if what X is this point fifteen hundred correspondingly

41
00:03:40,690 --> 00:03:48,250
Y values this same point, one will be saying that there is a 10 percent chance that the person will

42
00:03:48,250 --> 00:03:48,760
default.

43
00:03:51,640 --> 00:03:58,600
Now, keep in mind that since way is not linearly littered with X, that is this is the Y value and

44
00:03:58,600 --> 00:04:00,850
Y is not linearly littered with X.

45
00:04:01,090 --> 00:04:03,580
It is some other function of X.

46
00:04:04,510 --> 00:04:05,980
We cannot interpret B.

47
00:04:06,340 --> 00:04:10,240
And B does it all in the same way as we do in linear regression.

48
00:04:11,740 --> 00:04:18,490
That is, if we increase X by one unit this way will not increase by B one unit.

49
00:04:20,440 --> 00:04:23,020
The relationship is a bit more complex than that.

50
00:04:23,380 --> 00:04:28,240
And we will discuss later how to interpret the result of a logistic regression model.

51
00:04:30,130 --> 00:04:32,200
Now, if you remember from the last lecture.

52
00:04:32,590 --> 00:04:39,520
The third issue with the linear regression classification model is that if we have outlaying point,

53
00:04:40,330 --> 00:04:46,150
the line tends to change its slope and then misclassify most of the point.

54
00:04:47,290 --> 00:04:49,990
But that problem is also handled by the sigmoid function.

55
00:04:50,470 --> 00:04:57,730
If we have an outline point also, this code is automatically changing its slope and covering those

56
00:04:57,850 --> 00:04:58,740
outlying points.

57
00:05:00,430 --> 00:05:07,720
So any point here, if we extend this graph, any point here will also lay on this good.

58
00:05:08,080 --> 00:05:10,930
And this slope will not be changed much.

59
00:05:12,550 --> 00:05:14,110
So this is all this.

60
00:05:14,200 --> 00:05:20,800
Using the sigmoid function, we have solved the issue of having probability values between zero and

61
00:05:20,800 --> 00:05:21,130
one.

62
00:05:21,550 --> 00:05:30,130
And the problem of outlaying data point to know that we have the model that can give us the probability

63
00:05:30,130 --> 00:05:32,260
values using this formula.

64
00:05:33,460 --> 00:05:34,960
We need to get the values of this.

65
00:05:34,970 --> 00:05:35,420
B does it.

66
00:05:35,680 --> 00:05:42,580
And we do one such that we have minimum error or maximum correctness and prediction.

67
00:05:46,770 --> 00:05:54,290
In linear regression model, we use a method called or Neddie Lee Square Meter and logistic regression.

68
00:05:54,500 --> 00:05:57,830
The general method used is called maximum likelihood method.

69
00:06:00,110 --> 00:06:02,090
The intuition behind it is simple.

70
00:06:03,370 --> 00:06:13,100
Let us go back to the example to understand how maximum likelihood is applied when a person is defaulting.

71
00:06:13,970 --> 00:06:15,590
The default value is one.

72
00:06:17,390 --> 00:06:23,870
The predicted probability, which is B X should be as close to one in such a case.

73
00:06:25,300 --> 00:06:30,740
And when the person is not defaulting, the predicted probability should be as close to zero in such

74
00:06:30,740 --> 00:06:31,130
a case.

75
00:06:32,870 --> 00:06:44,210
So we want to maximize B X if the person is defaulting and we want to maximize one minus because if

76
00:06:44,210 --> 00:06:45,590
the person is not defaulting.

77
00:06:47,600 --> 00:06:56,690
So this likelihood function is to be maximized, given these two conditions, maximum likely, maximum

78
00:06:56,690 --> 00:07:03,740
likelihood metter is just maximizing this small condition beyond this formula.

79
00:07:03,920 --> 00:07:09,080
I will not show you any more mathematics because mathematics does get a bit complex.

80
00:07:09,140 --> 00:07:14,010
And frankly, as business managers, you will never need more mathematics than this.

81
00:07:15,640 --> 00:07:18,350
Our software packages can easily fit this model.

82
00:07:19,040 --> 00:07:23,890
And so we do not need to concern ourselves with details of this fitting procedure.

83
00:07:25,970 --> 00:07:34,430
However, remember this part, that lake and linear regression we used, or Mary Lee Square in logistic

84
00:07:34,430 --> 00:07:34,970
regression.

85
00:07:35,120 --> 00:07:41,810
We use maximum likelihood, better to estimate decode fishings and maximum likelihood.

86
00:07:41,810 --> 00:07:45,530
Materne is a very popular approach used in many classification techniques.

87
00:07:47,450 --> 00:07:53,140
And then next we do we will learn how to train a logistic model on a dataset using only one predictability.