1
00:00:00,420 --> 00:00:04,260
In the last video, we learned how to create our logistic regression model.

2
00:00:04,560 --> 00:00:10,440
In this lecture, we will see how to predict the values of why and how to create a confusion matrix

3
00:00:10,830 --> 00:00:17,220
from those predicted values to get the predicted values from our model.

4
00:00:17,970 --> 00:00:24,870
We just have to right not predict Kiewa to get the probability output from our model.

5
00:00:26,040 --> 00:00:26,820
We have to right.

6
00:00:27,570 --> 00:00:29,850
Predict underscore prob a.

7
00:00:30,990 --> 00:00:35,130
So my model variable is VLF underscore Elat.

8
00:00:36,750 --> 00:00:43,770
Therefore, I would like to see a live underscore la dot project underscore Proby and then in record,

9
00:00:43,800 --> 00:00:49,320
I have to mention the independent variables on which I want to predict the probabilities.

10
00:00:51,790 --> 00:00:52,660
If I done this.

11
00:00:55,190 --> 00:00:57,480
The output is in the form of a ray.

12
00:00:58,520 --> 00:01:01,520
The first column here is the probability of zero.

13
00:01:02,390 --> 00:01:05,750
That is probability of not sought in the second column.

14
00:01:05,750 --> 00:01:09,370
Here is the probability of one that this probability of solid.

15
00:01:12,180 --> 00:01:15,390
If you add these two column, the resultant is one.

16
00:01:18,330 --> 00:01:25,140
So for the first throw, the probability of homes being sold is zero point eighty seven for the second

17
00:01:25,140 --> 00:01:25,590
record.

18
00:01:25,980 --> 00:01:29,400
The probability is zero point six three and so on.

19
00:01:30,510 --> 00:01:38,460
Now, as we discussed earlier, we want to set a boundary condition on classifying it as not sold or

20
00:01:38,460 --> 00:01:42,930
sold by default or want to condition this point five.

21
00:01:44,030 --> 00:01:50,840
Which means that if the probability is less than point five, we had seen that the House cannot be sold

22
00:01:50,840 --> 00:01:54,880
within three months if the probability is greater than parfait.

23
00:01:55,160 --> 00:01:58,420
We are saying that we will be able to solve the holes in three months.

24
00:02:00,620 --> 00:02:09,620
Now, as we discussed earlier, that the cost associated with false positive and false negative is different

25
00:02:10,130 --> 00:02:14,270
and therefore we may want to choose a different boundary condition.

26
00:02:14,840 --> 00:02:18,740
We will use these probability values to choose that boundary condition.

27
00:02:19,070 --> 00:02:26,780
But before that, we will see how to predict zero one or classes depending on point five as our boundary

28
00:02:26,780 --> 00:02:27,350
condition.

29
00:02:28,550 --> 00:02:37,220
So by default, the predicate function that they see L.F. underscore allowed but predict takes boundary

30
00:02:37,220 --> 00:02:39,500
condition as zero point five.

31
00:02:40,130 --> 00:02:41,480
So if I run this code.

32
00:02:45,120 --> 00:02:52,980
You can see we have zero one zero one, the output result resultant is not in the form of problem readings,

33
00:02:53,190 --> 00:02:58,020
but it is in the form of classes that is sort of NORAD.

34
00:03:00,070 --> 00:03:06,700
If you compare from our probabilities values, for the first record, our probability was zero point

35
00:03:06,790 --> 00:03:10,690
eight seven six zero point eight seven is greater than zero point five.

36
00:03:10,840 --> 00:03:15,190
We are getting one for the second record.

37
00:03:15,220 --> 00:03:17,500
The probability is zero point six three.

38
00:03:18,340 --> 00:03:21,250
And since this is more than zero point five, we are getting one.

39
00:03:21,520 --> 00:03:22,810
And the second record as well.

40
00:03:24,550 --> 00:03:30,690
If you look at the third record, the probability here is zero point zero two.

41
00:03:31,480 --> 00:03:33,790
Since this is less than zero point five.

42
00:03:35,420 --> 00:03:37,570
We are getting zero in the third record

43
00:03:40,800 --> 00:03:44,330
now to set custom boundaries condition.

44
00:03:44,890 --> 00:03:49,700
We will use this project to underscore grade a probability values.

45
00:03:50,630 --> 00:03:56,150
So if you see here I am using this product, underscore Proby.

46
00:03:58,310 --> 00:04:01,490
Then I am selecting the second column from this.

47
00:04:01,760 --> 00:04:03,320
That's why I have written one here.

48
00:04:03,740 --> 00:04:10,540
So all the rules and on the second column, then I'm comparing this second column values with point

49
00:04:10,550 --> 00:04:12,020
three, point three.

50
00:04:12,050 --> 00:04:13,790
Here is my board recommendation.

51
00:04:15,890 --> 00:04:21,720
So if the value is greater than zero point three, the output will be crucial.

52
00:04:22,190 --> 00:04:26,420
And if the value here of probability is less than zero point three.

53
00:04:27,560 --> 00:04:29,050
The output will be false.

54
00:04:30,470 --> 00:04:32,240
And I am saving this value.

55
00:04:32,510 --> 00:04:37,550
And why underscore pride underscore zero point three if I rent this call?

56
00:04:40,110 --> 00:04:44,040
And get a sample of my wife tried to zero through and use.

57
00:04:54,230 --> 00:04:59,600
You can see I'm getting this grool for that Spruill false values, depending on the condition I have

58
00:04:59,600 --> 00:05:00,170
mentioned.

59
00:05:01,440 --> 00:05:07,840
Crew means one and false means zero one zero Stainforth, sorry or sorry.

60
00:05:11,340 --> 00:05:16,040
Now we have the actual values of way and the predicted values of way.

61
00:05:17,540 --> 00:05:25,580
Now we want to compare the accuracy of photo model, which means how many times we are actually predicting

62
00:05:25,580 --> 00:05:31,070
the correct outcome and how many times we are predicting the wrong outcome.

63
00:05:32,780 --> 00:05:34,250
Correct outcome means.

64
00:05:35,920 --> 00:05:39,660
Group positives that mean the actual value is true.

65
00:05:39,790 --> 00:05:42,800
That is one and the predicted value is also crucial.

66
00:05:42,940 --> 00:05:46,660
That is one or two negatives.

67
00:05:46,990 --> 00:05:50,860
That is actual value is zero and the predicted value is also zero.

68
00:05:52,100 --> 00:05:56,470
The wrong outcomes are false positive and false negative.

69
00:05:56,650 --> 00:05:59,200
We have already covered this in order to re-elect our.

70
00:06:02,220 --> 00:06:09,780
We also got word confusion metrics to categorize these four categories, and we will draw that confusion

71
00:06:09,780 --> 00:06:13,050
metrics from this output of our model.

72
00:06:16,110 --> 00:06:20,490
To create confusion metrics, we first have to import the confusion metrics.

73
00:06:21,840 --> 00:06:25,020
We will import it from a skill and dot matrix.

74
00:06:26,780 --> 00:06:30,560
And then we'll use confusion, make X function.

75
00:06:31,280 --> 00:06:32,690
There are two arguments here.

76
00:06:32,990 --> 00:06:35,060
First one is the actual values of Y.

77
00:06:35,570 --> 00:06:38,510
And second one is the predicted values of way.

78
00:06:40,010 --> 00:06:43,070
Remember, we created Widespread with zero point five.

79
00:06:43,130 --> 00:06:45,340
That is before it as our boundary condition.

80
00:06:46,190 --> 00:06:47,540
So let's run this.

81
00:06:52,060 --> 00:06:57,160
Here, these rules are the actual classes.

82
00:06:57,310 --> 00:06:59,600
So first rule is four zero.

83
00:06:59,980 --> 00:07:01,780
And second rule is for one.

84
00:07:02,200 --> 00:07:03,630
These are the actual outcomes.

85
00:07:03,640 --> 00:07:05,200
Rules are for actual outcomes.

86
00:07:05,830 --> 00:07:09,190
And this columns are for predicted classes.

87
00:07:09,640 --> 00:07:13,390
So the first column is for zero and the second column is for one.

88
00:07:15,250 --> 00:07:19,120
This one ninety five cents for zero and zero.

89
00:07:19,210 --> 00:07:22,180
That means the actual value was also zero.

90
00:07:22,270 --> 00:07:23,160
That is not sold.

91
00:07:23,330 --> 00:07:25,390
And the third there was also zero.

92
00:07:25,480 --> 00:07:26,380
That is not sold.

93
00:07:27,160 --> 00:07:29,260
These are also known as crude negatives.

94
00:07:32,220 --> 00:07:41,970
This 81 isn't the zero through the actual value of this, 81 was zero, but the predicted value is one.

95
00:07:42,030 --> 00:07:46,850
Since this are in the second volume, these are known as false positives.

96
00:07:49,470 --> 00:07:52,200
This 77 are in the second row.

97
00:07:52,350 --> 00:07:57,050
That is actually they belong to the second class that is sold.

98
00:07:59,040 --> 00:08:01,650
But we predicted them as not sold.

99
00:08:01,800 --> 00:08:03,690
So these are false negatives.

100
00:08:05,400 --> 00:08:06,670
This 153.

101
00:08:08,170 --> 00:08:14,330
For one and one, that is, the actual values also soared and the predicted value is also solid.

102
00:08:15,860 --> 00:08:23,690
Now let's create contingent metrics for our second pretty good value where we use zero point three as

103
00:08:23,690 --> 00:08:24,750
our boundary condition.

104
00:08:34,630 --> 00:08:35,520
Let's run this.

105
00:08:39,910 --> 00:08:44,510
Here is the confusion metrics for the zero point three as overblown to condition.

106
00:08:47,540 --> 00:08:54,530
Now, since we are using zero point three as bond recognition as compared to zero point five year,

107
00:08:54,740 --> 00:09:02,510
that means we are categorizing more values and the one category since we are lowering our probably be

108
00:09:02,510 --> 00:09:03,080
threshold.

109
00:09:03,980 --> 00:09:10,340
So, for example, if for some record the problem is zero point four, if we are losing zero point five

110
00:09:10,340 --> 00:09:14,030
threshold, we are categorizing it as not sold.

111
00:09:14,570 --> 00:09:17,960
But in this case we are categorizing it as sold.

112
00:09:18,740 --> 00:09:23,330
That's why the numbers here are inflated in the second column.

113
00:09:23,810 --> 00:09:26,310
The predicted ones numbers are inflated.

114
00:09:27,740 --> 00:09:38,000
So you can see our false positive values have increased from 81 to 154 and the false negative values

115
00:09:38,390 --> 00:09:40,880
as decreased from 77 to 17.

116
00:09:42,380 --> 00:09:49,880
So if you have different costs associated with your false positive and false negative, you can change

117
00:09:49,910 --> 00:09:55,910
this threshold level to change their distribution of these values in these two categories.