1 00:00:00,166 --> 00:00:03,633 In this tutorial we will be talking about false positives and false negatives. 2 00:00:04,033 --> 00:00:07,666 As you remember where we were using the logistic regression 3 00:00:07,666 --> 00:00:12,566 function to observe where for random 4 00:00:12,766 --> 00:00:17,700 values of the independent variable will end up in terms of y hat. 5 00:00:17,700 --> 00:00:21,633 So in terms of the predicted value for the dependent variable, 6 00:00:22,000 --> 00:00:25,500 and we agreed that anything below the 50% line 7 00:00:26,033 --> 00:00:33,300 will be projected downwards onto the zero horizontal line, and anything above 8 00:00:33,300 --> 00:00:39,300 the 50% line will be projected upwards onto the 100% horizontal line. 9 00:00:40,000 --> 00:00:45,000 And that allowed us to turn probabilities into actual predictions. 10 00:00:45,000 --> 00:00:48,000 So either yes or no. 11 00:00:48,600 --> 00:00:50,133 Now let's take a step back. 12 00:00:50,133 --> 00:00:53,333 Where did we get these four values from? 13 00:00:53,566 --> 00:00:57,400 So we took four random values of the independent variable. 14 00:00:57,833 --> 00:01:00,833 And we just had a look at what would happen to them. 15 00:01:00,966 --> 00:01:04,633 How we can use how we would use the logistic regression function 16 00:01:04,800 --> 00:01:07,733 to ascertain what's probability 17 00:01:07,733 --> 00:01:11,066 they have and what y hat values they have. 18 00:01:11,066 --> 00:01:14,866 So how about we take another step back and we forget 19 00:01:14,866 --> 00:01:16,800 about these four random values. 20 00:01:16,800 --> 00:01:19,033 And instead of taking four random values 21 00:01:19,033 --> 00:01:22,766 of the independent variable, how about we take four known values. 22 00:01:22,800 --> 00:01:27,866 In fact, let's take four values for the independent variable from our data set. 23 00:01:27,900 --> 00:01:32,800 So let's just pick out four values that we really know that exist in our data set. 24 00:01:32,800 --> 00:01:37,066 And we use them to create this logistic regression. 25 00:01:38,266 --> 00:01:40,200 And let's do the same thing with them. 26 00:01:40,200 --> 00:01:43,866 Let's see where they will end up if we apply the model to them. 27 00:01:44,133 --> 00:01:48,400 And as you can see here, the label of the vertical axis change to Y 28 00:01:48,533 --> 00:01:52,400 because this is the we already know that in red is the actual 29 00:01:52,666 --> 00:01:56,400 value of the dependent variable, because we know the result, those, 30 00:01:56,766 --> 00:01:58,300 the people on the bottom. 31 00:01:58,300 --> 00:02:01,666 So observations number one and number three, they didn't take up the offer 32 00:02:01,666 --> 00:02:04,666 the email offer and the observations. 33 00:02:04,733 --> 00:02:09,933 on the top, people number two and four, they did take up the email offer. 34 00:02:09,933 --> 00:02:12,700 So let's see what happens to them 35 00:02:12,700 --> 00:02:15,700 if we apply our logistic regression model. 36 00:02:16,500 --> 00:02:19,300 So step number one would be to project 37 00:02:19,300 --> 00:02:22,300 these values onto the curve. 38 00:02:22,833 --> 00:02:23,633 Makes sense right. 39 00:02:23,633 --> 00:02:26,366 So we just want to see where they end up on the curve. 40 00:02:26,366 --> 00:02:28,366 That's our blue dots over there. 41 00:02:28,366 --> 00:02:32,366 That's where they have been modeled by the curve. 42 00:02:33,000 --> 00:02:36,166 Now we can from here we can say what the probabilities are. 43 00:02:36,166 --> 00:02:37,600 You just have to project to the left. 44 00:02:37,600 --> 00:02:41,300 And you can see approximately that for observation number one, 45 00:02:41,766 --> 00:02:46,866 it's, about maybe 20%, ten, 15%, maybe let's say 15%. 46 00:02:46,866 --> 00:02:50,933 Observation number two, it's about 40% observation number three, 47 00:02:51,066 --> 00:02:54,933 I would say about 70% observation and before about 85%. 48 00:02:55,666 --> 00:02:59,233 But we're not interested in probabilities per se right now. 49 00:02:59,233 --> 00:03:03,133 What we want to get to is the actual y hat. 50 00:03:03,133 --> 00:03:05,100 We want to see what the predicted value will be. 51 00:03:05,100 --> 00:03:08,233 So we want to say we want to see if the model will tell us, 52 00:03:08,566 --> 00:03:11,900 are these people, going to take up the offer or not? 53 00:03:11,900 --> 00:03:15,766 And why do we want to do this is because we already know the result, right? 54 00:03:15,766 --> 00:03:19,200 We already know what the result will be, or was. 55 00:03:19,533 --> 00:03:22,900 And we just want to see we want to kind of, assess the model. 56 00:03:22,900 --> 00:03:24,800 We want to see how well it's working. 57 00:03:24,800 --> 00:03:27,000 And, if it's going to make any mistakes. 58 00:03:27,966 --> 00:03:28,600 So let's go 59 00:03:28,600 --> 00:03:32,366 ahead and proceed with our logic for getting the y hat. 60 00:03:32,566 --> 00:03:33,866 And what was the logic there? 61 00:03:33,866 --> 00:03:37,300 Well, the same thing that we discussed just a few minutes ago 62 00:03:37,300 --> 00:03:38,600 at the start of this tutorial. 63 00:03:38,600 --> 00:03:42,400 Anything we're using this arbitrary horizontal line 50%. 64 00:03:42,400 --> 00:03:45,766 So anything below this line is going to be projected 65 00:03:45,900 --> 00:03:49,366 onto the horizontal line, which is zero. 66 00:03:49,366 --> 00:03:53,766 So where we're saying that the offer is not going to be taken up 67 00:03:54,300 --> 00:03:57,166 and anything above the 50% line will be projected 68 00:03:57,166 --> 00:04:02,100 onto the horizontal line, which is 1 or 100%, where we're saying that 69 00:04:02,533 --> 00:04:06,700 those people that end up on that line are definitely going to take up the offer. 70 00:04:07,300 --> 00:04:10,300 So let's go ahead and do that. 71 00:04:10,333 --> 00:04:11,500 There we go. In gray. 72 00:04:11,500 --> 00:04:15,200 We have our projections or our predicted value. 73 00:04:15,200 --> 00:04:17,366 So y hat is in gray. 74 00:04:17,366 --> 00:04:21,300 And it's very interesting to see both y and y hat on one chart. 75 00:04:21,300 --> 00:04:23,733 So that means what actually happened is in red. 76 00:04:23,733 --> 00:04:26,733 And what we predicted was going to happen is in gray. 77 00:04:27,000 --> 00:04:30,666 And right away you can see that for observations number one and number four. 78 00:04:30,666 --> 00:04:34,800 So for those people in observation 79 00:04:34,800 --> 00:04:37,900 number one, number four they we predicted correctly. 80 00:04:37,900 --> 00:04:42,000 So we said for the person number one, we predicted that 81 00:04:42,166 --> 00:04:43,633 he won't take up the offer. 82 00:04:43,633 --> 00:04:48,200 And he actually did not take up the offer because the red, 83 00:04:48,200 --> 00:04:51,200 mark is also on the same horizontal line. 84 00:04:51,200 --> 00:04:53,666 Now for observation number four, same thing. 85 00:04:53,666 --> 00:04:54,666 We predicted that 86 00:04:54,666 --> 00:04:58,500 that person will take up the offer, and they did take up the offer. 87 00:04:58,966 --> 00:04:59,800 That's good. 88 00:04:59,800 --> 00:05:02,033 But now let's have a look at observation number two. 89 00:05:02,033 --> 00:05:03,733 And number three you can see that 90 00:05:04,766 --> 00:05:07,800 for observation number two the gray lines at the bottom, 91 00:05:08,100 --> 00:05:10,900 the gray marks at the bottom, meaning that 92 00:05:10,900 --> 00:05:15,600 the model is predicting for this person based on their gender, based on their age. 93 00:05:15,600 --> 00:05:18,933 Well, in this case, just age, because we're doing a single, 94 00:05:19,333 --> 00:05:21,766 variable logistic regression. 95 00:05:21,766 --> 00:05:25,900 So based on their age, this model is saying that this person 96 00:05:25,900 --> 00:05:30,033 is not going to take up the offer because the gray mark is at the bottom. 97 00:05:30,300 --> 00:05:33,200 However, we can see that the red mark is at the top, 98 00:05:33,200 --> 00:05:37,166 meaning that this person did take up the offer and that means 99 00:05:37,166 --> 00:05:40,166 that the logistic regression made an error here. 100 00:05:40,366 --> 00:05:43,366 And same thing for person number three. 101 00:05:44,000 --> 00:05:46,200 The gray mark is at the top. 102 00:05:46,200 --> 00:05:49,500 And that means that the model is predicting that the person will 103 00:05:49,500 --> 00:05:50,600 will take up the offer. 104 00:05:50,600 --> 00:05:53,900 But the red marks at the bottom, meaning that the person didn't actually 105 00:05:53,900 --> 00:05:57,900 take up the offer and therefore the logistic regression made a mistake. 106 00:05:57,900 --> 00:05:59,400 Once again. 107 00:05:59,400 --> 00:06:02,866 And these mistakes, they actually have specific names. 108 00:06:02,866 --> 00:06:08,900 So the top mistake over there is a false positive or a type one error. 109 00:06:09,066 --> 00:06:10,800 What does false positive mean. 110 00:06:10,800 --> 00:06:15,266 Well it means that we said we predicted a positive outcome. 111 00:06:15,266 --> 00:06:16,233 But it was false. 112 00:06:16,233 --> 00:06:20,433 So we were we predicted an effect that did not occur. 113 00:06:21,233 --> 00:06:25,500 And the other mistake you see here has a different name. 114 00:06:25,500 --> 00:06:28,300 It's called a false negative. So we predicted that 115 00:06:28,300 --> 00:06:30,300 there won't be an effect. 116 00:06:30,300 --> 00:06:32,100 But the effect actually did occur. 117 00:06:32,100 --> 00:06:36,266 So our prediction was negative meaning there won't be an effect. 118 00:06:36,266 --> 00:06:37,633 But it was a false negative. 119 00:06:37,633 --> 00:06:39,900 And it's called a type two type of error. 120 00:06:39,900 --> 00:06:42,600 And the way I personally remember them, 121 00:06:42,600 --> 00:06:45,500 it's important also to distinguish between the two. 122 00:06:45,500 --> 00:06:49,800 The way I personally remember them is, I think of type 123 00:06:49,800 --> 00:06:53,866 one as less dangerous than type two. 124 00:06:53,866 --> 00:06:59,700 So type type one is less, for me in my in my mind, although it's not 125 00:06:59,700 --> 00:07:02,700 necessarily the case, but the way that's the way I remember them. 126 00:07:02,700 --> 00:07:05,133 That type one is kind of like a warning. 127 00:07:05,133 --> 00:07:07,566 So that's why there's an orange explanation, Mark. 128 00:07:07,566 --> 00:07:09,600 And it's, it's a false positive. 129 00:07:09,600 --> 00:07:12,600 So basically you said something is going to happen, but it didn't happen. 130 00:07:12,600 --> 00:07:14,166 So you said maybe there'll be an earthquake, 131 00:07:14,166 --> 00:07:15,266 but there wasn't an earthquake. 132 00:07:15,266 --> 00:07:17,566 So, you know, that's not the end of the world. 133 00:07:17,566 --> 00:07:22,266 But false negative is a bit worse in my once again understanding because, 134 00:07:22,833 --> 00:07:25,033 once if you say something, it's not going to happen, 135 00:07:25,033 --> 00:07:27,800 but it actually does happen, then you can't even be prepared for it. 136 00:07:28,966 --> 00:07:32,466 And that's why type two is, false negative. 137 00:07:32,466 --> 00:07:34,566 And that's how I remember them personally. 138 00:07:34,566 --> 00:07:37,500 But once again, it could be absolutely. 139 00:07:37,500 --> 00:07:41,366 they can both be pretty serious errors, especially when you are dealing 140 00:07:41,366 --> 00:07:44,366 with, like medical conclusions and stuff like that. 141 00:07:44,433 --> 00:07:47,900 So those are false positives and false negatives. 142 00:07:48,566 --> 00:07:51,466 we will be using them more when we talk about 143 00:07:51,466 --> 00:07:54,466 the confusion matrix in the next tutorial. 144 00:07:54,600 --> 00:07:55,833 And I look forward to see you next time. 145 00:07:55,833 --> 00:07:58,833 Until then, happy analyzing.