1 00:00:00,633 --> 00:00:01,100 All right. 2 00:00:01,100 --> 00:00:01,900 Let's do this. 3 00:00:01,900 --> 00:00:04,266 Logistic regression intuition. 4 00:00:04,266 --> 00:00:07,266 And you can probably already tell by my voice that I'm pretty excited. 5 00:00:07,333 --> 00:00:09,366 There's some very interesting slides coming up. 6 00:00:09,366 --> 00:00:11,700 And this is quite an important topic. 7 00:00:11,700 --> 00:00:14,300 But at the same time it is quite challenging. 8 00:00:14,300 --> 00:00:16,733 So a quick heads up, there will be some math. 9 00:00:16,733 --> 00:00:20,866 And I've done a few run through of this presentation already. 10 00:00:21,133 --> 00:00:26,566 And I really I will try my best to convey everything in the simplest way possible. 11 00:00:26,600 --> 00:00:28,666 So let's get into it. 12 00:00:28,666 --> 00:00:30,800 We already know about the linear regression. 13 00:00:30,800 --> 00:00:32,833 We know that there is a simple linear regression, 14 00:00:32,833 --> 00:00:36,100 and it has a very short formula with one independent variable. 15 00:00:36,400 --> 00:00:39,833 And we also have looked into the multiple linear regression 16 00:00:40,033 --> 00:00:42,500 which has many independent variables. 17 00:00:42,500 --> 00:00:48,166 So we already know how to deal with this type of challenge. 18 00:00:48,166 --> 00:00:51,900 So when we have a scatterplot like that where on the horizontal axis 19 00:00:51,900 --> 00:00:53,433 we've got the independent variable 20 00:00:53,433 --> 00:00:55,866 on the vertical axis we've got the dependent variable. 21 00:00:55,866 --> 00:00:59,966 And this is an example we looked at salary versus experience. 22 00:01:01,133 --> 00:01:03,200 How do we create a model here. 23 00:01:03,200 --> 00:01:06,066 So we use a simple linear regression. 24 00:01:06,066 --> 00:01:07,800 It puts a line through our data. 25 00:01:07,800 --> 00:01:11,866 And that line is modeling our observations. 26 00:01:11,866 --> 00:01:15,466 So we can basically forecast things and 27 00:01:15,466 --> 00:01:18,466 compare our actual observations to our model and so on. 28 00:01:18,800 --> 00:01:23,100 But so we know how to deal with challenges like that or problems like that. 29 00:01:23,433 --> 00:01:26,600 But your company had hired you as a data scientist. 30 00:01:26,833 --> 00:01:30,566 What they do is they send out email offers to customers 31 00:01:30,600 --> 00:01:33,600 with like a proposal to buy certain products. 32 00:01:34,366 --> 00:01:39,000 it might be a clothing store, it might be grocery store or something like that. 33 00:01:39,000 --> 00:01:42,000 So, what they do is basically they send out 34 00:01:42,000 --> 00:01:45,200 a, offer in, in email, 35 00:01:45,466 --> 00:01:48,900 to a lot of customers to purchase certain products. 36 00:01:49,233 --> 00:01:52,300 And here you've got a sample of those customers that they contacted recently. 37 00:01:52,300 --> 00:01:53,500 You've got their age. 38 00:01:53,500 --> 00:01:58,200 And also you have a variable whether or not they took action. 39 00:01:58,533 --> 00:02:01,733 So did the person take up an action, perform an action. 40 00:02:01,733 --> 00:02:04,133 Did they take up an offer? Did they buy a product? 41 00:02:04,133 --> 00:02:06,966 Did they open up an email, respond to our email and so on. 42 00:02:06,966 --> 00:02:08,433 So was the action taken or not? 43 00:02:08,433 --> 00:02:11,433 And the very black and white very different. 44 00:02:11,833 --> 00:02:14,733 But at the same time, like even though we don't know what to do, 45 00:02:14,733 --> 00:02:17,233 we don't know what's going on here. It's not what we're expecting. 46 00:02:17,233 --> 00:02:18,233 But at the same time, 47 00:02:18,233 --> 00:02:22,133 intuitively, we can see that there is some sort of correlation. 48 00:02:22,266 --> 00:02:25,866 We can see that the observations on the bottom 49 00:02:25,866 --> 00:02:27,066 there are a bit more to the left. 50 00:02:27,066 --> 00:02:31,733 Observations on the top are a bit more right, implying kind of that probably 51 00:02:32,400 --> 00:02:35,400 older people are more likely to take action, 52 00:02:35,733 --> 00:02:39,566 based on this offer, and the younger people are more likely to ignore it. 53 00:02:40,400 --> 00:02:43,066 So can we somehow model this? 54 00:02:43,066 --> 00:02:46,233 How about we try our, the existing method 55 00:02:46,233 --> 00:02:49,366 in our toolkit, which is the linear regression? 56 00:02:49,733 --> 00:02:51,133 Let's run a linear regression. 57 00:02:51,133 --> 00:02:52,400 And that's what it looks like. 58 00:02:52,400 --> 00:02:53,700 As you can tell, 59 00:02:53,700 --> 00:02:56,933 it doesn't look like the best approach doesn't look like the best method. 60 00:02:57,200 --> 00:03:00,200 to solve this problem. 61 00:03:00,266 --> 00:03:03,033 So let's look into this in a bit more detail. 62 00:03:06,066 --> 00:03:07,166 We're going to draw 63 00:03:07,166 --> 00:03:10,166 the other horizontal line over here. 64 00:03:11,100 --> 00:03:15,133 Instead of trying to predict exactly what's going to happen for 65 00:03:15,133 --> 00:03:19,066 for any given person, let's imagine a person and let's say we want to predict 66 00:03:19,066 --> 00:03:23,400 for that person knowing their age, we want to predict whether they will 67 00:03:23,833 --> 00:03:25,333 take up the offer or not. 68 00:03:25,333 --> 00:03:29,000 But instead of predicting exactly whether they're going to take it up or not, 69 00:03:29,266 --> 00:03:32,266 how about instead we will, 70 00:03:32,500 --> 00:03:36,733 predict the probability we will we will state a probability 71 00:03:36,733 --> 00:03:39,833 or a likelihood of that person taking up that offer. 72 00:03:40,200 --> 00:03:44,200 And if you think of it in that way right away, things start becoming clearer. 73 00:03:44,700 --> 00:03:45,166 Right away. 74 00:03:45,166 --> 00:03:49,400 You can see that, okay, so this chart is actually from 0 to 1. 75 00:03:49,766 --> 00:03:52,766 And I also know that probabilities are from 0 to 1. 76 00:03:53,033 --> 00:03:53,933 Oh that's interesting. 77 00:03:53,933 --> 00:03:57,233 So basically I could fit in probabilities between 0 and 1. 78 00:03:57,266 --> 00:03:58,833 The fact that the red dots 79 00:03:58,833 --> 00:04:02,766 the red observations are already either 0 or 1 and nowhere in between. 80 00:04:02,933 --> 00:04:05,100 Well, that's simply because we already know the results. 81 00:04:05,100 --> 00:04:07,133 We already know that they're either there or there. 82 00:04:07,133 --> 00:04:11,866 But for something that we're predicting, it kind of makes sense to say, well, 83 00:04:13,300 --> 00:04:14,500 I don't know for sure. 84 00:04:14,500 --> 00:04:16,333 I don't know 100%. He'll take it up or not. 85 00:04:16,333 --> 00:04:19,700 But I know maybe, maybe with an 80% chance he'll take it up or not. 86 00:04:19,933 --> 00:04:20,466 Right. 87 00:04:20,466 --> 00:04:23,766 And when you think of it that way, the linear regression line, 88 00:04:23,766 --> 00:04:28,200 or at least that part that's in the middle between 0 and 1, it makes sense, right? 89 00:04:28,233 --> 00:04:32,366 Well, it makes some sense because that is basically 90 00:04:32,866 --> 00:04:35,933 it's telling you that anybody between those ages of, 91 00:04:36,200 --> 00:04:39,200 for instance, where it's crossing the horizontal line for the first time, 92 00:04:39,600 --> 00:04:43,466 it might be where it's crossing the horizontal axis, it might be like 25, 93 00:04:44,266 --> 00:04:47,266 or let's say 35, and we're crossing the vertical, 94 00:04:47,300 --> 00:04:51,500 the horizontal axis for one, it might be, let's say 55. 95 00:04:51,533 --> 00:04:54,166 So those people between 35 and 55, 96 00:04:55,566 --> 00:04:56,700 they, 97 00:04:56,700 --> 00:04:59,766 anything in between, any person that falls in between that age, 98 00:04:59,933 --> 00:05:03,733 there's a probability of them taking up this offer. 99 00:05:03,733 --> 00:05:07,900 And that probability is increasing as we move to the right, 100 00:05:07,900 --> 00:05:10,900 as we take more and more older people, that probability is increasing. 101 00:05:10,966 --> 00:05:14,633 So the part of the linear regression in the middle kind of makes sense. 102 00:05:14,633 --> 00:05:16,700 And we can do something with it. 103 00:05:16,700 --> 00:05:20,100 But the parts that don't make sense at all are the ones at the top 104 00:05:20,100 --> 00:05:20,700 and at the bottom. 105 00:05:20,700 --> 00:05:25,366 Because a probability can never be less than zero, I can never be above one. 106 00:05:25,366 --> 00:05:27,200 So what is the linear regression? 107 00:05:27,200 --> 00:05:29,266 Trying to give us a hint about here? 108 00:05:29,266 --> 00:05:33,833 Well, what it's probably saying what we could interpret as is that 109 00:05:34,233 --> 00:05:37,033 people above that age that 110 00:05:37,033 --> 00:05:40,033 nominal age, we said 55 above that age, they 111 00:05:40,533 --> 00:05:45,300 they are very, very likely to take off or actually more than, more than 100%. 112 00:05:45,300 --> 00:05:47,700 So basically, they're definitely taking it up. 113 00:05:47,700 --> 00:05:51,166 Anybody below 35 on the other side, on the left, 114 00:05:51,566 --> 00:05:53,500 they're definitely not taking it. 115 00:05:53,500 --> 00:05:56,400 So essentially what we're saying is 116 00:05:56,400 --> 00:05:59,433 if we ever take that approach, then we would have to replace this linear 117 00:05:59,433 --> 00:06:01,366 regression line with a line that looks like that. 118 00:06:01,366 --> 00:06:04,600 So let's just cut those bits off and replace them with horizontal parts. 119 00:06:05,433 --> 00:06:06,766 And that would 120 00:06:08,233 --> 00:06:10,233 be a very 121 00:06:10,233 --> 00:06:14,866 basic, but it still would be an attempt at creating a model for this situation. 122 00:06:14,866 --> 00:06:18,300 So we would still be able to use this to make some sort of predictions 123 00:06:18,300 --> 00:06:19,366 and assumptions. 124 00:06:19,366 --> 00:06:22,466 that's a lot that's, talk about 125 00:06:22,466 --> 00:06:25,466 the correlation between the action and the age of a person. 126 00:06:25,466 --> 00:06:28,633 So that's a very basic, understanding. 127 00:06:28,633 --> 00:06:31,633 And that's kind of the start of our, 128 00:06:32,000 --> 00:06:35,000 understanding of intuition behind logistic regression. 129 00:06:35,033 --> 00:06:38,666 So let's see what the actual scientific approaches. 130 00:06:39,166 --> 00:06:42,300 So here we've got the line that we looked at 131 00:06:42,666 --> 00:06:45,666 and it is described by this equation. 132 00:06:45,700 --> 00:06:48,200 Now this part is going to be this is the most fun part. 133 00:06:48,200 --> 00:06:49,533 So bear with me 134 00:06:49,533 --> 00:06:54,033 if you apply to this equation a sigmoid function which looks like that. 135 00:06:55,166 --> 00:06:56,633 So you put the y into the 136 00:06:56,633 --> 00:07:01,466 sigmoid function in purple, and then you solve 137 00:07:01,633 --> 00:07:05,933 for y from the purple box and you put y back into the blue box. 138 00:07:06,300 --> 00:07:07,900 Then you will get the green box. 139 00:07:07,900 --> 00:07:12,466 So basically your linear regression will start to look like this. 140 00:07:12,466 --> 00:07:15,900 And this is the formula for logistic regression. 141 00:07:16,333 --> 00:07:19,733 And what that will do to a chart which is most importantly 142 00:07:19,766 --> 00:07:22,200 this visual part. 143 00:07:22,200 --> 00:07:24,000 It will convert it from 144 00:07:24,000 --> 00:07:27,466 the chart that we see at the top to this new chart, 145 00:07:27,866 --> 00:07:31,066 which is actually the logistic regression function. 146 00:07:31,966 --> 00:07:36,333 So if at this stage you're asking yourself 147 00:07:37,166 --> 00:07:40,500 what just happened, then you're not alone. 148 00:07:41,133 --> 00:07:46,133 The first time I saw this or I learned this, this was the expression on my face, 149 00:07:46,800 --> 00:07:49,766 if you if you told me if you told you come through with all of that, 150 00:07:49,766 --> 00:07:50,700 that's super great. 151 00:07:50,700 --> 00:07:52,800 That means you'll fly through this section. 152 00:07:52,800 --> 00:07:55,200 But if you're confused right now, not a problem. 153 00:07:55,200 --> 00:07:57,566 I was the same when I was in your shoes. 154 00:07:57,566 --> 00:08:00,433 So let's, take this step by step. 155 00:08:00,433 --> 00:08:02,000 Let's look at it step by step. 156 00:08:02,000 --> 00:08:04,900 Exactly what happened. So there's our graph. 157 00:08:06,033 --> 00:08:08,733 There's our independent variable. 158 00:08:08,733 --> 00:08:11,100 There's our outcome. 159 00:08:11,100 --> 00:08:12,000 Yes or no. 160 00:08:12,000 --> 00:08:15,000 So that's the y the dependent variable. 161 00:08:15,433 --> 00:08:20,033 And there are our observations in our data set based on these observations. 162 00:08:20,266 --> 00:08:23,400 And plus using this formula 163 00:08:23,400 --> 00:08:27,233 which we're going to take as given this is the logistic regression formula. 164 00:08:27,833 --> 00:08:29,200 Using this formula. 165 00:08:29,200 --> 00:08:32,900 And these observations we come up with this line. 166 00:08:33,800 --> 00:08:35,933 And what is important to understand here. 167 00:08:35,933 --> 00:08:37,933 It's not a magical line. 168 00:08:37,933 --> 00:08:40,600 This line for the logistic regression is the same 169 00:08:40,600 --> 00:08:45,733 as a slope or a trend line for a linear regression. 170 00:08:45,733 --> 00:08:48,733 So basically what this line is doing is 171 00:08:48,733 --> 00:08:52,133 it is using the formula is following the formula. 172 00:08:52,366 --> 00:08:56,900 And it's the best fitting line that can fit these data sets. 173 00:08:56,900 --> 00:09:00,333 So basically we're doing exactly the same thing as a for linear regression. 174 00:09:00,933 --> 00:09:03,433 But it just looks different that's all. 175 00:09:03,433 --> 00:09:07,100 So there there are heaps of these lines that can you can draw that look like that. 176 00:09:07,400 --> 00:09:09,933 But only one of them is the best fitting line. 177 00:09:09,933 --> 00:09:14,200 So the point of the logistic regression is to find that best fitting line. 178 00:09:14,466 --> 00:09:15,400 And this is it. 179 00:09:15,400 --> 00:09:20,533 So we found the best fitting line that follows that equation. And 180 00:09:21,633 --> 00:09:23,233 it fits these 181 00:09:23,233 --> 00:09:26,233 variables that we or these observations that we had in our data set. 182 00:09:26,533 --> 00:09:29,733 After that we can forget about the equation. 183 00:09:29,733 --> 00:09:31,600 We forget about the variables. We've got our line. 184 00:09:31,600 --> 00:09:35,400 So this is our logistic regression function 185 00:09:35,400 --> 00:09:37,600 we found in same thing as with the linear regression. 186 00:09:37,600 --> 00:09:40,033 We've created the model. We've built the model. You can see it. 187 00:09:40,033 --> 00:09:43,033 This is the model in front of you right there. 188 00:09:43,400 --> 00:09:46,200 Now what can we do with this logistic regression. 189 00:09:46,200 --> 00:09:49,800 Well we can use it to predict probabilities. 190 00:09:49,800 --> 00:09:54,100 And we've already touched on probabilities that the lie between 0 and 1. 191 00:09:54,100 --> 00:09:58,033 And that instead of predicting for sure that something will or will not happen, 192 00:09:58,033 --> 00:09:59,633 how will we predict probability? 193 00:09:59,633 --> 00:10:03,000 So let's look at Oh by the way, 194 00:10:03,066 --> 00:10:06,000 probability here is called p hat. 195 00:10:06,000 --> 00:10:09,000 So that that's a little sign about the p. 196 00:10:09,266 --> 00:10:10,766 It gives it the name p hat. 197 00:10:10,766 --> 00:10:13,800 And anything you see in the info hat in this section 198 00:10:13,800 --> 00:10:14,933 just basically means that it's, 199 00:10:16,133 --> 00:10:17,100 something we're predicting. 200 00:10:17,100 --> 00:10:21,500 So, and that's, that's the way to remember it, that that picture P hat. 201 00:10:21,733 --> 00:10:24,733 So we're predicting this probability 202 00:10:25,100 --> 00:10:25,466 okay. 203 00:10:25,466 --> 00:10:31,200 So let's take four random values for the independent variable for x. 204 00:10:31,200 --> 00:10:33,166 We're going to say 2030 4050. 205 00:10:33,166 --> 00:10:34,966 Let's see what happens with the variables. 206 00:10:34,966 --> 00:10:37,600 So let's put them on the X line. 207 00:10:37,600 --> 00:10:38,633 Those are the dots. 208 00:10:38,633 --> 00:10:42,133 And I specifically put dots not x's or crosses 209 00:10:42,333 --> 00:10:45,500 because it doesn't mean that they're on the horizontal. 210 00:10:45,500 --> 00:10:49,000 The bottom line doesn't mean that the probability is zero or that 211 00:10:49,300 --> 00:10:52,066 their dependent variable is zero. 212 00:10:52,066 --> 00:10:54,933 No, they're just there because they're on the x axis. 213 00:10:54,933 --> 00:10:55,800 We just plotted them. 214 00:10:55,800 --> 00:10:58,800 There has nothing to do with the vertical axis. 215 00:10:59,466 --> 00:11:02,666 Now let's what you need to do to find the probabilities is 216 00:11:02,666 --> 00:11:05,666 you need to project these values onto your curve. 217 00:11:05,700 --> 00:11:08,700 Once you project them you get these blue 218 00:11:09,400 --> 00:11:12,166 light blue blue dots or blue 219 00:11:12,166 --> 00:11:14,800 observations which plotted basically. 220 00:11:14,800 --> 00:11:16,266 So these are the fitted values. 221 00:11:16,266 --> 00:11:18,333 As you remember in Gretel you have in red. 222 00:11:18,333 --> 00:11:21,000 You have the actual and in blue you have the fitted values. 223 00:11:21,000 --> 00:11:23,300 So these are your fitted values. 224 00:11:23,300 --> 00:11:26,200 And now if you project them, 225 00:11:26,200 --> 00:11:29,800 if you want the probabilities you need to project them to the left like that. 226 00:11:30,300 --> 00:11:31,600 And let's have a look at these probabilities. 227 00:11:31,600 --> 00:11:35,000 So the person who's 20 years old the probability of taking up 228 00:11:35,000 --> 00:11:38,300 this offer is very low, perhaps 0.7%. 229 00:11:38,300 --> 00:11:40,066 So less than one percentage. 230 00:11:40,066 --> 00:11:44,100 To take up this offer, the person who's 30 years old, the percent. 231 00:11:44,300 --> 00:11:45,533 The probability is higher. 232 00:11:45,533 --> 00:11:48,166 It's about 23%. To take up this offer. 233 00:11:48,166 --> 00:11:51,600 The person who's 40 years old, their probability take up. 234 00:11:51,600 --> 00:11:54,600 This offers 85% according to this model, 235 00:11:54,633 --> 00:11:58,200 and the person who's 50 years old, their probability is 99.4%. 236 00:11:59,033 --> 00:12:03,366 So that's the first thing that you can get out of a logistic regression. 237 00:12:03,366 --> 00:12:04,300 That's what we're going to be using. 238 00:12:04,300 --> 00:12:08,800 Very we're going to be using it very actively when we're talking 239 00:12:08,800 --> 00:12:12,666 about building geo demographic segmentations, because you use 240 00:12:12,666 --> 00:12:14,133 this probability as a score. 241 00:12:14,133 --> 00:12:16,300 And we'll talk about this more. 242 00:12:16,300 --> 00:12:20,666 So you can actually rank people who is the most likely to take up your friend. 243 00:12:20,666 --> 00:12:22,866 Who's the least likely to take your offer up. 244 00:12:22,866 --> 00:12:26,100 So it's actually even better than just having a one or a zero. 245 00:12:26,166 --> 00:12:27,700 You have a probability. 246 00:12:27,700 --> 00:12:30,700 So you can, order people by this probability. 247 00:12:31,033 --> 00:12:35,400 Anyway, the you might want to say, well, I don't want the probability, 248 00:12:35,400 --> 00:12:38,666 I want a prediction as because this is a regression. 249 00:12:38,933 --> 00:12:43,566 I want a prediction for the, the y value. 250 00:12:43,666 --> 00:12:46,066 So okay. We can do that. 251 00:12:46,066 --> 00:12:50,100 Can we get let's get rid of those, probabilities. 252 00:12:50,100 --> 00:12:53,966 Now can we get the Y the actual obviously we can't get the actual 253 00:12:53,966 --> 00:12:57,833 because the actual is something that we can only observe in, in our data 254 00:12:57,833 --> 00:12:59,100 set or in real life. 255 00:12:59,100 --> 00:13:01,400 We can only get a prediction for the actual. 256 00:13:01,400 --> 00:13:06,166 So y hat, as it has suggests, is a predicted value 257 00:13:06,166 --> 00:13:08,866 for the dependent variable. 258 00:13:08,866 --> 00:13:10,000 How did you get y hat? 259 00:13:10,000 --> 00:13:12,666 Well, the approach is very arbitrary. 260 00:13:12,666 --> 00:13:13,900 You have to select a line. 261 00:13:15,266 --> 00:13:16,433 Let's wait for that okay. 262 00:13:16,433 --> 00:13:17,633 So you have to select a line. 263 00:13:17,633 --> 00:13:19,900 In this case we're going to take 50%. 264 00:13:19,900 --> 00:13:22,633 you can select it anywhere, but 50% is usually selected 265 00:13:22,633 --> 00:13:23,533 because it's in the middle. 266 00:13:23,533 --> 00:13:28,266 And it's therefore you have symmetry and anything below this line. 267 00:13:28,266 --> 00:13:32,300 So anything that falls on the curve below this line will be projected 268 00:13:32,300 --> 00:13:36,000 downwards onto the zero line, which, which makes sense. 269 00:13:36,000 --> 00:13:39,000 So it's basically saying if your probability, 270 00:13:39,066 --> 00:13:42,733 your predicted probability of taking up this offer is less than 50%, let's say 271 00:13:42,733 --> 00:13:44,533 it's 40% or 20%, 272 00:13:44,533 --> 00:13:45,600 then we're just going to say that you're 273 00:13:45,600 --> 00:13:48,600 not you're probably not going to take up this offer. 274 00:13:48,833 --> 00:13:50,466 And so that's what's happening. 275 00:13:50,466 --> 00:13:56,066 The person with 0.7%, the person with whatever it was, 27%, 23%, 276 00:13:56,766 --> 00:14:01,566 their, predicted their probabilities are not zero, but they're below 50. 277 00:14:01,566 --> 00:14:04,766 So you are if you're if you're 278 00:14:04,766 --> 00:14:07,766 if you do require a Y hat, so a predicted 279 00:14:08,066 --> 00:14:11,600 value, a yes or no value, then make sense that 280 00:14:11,600 --> 00:14:14,233 if something is below 50%, you're probably going to say that 281 00:14:14,233 --> 00:14:16,400 they're not going to take up the offer now. 282 00:14:16,400 --> 00:14:19,066 And you think about, oh yeah, so there you go. 283 00:14:19,066 --> 00:14:21,000 Both of them. Y hat so zero. 284 00:14:21,000 --> 00:14:25,400 Now anything above the horizontal line that we've selected, the 50% line. 285 00:14:27,600 --> 00:14:30,600 It is agreed that all of those values 286 00:14:30,600 --> 00:14:34,433 that fall onto the curve above that line are projected upwards. 287 00:14:34,433 --> 00:14:38,033 They're projected onto the yes line, the one line. 288 00:14:38,400 --> 00:14:43,100 So the person that's had a probability of, 85% is projected 289 00:14:43,100 --> 00:14:47,000 upwards, and the person had the probability of 99.7% is projected upwards. 290 00:14:47,166 --> 00:14:48,100 Also makes sense. Right. 291 00:14:48,100 --> 00:14:52,400 So if, if, somebody has got a chart, you're predicting 292 00:14:52,400 --> 00:14:55,033 that somebody is probability of taking up an offer is 85%. 293 00:14:55,033 --> 00:14:58,566 Then if you have to say yes or no, then you're probably going to say yes. 294 00:14:58,700 --> 00:14:59,966 You're going to say yes. 295 00:14:59,966 --> 00:15:01,200 This person will take up the offer 296 00:15:01,200 --> 00:15:04,200 if you just if you have to choose one of the two. 297 00:15:04,433 --> 00:15:08,966 So those are our predicted y hat values. 298 00:15:09,700 --> 00:15:12,700 In this case they're both one for those two variables. 299 00:15:13,500 --> 00:15:15,666 And those are the two things, 300 00:15:15,666 --> 00:15:18,400 you can get out of, the logistic regression. 301 00:15:18,400 --> 00:15:19,233 So you get them, 302 00:15:20,566 --> 00:15:22,100 probabilities, which are important. 303 00:15:22,100 --> 00:15:23,266 Also, you can get the Y hat. 304 00:15:23,266 --> 00:15:26,400 So the predicted values for the dependent variables, once again, it's important 305 00:15:26,400 --> 00:15:30,600 to think of it as it's doing exactly the same thing as a linear regression. 306 00:15:30,800 --> 00:15:33,800 It's 307 00:15:34,100 --> 00:15:36,966 It's fitting this line even though it's not a straight line. 308 00:15:36,966 --> 00:15:41,066 And, and the values are not scattered. 309 00:15:41,366 --> 00:15:46,900 Everything looks bizarre in its uniformity or in the way it is structured. 310 00:15:46,900 --> 00:15:52,200 Its structure makes it look very bizarre, but still, it's it's a pretty much 311 00:15:52,200 --> 00:15:56,933 the same way we agreed on a line or a formula for a curve, 312 00:15:57,100 --> 00:16:00,000 and we're trying to fit the best curve to our data. 313 00:16:00,000 --> 00:16:02,166 Once we've done that, we've got we've got a model, 314 00:16:02,166 --> 00:16:05,233 we've got the, coefficients, which we'll talk about later, 315 00:16:05,466 --> 00:16:08,366 and we can start drawing conclusions 316 00:16:08,366 --> 00:16:11,366 or insights from, this model. 317 00:16:11,400 --> 00:16:16,100 And some of the insights are we can get a probability of somebody taking action 318 00:16:16,100 --> 00:16:21,600 or on of the event occurring and or basically of the answer being yes. 319 00:16:21,600 --> 00:16:23,633 So it's not a yes no, it's a probability. 320 00:16:23,633 --> 00:16:26,066 So 85% or 20% or whatever. 321 00:16:26,066 --> 00:16:29,233 So that's when we projected to the left onto the y axis. 322 00:16:29,533 --> 00:16:32,933 And also we can get a predicted value for the dependent variable 323 00:16:33,300 --> 00:16:36,333 based on where we select this arbitrary line 50%. 324 00:16:36,333 --> 00:16:38,000 You can select it anywhere you like. 325 00:16:38,000 --> 00:16:40,300 You can select higher or lower. 326 00:16:40,300 --> 00:16:43,866 depends on your knowledge about the problem at hand. 327 00:16:44,200 --> 00:16:47,100 And as you as you understand, depending on where you selected, 328 00:16:47,100 --> 00:16:50,100 that will significantly affect, your variables. 329 00:16:50,333 --> 00:16:53,333 So I really hope this, explanation was, 330 00:16:54,900 --> 00:16:58,166 trivial enough and yet insightful enough 331 00:16:58,166 --> 00:17:02,166 for you to gain an intuitive understanding of logistic regression. 332 00:17:02,633 --> 00:17:03,900 I look forward to seeing you then. 333 00:17:03,900 --> 00:17:05,933 And until next time, happy analyzing.